Re: Coref training format

Jörn Kottmann Wed, 28 Mar 2012 05:19:32 -0700

One training document could look like this:

(TOP (S (NP#21 (NP#3 (NNP President) (person (NNP Barack) (NNP Obama))(POS 's)) (NN campaign)) (VP (VBZ is) (VP (VBG reconfiguring) (NP (NP(NML#21 (PRP$ its)) (NN approach)) (PP (TO to) (NP (JJ powerful) (JJsuper) (NNS PACs)))) (, ,) (S (VP (VBD worried) (SBAR (S (NP (NP#3 (DTthe) (NN president) (POS 's)) (NN re-election) (NNS prospects)) (VP (MDcould) (VP (VB be) (VP (VBN overwhelmed) (PP (IN by) (NP (NP (JJconservative) (NNS groups)) (VP (VBG raising) (CC and) (VP (VBGspending) (NP (NP (JJ unlimited) (NNS amounts)) (PP (IN of) (NP (NNmoney))))))))))))))))) (. .)) )(TOP (S (S (NP#3 (DT The) (NN president)) (VP (MD will) (RB not) (VP (VBattend) (NP (DT those) (NNS events))))) (, ,) (NP (DT a) (NN source))(VP (VBN confirmed)) (. .)) )(TOP (S (S (NP#3 (person (NNP Obama))) (VP (VBD was) (RB staunchly) (NP(JJ anti-outside) (NN money)) (PP (IN during) (NP (NML#3 (PRP$ his)) (JJpre-White) (NNP House) (JJ political) (NN career))))) (, ,) (CC and) (S(NP (NP (JJ first) (NN ran)) (PP (IN for) (NP (DT the) (organization(NNP White) (NNP House))))) (VP (VBG encouraging) (NP (JJ deep-pocketed)(NNPS Democrats)) (S (VP (TO to) (VP (VB send) (NP (NNS checks)) (PP(ADVP (RB only)) (IN through) (NP (NML#3 (PRP$ his)) (NNcampaign)))))))) (. .)) )(TOP (S (NP#3 (PRP He)) (VP (VBD wanted) (SBAR (S (NP (NP (DT a) (ADJP(RB consistently) (JJ coordinated)) (NN message)) (CC and) (NP (NML#3(PRP$ his)) (NNS advisers))) (VP (VBD were) (ADJP (JJ willing) (S (VP(TO to) (VP (VB starve) (NP (NP (JJ non-campaign) (NNS organizations))(PP (IN of) (NP (NN cash)))) (SBAR (IN in) (NN order) (S (VP (TO to) (VP(VB achieve) (NP (PRP it)))))))))))))) (. .)) )


Jörn

On 03/28/2012 02:15 PM, Jörn Kottmann wrote:

Hi all,

after all my issues with wordfreak I now believe the easiest way to
train the coref component is by defining a training format and just
write the missing training code.

I suggest the following training format:

- Documents are separated by an empty line like it is done for thename finder

- One parse per line (like it is done in the parser format)

- Additional #id tag on noun phrases in the parse so that the corefcomponent

  knows which noun phrases are coreferent and which not.

Any comments or suggestions?

Jörn

Re: Coref training format

Reply via email to