One training document could look like this:
(TOP (S (NP#21 (NP#3 (NNP President) (person (NNP Barack) (NNP Obama)) (POS 's)) (NN campaign)) (VP (VBZ is) (VP (VBG reconfiguring) (NP (NP (NML#21 (PRP$ its)) (NN approach)) (PP (TO to) (NP (JJ powerful) (JJ super) (NNS PACs)))) (, ,) (S (VP (VBD worried) (SBAR (S (NP (NP#3 (DT the) (NN president) (POS 's)) (NN re-election) (NNS prospects)) (VP (MD could) (VP (VB be) (VP (VBN overwhelmed) (PP (IN by) (NP (NP (JJ conservative) (NNS groups)) (VP (VBG raising) (CC and) (VP (VBG spending) (NP (NP (JJ unlimited) (NNS amounts)) (PP (IN of) (NP (NN money))))))))))))))))) (. .)) ) (TOP (S (S (NP#3 (DT The) (NN president)) (VP (MD will) (RB not) (VP (VB attend) (NP (DT those) (NNS events))))) (, ,) (NP (DT a) (NN source)) (VP (VBN confirmed)) (. .)) ) (TOP (S (S (NP#3 (person (NNP Obama))) (VP (VBD was) (RB staunchly) (NP (JJ anti-outside) (NN money)) (PP (IN during) (NP (NML#3 (PRP$ his)) (JJ pre-White) (NNP House) (JJ political) (NN career))))) (, ,) (CC and) (S (NP (NP (JJ first) (NN ran)) (PP (IN for) (NP (DT the) (organization (NNP White) (NNP House))))) (VP (VBG encouraging) (NP (JJ deep-pocketed) (NNPS Democrats)) (S (VP (TO to) (VP (VB send) (NP (NNS checks)) (PP (ADVP (RB only)) (IN through) (NP (NML#3 (PRP$ his)) (NN campaign)))))))) (. .)) ) (TOP (S (NP#3 (PRP He)) (VP (VBD wanted) (SBAR (S (NP (NP (DT a) (ADJP (RB consistently) (JJ coordinated)) (NN message)) (CC and) (NP (NML#3 (PRP$ his)) (NNS advisers))) (VP (VBD were) (ADJP (JJ willing) (S (VP (TO to) (VP (VB starve) (NP (NP (JJ non-campaign) (NNS organizations)) (PP (IN of) (NP (NN cash)))) (SBAR (IN in) (NN order) (S (VP (TO to) (VP (VB achieve) (NP (PRP it)))))))))))))) (. .)) )

Jörn

On 03/28/2012 02:15 PM, Jörn Kottmann wrote:
Hi all,

after all my issues with wordfreak I now believe the easiest way to
train the coref component is by defining a training format and just
write the missing training code.

I suggest the following training format:
- Documents are separated by an empty line like it is done for the name finder
- One parse per line (like it is done in the parser format)
- Additional #id tag on noun phrases in the parse so that the coref component
  knows which noun phrases are coreferent and which not.

Any comments or suggestions?

Jörn


Reply via email to