On 04/04/2012 10:45 PM, Jörn Kottmann wrote:
The current approach DefaultParse.addMention doesn't really seem
to work. A coreferencer trained with it does perform way worse then one
instantiated on the models from the old SourceForge page.
I am not (yet) sure what the problem really is.
There are event files for the old models on the SourceForge page.
They are generated when the coref component is trained with the debug
flag on.
I started to reverse engineer these event files to see how things
could have been in the old training code (I sadly don't have access to it).
The DefaultParse.addMention method tries to match a mention to a noun
phrase or inserts it into the parse tree.
Lets look at this snippet from the training data:
... <COREF ID="5" MIN="chairman">its chairman, Frank Stronach,</COREF> ...
From the parser we get two noun phrases:
"its chairman" and "Frank Stronach"
My addMentions now just inserts a new noun phrase in the tree.
But the old code assigned the id 5 two the two noun phrases, and that
then generated two Mentions with with these ids, which are then used to
produce training events.
In my case these two noun phrases don't get an id assigned by the
Mention Finder, when only the parent noun phrase has an id.
So it looks like we need to come up with a new way of getting the mentions
into the parse tree to train a model which performs like the old one.
Inserting noun phrases based on the MUC mentions seems to be bad idea,
because the parse tree we have during testing is then different to the
one during
training time.
Jörn