On 12/1/11 8:08 PM, Boris Galitsky wrote:
   I spent last couple of weeks understanding how OpenNLP parser does chunking, 
how chunking occurs separately in opennlp.tools.chunker, and I came to 
conclusion that using independently trained chunker on the results of parser 
gives significantly higher accuracy of resultant parsing, and therefore makes 
'similarity' component much more accurate as a result.
Lets look at an example (I added stars):
two NP&  VP are extracted, but what kills similarity component is the last part 
of the latter:
****to-TO drive-NN****
Parse Tree Chunk list = [NP [Its-PRP$ classy-JJ design-NN and-CC the-DT 
Mercedes-NNP name-NN ], VP [make-VBP it-PRP a-DT very-RB cool-JJ vehicle-NN 
*******to-TO drive-NN**** ]]

When I apply the chunker which has its own problems ( but most importantly was 
trained independently)  I can then apply rules to fix these cases for matching 
with other sub-VP like 'to-VB'.
I understand it works slower that way.
I would propose we have two version of similarity, one that just does without 
chunker and one which uses it (and also an additional 'correction' algo ? ).
I have now both versions, but only the latter passes current tests.

Ok, sounds good to me, but we should assume that the user can run the
parser and chunker them self. Your similarity component simply accepts
a parse tree in one case and a parse tree plus chunks in the other case.

What do you think?

Jörn

Reply via email to