My philosophy for similarity component is that an engineer without background 
in linguistic can do text processing.
He/she would install OpenNLP, and would call assessRelevance(text1, text2) 
function, without any knowledge of what is heppening inside.
That would significantly extend the user base of OpenNLP.
The problem domains I used for illustration is search (a standard domain for 
linguistic apps) and content generation  (a state-of-art technology, in my 
opinion). Again, to incorporate these into user apps users do not need to know 
anything about parsing, chunking, etc.
RegardsBoris





> Date: Fri, 2 Dec 2011 13:10:23 +0100
> From: [email protected]
> To: [email protected]
> Subject: Re: any hints on how to get chunking info from Parse?
> 
> On 12/1/11 8:08 PM, Boris Galitsky wrote:
> >    I spent last couple of weeks understanding how OpenNLP parser does 
> > chunking, how chunking occurs separately in opennlp.tools.chunker, and I 
> > came to conclusion that using independently trained chunker on the results 
> > of parser gives significantly higher accuracy of resultant parsing, and 
> > therefore makes 'similarity' component much more accurate as a result.
> > Lets look at an example (I added stars):
> > two NP&  VP are extracted, but what kills similarity component is the last 
> > part of the latter:
> > ****to-TO drive-NN****
> > Parse Tree Chunk list = [NP [Its-PRP$ classy-JJ design-NN and-CC the-DT 
> > Mercedes-NNP name-NN ], VP [make-VBP it-PRP a-DT very-RB cool-JJ vehicle-NN 
> > *******to-TO drive-NN**** ]]
> >
> > When I apply the chunker which has its own problems ( but most importantly 
> > was trained independently)  I can then apply rules to fix these cases for 
> > matching with other sub-VP like 'to-VB'.
> > I understand it works slower that way.
> > I would propose we have two version of similarity, one that just does 
> > without chunker and one which uses it (and also an additional 'correction' 
> > algo ? ).
> > I have now both versions, but only the latter passes current tests.
> 
> Ok, sounds good to me, but we should assume that the user can run the
> parser and chunker them self. Your similarity component simply accepts
> a parse tree in one case and a parse tree plus chunks in the other case.
> 
> What do you think?
> 
> Jörn
                                          

Reply via email to