> 
> Sorry I'm a bit late to this discussion. I think it is fine to have a
> default way that similarity is assessed, but it shouldn't be completely hid
> from the user that there may be other choices. For example, have you done
> standard similarity based on the standard bag-of-words model?

I 've got the code, it is pretty basic, which will also do bag-of-words.
I will add it

 That's at
> least needed as a baseline to see whether the chunks and tree structures
> are helping. (Sorry in advance if this is not addressing the questions and
> such -- I seem to be missing some context in the discussion.)
> 
> FWIW, it is entirely possible for a chunker to produce better local
> structure than a full parser. This is pretty well known in the dependency
> parsing literature (e.g. see the comparison of MaltParser and MSTParser by
> Nivre and McDonald). Also, if you want unsupervised chunks, you might check
> out work that Elias Ponvert, Katrin Erk, and I did on using HMMs for this
> (and cascading them to get full parses). Code and paper available here:
> 
> http://elias.ponvert.net/upparse

Yes, will take a look

Regards
Boris


> 
> Jason
> 
> On Fri, Dec 2, 2011 at 10:12 AM, Boris Galitsky <[email protected]>wrote:
> 
> >
> > My philosophy for similarity component is that an engineer without
> > background in linguistic can do text processing.
> > He/she would install OpenNLP, and would call assessRelevance(text1, text2)
> > function, without any knowledge of what is heppening inside.
> > That would significantly extend the user base of OpenNLP.
> > The problem domains I used for illustration is search (a standard domain
> > for linguistic apps) and content generation  (a state-of-art technology, in
> > my opinion). Again, to incorporate these into user apps users do not need
> > to know anything about parsing, chunking, etc.
> > RegardsBoris
> >
> >
> >
> >
> >
> > > Date: Fri, 2 Dec 2011 13:10:23 +0100
> > > From: [email protected]
> > > To: [email protected]
> > > Subject: Re: any hints on how to get chunking info from Parse?
> > >
> > > On 12/1/11 8:08 PM, Boris Galitsky wrote:
> > > >    I spent last couple of weeks understanding how OpenNLP parser does
> > chunking, how chunking occurs separately in opennlp.tools.chunker, and I
> > came to conclusion that using independently trained chunker on the results
> > of parser gives significantly higher accuracy of resultant parsing, and
> > therefore makes 'similarity' component much more accurate as a result.
> > > > Lets look at an example (I added stars):
> > > > two NP&  VP are extracted, but what kills similarity component is the
> > last part of the latter:
> > > > ****to-TO drive-NN****
> > > > Parse Tree Chunk list = [NP [Its-PRP$ classy-JJ design-NN and-CC
> > the-DT Mercedes-NNP name-NN ], VP [make-VBP it-PRP a-DT very-RB cool-JJ
> > vehicle-NN *******to-TO drive-NN**** ]]
> > > >
> > > > When I apply the chunker which has its own problems ( but most
> > importantly was trained independently)  I can then apply rules to fix these
> > cases for matching with other sub-VP like 'to-VB'.
> > > > I understand it works slower that way.
> > > > I would propose we have two version of similarity, one that just does
> > without chunker and one which uses it (and also an additional 'correction'
> > algo ? ).
> > > > I have now both versions, but only the latter passes current tests.
> > >
> > > Ok, sounds good to me, but we should assume that the user can run the
> > > parser and chunker them self. Your similarity component simply accepts
> > > a parse tree in one case and a parse tree plus chunks in the other case.
> > >
> > > What do you think?
> > >
> > > Jörn
> >
> >
> 
> 
> 
> -- 
> Jason Baldridge
> Associate Professor, Department of Linguistics
> The University of Texas at Austin
> http://www.jasonbaldridge.com
> http://twitter.com/jasonbaldridge
                                          

Reply via email to