Hello Grant, Lance and Joern I have been developing 'similarity' component for OpenNLP that can be plugged into SOLR. This component does relevance assessment based on matching the parse tree of query with the parse trees of candidate answers. The idea of this component is that a search engineer does not need to be familiar with the linguistics, just plugs inSyntGenRequestHandler for longer queries or longer texts, and checks out if it improves the relevance. There are many other applications of similarity component of OpenNLP besides search which live as junits such as semantic filtering for speech recognition, content generation, and auto code generation from NL. This component is about to be released, hopefully, and is currently there: https://issues.apache.org/jira/browse/OPENNLP-497 It sounds like it is complementary to LUCENE 2899. RegardsBoris
> Date: Mon, 1 Oct 2012 00:35:07 +1100 > From: j...@apache.org > To: dev@lucene.apache.org > Subject: [jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities > as a module > > > [ > https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13466478#comment-13466478 > ] > > mailformailingli...@yahoo.de commented on LUCENE-2899: > ------------------------------------------------------ > > Could you please create a new Patch for the current Trunk? I had some > problems on applying it to my working copy... > > I am not entirely sure whether its the Trunk or your Code, but it seems like > your OpenNLP-code only works for the first request. > > As far as I was able to debug, the create()-method of the TokenFilterFactory > is only called every now and again (are created TokenFilters reused for > longer than one call in Solr?). > > If create() of your FilterFactory was called, everything works. However if > the TokenFilter is somehow reused, it fails. > > Is this a bug of Solr or of your Patch? > > > Add OpenNLP Analysis capabilities as a module > > --------------------------------------------- > > > > Key: LUCENE-2899 > > URL: https://issues.apache.org/jira/browse/LUCENE-2899 > > Project: Lucene - Core > > Issue Type: New Feature > > Components: modules/analysis > > Reporter: Grant Ingersoll > > Assignee: Grant Ingersoll > > Priority: Minor > > Attachments: LUCENE-2899.patch, LUCENE-2899.patch, > > LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, > > opennlp_trunk.patch > > > > > > Now that OpenNLP is an ASF project and has a nice license, it would be nice > > to have a submodule (under analysis) that exposed capabilities for it. Drew > > Farris, Tom Morton and I have code that does: > > * Sentence Detection as a Tokenizer (could also be a TokenFilter, although > > it would have to change slightly to buffer tokens) > > * NamedEntity recognition as a TokenFilter > > We are also planning a Tokenizer/TokenFilter that can put parts of speech > > as either payloads (PartOfSpeechAttribute?) on a token or at the same > > position. > > I'd propose it go under: > > modules/analysis/opennlp > > -- > This message is automatically generated by JIRA. > If you think it was sent incorrectly, please contact your JIRA administrators > For more information on JIRA, see: http://www.atlassian.com/software/jira > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org >