Hi-
Unit tests to the rescue! The current unit test system in the 4.x branch
catches code sequence problems.
[junit4] Throwable #1: java.lang.IllegalStateException:
TokenStream contract violation: reset()/close() call missing, reset()
called multiple times, or subclass does not call super.reset().
Please see Javadocs of TokenStream class for more information about the
correct consuming workflow.
I'll try to get this right. But both OpenNLP and LUCENE-2899 have
deployment problems:
1) OpenNLP does not have a good source of statistical training data for the
models. For example, the NER models are trained from late 1980's newspaper
articles, so the organization finder is kind of... obsolete. That kind of
problem. I think the currency recognizer is trained on text from before the
Euro was introduced (not sure about this).
2) Solr has a basic packaging problem when the Lucene code uses external
libraries.
As to adding it to the main Solr project, I think the Marketplace Of Ideas
has spoken with deafening silence :)
On Wed, Sep 25, 2013 at 9:26 AM, rashi gandhi gandhirash...@gmail.comwrote:
HI,
I am working on OpenNLP integration with SOLR. I have successfully applied
the patch (LUCENE-2899-x.patch) to latest SOLR source code (branch_4x).
I have designed OpenNLP analyzer and index data to it. Analyzer
declaration in schema.xml is as
fieldType name=nlp_type class=solr.TextField
positionIncrementGap=100
analyzer type=index
!-- Sequence of tokenizers and filters
applied at the index time--
tokenizer
class=solr.StandardTokenizerFactory/
filter
class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory
ignoreCase=true words=stopwords.txt enablePositionIncrements=true/
filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true/
filter
class=solr.SnowballPorterFilterFactory/
filter
class=solr.ASCIIFoldingFilterFactory/
/analyzer
analyzer type=query
!-- Sequence of tokenizers and filters
applied at the index time--
tokenizer
class=solr.StandardTokenizerFactory/
filter class=solr.OpenNLPFilterFactory
posTaggerModel=opennlp/en-pos-maxent.bin/
filter class=solr.OpenNLPFilterFactory
nerTaggerModels=opennlp/en-ner-person.bin/
filter class=solr.OpenNLPFilterFactory
nerTaggerModels=opennlp/en-ner-location.bin/
filter
class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory
ignoreCase=true words=stopwords.txt enablePositionIncrements=true/
/analyzer
/fieldType
And field declared for this analyzer:
field name=Detail_Person type=nlp_type indexed=true stored=true
omitNorms=true omitPositions=true/
Problem is here : When I search over this field Detail_Person, results are
not constant.
When I search Detail_Person:brett, it return one document
But again when I fire the same query, it return zero document.
Searching is not stable on OpenNLP field, sometimes it return documents
and sometimes not but documents are there.
And if I search on non OpenNLP fields, it is working properly, results are
stable and correct.
Please help me to make solr results consistent.
Thanks in Advance.
--
Lance Norskog
goks...@gmail.com