Re: SOLR: Searching on OpenNLP fields is unstable

2013-10-20 Thread Lance Norskog
Hi-

Unit tests to the rescue! The current unit test system in the 4.x branch
catches code sequence problems.

  [junit4] Throwable #1: java.lang.IllegalStateException:
TokenStream contract violation: reset()/close() call missing, reset()
called multiple times, or subclass does not call super.reset().
 Please see Javadocs of TokenStream class for more information about the
correct consuming workflow.

I'll try to get this right. But both OpenNLP and LUCENE-2899 have
deployment problems:
1) OpenNLP does not have a good source of statistical training data for the
models. For example, the NER models are trained from late 1980's newspaper
articles, so the organization finder is kind of... obsolete. That kind of
problem. I think the currency recognizer is trained on text from before the
Euro was introduced (not sure about this).
2) Solr has a basic packaging problem when the Lucene code uses external
libraries.

As to adding it to the main Solr project, I think the Marketplace Of Ideas
has spoken with deafening silence :)


On Wed, Sep 25, 2013 at 9:26 AM, rashi gandhi gandhirash...@gmail.comwrote:

 HI,



 I am working on OpenNLP integration with SOLR. I have successfully applied
 the patch (LUCENE-2899-x.patch) to latest SOLR source code (branch_4x).

 I have designed OpenNLP analyzer and index data to it. Analyzer
 declaration in schema.xml is as



   fieldType name=nlp_type class=solr.TextField
 positionIncrementGap=100

 analyzer type=index

 !-- Sequence of tokenizers and filters
 applied at the index time--

 tokenizer
 class=solr.StandardTokenizerFactory/

 filter
 class=solr.LowerCaseFilterFactory/

 filter class=solr.StopFilterFactory
 ignoreCase=true words=stopwords.txt enablePositionIncrements=true/

 filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true/

 filter
 class=solr.SnowballPorterFilterFactory/

 filter
 class=solr.ASCIIFoldingFilterFactory/

 /analyzer

 analyzer type=query

 !-- Sequence of tokenizers and filters
 applied at the index time--

 tokenizer
 class=solr.StandardTokenizerFactory/

 filter class=solr.OpenNLPFilterFactory
 posTaggerModel=opennlp/en-pos-maxent.bin/

 filter class=solr.OpenNLPFilterFactory
 nerTaggerModels=opennlp/en-ner-person.bin/

  filter class=solr.OpenNLPFilterFactory
 nerTaggerModels=opennlp/en-ner-location.bin/

 filter
 class=solr.LowerCaseFilterFactory/

 filter class=solr.StopFilterFactory
 ignoreCase=true words=stopwords.txt enablePositionIncrements=true/

  /analyzer

 /fieldType



 And field declared for this analyzer:

 field name=Detail_Person type=nlp_type indexed=true stored=true
 omitNorms=true omitPositions=true/



 Problem is here : When I search over this field Detail_Person, results are
 not constant.



 When I search Detail_Person:brett, it return one document





 But again when I fire the same query, it return zero document.



 Searching is not stable on OpenNLP field, sometimes it return documents
 and sometimes not but documents are there.

 And if I search on non OpenNLP fields, it is working properly, results are
 stable and correct.

 Please help me to make solr results consistent.

 Thanks in Advance.





-- 
Lance Norskog
goks...@gmail.com


SOLR: Searching on OpenNLP fields is unstable

2013-09-25 Thread rashi gandhi
HI,



I am working on OpenNLP integration with SOLR. I have successfully applied
the patch (LUCENE-2899-x.patch) to latest SOLR source code (branch_4x).

I have designed OpenNLP analyzer and index data to it. Analyzer declaration
in schema.xml is as



  fieldType name=nlp_type class=solr.TextField
positionIncrementGap=100

analyzer type=index

!-- Sequence of tokenizers and filters
applied at the index time--

tokenizer
class=solr.StandardTokenizerFactory/

filter
class=solr.LowerCaseFilterFactory/

filter class=solr.StopFilterFactory
ignoreCase=true words=stopwords.txt enablePositionIncrements=true/

filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true/

filter
class=solr.SnowballPorterFilterFactory/

filter
class=solr.ASCIIFoldingFilterFactory/

/analyzer

analyzer type=query

!-- Sequence of tokenizers and filters
applied at the index time--

tokenizer
class=solr.StandardTokenizerFactory/

filter class=solr.OpenNLPFilterFactory
posTaggerModel=opennlp/en-pos-maxent.bin/

filter class=solr.OpenNLPFilterFactory
nerTaggerModels=opennlp/en-ner-person.bin/

 filter class=solr.OpenNLPFilterFactory
nerTaggerModels=opennlp/en-ner-location.bin/

filter
class=solr.LowerCaseFilterFactory/

filter class=solr.StopFilterFactory
ignoreCase=true words=stopwords.txt enablePositionIncrements=true/

 /analyzer

/fieldType



And field declared for this analyzer:

field name=Detail_Person type=nlp_type indexed=true stored=true
omitNorms=true omitPositions=true/



Problem is here : When I search over this field Detail_Person, results are
not constant.



When I search Detail_Person:brett, it return one document





But again when I fire the same query, it return zero document.



Searching is not stable on OpenNLP field, sometimes it return documents and
sometimes not but documents are there.

And if I search on non OpenNLP fields, it is working properly, results are
stable and correct.

Please help me to make solr results consistent.

Thanks in Advance.