Re: Stop Words in SpellCheckComponent

2012-06-02 Thread Matthias Müller
 Also, generally, you should have a separate field and field type for the
 spellcheck field **so that normal text fields can use stop words.**

Now I've found a solution, although I'm not sure, if it's that what
you've meant:

Now I'm using a special fieldType WITHOUT stopwords for the spellcheck field.
So - I think - the SpellCheckComponent doesn't find better matches for
stopwords, because it has indexed the stopwords itself.

Thanks for your help

Matthias

schema.xml
.
fieldType name=spellcheckType class=solr.TextField
positionIncrementGap=100
  analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

   field name=spellcheckField type=spellcheckType indexed=true
stored=false/

solrconfig.xml
.
  searchComponent name=spellcheck class=solr.SpellCheckComponent

str name=queryAnalyzerFieldTypetextSpell/str

lst name=spellchecker
  str name=namedefault/str
  str name=fieldspellcheckField/str


Re: Stop Words in SpellCheckComponent

2012-06-01 Thread Jack Krupansky
Your earlier email had this option in your spellcheck.de field type analyzer 
for the StopFilterFactory:


words=german_stop_long.txt

But your most recent email referred to stopword.txt.

So, either add the to german_stop_long.txt, or change the words option 
of your stopfilter to refer to stopwords.txt.


BTW, I think you can actually have a comma-separated list of stopword files, 
so you can write:


words=german_stop_long.txt,stopwords.txt

-- Jack Krupansky

-Original Message- 
From: Matthias Müller

Sent: Friday, June 01, 2012 1:44 AM
To: solr-user@lucene.apache.org
Subject: Re: Stop Words in SpellCheckComponent


str name=fieldspellcheck_de/str

That should reference a field, not a field type.


Thanks for your help. But I did that, too.

Here I'll show that even the solr example webapp makes suggestions for
stopwords: I've ...

1. added the to the stopwords.txt
2. added thex to an example document (field name)
3. startet solr
4. indexed the example files (sh post.sh *.xml)
5. searched for the solr
http://myhost:8983/solr/select?q=the+solrspellcheck=truewt=json
6. got the desired result, but also the wrong suggestion thex

{ response : { docs : [ {...  name : Solr, thex Enterprise
Search Server, ..  } ],
 numFound : 1,
...  },
...
 spellcheck : { suggestions : [ the,
 {...suggestion : [ thex ]  }
   ] }
}


Here's the complete diff between the original download and my 3 
modifications:


diff -r apache-solr-3.6.0/example/exampledocs/solr.xml
apache-solr-3.6.0x/example/exampledocs/solr.xml
21c21
   field name=nameSolr, the Enterprise Search Server/field
---

  field name=nameSolr, thex Enterprise Search Server/field

diff -r apache-solr-3.6.0/example/solr/conf/solrconfig.xml
apache-solr-3.6.0x/example/solr/conf/solrconfig.xml
781a782,785

 arr name=last-components
   strspellcheck/str
 /arr


1122a1127

  str name=buildOnCommittrue/str

diff -r apache-solr-3.6.0/example/solr/conf/stopwords.txt
apache-solr-3.6.0x/example/solr/conf/stopwords.txt
14a15,16


the 




Re: Stop Words in SpellCheckComponent

2012-06-01 Thread Matthias Müller
 But your most recent email referred to stopword.txt.

 So, either add the to german_stop_long.txt, or change the words option
 of your stopfilter to refer to stopwords.txt.

Sorry for that confusion: The stopfilter refers to the stopwords.txt

Now I'm just talking about the solr example webapp
(apache-solr-3.6.0.tgz/example) which I slightly modified (as
described in the last mail).

In this example solr makes also suggestions for stopwords.
I can't see a mistake in my configuration.

1. The stopfilter refers to the stopwords.txt:

fieldType name=text_general class=solr.TextField
positionIncrementGap=100
  analyzer type=index
  ...
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
  ...
  /analyzer
  analyzer type=query
  ...
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
...
  /analyzer
/fieldType

2. The SpellCheckComponent refers to the field name:

 str name=fieldname/str


Re: Stop Words in SpellCheckComponent

2012-06-01 Thread Jack Krupansky
You forgot to give us the field definition for name. Is it the same as in 
the 3.6 example, or is it changed?


Make sure that you delete all existing data after you change the 
schema/config.


Do a direct query on the spellcheck field (name:the) to verify whether the 
is being indexed or not.


Also, generally, you should have a separate field and field type for the 
spellcheck field so that normal text fields can use stop words.


-- Jack Krupansky

-Original Message- 
From: Matthias Müller

Sent: Friday, June 01, 2012 4:51 AM
To: solr-user@lucene.apache.org
Subject: Re: Stop Words in SpellCheckComponent


But your most recent email referred to stopword.txt.

So, either add the to german_stop_long.txt, or change the words option
of your stopfilter to refer to stopwords.txt.


Sorry for that confusion: The stopfilter refers to the stopwords.txt

Now I'm just talking about the solr example webapp
(apache-solr-3.6.0.tgz/example) which I slightly modified (as
described in the last mail).

In this example solr makes also suggestions for stopwords.
I can't see a mistake in my configuration.

1. The stopfilter refers to the stopwords.txt:

   fieldType name=text_general class=solr.TextField
positionIncrementGap=100
 analyzer type=index
 ...
   filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
 ...
 /analyzer
 analyzer type=query
 ...
   filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
...
 /analyzer
   /fieldType

2. The SpellCheckComponent refers to the field name:

str name=fieldname/str 



Stop Words in SpellCheckComponent

2012-05-31 Thread Matthias Müller
Hi,

is it possible to configure a stopword list to the SpellCheckComponent?

For example:
When searching for the indexs the is filtered, because it is a stopword.
The SpellCheckComponent gives me a false suggestion for the.
But the SpellCheckComponent should only give a suggestion for index
because the is a stopword.

Kind Regards

Matthias


RE: Stop Words in SpellCheckComponent

2012-05-31 Thread Markus Jelsma
Add a stopwordfilter to your spellcheck field.
 
-Original message-
 From:Matthias Müller mm4...@googlemail.com
 Sent: Thu 31-May-2012 18:39
 To: solr-user@lucene.apache.org
 Subject: Stop Words in SpellCheckComponent
 
 Hi,
 
 is it possible to configure a stopword list to the SpellCheckComponent?
 
 For example:
 When searching for the indexs the is filtered, because it is a stopword.
 The SpellCheckComponent gives me a false suggestion for the.
 But the SpellCheckComponent should only give a suggestion for index
 because the is a stopword.
 
 Kind Regards
 
 Matthias
 


Re: Stop Words in SpellCheckComponent

2012-05-31 Thread Matthias Müller
 is it possible to configure a stopword list to the SpellCheckComponent?

 Add a stopwordfilter to your spellcheck field.

Hmm, I did. Could it be another mistake?

This is the schema definition:

fieldType name=spellcheck_de class=solr.TextField
positionIncrementGap=100
  analyzer
charFilter class=solr.MappingCharFilterFactory
mapping=mapping-ISOLatin1Accent-nouml.txt /
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.PatternReplaceFilterFactory
pattern=^(.*)[\.\-\']$ replacement=$1 /
filter class=solr.StopFilterFactory ignoreCase=true
words=german_stop_long.txt enablePositionIncrements=true /
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

This is the solrconfig:

  requestHandler name=search_de class=solr.SearchHandler
 lst name=defaults
   str name=defTypeedismax/str
   int name=rows10/int
   str name=qftext_de title_de^5/str
   str name=pftext_de title_de^5/str

   str name=spellchecktrue/str
   str name=mm0/str
 /lst

 arr name=last-components
   strspellcheck_de/str
 /arr
  /requestHandler


  searchComponent name=spellcheck_de class=solr.SpellCheckComponent
str name=queryAnalyzerFieldTypetextSpell/str
lst name=spellchecker
  str name=namedefault/str
  str name=fieldspellcheck_de/str
  str name=spellcheckIndexDirspellchecker_de/str
  str name=spellcheck.onlyMorePopulartrue/str
  str name=buildOnOptimizetrue/str
/lst
  /searchComponent


Re: Stop Words in SpellCheckComponent

2012-05-31 Thread Jack Krupansky
Spellcheck wants a field, not a field type. You have a spellcheck_de field 
type, but you need a field as well.


str name=fieldspellcheck_de/str

That should reference a field, not a field type.

-- Jack Krupansky

-Original Message- 
From: Matthias Müller

Sent: Thursday, May 31, 2012 3:23 PM
To: solr-user@lucene.apache.org
Subject: Re: Stop Words in SpellCheckComponent


is it possible to configure a stopword list to the SpellCheckComponent?



Add a stopwordfilter to your spellcheck field.


Hmm, I did. Could it be another mistake?

This is the schema definition:

   fieldType name=spellcheck_de class=solr.TextField
positionIncrementGap=100
 analyzer
   charFilter class=solr.MappingCharFilterFactory
mapping=mapping-ISOLatin1Accent-nouml.txt /
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.PatternReplaceFilterFactory
pattern=^(.*)[\.\-\']$ replacement=$1 /
   filter class=solr.StopFilterFactory ignoreCase=true
words=german_stop_long.txt enablePositionIncrements=true /
   filter class=solr.LowerCaseFilterFactory/
 /analyzer
   /fieldType

This is the solrconfig:

 requestHandler name=search_de class=solr.SearchHandler
lst name=defaults
  str name=defTypeedismax/str
  int name=rows10/int
  str name=qftext_de title_de^5/str
  str name=pftext_de title_de^5/str

  str name=spellchecktrue/str
  str name=mm0/str
/lst

arr name=last-components
  strspellcheck_de/str
/arr
 /requestHandler


 searchComponent name=spellcheck_de class=solr.SpellCheckComponent
   str name=queryAnalyzerFieldTypetextSpell/str
   lst name=spellchecker
 str name=namedefault/str
 str name=fieldspellcheck_de/str
 str name=spellcheckIndexDirspellchecker_de/str
 str name=spellcheck.onlyMorePopulartrue/str
 str name=buildOnOptimizetrue/str
   /lst
 /searchComponent 



Re: Stop Words in SpellCheckComponent

2012-05-31 Thread Matthias Müller
 str name=fieldspellcheck_de/str

 That should reference a field, not a field type.

Thanks for your help. But I did that, too.

Here I'll show that even the solr example webapp makes suggestions for
stopwords: I've ...

1. added the to the stopwords.txt
2. added thex to an example document (field name)
3. startet solr
4. indexed the example files (sh post.sh *.xml)
5. searched for the solr
http://myhost:8983/solr/select?q=the+solrspellcheck=truewt=json
6. got the desired result, but also the wrong suggestion thex

{ response : { docs : [ {...  name : Solr, thex Enterprise
Search Server, ..  } ],
  numFound : 1,
...  },
...
  spellcheck : { suggestions : [ the,
  {...suggestion : [ thex ]  }
] }
}


Here's the complete diff between the original download and my 3 modifications:

diff -r apache-solr-3.6.0/example/exampledocs/solr.xml
apache-solr-3.6.0x/example/exampledocs/solr.xml
21c21
   field name=nameSolr, the Enterprise Search Server/field
---
   field name=nameSolr, thex Enterprise Search Server/field
diff -r apache-solr-3.6.0/example/solr/conf/solrconfig.xml
apache-solr-3.6.0x/example/solr/conf/solrconfig.xml
781a782,785
  arr name=last-components
strspellcheck/str
  /arr

1122a1127
   str name=buildOnCommittrue/str
diff -r apache-solr-3.6.0/example/solr/conf/stopwords.txt
apache-solr-3.6.0x/example/solr/conf/stopwords.txt
14a15,16

 the