Re: Stop Words in SpellCheckComponent

2012-06-02 Thread Matthias Müller
> Also, generally, you should have a separate field and field type for the
> spellcheck field **so that normal text fields can use stop words.**

Now I've found a solution, although I'm not sure, if it's that what
you've meant:

Now I'm using a special fieldType WITHOUT stopwords for the spellcheck field.
So - I think - the SpellCheckComponent doesn't find better matches for
stopwords, because it has indexed the stopwords itself.

Thanks for your help

Matthias

schema.xml
.

  


  


   

solrconfig.xml
.
  

textSpell


  default
  spellcheckField


Re: Stop Words in SpellCheckComponent

2012-06-01 Thread Jack Krupansky
You forgot to give us the field definition for "name". Is it the same as in 
the 3.6 example, or is it changed?


Make sure that you delete all existing data after you change the 
schema/config.


Do a direct query on the spellcheck field (name:the) to verify whether "the" 
is being indexed or not.


Also, generally, you should have a separate field and field type for the 
spellcheck field so that normal text fields can use stop words.


-- Jack Krupansky

-Original Message- 
From: Matthias Müller

Sent: Friday, June 01, 2012 4:51 AM
To: solr-user@lucene.apache.org
Subject: Re: Stop Words in SpellCheckComponent


But your most recent email referred to "stopword.txt".

So, either add "the" to german_stop_long.txt, or change the "words" option
of your stopfilter to refer to "stopwords.txt".


Sorry for that confusion: The stopfilter refers to the stopwords.txt

Now I'm just talking about the solr example webapp
(apache-solr-3.6.0.tgz/example) which I slightly modified (as
described in the last mail).

In this example solr makes also suggestions for stopwords.
I can't see a mistake in my configuration.

1. The stopfilter refers to the stopwords.txt:

   
 
 ...
   
 ...
 
 
 ...
   
...
 
   

2. The SpellCheckComponent refers to the field "name":

name 



Re: Stop Words in SpellCheckComponent

2012-06-01 Thread Matthias Müller
> But your most recent email referred to "stopword.txt".
>
> So, either add "the" to german_stop_long.txt, or change the "words" option
> of your stopfilter to refer to "stopwords.txt".

Sorry for that confusion: The stopfilter refers to the stopwords.txt

Now I'm just talking about the solr example webapp
(apache-solr-3.6.0.tgz/example) which I slightly modified (as
described in the last mail).

In this example solr makes also suggestions for stopwords.
I can't see a mistake in my configuration.

1. The stopfilter refers to the stopwords.txt:


  
  ...

  ...
  
  
  ...

...
  


2. The SpellCheckComponent refers to the field "name":

 name


Re: Stop Words in SpellCheckComponent

2012-05-31 Thread Jack Krupansky
Your earlier email had this option in your spellcheck.de field type analyzer 
for the StopFilterFactory:


words="german_stop_long.txt"

But your most recent email referred to "stopword.txt".

So, either add "the" to german_stop_long.txt, or change the "words" option 
of your stopfilter to refer to "stopwords.txt".


BTW, I think you can actually have a comma-separated list of stopword files, 
so you can write:


words="german_stop_long.txt,stopwords.txt"

-- Jack Krupansky

-Original Message- 
From: Matthias Müller

Sent: Friday, June 01, 2012 1:44 AM
To: solr-user@lucene.apache.org
Subject: Re: Stop Words in SpellCheckComponent


spellcheck_de

That should reference a field, not a field type.


Thanks for your help. But I did that, too.

Here I'll show that even the solr example webapp makes suggestions for
stopwords: I've ...

1. added "the" to the stopwords.txt
2. added "thex" to an example document (field name)
3. startet solr
4. indexed the example files (sh post.sh *.xml)
5. searched for "the solr"
http://myhost:8983/solr/select?q=the+solr&spellcheck=true&wt=json
6. got the desired result, but also the wrong suggestion "thex"

{ "response" : { "docs" : [ {...  "name" : "Solr, thex Enterprise
Search Server", ..  } ],
 "numFound" : 1,
...  },
...
 "spellcheck" : { "suggestions" : [ "the",
 {..."suggestion" : [ "thex" ]  }
   ] }
}


Here's the complete diff between the original download and my 3 
modifications:


diff -r apache-solr-3.6.0/example/exampledocs/solr.xml
apache-solr-3.6.0x/example/exampledocs/solr.xml
21c21
<   Solr, the Enterprise Search Server
---

  Solr, thex Enterprise Search Server

diff -r apache-solr-3.6.0/example/solr/conf/solrconfig.xml
apache-solr-3.6.0x/example/solr/conf/solrconfig.xml
781a782,785

 
   spellcheck
 


1122a1127

  true

diff -r apache-solr-3.6.0/example/solr/conf/stopwords.txt
apache-solr-3.6.0x/example/solr/conf/stopwords.txt
14a15,16


the 




Re: Stop Words in SpellCheckComponent

2012-05-31 Thread Matthias Müller
> spellcheck_de
>
> That should reference a field, not a field type.

Thanks for your help. But I did that, too.

Here I'll show that even the solr example webapp makes suggestions for
stopwords: I've ...

1. added "the" to the stopwords.txt
2. added "thex" to an example document (field name)
3. startet solr
4. indexed the example files (sh post.sh *.xml)
5. searched for "the solr"
http://myhost:8983/solr/select?q=the+solr&spellcheck=true&wt=json
6. got the desired result, but also the wrong suggestion "thex"

{ "response" : { "docs" : [ {...  "name" : "Solr, thex Enterprise
Search Server", ..  } ],
  "numFound" : 1,
...  },
...
  "spellcheck" : { "suggestions" : [ "the",
  {..."suggestion" : [ "thex" ]  }
] }
}


Here's the complete diff between the original download and my 3 modifications:

diff -r apache-solr-3.6.0/example/exampledocs/solr.xml
apache-solr-3.6.0x/example/exampledocs/solr.xml
21c21
<   Solr, the Enterprise Search Server
---
>   Solr, thex Enterprise Search Server
diff -r apache-solr-3.6.0/example/solr/conf/solrconfig.xml
apache-solr-3.6.0x/example/solr/conf/solrconfig.xml
781a782,785
>  
>spellcheck
>  
>
1122a1127
>   true
diff -r apache-solr-3.6.0/example/solr/conf/stopwords.txt
apache-solr-3.6.0x/example/solr/conf/stopwords.txt
14a15,16
>
> the


Re: Stop Words in SpellCheckComponent

2012-05-31 Thread Jack Krupansky
Spellcheck wants a field, not a field type. You have a spellcheck_de field 
type, but you need a field as well.


spellcheck_de

That should reference a field, not a field type.

-- Jack Krupansky

-Original Message- 
From: Matthias Müller

Sent: Thursday, May 31, 2012 3:23 PM
To: solr-user@lucene.apache.org
Subject: Re: Stop Words in SpellCheckComponent


is it possible to configure a stopword list to the SpellCheckComponent?



Add a stopwordfilter to your spellcheck field.


Hmm, I did. Could it be another mistake?

This is the schema definition:

   
 
   
   
   
   
   
 
   

This is the solrconfig:

 

  edismax
  10
  text_de title_de^5
  text_de title_de^5

  true
  0



  spellcheck_de

 


 
   textSpell
   
 default
 spellcheck_de
 spellchecker_de
 true
 true
   
  



Re: Stop Words in SpellCheckComponent

2012-05-31 Thread Matthias Müller
>> is it possible to configure a stopword list to the SpellCheckComponent?

> Add a stopwordfilter to your spellcheck field.

Hmm, I did. Could it be another mistake?

This is the schema definition:


  





  


This is the solrconfig:

  
 
   edismax
   10
   text_de title_de^5
   text_de title_de^5

   true
   0
 

 
   spellcheck_de
 
  


  
textSpell

  default
  spellcheck_de
  spellchecker_de
  true
  true

  


RE: Stop Words in SpellCheckComponent

2012-05-31 Thread Markus Jelsma
Add a stopwordfilter to your spellcheck field.
 
-Original message-
> From:Matthias Müller 
> Sent: Thu 31-May-2012 18:39
> To: solr-user@lucene.apache.org
> Subject: Stop Words in SpellCheckComponent
> 
> Hi,
> 
> is it possible to configure a stopword list to the SpellCheckComponent?
> 
> For example:
> When searching for "the indexs" "the" is filtered, because it is a stopword.
> The SpellCheckComponent gives me a false suggestion for "the".
> But the SpellCheckComponent should only give a suggestion for "index"
> because "the" is a stopword.
> 
> Kind Regards
> 
> Matthias
>