Re: IndexBasedSpellChecker on multiple fields

2011-10-20 Thread Simone Tripodi
Hi James,
sorry for the noise but I am not able to using the approach described,
I'm sure I'm misconfiguring something.

Basically, I have 2 fields, `abstract` and `subject`, and a field
`master-dictionary` where the first to have ben copied.
Then, in solrconfig.xml I configured the SpellCheckComponent which
executes checks on master-dictionary field...
When I start Solr, raises an exception:

Oct 20, 2011 3:51:00 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Specified dictionary
does not exist.
at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:164)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
at 
org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:54)
at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1177)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)

Can you help me please checking this schema[1]?

Many thanks in advance, all the best!
Simo

[1] https://gist.github.com/1301194

http://people.apache.org/~simonetripodi/
http://simonetripodi.livejournal.com/
http://twitter.com/simonetripodi
http://www.99soft.org/



On Wed, Oct 19, 2011 at 9:39 AM, Simone Tripodi
simonetrip...@apache.org wrote:
 Hi James!
 terrific suggestion, thanks a lot!!! And sorry for the delay (due to
 my timezone ;) )
 I'll let you know how things will go, thanks once again and have a nice day!
 Simo

 http://people.apache.org/~simonetripodi/
 http://simonetripodi.livejournal.com/
 http://twitter.com/simonetripodi
 http://www.99soft.org/



 On Tue, Oct 18, 2011 at 5:16 PM, Dyer, James james.d...@ingrambook.com 
 wrote:
 Simone,

 You can set up a master dictionary but with a few caveats.  What you'll 
 need to do is copyfield all of the fields you want to include in your 
 master dictionary into one field and base your IndexBasedSpellChecker 
 dictionary on that.  In addition, I would recommend you use the collate 
 feature and set spellcheck.maxCollationTries to something greater than 
 zero (5-10 is usually good).  Otherwise, you probably will get a lot of 
 ridiculous suggestions from it trying to correct words from one field with 
 values from another.  See 
 http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate for more 
 information.

 There is still a big problem with approach, however.  Unless you set 
 onlyMorePopular=true, Solr will never suggest a correction for a word that 
 exists in the dictionary.  By creating a huge master dictionary, you will 
 be increasing the chances that Solr will assume your users' misspelled words 
 are in fact correct.  One way to work around this is instead of blindly 
 using copyField, to hand-pick a subset of your terms for the master field 
 on which you base your dictionary.  Another workaround is to use 
 onlyMorePopular, although this has its own problems.  See the discussion 
 for SOLR-2585 (https://issues.apache.org/jira/browse/SOLR-2585), which aims 
 to solve these problems.

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: simone.trip...@gmail.com [mailto:simone.trip...@gmail.com] On Behalf 
 Of Simone Tripodi
 Sent: Tuesday, October 18, 2011 7:06 AM
 To: solr-user@lucene.apache.org
 Subject: IndexBasedSpellChecker on multiple fields

 Hi all guys,
 I need to configure the IndexBasedSpellChecker that uses more than
 just one field as a spelling dictionary, is it possible to achieve?
 In the meanwhile I configured two spellcheckers and let users switch
 from a checkeer to another via params on GET request, but looks like
 people are not particularly happy about it...
 The main problem is that fields I need to speel contain different
 informations, I mean the intersection between the two sets could be
 empty.
 Many thanks in advance, all the best!
 Simo

 http://people.apache.org/~simonetripodi/
 http://simonetripodi.livejournal.com/
 http://twitter.com/simonetripodi
 http://www.99soft.org/




RE: IndexBasedSpellChecker on multiple fields

2011-10-20 Thread Dyer, James
Here's approximately how I've got it set up to do essentially the same thing, 
in one of our production indexes:
---
schema.xml has:

fieldType name=text_spelling class=solr.TextField 
positionIncrementGap=100
 { whitespaceanalyzer, stopwordfilter, wordfelimiterfilter, lowercasefilter ... 
or whatever your app needs } 
/fieldType

field name=abstract... /
field name=subject ... /
field name=spelling_abstract_subject type=text_spelling indexed=true 
stored=false multiValued=true omitNorms=true /

copyField source=abstract dest=spelling_abstract_subject /
copyField source=subject dest=spelling_abstract_subject /
-
solrconfig.xml has:

requestHandler name=search_abstract_and_subject class=solr.SearchHandler 
 lst name=defaults
  str name=defTypeedismax/str
  str name=echoParamsexplicit/str
  float name=tie0.01/float
  str name=qfabstract subject/str
  str name=q.alt*:*/str
  str name=spellchecktrue/str
  str name=spellcheck.dictionaryspellchecker_abstract_subject/str
  str name=spellcheck.count10/str
  str name=spellcheck.collatetrue/str
  str name=spellcheck.maxCollationTries10/str
  str name=spellcheck.maxCollations1/str
  str name=spellcheck.collateExtendedResultstrue/str
 /lst
 arr name=last-components
  strspellcheck/str
 /arr 
/requestHandler

searchComponent name=spellcheck class=solr.SpellCheckComponent
 str name=queryAnalyzerFieldTypetext_spelling/str
 lst name=spellchecker
  str name=namespellchecker_abstract_subject/str
  str name=fieldspelling_abstract_subject/str
  str name=fieldTypetext_spelling/str
  str name=spellcheckIndexDir./spellchecker/str
 /lst 
/searchComponent
---
You can then query across the 2 fields and get spell suggestions like this:
  q=query goes hereqt=search_abstract_and_subject

Of course if this is the first query since startup/commit, unless you're 
building automatically somehow, add:
spellcheck.build=true

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: simone.trip...@gmail.com [mailto:simone.trip...@gmail.com] On Behalf Of 
Simone Tripodi
Sent: Thursday, October 20, 2011 8:58 AM
To: solr-user@lucene.apache.org
Subject: Re: IndexBasedSpellChecker on multiple fields

Hi James,
sorry for the noise but I am not able to using the approach described,
I'm sure I'm misconfiguring something.

Basically, I have 2 fields, `abstract` and `subject`, and a field
`master-dictionary` where the first to have ben copied.
Then, in solrconfig.xml I configured the SpellCheckComponent which
executes checks on master-dictionary field...
When I start Solr, raises an exception:

Oct 20, 2011 3:51:00 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Specified dictionary
does not exist.
at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:164)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
at 
org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:54)
at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1177)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)

Can you help me please checking this schema[1]?

Many thanks in advance, all the best!
Simo

[1] https://gist.github.com/1301194

http://people.apache.org/~simonetripodi/
http://simonetripodi.livejournal.com/
http://twitter.com/simonetripodi
http://www.99soft.org/



On Wed, Oct 19, 2011 at 9:39 AM, Simone Tripodi
simonetrip...@apache.org wrote:
 Hi James!
 terrific suggestion, thanks a lot!!! And sorry for the delay (due to
 my timezone ;) )
 I'll let you know how things will go, thanks once again and have a nice day!
 Simo

 http://people.apache.org/~simonetripodi/
 http://simonetripodi.livejournal.com/
 http://twitter.com/simonetripodi
 http://www.99soft.org/



 On Tue, Oct 18, 2011 at 5:16 PM, Dyer, James james.d...@ingrambook.com 
 wrote:
 Simone,

 You can set up a master dictionary but with a few caveats.  What you'll 
 need to do is copyfield all of the fields you want to include in your 
 master dictionary into one field and base your IndexBasedSpellChecker 
 dictionary on that.  In addition, I would recommend you use the collate 
 feature and set spellcheck.maxCollationTries to something greater than 
 zero (5-10 is usually good).  Otherwise, you probably will get a lot of 
 ridiculous suggestions from it trying to correct words from one field with 
 values from another.  See

Re: IndexBasedSpellChecker on multiple fields

2011-10-19 Thread Simone Tripodi
Hi James!
terrific suggestion, thanks a lot!!! And sorry for the delay (due to
my timezone ;) )
I'll let you know how things will go, thanks once again and have a nice day!
Simo

http://people.apache.org/~simonetripodi/
http://simonetripodi.livejournal.com/
http://twitter.com/simonetripodi
http://www.99soft.org/



On Tue, Oct 18, 2011 at 5:16 PM, Dyer, James james.d...@ingrambook.com wrote:
 Simone,

 You can set up a master dictionary but with a few caveats.  What you'll 
 need to do is copyfield all of the fields you want to include in your 
 master dictionary into one field and base your IndexBasedSpellChecker 
 dictionary on that.  In addition, I would recommend you use the collate 
 feature and set spellcheck.maxCollationTries to something greater than zero 
 (5-10 is usually good).  Otherwise, you probably will get a lot of ridiculous 
 suggestions from it trying to correct words from one field with values from 
 another.  See 
 http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate for more 
 information.

 There is still a big problem with approach, however.  Unless you set 
 onlyMorePopular=true, Solr will never suggest a correction for a word that 
 exists in the dictionary.  By creating a huge master dictionary, you will 
 be increasing the chances that Solr will assume your users' misspelled words 
 are in fact correct.  One way to work around this is instead of blindly using 
 copyField, to hand-pick a subset of your terms for the master field on 
 which you base your dictionary.  Another workaround is to use 
 onlyMorePopular, although this has its own problems.  See the discussion 
 for SOLR-2585 (https://issues.apache.org/jira/browse/SOLR-2585), which aims 
 to solve these problems.

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: simone.trip...@gmail.com [mailto:simone.trip...@gmail.com] On Behalf Of 
 Simone Tripodi
 Sent: Tuesday, October 18, 2011 7:06 AM
 To: solr-user@lucene.apache.org
 Subject: IndexBasedSpellChecker on multiple fields

 Hi all guys,
 I need to configure the IndexBasedSpellChecker that uses more than
 just one field as a spelling dictionary, is it possible to achieve?
 In the meanwhile I configured two spellcheckers and let users switch
 from a checkeer to another via params on GET request, but looks like
 people are not particularly happy about it...
 The main problem is that fields I need to speel contain different
 informations, I mean the intersection between the two sets could be
 empty.
 Many thanks in advance, all the best!
 Simo

 http://people.apache.org/~simonetripodi/
 http://simonetripodi.livejournal.com/
 http://twitter.com/simonetripodi
 http://www.99soft.org/



IndexBasedSpellChecker on multiple fields

2011-10-18 Thread Simone Tripodi
Hi all guys,
I need to configure the IndexBasedSpellChecker that uses more than
just one field as a spelling dictionary, is it possible to achieve?
In the meanwhile I configured two spellcheckers and let users switch
from a checkeer to another via params on GET request, but looks like
people are not particularly happy about it...
The main problem is that fields I need to speel contain different
informations, I mean the intersection between the two sets could be
empty.
Many thanks in advance, all the best!
Simo

http://people.apache.org/~simonetripodi/
http://simonetripodi.livejournal.com/
http://twitter.com/simonetripodi
http://www.99soft.org/


RE: IndexBasedSpellChecker on multiple fields

2011-10-18 Thread Dyer, James
Simone,

You can set up a master dictionary but with a few caveats.  What you'll need 
to do is copyfield all of the fields you want to include in your master 
dictionary into one field and base your IndexBasedSpellChecker dictionary on 
that.  In addition, I would recommend you use the collate feature and set 
spellcheck.maxCollationTries to something greater than zero (5-10 is usually 
good).  Otherwise, you probably will get a lot of ridiculous suggestions from 
it trying to correct words from one field with values from another.  See 
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate for more 
information.

There is still a big problem with approach, however.  Unless you set 
onlyMorePopular=true, Solr will never suggest a correction for a word that 
exists in the dictionary.  By creating a huge master dictionary, you will be 
increasing the chances that Solr will assume your users' misspelled words are 
in fact correct.  One way to work around this is instead of blindly using 
copyField, to hand-pick a subset of your terms for the master field on which 
you base your dictionary.  Another workaround is to use onlyMorePopular, 
although this has its own problems.  See the discussion for SOLR-2585 
(https://issues.apache.org/jira/browse/SOLR-2585), which aims to solve these 
problems.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: simone.trip...@gmail.com [mailto:simone.trip...@gmail.com] On Behalf Of 
Simone Tripodi
Sent: Tuesday, October 18, 2011 7:06 AM
To: solr-user@lucene.apache.org
Subject: IndexBasedSpellChecker on multiple fields

Hi all guys,
I need to configure the IndexBasedSpellChecker that uses more than
just one field as a spelling dictionary, is it possible to achieve?
In the meanwhile I configured two spellcheckers and let users switch
from a checkeer to another via params on GET request, but looks like
people are not particularly happy about it...
The main problem is that fields I need to speel contain different
informations, I mean the intersection between the two sets could be
empty.
Many thanks in advance, all the best!
Simo

http://people.apache.org/~simonetripodi/
http://simonetripodi.livejournal.com/
http://twitter.com/simonetripodi
http://www.99soft.org/