Re: IndexBasedSpellChecker on multiple fields
Hi James, sorry for the noise but I am not able to using the approach described, I'm sure I'm misconfiguring something. Basically, I have 2 fields, `abstract` and `subject`, and a field `master-dictionary` where the first to have ben copied. Then, in solrconfig.xml I configured the SpellCheckComponent which executes checks on master-dictionary field... When I start Solr, raises an exception: Oct 20, 2011 3:51:00 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Specified dictionary does not exist. at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:164) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:54) at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1177) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) Can you help me please checking this schema[1]? Many thanks in advance, all the best! Simo [1] https://gist.github.com/1301194 http://people.apache.org/~simonetripodi/ http://simonetripodi.livejournal.com/ http://twitter.com/simonetripodi http://www.99soft.org/ On Wed, Oct 19, 2011 at 9:39 AM, Simone Tripodi simonetrip...@apache.org wrote: Hi James! terrific suggestion, thanks a lot!!! And sorry for the delay (due to my timezone ;) ) I'll let you know how things will go, thanks once again and have a nice day! Simo http://people.apache.org/~simonetripodi/ http://simonetripodi.livejournal.com/ http://twitter.com/simonetripodi http://www.99soft.org/ On Tue, Oct 18, 2011 at 5:16 PM, Dyer, James james.d...@ingrambook.com wrote: Simone, You can set up a master dictionary but with a few caveats. What you'll need to do is copyfield all of the fields you want to include in your master dictionary into one field and base your IndexBasedSpellChecker dictionary on that. In addition, I would recommend you use the collate feature and set spellcheck.maxCollationTries to something greater than zero (5-10 is usually good). Otherwise, you probably will get a lot of ridiculous suggestions from it trying to correct words from one field with values from another. See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate for more information. There is still a big problem with approach, however. Unless you set onlyMorePopular=true, Solr will never suggest a correction for a word that exists in the dictionary. By creating a huge master dictionary, you will be increasing the chances that Solr will assume your users' misspelled words are in fact correct. One way to work around this is instead of blindly using copyField, to hand-pick a subset of your terms for the master field on which you base your dictionary. Another workaround is to use onlyMorePopular, although this has its own problems. See the discussion for SOLR-2585 (https://issues.apache.org/jira/browse/SOLR-2585), which aims to solve these problems. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: simone.trip...@gmail.com [mailto:simone.trip...@gmail.com] On Behalf Of Simone Tripodi Sent: Tuesday, October 18, 2011 7:06 AM To: solr-user@lucene.apache.org Subject: IndexBasedSpellChecker on multiple fields Hi all guys, I need to configure the IndexBasedSpellChecker that uses more than just one field as a spelling dictionary, is it possible to achieve? In the meanwhile I configured two spellcheckers and let users switch from a checkeer to another via params on GET request, but looks like people are not particularly happy about it... The main problem is that fields I need to speel contain different informations, I mean the intersection between the two sets could be empty. Many thanks in advance, all the best! Simo http://people.apache.org/~simonetripodi/ http://simonetripodi.livejournal.com/ http://twitter.com/simonetripodi http://www.99soft.org/
RE: IndexBasedSpellChecker on multiple fields
Here's approximately how I've got it set up to do essentially the same thing, in one of our production indexes: --- schema.xml has: fieldType name=text_spelling class=solr.TextField positionIncrementGap=100 { whitespaceanalyzer, stopwordfilter, wordfelimiterfilter, lowercasefilter ... or whatever your app needs } /fieldType field name=abstract... / field name=subject ... / field name=spelling_abstract_subject type=text_spelling indexed=true stored=false multiValued=true omitNorms=true / copyField source=abstract dest=spelling_abstract_subject / copyField source=subject dest=spelling_abstract_subject / - solrconfig.xml has: requestHandler name=search_abstract_and_subject class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qfabstract subject/str str name=q.alt*:*/str str name=spellchecktrue/str str name=spellcheck.dictionaryspellchecker_abstract_subject/str str name=spellcheck.count10/str str name=spellcheck.collatetrue/str str name=spellcheck.maxCollationTries10/str str name=spellcheck.maxCollations1/str str name=spellcheck.collateExtendedResultstrue/str /lst arr name=last-components strspellcheck/str /arr /requestHandler searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetext_spelling/str lst name=spellchecker str name=namespellchecker_abstract_subject/str str name=fieldspelling_abstract_subject/str str name=fieldTypetext_spelling/str str name=spellcheckIndexDir./spellchecker/str /lst /searchComponent --- You can then query across the 2 fields and get spell suggestions like this: q=query goes hereqt=search_abstract_and_subject Of course if this is the first query since startup/commit, unless you're building automatically somehow, add: spellcheck.build=true James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: simone.trip...@gmail.com [mailto:simone.trip...@gmail.com] On Behalf Of Simone Tripodi Sent: Thursday, October 20, 2011 8:58 AM To: solr-user@lucene.apache.org Subject: Re: IndexBasedSpellChecker on multiple fields Hi James, sorry for the noise but I am not able to using the approach described, I'm sure I'm misconfiguring something. Basically, I have 2 fields, `abstract` and `subject`, and a field `master-dictionary` where the first to have ben copied. Then, in solrconfig.xml I configured the SpellCheckComponent which executes checks on master-dictionary field... When I start Solr, raises an exception: Oct 20, 2011 3:51:00 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Specified dictionary does not exist. at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:164) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:54) at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1177) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) Can you help me please checking this schema[1]? Many thanks in advance, all the best! Simo [1] https://gist.github.com/1301194 http://people.apache.org/~simonetripodi/ http://simonetripodi.livejournal.com/ http://twitter.com/simonetripodi http://www.99soft.org/ On Wed, Oct 19, 2011 at 9:39 AM, Simone Tripodi simonetrip...@apache.org wrote: Hi James! terrific suggestion, thanks a lot!!! And sorry for the delay (due to my timezone ;) ) I'll let you know how things will go, thanks once again and have a nice day! Simo http://people.apache.org/~simonetripodi/ http://simonetripodi.livejournal.com/ http://twitter.com/simonetripodi http://www.99soft.org/ On Tue, Oct 18, 2011 at 5:16 PM, Dyer, James james.d...@ingrambook.com wrote: Simone, You can set up a master dictionary but with a few caveats. What you'll need to do is copyfield all of the fields you want to include in your master dictionary into one field and base your IndexBasedSpellChecker dictionary on that. In addition, I would recommend you use the collate feature and set spellcheck.maxCollationTries to something greater than zero (5-10 is usually good). Otherwise, you probably will get a lot of ridiculous suggestions from it trying to correct words from one field with values from another. See
Re: IndexBasedSpellChecker on multiple fields
Hi James! terrific suggestion, thanks a lot!!! And sorry for the delay (due to my timezone ;) ) I'll let you know how things will go, thanks once again and have a nice day! Simo http://people.apache.org/~simonetripodi/ http://simonetripodi.livejournal.com/ http://twitter.com/simonetripodi http://www.99soft.org/ On Tue, Oct 18, 2011 at 5:16 PM, Dyer, James james.d...@ingrambook.com wrote: Simone, You can set up a master dictionary but with a few caveats. What you'll need to do is copyfield all of the fields you want to include in your master dictionary into one field and base your IndexBasedSpellChecker dictionary on that. In addition, I would recommend you use the collate feature and set spellcheck.maxCollationTries to something greater than zero (5-10 is usually good). Otherwise, you probably will get a lot of ridiculous suggestions from it trying to correct words from one field with values from another. See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate for more information. There is still a big problem with approach, however. Unless you set onlyMorePopular=true, Solr will never suggest a correction for a word that exists in the dictionary. By creating a huge master dictionary, you will be increasing the chances that Solr will assume your users' misspelled words are in fact correct. One way to work around this is instead of blindly using copyField, to hand-pick a subset of your terms for the master field on which you base your dictionary. Another workaround is to use onlyMorePopular, although this has its own problems. See the discussion for SOLR-2585 (https://issues.apache.org/jira/browse/SOLR-2585), which aims to solve these problems. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: simone.trip...@gmail.com [mailto:simone.trip...@gmail.com] On Behalf Of Simone Tripodi Sent: Tuesday, October 18, 2011 7:06 AM To: solr-user@lucene.apache.org Subject: IndexBasedSpellChecker on multiple fields Hi all guys, I need to configure the IndexBasedSpellChecker that uses more than just one field as a spelling dictionary, is it possible to achieve? In the meanwhile I configured two spellcheckers and let users switch from a checkeer to another via params on GET request, but looks like people are not particularly happy about it... The main problem is that fields I need to speel contain different informations, I mean the intersection between the two sets could be empty. Many thanks in advance, all the best! Simo http://people.apache.org/~simonetripodi/ http://simonetripodi.livejournal.com/ http://twitter.com/simonetripodi http://www.99soft.org/
IndexBasedSpellChecker on multiple fields
Hi all guys, I need to configure the IndexBasedSpellChecker that uses more than just one field as a spelling dictionary, is it possible to achieve? In the meanwhile I configured two spellcheckers and let users switch from a checkeer to another via params on GET request, but looks like people are not particularly happy about it... The main problem is that fields I need to speel contain different informations, I mean the intersection between the two sets could be empty. Many thanks in advance, all the best! Simo http://people.apache.org/~simonetripodi/ http://simonetripodi.livejournal.com/ http://twitter.com/simonetripodi http://www.99soft.org/
RE: IndexBasedSpellChecker on multiple fields
Simone, You can set up a master dictionary but with a few caveats. What you'll need to do is copyfield all of the fields you want to include in your master dictionary into one field and base your IndexBasedSpellChecker dictionary on that. In addition, I would recommend you use the collate feature and set spellcheck.maxCollationTries to something greater than zero (5-10 is usually good). Otherwise, you probably will get a lot of ridiculous suggestions from it trying to correct words from one field with values from another. See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate for more information. There is still a big problem with approach, however. Unless you set onlyMorePopular=true, Solr will never suggest a correction for a word that exists in the dictionary. By creating a huge master dictionary, you will be increasing the chances that Solr will assume your users' misspelled words are in fact correct. One way to work around this is instead of blindly using copyField, to hand-pick a subset of your terms for the master field on which you base your dictionary. Another workaround is to use onlyMorePopular, although this has its own problems. See the discussion for SOLR-2585 (https://issues.apache.org/jira/browse/SOLR-2585), which aims to solve these problems. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: simone.trip...@gmail.com [mailto:simone.trip...@gmail.com] On Behalf Of Simone Tripodi Sent: Tuesday, October 18, 2011 7:06 AM To: solr-user@lucene.apache.org Subject: IndexBasedSpellChecker on multiple fields Hi all guys, I need to configure the IndexBasedSpellChecker that uses more than just one field as a spelling dictionary, is it possible to achieve? In the meanwhile I configured two spellcheckers and let users switch from a checkeer to another via params on GET request, but looks like people are not particularly happy about it... The main problem is that fields I need to speel contain different informations, I mean the intersection between the two sets could be empty. Many thanks in advance, all the best! Simo http://people.apache.org/~simonetripodi/ http://simonetripodi.livejournal.com/ http://twitter.com/simonetripodi http://www.99soft.org/