AW: Spellchecking and suggesting part numbers
Thanks James, this did help a lot. Is it possible to make DirectSolrSpellChecker try to return suggestions with maximum length of matching leading characters? Alexander -Ursprüngliche Nachricht- Von: Dyer, James [mailto:james.d...@ingramcontent.com] Gesendet: Mittwoch, 24. September 2014 16:42 An: solr-user@lucene.apache.org Betreff: RE: Spellchecking and suggesting part numbers Alexander, You could use a higher value for spellcheck.count, maybe 20 or so, then in your application pick out the suggestions that make changes on the right side. Another option is to use DirectSolrSpellChecker (usually a better choice anyhow) and set the "minPrefix" field. This will require up to n characters on the left side to match before it will make suggestions. Taking a quick look at the code, it seems to me it won't try and correct anything in this prefix region also. So perhaps you can set this to 2-4 (default=1). See http://lucene.apache.org/core/4_10_0/suggest/org/apache/lucene/search/spell/DirectSpellChecker.html#setMinPrefix%28int%29 . James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Lochschmied, Alexander [mailto:alexander.lochschm...@vishay.com] Sent: Wednesday, September 24, 2014 9:06 AM To: solr-user@lucene.apache.org Subject: Spellchecking and suggesting part numbers Hello Solr Users, we are trying to get suggestions for part numbers using the spellchecker. Problem scenario: ABCD1234 // This is the search term ABCE1234 // This is what we get from spellchecker ABCD1244 // This is what we would like to get from spellchecker Characters towards the left of our part numbers are more relevant. The setup is: solr.IndexBasedSpellChecker ./spellchecker did_you_mean_part did_you_mean_part on spellcheck_part Can we tweak the setup such that we should get more relevant part numbers? Thanks, Alexander
Re: Spellchecking and suggesting part numbers
I’ve done something similar to this using the the EdgeNGram not the spellchecker component, I don’t know if this is along with your requirements: The relevant portion of my fieldType config: class="solr.SpellCheckComponent"> > > solr.IndexBasedSpellChecker > ./spellchecker > did_you_mean_part > > >startup="lazy"> > > did_you_mean_part > on > > > spellcheck_part > > > > >positionIncrementGap="100"> > >class="solr.PatternReplaceCharFilterFactory" pattern="[\s]+" replacement=""/> > > >minGramSize="1" maxGramSize="20" side="front"/> >class="solr.RemoveDuplicatesTokenFilterFactory"/> > > >class="solr.PatternReplaceCharFilterFactory" pattern="[\s]+" replacement=""/> > > >minGramSize="1" maxGramSize="20" side="front"/> > > > > Can we tweak the setup such that we should get more relevant part numbers? > > Thanks, > Alexander Concurso "Mi selfie por los 5". Detalles en http://justiciaparaloscinco.wordpress.com
RE: Spellchecking and suggesting part numbers
Alexander, You could use a higher value for spellcheck.count, maybe 20 or so, then in your application pick out the suggestions that make changes on the right side. Another option is to use DirectSolrSpellChecker (usually a better choice anyhow) and set the "minPrefix" field. This will require up to n characters on the left side to match before it will make suggestions. Taking a quick look at the code, it seems to me it won't try and correct anything in this prefix region also. So perhaps you can set this to 2-4 (default=1). See http://lucene.apache.org/core/4_10_0/suggest/org/apache/lucene/search/spell/DirectSpellChecker.html#setMinPrefix%28int%29 . James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Lochschmied, Alexander [mailto:alexander.lochschm...@vishay.com] Sent: Wednesday, September 24, 2014 9:06 AM To: solr-user@lucene.apache.org Subject: Spellchecking and suggesting part numbers Hello Solr Users, we are trying to get suggestions for part numbers using the spellchecker. Problem scenario: ABCD1234 // This is the search term ABCE1234 // This is what we get from spellchecker ABCD1244 // This is what we would like to get from spellchecker Characters towards the left of our part numbers are more relevant. The setup is: solr.IndexBasedSpellChecker ./spellchecker did_you_mean_part did_you_mean_part on spellcheck_part Can we tweak the setup such that we should get more relevant part numbers? Thanks, Alexander
Spellchecking and suggesting part numbers
Hello Solr Users, we are trying to get suggestions for part numbers using the spellchecker. Problem scenario: ABCD1234 // This is the search term ABCE1234 // This is what we get from spellchecker ABCD1244 // This is what we would like to get from spellchecker Characters towards the left of our part numbers are more relevant. The setup is: solr.IndexBasedSpellChecker ./spellchecker did_you_mean_part did_you_mean_part on spellcheck_part Can we tweak the setup such that we should get more relevant part numbers? Thanks, Alexander
RE: Spellchecking suggestions won't collate
I'm working with business names which are even sometimes people names such as " Wardell F E B Dr ". I suspect I need to change my logic to not try to rely on spellchecking so much as you suggest. Thanks. Corey -Original Message- From: Dyer, James [mailto:james.d...@ingramcontent.com] Sent: August-20-14 9:37 AM To: solr-user@lucene.apache.org Subject: RE: Spellchecking suggestions won't collate Because "my" is the 7th suggestion down the list, it is going to need more than 30 tries to figure out the one that can give some hits. You can increase "maxCollationTries" if you're willing to endure the performance penalty of trying so many replacement queries. This case actually highlights why DirecrSpellChecker by default doesn't even bother with short words like this. Rather than letting the spellchecker check words this small, possibly you can just scan the user's input and make any words <4 characters long to be optional? Or even just use a mm below 100%? (65% ?) I realize this will give you a small loss of precision but the recall will be better and you'll have to rely less on spellcheck. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Corey Gerhardt [mailto:corey.gerha...@directwest.com] Sent: Friday, August 15, 2014 3:21 PM To: Solr User List Subject: Spellchecking suggestions won't collate It must be Friday. I can't figure out why there is no collation value: { "responseHeader":{ "status":0, "QTime":31, "params":{ "spellcheck":"on", "spellcheck.collateParam.qf":"BUS_BUSINESS_NAME", "spellcheck.maxResultsForSuggest":"5", "spellcheck.maxCollations":"3", "spellcheck.maxCollationTries":"30", "qf":"BUS_BUSINESS_NAME_PHRASE", "q.alt":"*:*", "spellcheck.collate":"true", "spellcheck.onlyMorePopular":"false", "defType":"edismax", "debugQuery":"true", "echoParams":"all", "spellcheck.count":"10", "spellcheck.alternativeTermCount":"10", "indent":"true", "q":"Mi Next Promo", "wt":"json"}}, "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[] }, "spellcheck":{ "suggestions":[ "mi",{ "numFound":10, "startOffset":0, "endOffset":2, "suggestion":["mr", "mp", "mid", "mix", "mb", "mj", "my", "md", "mc", "ma"]}, "next",{ "numFound":3, "startOffset":3, "endOffset":7, "suggestion":["nest", "news", "neil"]}, "promo",{ "numFound":4, "startOffset":8, "endOffset":13, "suggestion":["photo", "prime", "pronto", "prof"]}]}, The actual business name is "My Next Promo" which I'm hoping would be the collation value. Thanks, Corey
RE: Spellchecking suggestions won't collate
Because "my" is the 7th suggestion down the list, it is going to need more than 30 tries to figure out the one that can give some hits. You can increase "maxCollationTries" if you're willing to endure the performance penalty of trying so many replacement queries. This case actually highlights why DirecrSpellChecker by default doesn't even bother with short words like this. Rather than letting the spellchecker check words this small, possibly you can just scan the user's input and make any words <4 characters long to be optional? Or even just use a mm below 100%? (65% ?) I realize this will give you a small loss of precision but the recall will be better and you'll have to rely less on spellcheck. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Corey Gerhardt [mailto:corey.gerha...@directwest.com] Sent: Friday, August 15, 2014 3:21 PM To: Solr User List Subject: Spellchecking suggestions won't collate It must be Friday. I can't figure out why there is no collation value: { "responseHeader":{ "status":0, "QTime":31, "params":{ "spellcheck":"on", "spellcheck.collateParam.qf":"BUS_BUSINESS_NAME", "spellcheck.maxResultsForSuggest":"5", "spellcheck.maxCollations":"3", "spellcheck.maxCollationTries":"30", "qf":"BUS_BUSINESS_NAME_PHRASE", "q.alt":"*:*", "spellcheck.collate":"true", "spellcheck.onlyMorePopular":"false", "defType":"edismax", "debugQuery":"true", "echoParams":"all", "spellcheck.count":"10", "spellcheck.alternativeTermCount":"10", "indent":"true", "q":"Mi Next Promo", "wt":"json"}}, "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[] }, "spellcheck":{ "suggestions":[ "mi",{ "numFound":10, "startOffset":0, "endOffset":2, "suggestion":["mr", "mp", "mid", "mix", "mb", "mj", "my", "md", "mc", "ma"]}, "next",{ "numFound":3, "startOffset":3, "endOffset":7, "suggestion":["nest", "news", "neil"]}, "promo",{ "numFound":4, "startOffset":8, "endOffset":13, "suggestion":["photo", "prime", "pronto", "prof"]}]}, The actual business name is "My Next Promo" which I'm hoping would be the collation value. Thanks, Corey
Spellchecking suggestions won't collate
It must be Friday. I can't figure out why there is no collation value: { "responseHeader":{ "status":0, "QTime":31, "params":{ "spellcheck":"on", "spellcheck.collateParam.qf":"BUS_BUSINESS_NAME", "spellcheck.maxResultsForSuggest":"5", "spellcheck.maxCollations":"3", "spellcheck.maxCollationTries":"30", "qf":"BUS_BUSINESS_NAME_PHRASE", "q.alt":"*:*", "spellcheck.collate":"true", "spellcheck.onlyMorePopular":"false", "defType":"edismax", "debugQuery":"true", "echoParams":"all", "spellcheck.count":"10", "spellcheck.alternativeTermCount":"10", "indent":"true", "q":"Mi Next Promo", "wt":"json"}}, "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[] }, "spellcheck":{ "suggestions":[ "mi",{ "numFound":10, "startOffset":0, "endOffset":2, "suggestion":["mr", "mp", "mid", "mix", "mb", "mj", "my", "md", "mc", "ma"]}, "next",{ "numFound":3, "startOffset":3, "endOffset":7, "suggestion":["nest", "news", "neil"]}, "promo",{ "numFound":4, "startOffset":8, "endOffset":13, "suggestion":["photo", "prime", "pronto", "prof"]}]}, The actual business name is "My Next Promo" which I'm hoping would be the collation value. Thanks, Corey
Re: permissive mm value and efficient spellchecking
ok, thanks a lot, I'll check that out. 2014-05-14 14:20 GMT+02:00 Markus Jelsma : > Elisabeth, i think you are looking for SOLR-3211 that introduced > spellcheck.collateParam.* to override e.g. dismax settings. > > Markus > > -Original message- > From:elisabeth benoit > Sent:Wed 14-05-2014 14:01 > Subject:permissive mm value and efficient spellchecking > To:solr-user@lucene.apache.org; > Hello, > > I'm using solr 4.2.1. > > I use a very permissive value for mm, to be able to find results even if > request contains non relevant words. > > At the same time, I'd like to be able to do some efficient spellcheking > with solrdirectspellchecker. > > So for instance, if user searches for "rue de Chraonne Paris", where > Chraonne is mispelled, because of my permissive mm value I get more than > 100 000 results containing words "rue" and "Paris" ("de" is a stopword), > which are very frequent terms in my index, but no spellcheck correction for > Chraonne. If I set mm=3, then I get the expected spellcheck correction > value: "rue de Charonne Paris". > > Is there a way to achieve my two goals in a single solr request? > > Thanks, > Elisabeth >
permissive mm value and efficient spellchecking
Hello, I'm using solr 4.2.1. I use a very permissive value for mm, to be able to find results even if request contains non relevant words. At the same time, I'd like to be able to do some efficient spellcheking with solrdirectspellchecker. So for instance, if user searches for "rue de Chraonne Paris", where Chraonne is mispelled, because of my permissive mm value I get more than 100 000 results containing words "rue" and "Paris" ("de" is a stopword), which are very frequent terms in my index, but no spellcheck correction for Chraonne. If I set mm=3, then I get the expected spellcheck correction value: "rue de Charonne Paris". Is there a way to achieve my two goals in a single solr request? Thanks, Elisabeth
RE: permissive mm value and efficient spellchecking
Elisabeth, i think you are looking for SOLR-3211 that introduced spellcheck.collateParam.* to override e.g. dismax settings. Markus -Original message- From:elisabeth benoit Sent:Wed 14-05-2014 14:01 Subject:permissive mm value and efficient spellchecking To:solr-user@lucene.apache.org; Hello, I'm using solr 4.2.1. I use a very permissive value for mm, to be able to find results even if request contains non relevant words. At the same time, I'd like to be able to do some efficient spellcheking with solrdirectspellchecker. So for instance, if user searches for "rue de Chraonne Paris", where Chraonne is mispelled, because of my permissive mm value I get more than 100 000 results containing words "rue" and "Paris" ("de" is a stopword), which are very frequent terms in my index, but no spellcheck correction for Chraonne. If I set mm=3, then I get the expected spellcheck correction value: "rue de Charonne Paris". Is there a way to achieve my two goals in a single solr request? Thanks, Elisabeth
RE: Spellchecking - looking for general advice
Got it. Are you also considering Stemming & Phonetic here. For e.g. phonetic may catch some of the restaurant variations and recruiter & recruited may convert to base words and at last spell check would have catch all situation. -Original Message- From: Maciej Dziardziel [mailto:fied...@gmail.com] Sent: Saturday, May 03, 2014 10:15 AM To: solr-user@lucene.apache.org Subject: Re: Spellchecking - looking for general advice Hi I've set it to 2, but python implementation of Levenshtein says its 3 for restraunt -> restaurant. On Sat, May 3, 2014 at 2:44 PM, Susheel Kumar wrote: > How much is the maxEdits you have set. It should catch restaurant example > with edit distance set to 2. > > Thanks, > Susheel > > -Original Message- > From: Maciej Dziardziel [mailto:fied...@gmail.com] > Sent: Friday, May 02, 2014 7:05 PM > To: solr-user@lucene.apache.org > Subject: Spellchecking - looking for general advice > > Hi > > I was looking at spellcheck (Direct and FileBased) and testing that they can > do. > Direct works fine most of the time, but I'd like to find solution for few > corner cases: > > 1) having "recruted" and "recruiter" in index, "recruter" should suggest the > latter. > Obviously the distance to the former is smaller, so it may be completely > arbitrary, > and perhaps must be handled on application side rather then solr. > 2) "restraunt" doesn't suggest "restaurant" - I assume that distance is to > big for that. > > Those are few examples of queries that spellcheck gets (according to my > requirements) wrong. > For now I am just looking at possible solutions and I'd need to come up with > initial concept to have something to show to users and get more feedback, > likely with more cases to correct. > > I'd like to know if there are some tweaks to spellcheck component I could > make (or perhaps other ways of doing this with solr), or am I forced to > hardcode list of all such corrections that go beyond what spellcheck can do? > > One solution I am considering is to put list of those special cases > into FileSpellChecker (it seems to be more relaxed, and handles > restraunt case well) and fall back to Direct if this yields no > results... though I am not sure yet how well that would work in > practice if the list of misspelled words would grow beyond few I have > now. It would most likely woldn't scale > > Another possibility would be to analyze list of queries our users use that > yield little results and check if there is spellchecked version that improves > that... but that seems to require human to review corrections. > > Yet another thing I was thinking about would be to pull terms into separate > spellchecker (like aspell) and see if they do better job or are more > tweakable. > > That's a bit open ended problem, so any advice welcome. > > -- > Maciej Dziardziel > fied...@gmail.com > This e-mail message may contain confidential or legally privileged > information and is intended only for the use of the intended recipient(s). > Any unauthorized disclosure, dissemination, distribution, copying or the > taking of any action in reliance on the information herein is prohibited. > E-mails are not secure and cannot be guaranteed to be error free as they can > be intercepted, amended, or contain viruses. Anyone who communicates with us > by e-mail is deemed to have accepted these risks. The Digital Group is not > responsible for errors or omissions in this message and denies any > responsibility for any damage arising from the use of e-mail. Any opinion > defamatory or deemed to be defamatory or any material which could be > reasonably branded to be a species of plagiarism and other statements > contained in this message and any attachment are solely those of the author > and do not necessarily represent those of the company. -- Maciej Dziardziel fied...@gmail.com This e-mail message may contain confidential or legally privileged information and is intended only for the use of the intended recipient(s). Any unauthorized disclosure, dissemination, distribution, copying or the taking of any action in reliance on the information herein is prohibited. E-mails are not secure and cannot be guaranteed to be error free as they can be intercepted, amended, or contain viruses. Anyone who communicates with us by e-mail is deemed to have accepted these risks. The Digital Group is not responsible for errors or omissions in this message and denies any responsibility for any damage arising from the use of e-mail. Any opinion defamatory or deemed to be defamatory or any material which could be reasonably branded to be a species of plagiarism and other statements contained in this message and any attachment are solely those of the author and do not necessarily represent those of the company.
Re: Spellchecking - looking for general advice
Hi I've set it to 2, but python implementation of Levenshtein says its 3 for restraunt -> restaurant. On Sat, May 3, 2014 at 2:44 PM, Susheel Kumar wrote: > How much is the maxEdits you have set. It should catch restaurant example > with edit distance set to 2. > > Thanks, > Susheel > > -Original Message- > From: Maciej Dziardziel [mailto:fied...@gmail.com] > Sent: Friday, May 02, 2014 7:05 PM > To: solr-user@lucene.apache.org > Subject: Spellchecking - looking for general advice > > Hi > > I was looking at spellcheck (Direct and FileBased) and testing that they can > do. > Direct works fine most of the time, but I'd like to find solution for few > corner cases: > > 1) having "recruted" and "recruiter" in index, "recruter" should suggest the > latter. > Obviously the distance to the former is smaller, so it may be completely > arbitrary, > and perhaps must be handled on application side rather then solr. > 2) "restraunt" doesn't suggest "restaurant" - I assume that distance is to > big for that. > > Those are few examples of queries that spellcheck gets (according to my > requirements) wrong. > For now I am just looking at possible solutions and I'd need to come up with > initial concept to have something to show to users and get more feedback, > likely with more cases to correct. > > I'd like to know if there are some tweaks to spellcheck component I could > make (or perhaps other ways of doing this with solr), or am I forced to > hardcode list of all such corrections that go beyond what spellcheck can do? > > One solution I am considering is to put list of those special cases into > FileSpellChecker (it seems to be more relaxed, and handles restraunt case > well) and fall back to Direct if this yields no results... though I am not > sure yet how well that would work in practice if the list of misspelled words > would grow beyond few I have now. It would most likely woldn't scale > > Another possibility would be to analyze list of queries our users use that > yield little results and check if there is spellchecked version that improves > that... but that seems to require human to review corrections. > > Yet another thing I was thinking about would be to pull terms into separate > spellchecker (like aspell) and see if they do better job or are more > tweakable. > > That's a bit open ended problem, so any advice welcome. > > -- > Maciej Dziardziel > fied...@gmail.com > This e-mail message may contain confidential or legally privileged > information and is intended only for the use of the intended recipient(s). > Any unauthorized disclosure, dissemination, distribution, copying or the > taking of any action in reliance on the information herein is prohibited. > E-mails are not secure and cannot be guaranteed to be error free as they can > be intercepted, amended, or contain viruses. Anyone who communicates with us > by e-mail is deemed to have accepted these risks. The Digital Group is not > responsible for errors or omissions in this message and denies any > responsibility for any damage arising from the use of e-mail. Any opinion > defamatory or deemed to be defamatory or any material which could be > reasonably branded to be a species of plagiarism and other statements > contained in this message and any attachment are solely those of the author > and do not necessarily represent those of the company. -- Maciej Dziardziel fied...@gmail.com
RE: Spellchecking - looking for general advice
How much is the maxEdits you have set. It should catch restaurant example with edit distance set to 2. Thanks, Susheel -Original Message- From: Maciej Dziardziel [mailto:fied...@gmail.com] Sent: Friday, May 02, 2014 7:05 PM To: solr-user@lucene.apache.org Subject: Spellchecking - looking for general advice Hi I was looking at spellcheck (Direct and FileBased) and testing that they can do. Direct works fine most of the time, but I'd like to find solution for few corner cases: 1) having "recruted" and "recruiter" in index, "recruter" should suggest the latter. Obviously the distance to the former is smaller, so it may be completely arbitrary, and perhaps must be handled on application side rather then solr. 2) "restraunt" doesn't suggest "restaurant" - I assume that distance is to big for that. Those are few examples of queries that spellcheck gets (according to my requirements) wrong. For now I am just looking at possible solutions and I'd need to come up with initial concept to have something to show to users and get more feedback, likely with more cases to correct. I'd like to know if there are some tweaks to spellcheck component I could make (or perhaps other ways of doing this with solr), or am I forced to hardcode list of all such corrections that go beyond what spellcheck can do? One solution I am considering is to put list of those special cases into FileSpellChecker (it seems to be more relaxed, and handles restraunt case well) and fall back to Direct if this yields no results... though I am not sure yet how well that would work in practice if the list of misspelled words would grow beyond few I have now. It would most likely woldn't scale Another possibility would be to analyze list of queries our users use that yield little results and check if there is spellchecked version that improves that... but that seems to require human to review corrections. Yet another thing I was thinking about would be to pull terms into separate spellchecker (like aspell) and see if they do better job or are more tweakable. That's a bit open ended problem, so any advice welcome. -- Maciej Dziardziel fied...@gmail.com This e-mail message may contain confidential or legally privileged information and is intended only for the use of the intended recipient(s). Any unauthorized disclosure, dissemination, distribution, copying or the taking of any action in reliance on the information herein is prohibited. E-mails are not secure and cannot be guaranteed to be error free as they can be intercepted, amended, or contain viruses. Anyone who communicates with us by e-mail is deemed to have accepted these risks. The Digital Group is not responsible for errors or omissions in this message and denies any responsibility for any damage arising from the use of e-mail. Any opinion defamatory or deemed to be defamatory or any material which could be reasonably branded to be a species of plagiarism and other statements contained in this message and any attachment are solely those of the author and do not necessarily represent those of the company.
Spellchecking - looking for general advice
Hi I was looking at spellcheck (Direct and FileBased) and testing that they can do. Direct works fine most of the time, but I'd like to find solution for few corner cases: 1) having "recruted" and "recruiter" in index, "recruter" should suggest the latter. Obviously the distance to the former is smaller, so it may be completely arbitrary, and perhaps must be handled on application side rather then solr. 2) "restraunt" doesn't suggest "restaurant" - I assume that distance is to big for that. Those are few examples of queries that spellcheck gets (according to my requirements) wrong. For now I am just looking at possible solutions and I'd need to come up with initial concept to have something to show to users and get more feedback, likely with more cases to correct. I'd like to know if there are some tweaks to spellcheck component I could make (or perhaps other ways of doing this with solr), or am I forced to hardcode list of all such corrections that go beyond what spellcheck can do? One solution I am considering is to put list of those special cases into FileSpellChecker (it seems to be more relaxed, and handles restraunt case well) and fall back to Direct if this yields no results... though I am not sure yet how well that would work in practice if the list of misspelled words would grow beyond few I have now. It would most likely woldn't scale Another possibility would be to analyze list of queries our users use that yield little results and check if there is spellchecked version that improves that... but that seems to require human to review corrections. Yet another thing I was thinking about would be to pull terms into separate spellchecker (like aspell) and see if they do better job or are more tweakable. That's a bit open ended problem, so any advice welcome. -- Maciej Dziardziel fied...@gmail.com
RE: Spellchecking problem
Gastone, You may, at least while developing, specify "spellcheck.collateExtendedResults=true" so you can see for sure it has verified how many hits each collation would return. But my guess is that your "mm" parameter makes pretty much anything return some hits. You might want to specify "spellcheck.collateParam.mm=100%" or something like that to restrict collations to only those queries that return hits if all the terms were required. See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collateParam.XX . James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Gastone Penzo [mailto:gastone.pe...@gmail.com] Sent: Friday, December 20, 2013 8:38 AM To: solr-user@lucene.apache.org Subject: Re: Spellchecking problem Thank you for your answer. this is the querystring http://seshat:9000/solr/browse/?q=otto+maialotto&fq=shelf:GIO&qf=ean^0 title^0.0035 authors^0 publisher^0 series^0 contributors^0 characters^0 manufacturer^0 actors^0 directors^0 tags^0 category_label^0 &pf=ean^0 title^0.0035 authors^0 publisher^0 series^0 contributors^0 characters^0 manufacturer^0 actors^0 directors^0 tags^0 category_label^0&spellcheck=true&spellcheck.collate=true&spellcheck.maxCollationTries=10&spellcheck.q=otto+il+maialotto&mm=2%3C-1+5%3C80%25& shelf is the field that rappresent the typology of product and GIO is the typology (games) the problem is the collation the result gives ( Otto il polpo ) is the name of another product typology (Book) why? the result is this. 5 0 17 0 otto il polpo 2 gigetto il maialetto vol.0 2 sotto il mare vol.0 2 sotto il mare 2 otto il rinoceronte 2 true (otto il polpo) this is the conf: textSpell default spellcheckdef spellchecker on false true 6 true .001 Thanks 2013/12/20 Dyer, James > If you are using "spellcheck.maxCollateTries" with a value greater than 0 > the *collatation* section of your spellcheck response will give query > corrections that are proven to produce hits. Possibly you were looking at > the first section where it gives individual word suggestions? Or maybe one > of your query parameters is misspelled (check case and that you have > "spellcheck." in front of all of them)? If you can't figure it out, > provide us the entire query string you're using, the spellcheck response > you get back and also the relevant portions of solrconfig.xml. > > James Dyer > Ingram Content Group > (615) 213-4311 > > > -Original Message- > From: Gastone Penzo [mailto:gastone.pe...@gmail.com] > Sent: Friday, December 20, 2013 7:43 AM > To: solr-user@lucene.apache.org > Subject: Spellchecking problem > > Hello, > > i have problem with spellchecking. > i use solr to index an ecommerce products (dvd, cd, books ecc) > the collation is only one but in the index there'is the field: typology (of > product) > When i build spellchecking indexes, they are build together. > How can i have only suggestsions of one typology? > > i read that if i user spellcheck.collate=true and i maxcollatetries > 0, > solr evaluates every suggestion with fq parameter of the query. In my query > i have for example fq=typology:book > but it doesn't works. why? > > i also tried collationparameter.fq=typology:book > the same > > i use solr 4.3 > thank you > > > -- > *Gastone Penzo* > > -- *Gastone Penzo*
Re: Spellchecking problem
Thank you for your answer. this is the querystring http://seshat:9000/solr/browse/?q=otto+maialotto&fq=shelf:GIO&qf=ean^0 title^0.0035 authors^0 publisher^0 series^0 contributors^0 characters^0 manufacturer^0 actors^0 directors^0 tags^0 category_label^0 &pf=ean^0 title^0.0035 authors^0 publisher^0 series^0 contributors^0 characters^0 manufacturer^0 actors^0 directors^0 tags^0 category_label^0&spellcheck=true&spellcheck.collate=true&spellcheck.maxCollationTries=10&spellcheck.q=otto+il+maialotto&mm=2%3C-1+5%3C80%25& shelf is the field that rappresent the typology of product and GIO is the typology (games) the problem is the collation the result gives ( Otto il polpo ) is the name of another product typology (Book) why? the result is this. 5 0 17 0 otto il polpo 2 gigetto il maialetto vol.0 2 sotto il mare vol.0 2 sotto il mare 2 otto il rinoceronte 2 true (otto il polpo) this is the conf: textSpell default spellcheckdef spellchecker on false true 6 true .001 Thanks 2013/12/20 Dyer, James > If you are using "spellcheck.maxCollateTries" with a value greater than 0 > the *collatation* section of your spellcheck response will give query > corrections that are proven to produce hits. Possibly you were looking at > the first section where it gives individual word suggestions? Or maybe one > of your query parameters is misspelled (check case and that you have > "spellcheck." in front of all of them)? If you can't figure it out, > provide us the entire query string you're using, the spellcheck response > you get back and also the relevant portions of solrconfig.xml. > > James Dyer > Ingram Content Group > (615) 213-4311 > > > -Original Message- > From: Gastone Penzo [mailto:gastone.pe...@gmail.com] > Sent: Friday, December 20, 2013 7:43 AM > To: solr-user@lucene.apache.org > Subject: Spellchecking problem > > Hello, > > i have problem with spellchecking. > i use solr to index an ecommerce products (dvd, cd, books ecc) > the collation is only one but in the index there'is the field: typology (of > product) > When i build spellchecking indexes, they are build together. > How can i have only suggestsions of one typology? > > i read that if i user spellcheck.collate=true and i maxcollatetries > 0, > solr evaluates every suggestion with fq parameter of the query. In my query > i have for example fq=typology:book > but it doesn't works. why? > > i also tried collationparameter.fq=typology:book > the same > > i use solr 4.3 > thank you > > > -- > *Gastone Penzo* > > -- *Gastone Penzo*
RE: Spellchecking problem
If you are using "spellcheck.maxCollateTries" with a value greater than 0 the *collatation* section of your spellcheck response will give query corrections that are proven to produce hits. Possibly you were looking at the first section where it gives individual word suggestions? Or maybe one of your query parameters is misspelled (check case and that you have "spellcheck." in front of all of them)? If you can't figure it out, provide us the entire query string you're using, the spellcheck response you get back and also the relevant portions of solrconfig.xml. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Gastone Penzo [mailto:gastone.pe...@gmail.com] Sent: Friday, December 20, 2013 7:43 AM To: solr-user@lucene.apache.org Subject: Spellchecking problem Hello, i have problem with spellchecking. i use solr to index an ecommerce products (dvd, cd, books ecc) the collation is only one but in the index there'is the field: typology (of product) When i build spellchecking indexes, they are build together. How can i have only suggestsions of one typology? i read that if i user spellcheck.collate=true and i maxcollatetries > 0, solr evaluates every suggestion with fq parameter of the query. In my query i have for example fq=typology:book but it doesn't works. why? i also tried collationparameter.fq=typology:book the same i use solr 4.3 thank you -- *Gastone Penzo*
Spellchecking problem
Hello, i have problem with spellchecking. i use solr to index an ecommerce products (dvd, cd, books ecc) the collation is only one but in the index there'is the field: typology (of product) When i build spellchecking indexes, they are build together. How can i have only suggestsions of one typology? i read that if i user spellcheck.collate=true and i maxcollatetries > 0, solr evaluates every suggestion with fq parameter of the query. In my query i have for example fq=typology:book but it doesn't works. why? i also tried collationparameter.fq=typology:book the same i use solr 4.3 thank you -- *Gastone Penzo*
Re: Spellchecking
Thank you!! 2013/9/20 Dyer, James > If you're using "spellcheck.collate" you can also set > "spellcheck.maxCollationTries" to validate each collation against the index > before suggesting it. This validation takes into account any "fq" > parameters on your query, so if your original query has "fq=Product:Book", > then the collations returned will all be vetted by internally running the > query with that filter applied. > > If for some reason your main query does not have "fq=Product:Book", but > you want it considered when collations are being built, you can include > "spellcheck.collateParam.fq=Product:Book". > > See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collateand > following sections. > > James Dyer > Ingram Content Group > (615) 213-4311 > > > -Original Message- > From: Gastone Penzo [mailto:gastone.pe...@gmail.com] > Sent: Friday, September 20, 2013 4:00 AM > To: solr-user@lucene.apache.org > Subject: Spellchecking > > Hi, > i'd like to know if is it possibile to have suggests only of a part of > indexes. > for example: > > an ecommerce: > there are a lot of typologies of products (book, dvd, cd..) > > if i search inside books, i want only suggests of books products, not cds > but the spellchecking indexs are all together. > > is it possibile to divided indexes or have suggests only of a typology? > > thanx > > -- > Gastone > > -- *Gastone Penzo* * *
RE: Spellchecking
If you're using "spellcheck.collate" you can also set "spellcheck.maxCollationTries" to validate each collation against the index before suggesting it. This validation takes into account any "fq" parameters on your query, so if your original query has "fq=Product:Book", then the collations returned will all be vetted by internally running the query with that filter applied. If for some reason your main query does not have "fq=Product:Book", but you want it considered when collations are being built, you can include "spellcheck.collateParam.fq=Product:Book". See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate and following sections. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Gastone Penzo [mailto:gastone.pe...@gmail.com] Sent: Friday, September 20, 2013 4:00 AM To: solr-user@lucene.apache.org Subject: Spellchecking Hi, i'd like to know if is it possibile to have suggests only of a part of indexes. for example: an ecommerce: there are a lot of typologies of products (book, dvd, cd..) if i search inside books, i want only suggests of books products, not cds but the spellchecking indexs are all together. is it possibile to divided indexes or have suggests only of a typology? thanx -- Gastone
Spellchecking
Hi, i'd like to know if is it possibile to have suggests only of a part of indexes. for example: an ecommerce: there are a lot of typologies of products (book, dvd, cd..) if i search inside books, i want only suggests of books products, not cds but the spellchecking indexs are all together. is it possibile to divided indexes or have suggests only of a typology? thanx -- Gastone
Re: Synonym Expansion in Spellchecking Field Solr 4.3.1
Hi All, I didn't have the lucene-solr source compiling cleaning in eclipse initially so I created a very quick maven project to demonstrate this issue: https://github.com/rainkinz/solr_spellcheck_index_out_of_bounds.git Having said that I just got everything set up in eclipse, so I can create a test case if this is actually an issue and not something weird with my configuration. Thanks Brendan On Thu, Aug 15, 2013 at 1:43 PM, Brendan Grainger < brendan.grain...@gmail.com> wrote: > Further to this. If I change: > > tpms,service tire monitor,tire monitor,tire pressure monitor,tire pressure > monitoring system,tpm,low tire warning,tire pressure monitor system > > to > > service tire monitor,tire monitor,tire pressure monitor,tire pressure > monitoring system,tpm,low tire warning,tire pressure monitor system,tpms > > I don't get a crash. I tried it with some other fields too. e.g.: > > asdm,airbag system diagnostic module => crash > > airbag system diagnostic module,asdm => no crash > > Thanks > Brendan > > > > On Thu, Aug 15, 2013 at 1:37 PM, Brendan Grainger < > brendan.grain...@gmail.com> wrote: > >> Hi All, >> >> I've been debugging an issue where the query 'tpms' would make the >> spellchecker throw the following exception: >> >> 21021 [qtp91486057-17] ERROR org.apache.solr.servlet.SolrDispatchFilter >> – null:java.lang.StringIndexOutOfBoundsException: String index out of >> range: -1 >> at >> java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:789) >> at java.lang.StringBuilder.replace(StringBuilder.java:266) >> at >> org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:190) >> at >> org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:75) >> >> >> I have the following synonyms defined for tpms: >> >> tpms,service tire monitor,tire monitor,tire pressure monitor,tire >> pressure monitoring system,tpm,low tire warning,tire pressure monitor system >> >> Note that if you query any of the other synonyms there is no issue, only >> tpms. >> >> Looking at my field definition for my spellchecker I realized I am doing >> query time synonym expansion: >> >> > positionIncrementGap="100" omitNorms="true"> >> >> >> > ignoreCase="true" >> words="lang/stopwords_en.txt" >> enablePositionIncrements="true" >> /> >> >> >> >> >> >> > ignoreCase="true" expand="true"/> >> > ignoreCase="true" >> words="lang/stopwords_en.txt" >> enablePositionIncrements="true" >> /> >> >> >> >> >> >> I copied this field definition from: >> http://wiki.apache.org/solr/SpellCheckingAnalysis. As the issue seemed >> related to synonyms I removed the SynonymFilterFactory and everything >> works. >> >> I'm going to try to create a reproducible test case for the crash, but >> right now I'm wondering what I lose by not having synonym expansion when >> spell checking? >> >> Thanks >> Brendan >> >> >> > > > -- > Brendan Grainger > www.kuripai.com > -- Brendan Grainger www.kuripai.com
Re: Synonym Expansion in Spellchecking Field Solr 4.3.1
Further to this. If I change: tpms,service tire monitor,tire monitor,tire pressure monitor,tire pressure monitoring system,tpm,low tire warning,tire pressure monitor system to service tire monitor,tire monitor,tire pressure monitor,tire pressure monitoring system,tpm,low tire warning,tire pressure monitor system,tpms I don't get a crash. I tried it with some other fields too. e.g.: asdm,airbag system diagnostic module => crash airbag system diagnostic module,asdm => no crash Thanks Brendan On Thu, Aug 15, 2013 at 1:37 PM, Brendan Grainger < brendan.grain...@gmail.com> wrote: > Hi All, > > I've been debugging an issue where the query 'tpms' would make the > spellchecker throw the following exception: > > 21021 [qtp91486057-17] ERROR org.apache.solr.servlet.SolrDispatchFilter – > null:java.lang.StringIndexOutOfBoundsException: String index out of range: > -1 > at > java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:789) > at java.lang.StringBuilder.replace(StringBuilder.java:266) > at > org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:190) > at > org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:75) > > > I have the following synonyms defined for tpms: > > tpms,service tire monitor,tire monitor,tire pressure monitor,tire pressure > monitoring system,tpm,low tire warning,tire pressure monitor system > > Note that if you query any of the other synonyms there is no issue, only > tpms. > > Looking at my field definition for my spellchecker I realized I am doing > query time synonym expansion: > > positionIncrementGap="100" omitNorms="true"> > > > ignoreCase="true" > words="lang/stopwords_en.txt" > enablePositionIncrements="true" > /> > > > > > > ignoreCase="true" expand="true"/> > ignoreCase="true" > words="lang/stopwords_en.txt" > enablePositionIncrements="true" > /> > > > > > > I copied this field definition from: > http://wiki.apache.org/solr/SpellCheckingAnalysis. As the issue seemed > related to synonyms I removed the SynonymFilterFactory and everything > works. > > I'm going to try to create a reproducible test case for the crash, but > right now I'm wondering what I lose by not having synonym expansion when > spell checking? > > Thanks > Brendan > > > -- Brendan Grainger www.kuripai.com
Synonym Expansion in Spellchecking Field Solr 4.3.1
Hi All, I've been debugging an issue where the query 'tpms' would make the spellchecker throw the following exception: 21021 [qtp91486057-17] ERROR org.apache.solr.servlet.SolrDispatchFilter – null:java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:789) at java.lang.StringBuilder.replace(StringBuilder.java:266) at org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:190) at org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:75) I have the following synonyms defined for tpms: tpms,service tire monitor,tire monitor,tire pressure monitor,tire pressure monitoring system,tpm,low tire warning,tire pressure monitor system Note that if you query any of the other synonyms there is no issue, only tpms. Looking at my field definition for my spellchecker I realized I am doing query time synonym expansion: I copied this field definition from: http://wiki.apache.org/solr/SpellCheckingAnalysis. As the issue seemed related to synonyms I removed the SynonymFilterFactory and everything works. I'm going to try to create a reproducible test case for the crash, but right now I'm wondering what I lose by not having synonym expansion when spell checking? Thanks Brendan
Two questions on spellchecking
Hi, even though I read a lot, none of my spellchecker configurations works really well. I reached a dead end. Maybe someone could help, to solve my challenges. - How can I get case sensitive suggestions, independent of the given case in the query? - How to configure a 'did you mean' spellchecking, as discussed in https://issues.apache.org/jira/browse/SOLR-2585 (Context-Sensitive Spelling Suggestions & Collations) I'm using following environment: - Solr 4.0-alpha (downloaded 25. June) - Java 7 - schema.xml > ... - solrconfig.xml (suggester) all true suggester true false 20 suggester suggester org.apache.solr.spelling.suggest.Suggester org.apache.solr.spelling.suggest.tst.TSTLookup suggest - solrconfig.xml (spellcheck) all 10 allfields true false 20 spellcheck > textSpell default suggest solr.DirectSolrSpellChecker internal 0.1 2 1 5 1 0.1 0.001 *Suggester problem* With this configuration the suggester works not case sensitive, but the hints are all lower case. Example: .../hint?q=da&wt=xml&spellcheck=true&spellcheck.build=true 0173truealltruesuggester20falsetruedaxmltruebuild2002dat-marktspiegel spezialdata structures with c++ using stldata warehousedatan, ingeborgdatenbanken mit delphidatenverschlüsselungdauner, gabrieledautermann, margitdavid copperfielddavid, horstdav id, leodavid, nicholasdavis, charles t.davis, edward ldavis, leslie dorfmandavis, stanley m.davor kommt nochdavydova, irina n.dawidowski, bernddayan, danielfalse Using just solr.StrField as field type, the suggestion are true to original capitalization, but I get no suggestions, if the query starts with a lower case character. *Spelling problem* One of the indexed entries in the field 'suggest' is "David Copperfield" and I want this string as alternative suggestion to the query "David opperfield". Example .../select?q="david+opperfield"&rows=0&wt=xml&spellcheck=true 015allfieldsalltrue20false0true"david opperfield"xml0false .../select?q=david+opperfield&rows=0&wt=xml&spellcheck=true --> true =?8-) Uwe Btw. Is there a DirectSolrSuggester corresponding to DirectSolrSpellChecker?
Re: Solr spellchecking fails on sharded query
Hi, it seems the shards suggestion working fine if i set the select RH as follow ( instead of ) explicit 10 spellcheck Now suggestion is populated! Fabio 2012/6/22 fabio curti > I did as you suggest enabling "spellcheck" component in select RH. > > > >explicit >10 > > > spellcheck > > > > Response contains error 500 > > > 500 > 29 > > file > true > > > fc:8900/solr/commenti,fc:7500/solr/commenti,fc:8584/solr/commenti,fc:7574/solr/commenti > > piza > piza > > > > > java.lang.NullPointerException at > org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:819) > at > org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:626) > at > org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:605) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:307) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1566) at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111) > at org.eclipse.jetty.server.Server.handle(Server.java:351) at > org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454) > at > org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47) > at > org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:890) > at > org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:944) > at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:634) at > org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230) at > org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66) > at > org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534) > at java.lang.Thread.run(Thread.java:679) > > 500 > > > > Fabio > > 2012/6/22 Markus Jelsma > >> Hi, >> >> The spellcheck component must be enabled in your default request handler >> otherwise your suggestions list is empty. >> >> Cheers, >> >> >> >> -Original message- >> > From:fabio curti >> > Sent: Fri 22-Jun-2012 09:34 >> > To: solr-user@lucene.apache.org >> > Subject: Re: Solr spellchecking fails on sharded query >> > >> > Hi, >> > i try solr shards configuration ( SolrCloud ) and request settings as >> > suggested in >> > >> http://wiki.apache.org/solr/SpellCheckComponent#Distributed_Search_Supportfor >> > shards spelling. >> > Suggestion is empty as Eric said. >> > >> > Any idea? >> > >> > Fabio >> > >> > 2012/6/19 Eric Wilson >> > >> > > I have a Solr application that is distributed into 11 shards, using >> Solr >> > > version 4.0.0.2011.07.26.16.34.16 >> > > >> > > In the solrconfig.xml for e
Re: Solr spellchecking fails on sharded query
I did as you suggest enabling "spellcheck" component in select RH. explicit 10 spellcheck Response contains error 500 500 29 file true fc:8900/solr/commenti,fc:7500/solr/commenti,fc:8584/solr/commenti,fc:7574/solr/commenti piza piza java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:819) at org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:626) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:605) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:307) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1566) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111) at org.eclipse.jetty.server.Server.handle(Server.java:351) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:890) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:944) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:634) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534) at java.lang.Thread.run(Thread.java:679) 500 Fabio 2012/6/22 Markus Jelsma > Hi, > > The spellcheck component must be enabled in your default request handler > otherwise your suggestions list is empty. > > Cheers, > > > > -Original message- > > From:fabio curti > > Sent: Fri 22-Jun-2012 09:34 > > To: solr-user@lucene.apache.org > > Subject: Re: Solr spellchecking fails on sharded query > > > > Hi, > > i try solr shards configuration ( SolrCloud ) and request settings as > > suggested in > > > http://wiki.apache.org/solr/SpellCheckComponent#Distributed_Search_Supportfor > > shards spelling. > > Suggestion is empty as Eric said. > > > > Any idea? > > > > Fabio > > > > 2012/6/19 Eric Wilson > > > > > I have a Solr application that is distributed into 11 shards, using > Solr > > > version 4.0.0.2011.07.26.16.34.16 > > > > > > In the solrconfig.xml for each shard, I have configured a spellcheck > > > component: > > > > > > > > > > > > textSpell > > > > > > > > > > > >cn_spell > > > > > >company_name_spell > > > > > >0.0001 > > > > > >true > > > > > >./spellchecker_cn_spell > > > > > > > > > > > > > > > > > > I have built the dictionary for each shard, and verified that each > shard > > > will return suggestions for misspellings. Moreover, it is evident that > a > > > different dictionary is being used for the various shards. > > > > > > The problem comes when I submit a sharded query. In that case the > result > > > comes back with the following: > > > > > > > > > > > > > > > > > > In other words, the list of words for which there are suggestions is > empty. > > > > > > Is there a trick to sharded spellchecking? I appreciate any > suggestions. > > > > > > Eric > > > > > >
RE: Solr spellchecking fails on sharded query
Hi, The spellcheck component must be enabled in your default request handler otherwise your suggestions list is empty. Cheers, -Original message- > From:fabio curti > Sent: Fri 22-Jun-2012 09:34 > To: solr-user@lucene.apache.org > Subject: Re: Solr spellchecking fails on sharded query > > Hi, > i try solr shards configuration ( SolrCloud ) and request settings as > suggested in > http://wiki.apache.org/solr/SpellCheckComponent#Distributed_Search_Support for > shards spelling. > Suggestion is empty as Eric said. > > Any idea? > > Fabio > > 2012/6/19 Eric Wilson > > > I have a Solr application that is distributed into 11 shards, using Solr > > version 4.0.0.2011.07.26.16.34.16 > > > > In the solrconfig.xml for each shard, I have configured a spellcheck > > component: > > > > > > > > textSpell > > > > > > > >cn_spell > > > >company_name_spell > > > >0.0001 > > > >true > > > >./spellchecker_cn_spell > > > > > > > > > > > > I have built the dictionary for each shard, and verified that each shard > > will return suggestions for misspellings. Moreover, it is evident that a > > different dictionary is being used for the various shards. > > > > The problem comes when I submit a sharded query. In that case the result > > comes back with the following: > > > > > > > > > > > > In other words, the list of words for which there are suggestions is empty. > > > > Is there a trick to sharded spellchecking? I appreciate any suggestions. > > > > Eric > > >
Re: Solr spellchecking fails on sharded query
Hi, i try solr shards configuration ( SolrCloud ) and request settings as suggested in http://wiki.apache.org/solr/SpellCheckComponent#Distributed_Search_Support for shards spelling. Suggestion is empty as Eric said. Any idea? Fabio 2012/6/19 Eric Wilson > I have a Solr application that is distributed into 11 shards, using Solr > version 4.0.0.2011.07.26.16.34.16 > > In the solrconfig.xml for each shard, I have configured a spellcheck > component: > > > > textSpell > > > >cn_spell > >company_name_spell > >0.0001 > >true > >./spellchecker_cn_spell > > > > > > I have built the dictionary for each shard, and verified that each shard > will return suggestions for misspellings. Moreover, it is evident that a > different dictionary is being used for the various shards. > > The problem comes when I submit a sharded query. In that case the result > comes back with the following: > > > > > > In other words, the list of words for which there are suggestions is empty. > > Is there a trick to sharded spellchecking? I appreciate any suggestions. > > Eric >
Re: Solr spellchecking fails on sharded query
Hi, i found this article about your issue. http://wiki.apache.org/solr/SpellCheckComponent#Distributed_Search_Support Fabio 2012/6/19 Eric Wilson > I have a Solr application that is distributed into 11 shards, using Solr > version 4.0.0.2011.07.26.16.34.16 > > In the solrconfig.xml for each shard, I have configured a spellcheck > component: > > > > textSpell > > > >cn_spell > >company_name_spell > >0.0001 > >true > >./spellchecker_cn_spell > > > > > > I have built the dictionary for each shard, and verified that each shard > will return suggestions for misspellings. Moreover, it is evident that a > different dictionary is being used for the various shards. > > The problem comes when I submit a sharded query. In that case the result > comes back with the following: > > > > > > In other words, the list of words for which there are suggestions is empty. > > Is there a trick to sharded spellchecking? I appreciate any suggestions. > > Eric >
Solr spellchecking fails on sharded query
I have a Solr application that is distributed into 11 shards, using Solr version 4.0.0.2011.07.26.16.34.16 In the solrconfig.xml for each shard, I have configured a spellcheck component: textSpell cn_spell company_name_spell 0.0001 true ./spellchecker_cn_spell I have built the dictionary for each shard, and verified that each shard will return suggestions for misspellings. Moreover, it is evident that a different dictionary is being used for the various shards. The problem comes when I submit a sharded query. In that case the result comes back with the following: In other words, the list of words for which there are suggestions is empty. Is there a trick to sharded spellchecking? I appreciate any suggestions. Eric
spellchecking in nutch solr
Hello, I have tried to implement spellchecker based on index in nutch-solr by adding spell field to schema.xml and making it a copy from content field. However, this increased data folder size twice and spell filed as a copy of content field appears in xml feed which is not necessary. Is it possible to implement spellchecker without this issue? Thanks. Alex.
SolR : Spellchecking & Autocomplete
Hello, I posted on the Lucene Forums, and someone told me to e-mail it here. Instead of writing again my question here, I take the liberty to link my post. Its about SolR, autocompletion, Spellchecking and "case-sentivieness" (?). http://lucene.472066.n3.nabble.com/SolR-Spellchecking-amp-Autocomplete-td3243107.html Thanks for all, Valentin
Re: Problem with spellchecking, dont want multiple request to SOLR
What should the query look like?? I can't define 2 spellchecker in one query. I want something like this: Search: Soccerclub(what) Manchester(where) select/?q=socerclub macnchester&spellcheck=true&spellcheck.dictionary=spell_what&spellcheck.dictionary=spell_where&spell_what=socerclub&spell_where=macnchester Now i have 2 spellcheckers in my requesthandler but i can't set them correct in my query. My config looks like this: spellcheck1 spellcheck2 spell_what spell_search1 true spellchecker1 spell_where spell_search2 true spellchecker2 -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-spellchecking-dont-want-multiple-request-to-SOLR-tp2988167p3147545.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem with spellchecking, dont want multiple request to SOLR
Hi, Define two searchComponents with different names. Then refer to both in in your Search Request Handler config. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 27. mai 2011, at 10.01, roySolr wrote: > mm ok. I configure 2 spellcheckers: > > > > spell_what > spell_what > true > spellchecker_what > > > spell_where > spell_where > true > spellchecker_where > > > > How can i enable it in my search request handler and search both in one > request? > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Problem-with-spellchecking-dont-want-multiple-request-to-SOLR-tp2988167p2992076.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem with spellchecking, dont want multiple request to SOLR
mm ok. I configure 2 spellcheckers: spell_what spell_what true spellchecker_what spell_where spell_where true spellchecker_where How can i enable it in my search request handler and search both in one request? -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-spellchecking-dont-want-multiple-request-to-SOLR-tp2988167p2992076.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem with spellchecking, dont want multiple request to SOLR
Yep, it's possible. Setup two spellcheckers, one named "spellwhat" and one named "spellwhere" and enable both on your searchRequestHandler. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 26. mai 2011, at 12.04, roySolr wrote: > Hello, > > First i will explain my situation. I have a 2 fields on my website: What and > Where. > When a user search i want spellcheck on both fields. Now i have 2 > dictionaries, one for > what and one for where. I want to search with one request and spellcheck > both fields. Is > it possible and how? > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Problem-with-spellchecking-dont-want-multiple-request-to-SOLR-tp2988167p2988167.html > Sent from the Solr - User mailing list archive at Nabble.com.
Problem with spellchecking, dont want multiple request to SOLR
Hello, First i will explain my situation. I have a 2 fields on my website: What and Where. When a user search i want spellcheck on both fields. Now i have 2 dictionaries, one for what and one for where. I want to search with one request and spellcheck both fields. Is it possible and how? -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-spellchecking-dont-want-multiple-request-to-SOLR-tp2988167p2988167.html Sent from the Solr - User mailing list archive at Nabble.com.
auto-completion with suggester and spellchecking
Hi, I would like to get suggestions that correspond to spelling correction in case there are typing mistakes in the typed characters. I found a similar post but with no answer http://lucene.472066.n3.nabble.com/Solr-suggester-and-spell-checker-td2326907.html And I have some questions about how to use Solr suggester component for autocompletion and spellchecking at the same time. 1) Does Solr can use the same spellcheck dictionary (that is based upon the main index) for autocompletion and spellchecking? 2) In solrconfig.xml, should I configure a "suggest" search Component AND a "spellcheck" component? OR a single search component would be sufficient? any example of configuration would be appreciated. 3) Which parameters should be used in the query? I The following query: http://localhost:8983/solr/position/suggest?q=ing&qt=/suggest&onlyMorePopular=true returns no suggestions in case of typing mistake. Thank you in advance for yr time Best wishes -- Jean-Claude Dauphin jc.daup...@gmail.com jc.daup...@afus.unesco.org http://kenai.com/projects/j-isis/ http://www.unesco.org/isis/ http://www.unesco.org/idams/ http://www.greenstone.org
Re: Spellchecking in the Chinese Lanugage
Thanks Otis and Luke. Yes it does make sense to spellcheck phrases in Chinese. Looks like the default Solr spellCheck component is already doing some kind of NGram-ing. When examining the spellCheck index, I did see gram1, gram2, gram3, gram4... The problem is no Chinese terms were indexed into the spellChecker index, only English terms. Regards, Alex -- View this message in context: http://lucene.472066.n3.nabble.com/Spellchecking-in-the-Chinese-Lanugage-tp2812726p2813149.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Spellchecking in the Chinese Lanugage
It doesn't make sense to spell check individual character sized words, but makes a lot of sense for phrases. Due to pervasive use of pinyin IM, it's very easy to write phrases that are totally wrong in semantics and but "sounds" correct. n-gram should work if it doesn't mangle the characters. On Tue, Apr 12, 2011 at 12:47 PM, Otis Gospodnetic wrote: > Hi, > > Does spellchecking in Chinese actually make sense? I once asked a native > Chinese speaker about that and the person told me it didn't really make sense. > Anyhow, with n-grams, I don't think this could technically work even if it > made > sense for Chinese, could it? > > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > - Original Message >> From: alexw >> To: solr-user@lucene.apache.org >> Sent: Tue, April 12, 2011 3:07:48 PM >> Subject: Spellchecking in the Chinese Lanugage >> >> Hi, >> >> I have been trying to get spellcheck to work in the Chinese language. So far >> I have not had any luck. Can someone shed some light here as a general guide >> line in terms of what need to happen? >> >> I am using the CJKAnalyzer in the text field type and searching works fine, >> but spelling does not work. Here are the things I have tried: >> >> 1. Put CJKAnalyzer in the "textSpell" field type. >> 2. Set the characterEncoding param to "utf-8" in the spellcheck search >> component. >> 3. Using Luke, I can see the Chinese characters in the "spell" field in the >> main index. >> 4. After building the spelling index, I don't see Chinese characters in the >> "spellchecker" index, only terms in English. >> 5. Tried adding the NGramFilterFactory to the CJKAnalyzer with no luck >> either. >> >> Thanks! >> >> >> -- >> View this message in context: >>http://lucene.472066.n3.nabble.com/Spellchecking-in-the-Chinese-Lanugage-tp2812726p2812726.html >> >> Sent from the Solr - User mailing list archive at Nabble.com. >> >
Re: Spellchecking in the Chinese Lanugage
Hi, Does spellchecking in Chinese actually make sense? I once asked a native Chinese speaker about that and the person told me it didn't really make sense. Anyhow, with n-grams, I don't think this could technically work even if it made sense for Chinese, could it? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: alexw > To: solr-user@lucene.apache.org > Sent: Tue, April 12, 2011 3:07:48 PM > Subject: Spellchecking in the Chinese Lanugage > > Hi, > > I have been trying to get spellcheck to work in the Chinese language. So far > I have not had any luck. Can someone shed some light here as a general guide > line in terms of what need to happen? > > I am using the CJKAnalyzer in the text field type and searching works fine, > but spelling does not work. Here are the things I have tried: > > 1. Put CJKAnalyzer in the "textSpell" field type. > 2. Set the characterEncoding param to "utf-8" in the spellcheck search > component. > 3. Using Luke, I can see the Chinese characters in the "spell" field in the > main index. > 4. After building the spelling index, I don't see Chinese characters in the > "spellchecker" index, only terms in English. > 5. Tried adding the NGramFilterFactory to the CJKAnalyzer with no luck > either. > > Thanks! > > > -- > View this message in context: >http://lucene.472066.n3.nabble.com/Spellchecking-in-the-Chinese-Lanugage-tp2812726p2812726.html > > Sent from the Solr - User mailing list archive at Nabble.com. >
Spellchecking in the Chinese Lanugage
Hi, I have been trying to get spellcheck to work in the Chinese language. So far I have not had any luck. Can someone shed some light here as a general guide line in terms of what need to happen? I am using the CJKAnalyzer in the text field type and searching works fine, but spelling does not work. Here are the things I have tried: 1. Put CJKAnalyzer in the "textSpell" field type. 2. Set the characterEncoding param to "utf-8" in the spellcheck search component. 3. Using Luke, I can see the Chinese characters in the "spell" field in the main index. 4. After building the spelling index, I don't see Chinese characters in the "spellchecker" index, only terms in English. 5. Tried adding the NGramFilterFactory to the CJKAnalyzer with no luck either. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Spellchecking-in-the-Chinese-Lanugage-tp2812726p2812726.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Spellchecking Escaped Queries
Apologies for the duplicate post. I'm having Evolution problems > Thanks Chris, > > The field used for indexing and spellcheck is the same and is > configured like this:.. > > > class="solr.TextField" > > > > ignoreCase="true" expand="true"/> > > pattern="^([^!]+)\!([^!]+)$" > replacement="$1i$2" > replace="all"/> > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="0" catenateAll="1" splitOnCaseChange="1" > preserveOriginal="1"/> > > > > > > I use the pattern replace filter to swap all instances of "!" within a > word to "i". I know this part is working correctly as performing a > search works correctly. > > The spellcheck is initialized like this: > > > >title > > default > searchfield > ./spellchecker > false > > > > And is attached to as a component to my search handler. > > Thanks, > > Colin > > > > : I'm having an issue performing a spellcheck on some information and > > : search of the archive isn't helping. > > > > For this type of quesiton, there's not much feedback anyone can offer w/o > > knowing exactly what analyzers you have configured for hte various > > fieldtypes (both the field you index/search and the fieldtype used for > > spellchecking) > > > > it's also fairly critical to know how you have the spellcheck component > > configured. > > > > off the cuff: i'd guess that maybe WordDelimiterFilter is being used in a > > wonky way given your usecase -- but like i said: would need to see the > > configs to make a guess. > > > > > > -Hoss > > > > __ > > This email has been scanned by the MessageLabs Email Security System. > > For more information please visit http://www.messagelabs.com/email > > __ > > > -- > > > Colin Vipurs > Server Team Lead > > Shazam Entertainment Ltd > 26-28 Hammersmith Grove, London W6 7HA > m: +44 (0) 000 000 t: +44 (0) 20 8742 6820 > w:www.shazam.com > > Please consider the environment before printing this document > > This e-mail and its contents are strictly private and confidential. It > must not be disclosed, distributed or copied without our prior > consent. If you have received this transmission in error, please > notify Shazam Entertainment immediately on: +44 (0) 020 8742 6820 and > then delete it from your system. Please note that the information > contained herein shall additionally constitute Confidential > Information for the purposes of any NDA between the recipient/s and > Shazam Entertainment. Shazam Entertainment Limited is incorporated in > England and Wales under company number 3998831 and its registered > office is at 26-28 Hammersmith Grove, London W6 7HA. > > > > > __ > This email has been scanned by the MessageLabs Email Security System. > For more information please visit http://www.messagelabs.com/email > __ > > __ > This email has been scanned by the MessageLabs Email Security System. > For more information please visit http://www.messagelabs.com/email > __ -- Colin Vipurs Server Team Lead Shazam Entertainment Ltd 26-28 Hammersmith Grove, London W6 7HA m: +44 (0) 000 000 t: +44 (0) 20 8742 6820 w:www.shazam.com Please consider the environment before printing this document This e-mail and its contents are strictly private and confidential. It must not be disclosed, distributed or copied without our prior consent. If you have received this transmission in error, please notify Shazam Entertainment immediately on: +44 (0) 020 8742 6820 and then delete it from your system. Please note that the information contained herein shall additionally constitute Confidential Information for the purposes of any NDA between the recipient/s and Shazam Entertainment. Shazam Entertainment Limited is incorporated in England and Wales under company number 3998831 and its registered office is at 26-28 Hammersmith Grove, London W6 7HA. __ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email __
Re: Spellchecking Escaped Queries
Thanks Chris, The field used for indexing and spellcheck is the same and is configured like this:.. I use the pattern replace filter to swap all instances of "!" within a word to "i". I know this part is working correctly as performing a search works correctly. The spellcheck is initialized like this: title default searchfield ./spellchecker false This is attached as a component to my search handler and spellchecking is done inline with the queries. Thanks, Colin > : I'm having an issue performing a spellcheck on some information and > : search of the archive isn't helping. > > For this type of quesiton, there's not much feedback anyone can offer w/o > knowing exactly what analyzers you have configured for hte various > fieldtypes (both the field you index/search and the fieldtype used for > spellchecking) > > it's also fairly critical to know how you have the spellcheck component > configured. > > off the cuff: i'd guess that maybe WordDelimiterFilter is being used in a > wonky way given your usecase -- but like i said: would need to see the > configs to make a guess. > > > -Hoss > > __ > This email has been scanned by the MessageLabs Email Security System. > For more information please visit http://www.messagelabs.com/email > __ -- Colin Vipurs Server Team Lead Shazam Entertainment Ltd 26-28 Hammersmith Grove, London W6 7HA m: +44 (0) 000 000 t: +44 (0) 20 8742 6820 w:www.shazam.com Please consider the environment before printing this document This e-mail and its contents are strictly private and confidential. It must not be disclosed, distributed or copied without our prior consent. If you have received this transmission in error, please notify Shazam Entertainment immediately on: +44 (0) 020 8742 6820 and then delete it from your system. Please note that the information contained herein shall additionally constitute Confidential Information for the purposes of any NDA between the recipient/s and Shazam Entertainment. Shazam Entertainment Limited is incorporated in England and Wales under company number 3998831 and its registered office is at 26-28 Hammersmith Grove, London W6 7HA. __ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email __
Re: Spellchecking Escaped Queries
Thanks Chris, The field used for indexing and spellcheck is the same and is configured like this:.. I use the pattern replace filter to swap all instances of "!" within a word to "i". I know this part is working correctly as performing a search works correctly. The spellcheck is initialized like this: title default searchfield ./spellchecker false And is attached to as a component to my search handler. Thanks, Colin > : I'm having an issue performing a spellcheck on some information and > : search of the archive isn't helping. > > For this type of quesiton, there's not much feedback anyone can offer w/o > knowing exactly what analyzers you have configured for hte various > fieldtypes (both the field you index/search and the fieldtype used for > spellchecking) > > it's also fairly critical to know how you have the spellcheck component > configured. > > off the cuff: i'd guess that maybe WordDelimiterFilter is being used in a > wonky way given your usecase -- but like i said: would need to see the > configs to make a guess. > > > -Hoss > > __ > This email has been scanned by the MessageLabs Email Security System. > For more information please visit http://www.messagelabs.com/email > __ -- Colin Vipurs Server Team Lead Shazam Entertainment Ltd 26-28 Hammersmith Grove, London W6 7HA m: +44 (0) 000 000 t: +44 (0) 20 8742 6820 w:www.shazam.com Please consider the environment before printing this document This e-mail and its contents are strictly private and confidential. It must not be disclosed, distributed or copied without our prior consent. If you have received this transmission in error, please notify Shazam Entertainment immediately on: +44 (0) 020 8742 6820 and then delete it from your system. Please note that the information contained herein shall additionally constitute Confidential Information for the purposes of any NDA between the recipient/s and Shazam Entertainment. Shazam Entertainment Limited is incorporated in England and Wales under company number 3998831 and its registered office is at 26-28 Hammersmith Grove, London W6 7HA. __ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email __
Re: Spellchecking Escaped Queries
: I'm having an issue performing a spellcheck on some information and : search of the archive isn't helping. For this type of quesiton, there's not much feedback anyone can offer w/o knowing exactly what analyzers you have configured for hte various fieldtypes (both the field you index/search and the fieldtype used for spellchecking) it's also fairly critical to know how you have the spellcheck component configured. off the cuff: i'd guess that maybe WordDelimiterFilter is being used in a wonky way given your usecase -- but like i said: would need to see the configs to make a guess. -Hoss
Spellchecking Escaped Queries
I'm having an issue performing a spellcheck on some information and search of the archive isn't helping. I'm indexing the word "p!nk" (yes, that's a bang in there), and have a replacement filter setup so that the ! becomes i. Looking at the analyzer the right thing is happening with both the indexer and query mapping to "pink". When I ask switch on spelling suggestions I get a suggestion of "p!pink" which just seems odd. When I make a request for something like "rink", I get the correct suggestion of "pink", but asking for "r!nk", I get a suggestion of "r! pink". It seems like the spellcheck component isn't quite doing the right thing somewhere. I'm running 1.4.1 with the https://issues.apache.org/jira/browse/SOLR-1553 patch applied for the edismax query parser. Thanks, Colin -- Colin Vipurs Server Team Lead Shazam Entertainment Ltd 26-28 Hammersmith Grove, London W6 7HA m: +44 (0) 000 000 t: +44 (0) 20 8742 6820 w:www.shazam.com Please consider the environment before printing this document This e-mail and its contents are strictly private and confidential. It must not be disclosed, distributed or copied without our prior consent. If you have received this transmission in error, please notify Shazam Entertainment immediately on: +44 (0) 020 8742 6820 and then delete it from your system. Please note that the information contained herein shall additionally constitute Confidential Information for the purposes of any NDA between the recipient/s and Shazam Entertainment. Shazam Entertainment Limited is incorporated in England and Wales under company number 3998831 and its registered office is at 26-28 Hammersmith Grove, London W6 7HA. __ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email __
Re: Spellchecking with some misspelled words in index source
You have to correct the misspelled terms in your content to work properly because spell checker will find the term and supposed as right term. spell checker will return suggestion when word not found in its dictionary. - Thanx: Grijesh http://lucidimagination.com -- View this message in context: http://lucene.472066.n3.nabble.com/Spellchecking-with-some-misspelled-words-in-index-source-tp2505722p2507110.html Sent from the Solr - User mailing list archive at Nabble.com.
Spellchecking with some misspelled words in index source
I'm building my spellcheck index from my content and it seems to be working, but my problem is that there are a few misspelled words in my content. For example: the word Sheriff is improperly misspelled Sherrif in my content a couple dozen times (but spelled correctly a couple thousand times). The results of the spellcheck at first glance indicate that the word is spelled correctly because it is found in the spellcheck dictionary and has valid search results. Adding a spellcheck.onlyMorePopular=true to the query results in the spellcheck returning additional suggestions, but none of them are for the correct spelling of the word: sherriff 10 sherri 2319 sherril 155 sherif 19 sherric 4 is this just a strange glitch in my spellcheck dictionary based on my content? What is strange, is sending the spellcheck sherriff (which is another misspelling that has results in the index) results in the spellcheck sending back the correct spelling as the top result.
RE: spellchecking even the key is true....
Add spellcheck.onlyMorePopular=true to your query and I think it'll do what you want. See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.onlyMorePopular for more info. One caveat is if you use spellcheck.collate, this will likely result in useless, nonsensical collations most of the time. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: satya swaroop [mailto:satya.yada...@gmail.com] Sent: Monday, January 17, 2011 10:32 AM To: solr-user@lucene.apache.org Subject: spellchecking even the key is true Hi All, can we get the spellchecking results even when the keyword is true. As for spellchecking will give only to the wrong keywords, cant we get similar and near words of the keyword though the spellcheck.q is true.. as an example http://localhost:8080/solr/spellcheck?q=java&spellcheck=true&spellcheck.count=5 the result will be 1)- - can we get the result as 2) - javax javac javabean javascript NOTE:: all the keywords in the 2nd result is are in index... Regards, satya
spellchecking even the key is true....
Hi All, can we get the spellchecking results even when the keyword is true. As for spellchecking will give only to the wrong keywords, cant we get similar and near words of the keyword though the spellcheck.q is true.. as an example http://localhost:8080/solr/spellcheck?q=java&spellcheck=true&spellcheck.count=5 the result will be 1)- - can we get the result as 2) - javax javac javabean javascript NOTE:: all the keywords in the 2nd result is are in index... Regards, satya
Re: Spellchecking and frequency
I therefore wrote an implementation of SolrSpellChecker that wraps jazzy, the java aspell library. I also extended the SpellCheckComponent to take the matrix of suggested words and query the corpus to find the first combination of suggestions which returned a match. This works well for my use case, where term frequency is irrelevant to spelling or scoring. This is interesting to me. I also have not been that happy with standard solr spellcheck. In addition to possibly filing a JIRA for future fix to Solr itself, another option would be you could make your 'alternate' SpellCheck component available as a seperate .jar, so anyone could use it just by installing and specifying it in their solrconfig.xml. I would encourage you to consider that, not as a replacement for suggesting a patch to Solr itself, but so people can use your improved spellchecker immediately, without waiting for possible Solr patches. Jonathan
Re: Spellchecking and frequency
Hi Mark, Thanks for that info looks very interesting, would be great to see your code. Out of interest did you use the dictionary and the phonetic file? Did you see better results with both? In regards to the secondary part to check the corpus for matching suggestions, would another way to do this is to have an event listener to listen for commits, and then build the dictionary for matching corpus words that way, then you avoid the performance hit at query time. Cheers, Dan On Tue, Jul 27, 2010 at 7:04 PM, Mark Holland wrote: > Hi, > > I found the suggestions returned from the standard solr spellcheck not to > be > that relevant. By contrast, aspell, given the same dictionary and mispelled > words, gives much more accurate suggestions. > > I therefore wrote an implementation of SolrSpellChecker that wraps jazzy, > the java aspell library. I also extended the SpellCheckComponent to take > the > matrix of suggested words and query the corpus to find the first > combination > of suggestions which returned a match. This works well for my use case, > where term frequency is irrelevant to spelling or scoring. > > I'd like to publish the code in case someone finds it useful (although it's > a bit crude at the moment and will need a decent tidy up). Would it be > appropriate to open up a Jira issue for this? > > Cheers, > ~mark > > On 27 July 2010 09:33, dan sutton wrote: > > > Hi, > > > > I've recently been looking into Spellchecking in solr, and was struck by > > how > > limited the usefulness of the tool was. > > > > Like most corpora , ours contains lots of different spelling mistakes for > > the same word, so the 'spellcheck.onlyMorePopular' is not really that > > useful > > unless you click on it numerous times. > > > > I was thinking that since most of the time people spell words correctly > why > > was there no other frequency parameter that could enter into the score? > > i.e. > > something like: > > > > spell_score ~ edit_dist * freq > > > > I'm sure others have come across this issue and was wonding what > > steps/algorithms they have used to overcome these limitations? > > > > Cheers, > > Dan > > >
Re: Spellchecking and frequency
"Yonik's Law of Patches" reads: "A half-baked patch in Jira, with no documentation, no tests and no backwards compatibilty is better than no patch at all." It'd be perfectly appropriate, IMO, for you to post an outline of what your enhancements do over on the SOLR dev list and get a reaction from the folks over there as to whether it should be a Jira or not... see solr-...@lucene.apache.org Best Erick On Tue, Jul 27, 2010 at 2:04 PM, Mark Holland wrote: > Hi, > > I found the suggestions returned from the standard solr spellcheck not to > be > that relevant. By contrast, aspell, given the same dictionary and mispelled > words, gives much more accurate suggestions. > > I therefore wrote an implementation of SolrSpellChecker that wraps jazzy, > the java aspell library. I also extended the SpellCheckComponent to take > the > matrix of suggested words and query the corpus to find the first > combination > of suggestions which returned a match. This works well for my use case, > where term frequency is irrelevant to spelling or scoring. > > I'd like to publish the code in case someone finds it useful (although it's > a bit crude at the moment and will need a decent tidy up). Would it be > appropriate to open up a Jira issue for this? > > Cheers, > ~mark > > On 27 July 2010 09:33, dan sutton wrote: > > > Hi, > > > > I've recently been looking into Spellchecking in solr, and was struck by > > how > > limited the usefulness of the tool was. > > > > Like most corpora , ours contains lots of different spelling mistakes for > > the same word, so the 'spellcheck.onlyMorePopular' is not really that > > useful > > unless you click on it numerous times. > > > > I was thinking that since most of the time people spell words correctly > why > > was there no other frequency parameter that could enter into the score? > > i.e. > > something like: > > > > spell_score ~ edit_dist * freq > > > > I'm sure others have come across this issue and was wonding what > > steps/algorithms they have used to overcome these limitations? > > > > Cheers, > > Dan > > >
RE: Spellchecking and frequency
Mark, I'd like to see your code if you open a JIRA for this. I recently opened SOLR-2010 with a patch that does something similar to the second part only of what you describe (find combinations that actually return a match). But I'm not sure if my approach is the best one so I would like to see yours to compare. James Dyer E-Commerce Systems Ingram Book Company (615) 213-4311 -Original Message- From: Mark Holland [mailto:mark.holl...@zoopla.co.uk] Sent: Tuesday, July 27, 2010 1:04 PM To: solr-user@lucene.apache.org Subject: Re: Spellchecking and frequency Hi, I found the suggestions returned from the standard solr spellcheck not to be that relevant. By contrast, aspell, given the same dictionary and mispelled words, gives much more accurate suggestions. I therefore wrote an implementation of SolrSpellChecker that wraps jazzy, the java aspell library. I also extended the SpellCheckComponent to take the matrix of suggested words and query the corpus to find the first combination of suggestions which returned a match. This works well for my use case, where term frequency is irrelevant to spelling or scoring. I'd like to publish the code in case someone finds it useful (although it's a bit crude at the moment and will need a decent tidy up). Would it be appropriate to open up a Jira issue for this? Cheers, ~mark On 27 July 2010 09:33, dan sutton wrote: > Hi, > > I've recently been looking into Spellchecking in solr, and was struck by > how > limited the usefulness of the tool was. > > Like most corpora , ours contains lots of different spelling mistakes for > the same word, so the 'spellcheck.onlyMorePopular' is not really that > useful > unless you click on it numerous times. > > I was thinking that since most of the time people spell words correctly why > was there no other frequency parameter that could enter into the score? > i.e. > something like: > > spell_score ~ edit_dist * freq > > I'm sure others have come across this issue and was wonding what > steps/algorithms they have used to overcome these limitations? > > Cheers, > Dan >
Re: Spellchecking and frequency
Hi, I found the suggestions returned from the standard solr spellcheck not to be that relevant. By contrast, aspell, given the same dictionary and mispelled words, gives much more accurate suggestions. I therefore wrote an implementation of SolrSpellChecker that wraps jazzy, the java aspell library. I also extended the SpellCheckComponent to take the matrix of suggested words and query the corpus to find the first combination of suggestions which returned a match. This works well for my use case, where term frequency is irrelevant to spelling or scoring. I'd like to publish the code in case someone finds it useful (although it's a bit crude at the moment and will need a decent tidy up). Would it be appropriate to open up a Jira issue for this? Cheers, ~mark On 27 July 2010 09:33, dan sutton wrote: > Hi, > > I've recently been looking into Spellchecking in solr, and was struck by > how > limited the usefulness of the tool was. > > Like most corpora , ours contains lots of different spelling mistakes for > the same word, so the 'spellcheck.onlyMorePopular' is not really that > useful > unless you click on it numerous times. > > I was thinking that since most of the time people spell words correctly why > was there no other frequency parameter that could enter into the score? > i.e. > something like: > > spell_score ~ edit_dist * freq > > I'm sure others have come across this issue and was wonding what > steps/algorithms they have used to overcome these limitations? > > Cheers, > Dan >
Spellchecking and frequency
Hi, I've recently been looking into Spellchecking in solr, and was struck by how limited the usefulness of the tool was. Like most corpora , ours contains lots of different spelling mistakes for the same word, so the 'spellcheck.onlyMorePopular' is not really that useful unless you click on it numerous times. I was thinking that since most of the time people spell words correctly why was there no other frequency parameter that could enter into the score? i.e. something like: spell_score ~ edit_dist * freq I'm sure others have come across this issue and was wonding what steps/algorithms they have used to overcome these limitations? Cheers, Dan
Re: PECL and Spellchecking
Hi Peter A full list of spell check parameters are available here http://wiki.apache.org/solr/SpellCheckComponent With the PECL extension, there is currently no special method that handles the spell check component so you would have to use the SolrParams::set() or SolrParams::setParam() method available from the SolrQuery class (a child of the SolrParams class) Below is the code snippet : $options = array of options for solr client (name => value pairs); see SolrClient::__construct() $client = new SolrClient($options); $spellcheck_component_name = 'spell'; $client->setServlet(SolrClient::SEARCH_SERVLET_TYPE, $spellcheck_component_name); $q = new SolrQuery(); $q->set($param_name, $param_value); $q->setParam($param_name, $param_value); $q->set('spellcheck', 'true'); $q->set('spellcheck.q', 'pecl'); $q->set('spellcheck.build', 'true'); $response = $client->query($q); That should do it. I hope this helps. On Wed, May 5, 2010 at 4:56 AM, Peter Gabriel wrote: > Hi there, > > i´m working with the solr-pecl extension and asking me how I to permanently > activate spellchecking. > I couldn´t find a command from the pecl library to activate it by the > client - like $solrQuery->enableFacet(true) for factes. > > Or is it possible to keep spellchecking permanently activate by solrconfig? > Without using the "&spellcheck=true" parameters? > > Would be nice if someone could help me. > > Thx and greetings, > Peter > -- > GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT! > Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01 > -- "Good Enough" is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
PECL and Spellchecking
Hi there, i´m working with the solr-pecl extension and asking me how I to permanently activate spellchecking. I couldn´t find a command from the pecl library to activate it by the client - like $solrQuery->enableFacet(true) for factes. Or is it possible to keep spellchecking permanently activate by solrconfig? Without using the "&spellcheck=true" parameters? Would be nice if someone could help me. Thx and greetings, Peter -- GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01
Re: SpellChecking
Hi, thanks, exactly that i forgot. Now it works fine. :-) Am 03.05.2010 16:50, schrieb Michael Kuhlmann: Am 03.05.2010 16:43, schrieb Jan Kammer: Hi, It worked fine with a normal field. There must something wrong with copyfield, or why does dataimporthandler add/update no more documents? Did you define your destination field as multivalue? -Michael
Re: SpellChecking
Am 03.05.2010 16:43, schrieb Jan Kammer: > Hi, > > It worked fine with a normal field. There must something wrong with > copyfield, or why does dataimporthandler add/update no more documents? Did you define your destination field as multivalue? -Michael
Re: SpellChecking
Hi, i build the index with ...&spellcheck.build=true It worked fine with a normal field. There must something wrong with copyfield, or why does dataimporthandler add/update no more documents? Can somebody paste the code for copyfield with many fields? Greetz, Jan Am 03.05.2010 16:36, schrieb Villemos, Gert: We are using copy fields for 40+ fields to do spelling, and it works fine. Are you sure that you actually build the spell index before you try to do spelling? You need to either configure SOLr to build spell index on commit, or manually issue a spell index build request. Regards, Gert. -Original Message- From: Jan Kammer [mailto:jan.kam...@mni.fh-giessen.de] Sent: Montag, 3. Mai 2010 16:26 To: solr-user@lucene.apache.org Subject: Re: SpellChecking Hi, if I define one of my normal fields from schema.xml in solrconfig.xml for spellchecking all works fine: ... That didnt work, because nothing was in "spell" after that. Next try was to copy each field in a line to "spell": ... This does work up to 3 documents, if i define more, the count for failed documents in dataimporthandler gets higher and higher the more i copy into "spell". 16444 So my question is, if this is the right way to use the spellchecker with many fields, or is there an other "better" way... thanks. greetz, Jan Am 03.05.2010 16:08, schrieb Erick Erickson: It would help a lot to see your actual config file, and if you provided a bit more detail about what failure looks like Best Erick On Mon, May 3, 2010 at 9:43 AM, Jan Kammerwrote: Hi there, I want to enable spellchecking, but i got many fields. I tried around with copyfield to copy all with "*" in one field, but that didnt work. Next try was to copy some fields specified each by name in one field named "spell", but that worked only for 2 or 3 fields, but not for 10 or more... My question is, what the best practice is to enable spellchecking on many fields. thanks. greetz, Jan Please help Logica to respect the environment by not printing this email / Pour contribuer comme Logica au respect de l'environnement, merci de ne pas imprimer ce mail / Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei, die Umwelt zu schützen. / Por favor ajude a Logica a respeitar o ambiente nao imprimindo este correio electronico. This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
RE: SpellChecking
We are using copy fields for 40+ fields to do spelling, and it works fine. Are you sure that you actually build the spell index before you try to do spelling? You need to either configure SOLr to build spell index on commit, or manually issue a spell index build request. Regards, Gert. -Original Message- From: Jan Kammer [mailto:jan.kam...@mni.fh-giessen.de] Sent: Montag, 3. Mai 2010 16:26 To: solr-user@lucene.apache.org Subject: Re: SpellChecking Hi, if I define one of my normal fields from schema.xml in solrconfig.xml for spellchecking all works fine: ... That didnt work, because nothing was in "spell" after that. Next try was to copy each field in a line to "spell": ... This does work up to 3 documents, if i define more, the count for failed documents in dataimporthandler gets higher and higher the more i copy into "spell". 16444 So my question is, if this is the right way to use the spellchecker with many fields, or is there an other "better" way... thanks. greetz, Jan Am 03.05.2010 16:08, schrieb Erick Erickson: > It would help a lot to see your actual config file, and if you provided a > bit more > detail about what failure looks like > > Best > Erick > > On Mon, May 3, 2010 at 9:43 AM, Jan Kammerwrote: > > >> Hi there, >> >> I want to enable spellchecking, but i got many fields. >> >> I tried around with copyfield to copy all with "*" in one field, but that >> didnt work. >> Next try was to copy some fields specified each by name in one field named >> "spell", but that worked only for 2 or 3 fields, but not for 10 or more... >> >> My question is, what the best practice is to enable spellchecking on many >> fields. >> >> thanks. >> >> greetz, Jan >> >> > Please help Logica to respect the environment by not printing this email / Pour contribuer comme Logica au respect de l'environnement, merci de ne pas imprimer ce mail / Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei, die Umwelt zu schützen. / Por favor ajude a Logica a respeitar o ambiente nao imprimindo este correio electronico. This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
Re: SpellChecking
Hi, if I define one of my normal fields from schema.xml in solrconfig.xml for spellchecking all works fine: ... That didnt work, because nothing was in "spell" after that. Next try was to copy each field in a line to "spell": ... This does work up to 3 documents, if i define more, the count for failed documents in dataimporthandler gets higher and higher the more i copy into "spell". 16444 So my question is, if this is the right way to use the spellchecker with many fields, or is there an other "better" way... thanks. greetz, Jan Am 03.05.2010 16:08, schrieb Erick Erickson: It would help a lot to see your actual config file, and if you provided a bit more detail about what failure looks like Best Erick On Mon, May 3, 2010 at 9:43 AM, Jan Kammerwrote: Hi there, I want to enable spellchecking, but i got many fields. I tried around with copyfield to copy all with "*" in one field, but that didnt work. Next try was to copy some fields specified each by name in one field named "spell", but that worked only for 2 or 3 fields, but not for 10 or more... My question is, what the best practice is to enable spellchecking on many fields. thanks. greetz, Jan
Re: SpellChecking
It would help a lot to see your actual config file, and if you provided a bit more detail about what failure looks like Best Erick On Mon, May 3, 2010 at 9:43 AM, Jan Kammer wrote: > Hi there, > > I want to enable spellchecking, but i got many fields. > > I tried around with copyfield to copy all with "*" in one field, but that > didnt work. > Next try was to copy some fields specified each by name in one field named > "spell", but that worked only for 2 or 3 fields, but not for 10 or more... > > My question is, what the best practice is to enable spellchecking on many > fields. > > thanks. > > greetz, Jan >
SpellChecking
Hi there, I want to enable spellchecking, but i got many fields. I tried around with copyfield to copy all with "*" in one field, but that didnt work. Next try was to copy some fields specified each by name in one field named "spell", but that worked only for 2 or 3 fields, but not for 10 or more... My question is, what the best practice is to enable spellchecking on many fields. thanks. greetz, Jan
Re: Spellchecking - Is there a way to do this?
Another thing you might check into is stemming. The Porter stemmer included in Solr is "aggressive", meaning that it will tend to do weird things with misspellings. There is a different stemmer called KStem which is available from www.lucidimagination.com/Downloads is less aggressive. Porter turns "changes" and "changing" into "chang", while KStem does not go this far. http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem On Thu, Dec 17, 2009 at 12:59 PM, Lance Norskog wrote: > Character-based NGrams are a good tool for this problem. MLT is a > document-wide numerical analysis. > > If the common types of OCR mistakes are different than what NGrams > create, you might tune the ngram generator. For example, swapping > letters might not happen very often. SIngle- and multi-word errors > must happen a lot. > > If you do a facet query on your indexed terms, you will get a lot of > facets with only one appearance in the index. These are often > misspellings. It is possible to automate pulling these and creating a > matching set of synonyms for words that appear in the spelling index. > > On Tue, Dec 15, 2009 at 12:57 PM, Chris Hostetter > wrote: >> >> : My first problem appears because I need suggestions inclusive when the >> : expression has returned results. It's seems that only appear >> : suggestions when there are no results. Is there a way to do so? >> >> can you give us an example of what your queries look like? with the >> example configs, i can get matches, as well as suggestions... >> >> >> http://localhost:8983/solr/spell?q=ide&spellcheck=true >> >> : The second question is: For the purposes that I've mentioned, is the >> : best way to use spellchecker or mlt component? Or some other (as a >> : fuzzy query)? >> >> there's no clear cut answer to that -- i don't remember anyone else ever >> asking about anything particularly similar to what you're doing, so i >> don't know that there is any precident for a "best" way to go about it. >> >> >> >> -Hoss >> >> > > > > -- > Lance Norskog > goks...@gmail.com > -- Lance Norskog goks...@gmail.com
Re: Spellchecking - Is there a way to do this?
Character-based NGrams are a good tool for this problem. MLT is a document-wide numerical analysis. If the common types of OCR mistakes are different than what NGrams create, you might tune the ngram generator. For example, swapping letters might not happen very often. SIngle- and multi-word errors must happen a lot. If you do a facet query on your indexed terms, you will get a lot of facets with only one appearance in the index. These are often misspellings. It is possible to automate pulling these and creating a matching set of synonyms for words that appear in the spelling index. On Tue, Dec 15, 2009 at 12:57 PM, Chris Hostetter wrote: > > : My first problem appears because I need suggestions inclusive when the > : expression has returned results. It's seems that only appear > : suggestions when there are no results. Is there a way to do so? > > can you give us an example of what your queries look like? with the > example configs, i can get matches, as well as suggestions... > > > http://localhost:8983/solr/spell?q=ide&spellcheck=true > > : The second question is: For the purposes that I've mentioned, is the > : best way to use spellchecker or mlt component? Or some other (as a > : fuzzy query)? > > there's no clear cut answer to that -- i don't remember anyone else ever > asking about anything particularly similar to what you're doing, so i > don't know that there is any precident for a "best" way to go about it. > > > > -Hoss > > -- Lance Norskog goks...@gmail.com
Re: Spellchecking - Is there a way to do this?
: My first problem appears because I need suggestions inclusive when the : expression has returned results. It's seems that only appear : suggestions when there are no results. Is there a way to do so? can you give us an example of what your queries look like? with the example configs, i can get matches, as well as suggestions... http://localhost:8983/solr/spell?q=ide&spellcheck=true : The second question is: For the purposes that I've mentioned, is the : best way to use spellchecker or mlt component? Or some other (as a : fuzzy query)? there's no clear cut answer to that -- i don't remember anyone else ever asking about anything particularly similar to what you're doing, so i don't know that there is any precident for a "best" way to go about it. -Hoss
Re: Re: Solr Cell and Spellchecking.
I just resolved the issue (fresh coffee == good) ! In my schema, I had added: but missed the copyField definition. Adding these: and a restart and everything is working properly. Thanks for the reply and for LucidImagination -- the only reason I have been able to get Solr integrated into our ruby app. -Mike
Re: Solr Cell and Spellchecking.
What's your schema and your config look like for the various relevant pieces? On Dec 8, 2009, at 8:04 PM, Michael Boyle wrote: > Following Eric Hatcher's post about using SolrCell and acts_as_solr { > http://www.lucidimagination.com/blog/2009/02/17/acts_as_solr_cell/ }, I have > been able to index a rich document stream and retrieve it's id. No worries. > > However, I have the SpellCheckComponent setup to build on commit > (buildOnCommit=true). Alas, the rich document text is not being added to the > spellchecker dictionary. > > Is there something special I need to do within the SolrConfig.xml or within > the acts_as_solr ruby classes? > > - thanks in advance for any ideas - > > Mike Boyle -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Solr Cell and Spellchecking.
Following Eric Hatcher's post about using SolrCell and acts_as_solr { http://www.lucidimagination.com/blog/2009/02/17/acts_as_solr_cell/ }, I have been able to index a rich document stream and retrieve it's id. No worries. However, I have the SpellCheckComponent setup to build on commit (buildOnCommit=true). Alas, the rich document text is not being added to the spellchecker dictionary. Is there something special I need to do within the SolrConfig.xml or within the acts_as_solr ruby classes? - thanks in advance for any ideas - Mike Boyle
Spellchecking - Is there a way to do this?
Hello everybody 1. Have tons of digitalized text with the logical errors in OCR process 2. Have indexed with Solr and is working OK. 3. Have added spellchecker index-based for words and phrases with the hope to offer suggestions with "suspicious" possible new query expressions, or related query expressions to the actual one with the intention to find documents that have the original expression but contains OCR errors (the user originally have search for "state and democracy" and the interface will offer "stete and demcraci" as an alternate query expression) My first problem appears because I need suggestions inclusive when the expression has returned results. It's seems that only appear suggestions when there are no results. Is there a way to do so? The second question is: For the purposes that I've mentioned, is the best way to use spellchecker or mlt component? Or some other (as a fuzzy query)? Thanks a lot German
Re: spellchecking multiple fields?
and the caveat that all fields would need to be declared in the solrconfig.xml (or get used for both fields) this could work... would also need to augment the response with the name of the dictionary, or assert that something will be written all the time (so you could know the 2nd would be for the 2nd configured dictionary. On Jul 16, 2008, at 8:06 AM, Grant Ingersoll wrote: Another thought that might work: Declare two separate components, one for each field and then implement a QueryConverter that takes in the field and only extracts the tokens for the field or choice. This is a definite workaround, but I think it might work. Hmm, except we only have one QueryConverter -Grant On Jul 15, 2008, at 8:56 PM, Ryan McKinley wrote: I have a use case where I want to spellcheck the input query across multiple fields: Did you mean: location = washington vs Did you mean: person = washington The current parameter / response structure for the spellcheck component does not support this kind of thing. Any thoughts on how/ if the component should handle this? Perhaps it could be in a requestHandler where the params are passed in as json? spelling={ dictionary="location", onlyMorePopular=true}&spelling={ dictionary="person", onlyMorePopular=false } Thoughts? ryan
Re: spellchecking multiple fields?
Another thought that might work: Declare two separate components, one for each field and then implement a QueryConverter that takes in the field and only extracts the tokens for the field or choice. This is a definite workaround, but I think it might work. Hmm, except we only have one QueryConverter -Grant On Jul 15, 2008, at 8:56 PM, Ryan McKinley wrote: I have a use case where I want to spellcheck the input query across multiple fields: Did you mean: location = washington vs Did you mean: person = washington The current parameter / response structure for the spellcheck component does not support this kind of thing. Any thoughts on how/ if the component should handle this? Perhaps it could be in a requestHandler where the params are passed in as json? spelling={ dictionary="location", onlyMorePopular=true}&spelling={ dictionary="person", onlyMorePopular=false } Thoughts? ryan
Re: spellchecking multiple fields?
One way would be to create a copyField containing both the fields and use it as the dictionary's source. If you do want to keep separate dictionaries for both the fields then I guess we can introduce per-dictionary overridable parameters like the per-field overridden facet parameters. That would be cleaner than json params. What do you think? On Wed, Jul 16, 2008 at 6:26 AM, Ryan McKinley <[EMAIL PROTECTED]> wrote: > I have a use case where I want to spellcheck the input query across > multiple fields: > Did you mean: location = washington > vs > Did you mean: person = washington > > The current parameter / response structure for the spellcheck component > does not support this kind of thing. Any thoughts on how/if the component > should handle this? Perhaps it could be in a requestHandler where the > params are passed in as json? > > spelling={ dictionary="location", onlyMorePopular=true}&spelling={ > dictionary="person", onlyMorePopular=false } > > Thoughts? > ryan > -- Regards, Shalin Shekhar Mangar.
spellchecking multiple fields?
I have a use case where I want to spellcheck the input query across multiple fields: Did you mean: location = washington vs Did you mean: person = washington The current parameter / response structure for the spellcheck component does not support this kind of thing. Any thoughts on how/if the component should handle this? Perhaps it could be in a requestHandler where the params are passed in as json? spelling={ dictionary="location", onlyMorePopular=true}&spelling={ dictionary="person", onlyMorePopular=false } Thoughts? ryan
Re: Integrated Spellchecking
Posted our patches if anyone wants to take a look: https://issues.apache.org/jira/browse/SOLR-433 Small change to core.RunExecutableListener and all the changes to the shell scripts. All these scripts seem to run fine on RHEL-3 and RHEL-5.1 servers. doug Doug Steigerwald wrote: Sure. I'll try to post it today or tomorrow. Doug Steigerwald Software Developer McClatchy Interactive [EMAIL PROTECTED] 919.861.1287 Otis Gospodnetic wrote: Hey Doug, You have multicore/spellcheck replication going already? We have been working on the replication for multicore. Sounds like we are replicating each others work. When will you be able to attach your stuff to JIRA issue? https://issues.apache.org/jira/browse/SOLR-433 Thanks, Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Doug Steigerwald <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Friday, February 15, 2008 12:45:08 PM Subject: Re: Integrated Spellchecking That unfortunately got pushed aside to work on some of our higher priority solr work since we already had it working one way. Hoping to revisit this after we push to production and start working on new features and share what I've done for this and multicore/spellcheck replication (which we have working quite well in QA right now). Doug Steigerwald Software Developer McClatchy Interactive [EMAIL PROTECTED] 919.861.1287 oleg_gnatovskiy wrote: dsteiger wrote: I've got a couple search components for automatic spell correction that I've been working on. I've converted most of the SpellCheckerRequestHandler to a search component (hopefully will throw a patch out soon for this). Then another search component that will do auto correction for a query if the search returns zero results. We're hoping to see some performance improvements out of handling this in Solr instead of our Rails service. doug Ryan McKinley wrote: Yes -- this is what search components are for! Depending on where you put it in the chain, it could only return spell checked results if there are too few results (or the top score is below some threshold) ryan Grant Ingersoll wrote: Is it feasible to submit a query to any of the various handlers and have it bring back results and spelling suggestions all in one response? Is this something the query components piece would handle, assuming one exists for the spell checker? Thanks, Grant So have you succeeded in implementing this patch? I'd definitely like to use this functionality as a search suggestion.
Re: Integrated Spellchecking
Allocating some time to this next week. Need to try and remember what issues I was having when I stopped working on it. doug Matthew Runo wrote: I'd have to agree with this. I'd probably be able to put a bit of work into it as well, as it's something we'd use for sure if it were available. Thanks! Matthew Runo Software Developer Zappos.com 702.943.7833 On Feb 18, 2008, at 6:09 AM, Grant Ingersoll wrote: Hey Doug, If you have permission to donate, perhaps you can just post the patch anyway and state that it isn't quite ready to go. This is something I could use too, and so may have some cycles to work on it. I hate to replicate the work if you already have something that is more or less working. A half baked patch is better than no patch. -Grant On Feb 15, 2008, at 12:45 PM, Doug Steigerwald wrote: That unfortunately got pushed aside to work on some of our higher priority solr work since we already had it working one way. Hoping to revisit this after we push to production and start working on new features and share what I've done for this and multicore/spellcheck replication (which we have working quite well in QA right now). Doug Steigerwald Software Developer McClatchy Interactive [EMAIL PROTECTED] 919.861.1287 oleg_gnatovskiy wrote: dsteiger wrote: I've got a couple search components for automatic spell correction that I've been working on. I've converted most of the SpellCheckerRequestHandler to a search component (hopefully will throw a patch out soon for this). Then another search component that will do auto correction for a query if the search returns zero results. We're hoping to see some performance improvements out of handling this in Solr instead of our Rails service. doug Ryan McKinley wrote: Yes -- this is what search components are for! Depending on where you put it in the chain, it could only return spell checked results if there are too few results (or the top score is below some threshold) ryan Grant Ingersoll wrote: Is it feasible to submit a query to any of the various handlers and have it bring back results and spelling suggestions all in one response? Is this something the query components piece would handle, assuming one exists for the spell checker? Thanks, Grant So have you succeeded in implementing this patch? I'd definitely like to use this functionality as a search suggestion.
Re: Integrated Spellchecking
Sure. I'll try to post it today or tomorrow. Doug Steigerwald Software Developer McClatchy Interactive [EMAIL PROTECTED] 919.861.1287 Otis Gospodnetic wrote: Hey Doug, You have multicore/spellcheck replication going already? We have been working on the replication for multicore. Sounds like we are replicating each others work. When will you be able to attach your stuff to JIRA issue? https://issues.apache.org/jira/browse/SOLR-433 Thanks, Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Doug Steigerwald <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Friday, February 15, 2008 12:45:08 PM Subject: Re: Integrated Spellchecking That unfortunately got pushed aside to work on some of our higher priority solr work since we already had it working one way. Hoping to revisit this after we push to production and start working on new features and share what I've done for this and multicore/spellcheck replication (which we have working quite well in QA right now). Doug Steigerwald Software Developer McClatchy Interactive [EMAIL PROTECTED] 919.861.1287 oleg_gnatovskiy wrote: dsteiger wrote: I've got a couple search components for automatic spell correction that I've been working on. I've converted most of the SpellCheckerRequestHandler to a search component (hopefully will throw a patch out soon for this). Then another search component that will do auto correction for a query if the search returns zero results. We're hoping to see some performance improvements out of handling this in Solr instead of our Rails service. doug Ryan McKinley wrote: Yes -- this is what search components are for! Depending on where you put it in the chain, it could only return spell checked results if there are too few results (or the top score is below some threshold) ryan Grant Ingersoll wrote: Is it feasible to submit a query to any of the various handlers and have it bring back results and spelling suggestions all in one response? Is this something the query components piece would handle, assuming one exists for the spell checker? Thanks, Grant So have you succeeded in implementing this patch? I'd definitely like to use this functionality as a search suggestion.
Re: Integrated Spellchecking
Hey Doug, You have multicore/spellcheck replication going already? We have been working on the replication for multicore. Sounds like we are replicating each others work. When will you be able to attach your stuff to JIRA issue? https://issues.apache.org/jira/browse/SOLR-433 Thanks, Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Doug Steigerwald <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Friday, February 15, 2008 12:45:08 PM > Subject: Re: Integrated Spellchecking > > That unfortunately got pushed aside to work on some of our higher priority > solr > work since we > already had it working one way. > > Hoping to revisit this after we push to production and start working on new > features and share what > I've done for this and multicore/spellcheck replication (which we have > working > quite well in QA > right now). > > Doug Steigerwald > Software Developer > McClatchy Interactive > [EMAIL PROTECTED] > 919.861.1287 > > > oleg_gnatovskiy wrote: > > > > > > dsteiger wrote: > >> I've got a couple search components for automatic spell correction that > >> I've been working on. > >> > >> I've converted most of the SpellCheckerRequestHandler to a search > >> component (hopefully will throw a > >> patch out soon for this). Then another search component that will do auto > >> correction for a query if > >> the search returns zero results. > >> > >> We're hoping to see some performance improvements out of handling this in > >> Solr instead of our Rails > >> service. > >> > >> doug > >> > >> > >> Ryan McKinley wrote: > >>> Yes -- this is what search components are for! > >>> > >>> Depending on where you put it in the chain, it could only return spell > >>> checked results if there are too few results (or the top score is below > >>> some threshold) > >>> > >>> ryan > >>> > >>> > >>> Grant Ingersoll wrote: > >>>> Is it feasible to submit a query to any of the various handlers and > >>>> have it bring back results and spelling suggestions all in one > >>>> response? Is this something the query components piece would handle, > >>>> assuming one exists for the spell checker? > >>>> > >>>> Thanks, > >>>> Grant > >>>> > >> > > > > > > So have you succeeded in implementing this patch? I'd definitely like to use > > this functionality as a search suggestion. >
Re: Integrated Spellchecking
I'd have to agree with this. I'd probably be able to put a bit of work into it as well, as it's something we'd use for sure if it were available. Thanks! Matthew Runo Software Developer Zappos.com 702.943.7833 On Feb 18, 2008, at 6:09 AM, Grant Ingersoll wrote: Hey Doug, If you have permission to donate, perhaps you can just post the patch anyway and state that it isn't quite ready to go. This is something I could use too, and so may have some cycles to work on it. I hate to replicate the work if you already have something that is more or less working. A half baked patch is better than no patch. -Grant On Feb 15, 2008, at 12:45 PM, Doug Steigerwald wrote: That unfortunately got pushed aside to work on some of our higher priority solr work since we already had it working one way. Hoping to revisit this after we push to production and start working on new features and share what I've done for this and multicore/spellcheck replication (which we have working quite well in QA right now). Doug Steigerwald Software Developer McClatchy Interactive [EMAIL PROTECTED] 919.861.1287 oleg_gnatovskiy wrote: dsteiger wrote: I've got a couple search components for automatic spell correction that I've been working on. I've converted most of the SpellCheckerRequestHandler to a search component (hopefully will throw a patch out soon for this). Then another search component that will do auto correction for a query if the search returns zero results. We're hoping to see some performance improvements out of handling this in Solr instead of our Rails service. doug Ryan McKinley wrote: Yes -- this is what search components are for! Depending on where you put it in the chain, it could only return spell checked results if there are too few results (or the top score is below some threshold) ryan Grant Ingersoll wrote: Is it feasible to submit a query to any of the various handlers and have it bring back results and spelling suggestions all in one response? Is this something the query components piece would handle, assuming one exists for the spell checker? Thanks, Grant So have you succeeded in implementing this patch? I'd definitely like to use this functionality as a search suggestion.
Re: Integrated Spellchecking
Hey Doug, If you have permission to donate, perhaps you can just post the patch anyway and state that it isn't quite ready to go. This is something I could use too, and so may have some cycles to work on it. I hate to replicate the work if you already have something that is more or less working. A half baked patch is better than no patch. -Grant On Feb 15, 2008, at 12:45 PM, Doug Steigerwald wrote: That unfortunately got pushed aside to work on some of our higher priority solr work since we already had it working one way. Hoping to revisit this after we push to production and start working on new features and share what I've done for this and multicore/ spellcheck replication (which we have working quite well in QA right now). Doug Steigerwald Software Developer McClatchy Interactive [EMAIL PROTECTED] 919.861.1287 oleg_gnatovskiy wrote: dsteiger wrote: I've got a couple search components for automatic spell correction that I've been working on. I've converted most of the SpellCheckerRequestHandler to a search component (hopefully will throw a patch out soon for this). Then another search component that will do auto correction for a query if the search returns zero results. We're hoping to see some performance improvements out of handling this in Solr instead of our Rails service. doug Ryan McKinley wrote: Yes -- this is what search components are for! Depending on where you put it in the chain, it could only return spell checked results if there are too few results (or the top score is below some threshold) ryan Grant Ingersoll wrote: Is it feasible to submit a query to any of the various handlers and have it bring back results and spelling suggestions all in one response? Is this something the query components piece would handle, assuming one exists for the spell checker? Thanks, Grant So have you succeeded in implementing this patch? I'd definitely like to use this functionality as a search suggestion.
Re: Integrated Spellchecking
That unfortunately got pushed aside to work on some of our higher priority solr work since we already had it working one way. Hoping to revisit this after we push to production and start working on new features and share what I've done for this and multicore/spellcheck replication (which we have working quite well in QA right now). Doug Steigerwald Software Developer McClatchy Interactive [EMAIL PROTECTED] 919.861.1287 oleg_gnatovskiy wrote: dsteiger wrote: I've got a couple search components for automatic spell correction that I've been working on. I've converted most of the SpellCheckerRequestHandler to a search component (hopefully will throw a patch out soon for this). Then another search component that will do auto correction for a query if the search returns zero results. We're hoping to see some performance improvements out of handling this in Solr instead of our Rails service. doug Ryan McKinley wrote: Yes -- this is what search components are for! Depending on where you put it in the chain, it could only return spell checked results if there are too few results (or the top score is below some threshold) ryan Grant Ingersoll wrote: Is it feasible to submit a query to any of the various handlers and have it bring back results and spelling suggestions all in one response? Is this something the query components piece would handle, assuming one exists for the spell checker? Thanks, Grant So have you succeeded in implementing this patch? I'd definitely like to use this functionality as a search suggestion.
Re: Integrated Spellchecking
dsteiger wrote: > > I've got a couple search components for automatic spell correction that > I've been working on. > > I've converted most of the SpellCheckerRequestHandler to a search > component (hopefully will throw a > patch out soon for this). Then another search component that will do auto > correction for a query if > the search returns zero results. > > We're hoping to see some performance improvements out of handling this in > Solr instead of our Rails > service. > > doug > > > Ryan McKinley wrote: >> Yes -- this is what search components are for! >> >> Depending on where you put it in the chain, it could only return spell >> checked results if there are too few results (or the top score is below >> some threshold) >> >> ryan >> >> >> Grant Ingersoll wrote: >>> Is it feasible to submit a query to any of the various handlers and >>> have it bring back results and spelling suggestions all in one >>> response? Is this something the query components piece would handle, >>> assuming one exists for the spell checker? >>> >>> Thanks, >>> Grant >>> > > So have you succeeded in implementing this patch? I'd definitely like to use this functionality as a search suggestion. -- View this message in context: http://www.nabble.com/Integrated-Spellchecking-tp14930232p15504125.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Integrated Spellchecking
On Jan 17, 2008, at 3:01 PM, Doug Steigerwald wrote: I've got a couple search components for automatic spell correction that I've been working on. I've converted most of the SpellCheckerRequestHandler to a search component (hopefully will throw a patch out soon for this). Then another search component that will do auto correction for a query if the search returns zero results. If you need somebody to test, throw it up on a JIRA, as I would be happy to test. -Grant
Re: Integrated Spellchecking
I've got a couple search components for automatic spell correction that I've been working on. I've converted most of the SpellCheckerRequestHandler to a search component (hopefully will throw a patch out soon for this). Then another search component that will do auto correction for a query if the search returns zero results. We're hoping to see some performance improvements out of handling this in Solr instead of our Rails service. doug Ryan McKinley wrote: Yes -- this is what search components are for! Depending on where you put it in the chain, it could only return spell checked results if there are too few results (or the top score is below some threshold) ryan Grant Ingersoll wrote: Is it feasible to submit a query to any of the various handlers and have it bring back results and spelling suggestions all in one response? Is this something the query components piece would handle, assuming one exists for the spell checker? Thanks, Grant
Re: Integrated Spellchecking
On Jan 17, 2008 2:33 PM, Ryan McKinley <[EMAIL PROTECTED]> wrote: > Yes -- this is what search components are for! > > Depending on where you put it in the chain, it could only return spell > checked results if there are too few results (or the top score is below > some threshold) Score thresholds are tricky in lucene since scores across different queries aren't that meaningful. But a number of results threshold sounds like it might be a good idea Perhaps there could even be options to - test if the suggestion actually matches any documents - replace the original query with the suggestion before running the query - add an additional DocList to the response for documents matching the suggestion Thinking a little more on the threshold idea, it seems to have some issues. One problem: In general, you want spell suggestions to be corpus wide... so you might be under a threshold just because the query is heavily filtered (restrictive fqs) and the suggestion may not match anything under those restrictions. Getting the DocSet of the query only to check the number of hits adds expense to the request. But - if not sorting by score, the cache would re-use the query DocSet instead of going to the Lucene index - one could add a call to Solr to retrieve the number of hits in the base query, before filtering (but that could limit or complicate future optimizations that move some of the filters into the base query...) Another issue is how big the spelling index is if it's big enough, best practice might be to have a separate spelling index that the front-end client hits concurrently with the main index. This also sort of applies to distributed search (one may want a single separate spelling index that isn't distributed). -Yonik
Re: Integrated Spellchecking
Yes -- this is what search components are for! Depending on where you put it in the chain, it could only return spell checked results if there are too few results (or the top score is below some threshold) ryan Grant Ingersoll wrote: Is it feasible to submit a query to any of the various handlers and have it bring back results and spelling suggestions all in one response? Is this something the query components piece would handle, assuming one exists for the spell checker? Thanks, Grant
Integrated Spellchecking
Is it feasible to submit a query to any of the various handlers and have it bring back results and spelling suggestions all in one response? Is this something the query components piece would handle, assuming one exists for the spell checker? Thanks, Grant