Re: Suggesting broken words with solr.WordBreakSolrSpellChecker
Nice! It works indeed! Sorry I didn't noticed that before. But what if I want the same for the iPhone? I mean suggesting "I phone" for users who searched "iphone". Minbreaklength of 1 is just too small isn't it? Il sabato 31 gennaio 2015, Dyer, James-2 [via Lucene] < ml-node+s472066n4183176...@n3.nabble.com> ha scritto: > You need to decrease this to at least 2 because the length of "go" is <3. > > 3 > > James Dyer > Ingram Content Group > > > -Original Message- > From: fabio.bozzo [mailto:[hidden email] > <http:///user/SendEmail.jtp?type=node&node=4183176&i=0>] > Sent: Wednesday, January 28, 2015 4:55 PM > To: [hidden email] <http:///user/SendEmail.jtp?type=node&node=4183176&i=1> > Subject: RE: Suggesting broken words with solr.WordBreakSolrSpellChecker > > I tried increasing my alternativeTermCount to 5 and enable extended > results. > I also added a filter fq parameter to clarify what I mean: > > *Querying for "go pro" is good:* > > { > "responseHeader": { > "status": 0, > "QTime": 2, > "params": { > "q": "go pro", > "indent": "true", > "fq": "marchio:\"GO PRO\"", > "rows": "1", > "wt": "json", > "spellcheck.extendedResults": "true", > "_": "1422485581792" > } > }, > "response": { > "numFound": 27, > "start": 0, > "docs": [ > { > "codice_produttore_s": "DK00150020", > "codice_s": "5.BAT.27407", > "id": "27407", > "marchio": "GO PRO", > "barcode_interno_s": "185323000958", > "prezzo_acquisto_d": 16.12, > "data_aggiornamento_dt": "2012-06-21T00:00:00Z", > "descrizione": "BATTERIA GO PRO HERO ", > "prezzo_vendita_d": 39.9, > "categoria": "Batterie", > "_version_": 1491583424191791000 > }, > > > > ] > }, > "spellcheck": { > "suggestions": [ > "go pro", > { > "numFound": 1, > "startOffset": 0, > "endOffset": 6, > "origFreq": 433, > "suggestion": [ > { > "word": "gopro", > "freq": 2 > } > ] > }, > "correctlySpelled", > false, > "collation", > [ > "collationQuery", > "gopro", > "hits", > 3, > "misspellingsAndCorrections", > [ > "go pro", > "gopro" > ] > ] > ] > } > } > > While querying for "gopro" is not: > > { > "responseHeader": { > "status": 0, > "QTime": 6, > "params": { > "q": "gopro", > "indent": "true", > "fq": "marchio:\"GO PRO\"", > "rows": "1", > "wt": "json", > "spellcheck.extendedResults": "true", > "_": "1422485629480" > } > }, > "response": { > "numFound": 3, > "start": 0, > "docs": [ > { > "codice_produttore_s": "DK0030010", > "codice_s": "5.VID.39163", > "id": "38814", > "marchio": "GO PRO", > "barcode_interno_s": "818279012477", > "prezzo_acquisto_d": 150.84, > "data_aggiornamento_dt": "2014-12-24T00:00:00Z", > "descrizione": "VIDEOCAMERA GO-PRO HERO 3 WHITE NUOVO SLIM", > "prezzo_vendita_d": 219, > "categoria": "Fotografia", > "_version_": 1491583425479442400 > }, > > ] > }, > "spellcheck": { > "suggestions": [ > "gopro", > { >
RE: Suggesting broken words with solr.WordBreakSolrSpellChecker
I tried increasing my alternativeTermCount to 5 and enable extended results. I also added a filter fq parameter to clarify what I mean: *Querying for "go pro" is good:* { "responseHeader": { "status": 0, "QTime": 2, "params": { "q": "go pro", "indent": "true", "fq": "marchio:\"GO PRO\"", "rows": "1", "wt": "json", "spellcheck.extendedResults": "true", "_": "1422485581792" } }, "response": { "numFound": 27, "start": 0, "docs": [ { "codice_produttore_s": "DK00150020", "codice_s": "5.BAT.27407", "id": "27407", "marchio": "GO PRO", "barcode_interno_s": "185323000958", "prezzo_acquisto_d": 16.12, "data_aggiornamento_dt": "2012-06-21T00:00:00Z", "descrizione": "BATTERIA GO PRO HERO ", "prezzo_vendita_d": 39.9, "categoria": "Batterie", "_version_": 1491583424191791000 }, ] }, "spellcheck": { "suggestions": [ "go pro", { "numFound": 1, "startOffset": 0, "endOffset": 6, "origFreq": 433, "suggestion": [ { "word": "gopro", "freq": 2 } ] }, "correctlySpelled", false, "collation", [ "collationQuery", "gopro", "hits", 3, "misspellingsAndCorrections", [ "go pro", "gopro" ] ] ] } } While querying for "gopro" is not: { "responseHeader": { "status": 0, "QTime": 6, "params": { "q": "gopro", "indent": "true", "fq": "marchio:\"GO PRO\"", "rows": "1", "wt": "json", "spellcheck.extendedResults": "true", "_": "1422485629480" } }, "response": { "numFound": 3, "start": 0, "docs": [ { "codice_produttore_s": "DK0030010", "codice_s": "5.VID.39163", "id": "38814", "marchio": "GO PRO", "barcode_interno_s": "818279012477", "prezzo_acquisto_d": 150.84, "data_aggiornamento_dt": "2014-12-24T00:00:00Z", "descrizione": "VIDEOCAMERA GO-PRO HERO 3 WHITE NUOVO SLIM", "prezzo_vendita_d": 219, "categoria": "Fotografia", "_version_": 1491583425479442400 }, ] }, "spellcheck": { "suggestions": [ "gopro", { "numFound": 1, "startOffset": 0, "endOffset": 5, "origFreq": 2, "suggestion": [ { "word": "giro", "freq": 6 } ] }, "correctlySpelled", false ] } } --- I'd like "go pro" as a suggestion for "gopro" too. -- View this message in context: http://lucene.472066.n3.nabble.com/Suggesting-broken-words-with-solr-WordBreakSolrSpellChecker-tp4182172p4182735.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Suggesting broken words with solr.WordBreakSolrSpellChecker
I have this in my solrconfig: explicit 10 catch_all on default wordbreak false 5 2 100 true true 5 3 spellcheck Although my spellchecker does work, suggesting for misspelled terms, it doesn't work for the example above: I mean terms which are both valid, ("gopro"=100 docs; "go pro"=150 'others' docs). I want to suggest "gopro" for "go pro" search term and vice-versa, even if they're both perfectly valid terms in the index. Thank you -- View this message in context: http://lucene.472066.n3.nabble.com/Suggesting-broken-words-with-solr-WordBreakSolrSpellChecker-tp4182172p4182398.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Suggesting broken words with solr.WordBreakSolrSpellChecker
Good, I'll try. But imagine I have 100 documents containing "go pro" and 150 documents containing "gopro". Suggestions of the "other" term do not come up in any case. 2015-01-27 16:21 GMT+01:00 Dyer, James-2 [via Lucene] < ml-node+s472066n4182254...@n3.nabble.com>: > I think the word break spellchecker will do what you want. But, if I were > you, I'd dial back "maxChanges" to 1 or 2. You don't want it slicing a > word into 10 parts or trying to combine 10 adjacent words. You also need > the "minBreakLength" to be no more than 2, if you want it to break "go" > (length=2) off of "gopro". > > James Dyer > Ingram Content Group > > > -Original Message- > From: fabio.bozzo [mailto:[hidden email] > <http:///user/SendEmail.jtp?type=node&node=4182254&i=0>] > Sent: Tuesday, January 27, 2015 2:58 AM > To: [hidden email] <http:///user/SendEmail.jtp?type=node&node=4182254&i=1> > Subject: Suggesting broken words with solr.WordBreakSolrSpellChecker > > I indexed an electronics e-commerce product catalog. > > This is a typical document from my collection: > > > "docs": [ > { > "prezzo_vendita_d": 39.9, > "codice_produttore_s": "DK00150020", > "codice_s": "5.BAT.27407", > "descrizione": "BATTERIA GO PRO HERO ", > "barcode_interno_s": "185323000958", > "categoria": "Batterie", > "prezzo_acquisto_d": 16.12, > "marchio": "GO PRO", > "data_aggiornamento_dt": "2012-06-21T00:00:00Z", > "id": "27407", > "_version_": 1491274123542790100 > }, > { > "codice_produttore_s": "DK0052043", > "codice_s": "05.SP.42760", > "id": "42760", > "marchio": "SP GADGETS", > "barcode_interno_s": "4028017520430", > "prezzo_acquisto_d": 34.4, > "data_aggiornamento_dt": "2014-11-04T00:00:00Z", > "descrizione": "SP POS CASE GOPRO OLIVE LARGE", > "prezzo_vendita_d": 59.95, > "_version_": 1491274406746390500 > } > ...] > I want my spellchecker to suggest "go pro" to users searching "gopro" > (without whitespace). > > I also want users searching "go pro" to find "gopro" products, too. > > Here's a little bit of my configuration: > > *schema.xml* > > stored="true"/> > stored="true"/> > stored="true"/> > > indexed="true" > stored="false" multiValued="true" /> > stored="false" > multiValued="true" /> > > > > > > > > > > > ... > > positionIncrementGap="100"> > > > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" > preserveOriginal="1" /> > > ignoreCase="true" > articles="lang/contractions_it.txt"/> > > > words="lang/stopwords_it.txt" format="snowball" /> > > > > > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" > preserveOriginal="1" /> > > ignoreCase="true" > articles="lang/contractions_it.txt"/> > > > words="lang/stopwords_it.txt" format="snowball" /> > > > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > > > > > > *solr-config.xml* > > > > explicit > 10 > catch_all > > on > default > wordbreak > false > 5 > 2 > 5 > true >
Suggesting broken words with solr.WordBreakSolrSpellChecker
I indexed an electronics e-commerce product catalog. This is a typical document from my collection: "docs": [ { "prezzo_vendita_d": 39.9, "codice_produttore_s": "DK00150020", "codice_s": "5.BAT.27407", "descrizione": "BATTERIA GO PRO HERO ", "barcode_interno_s": "185323000958", "categoria": "Batterie", "prezzo_acquisto_d": 16.12, "marchio": "GO PRO", "data_aggiornamento_dt": "2012-06-21T00:00:00Z", "id": "27407", "_version_": 1491274123542790100 }, { "codice_produttore_s": "DK0052043", "codice_s": "05.SP.42760", "id": "42760", "marchio": "SP GADGETS", "barcode_interno_s": "4028017520430", "prezzo_acquisto_d": 34.4, "data_aggiornamento_dt": "2014-11-04T00:00:00Z", "descrizione": "SP POS CASE GOPRO OLIVE LARGE", "prezzo_vendita_d": 59.95, "_version_": 1491274406746390500 } ...] I want my spellchecker to suggest "go pro" to users searching "gopro" (without whitespace). I also want users searching "go pro" to find "gopro" products, too. Here's a little bit of my configuration: *schema.xml* ... *solr-config.xml* explicit 10 catch_all on default wordbreak false 5 2 5 true true 5 3 spellcheck ... text_general default catch_all_original solr.DirectSolrSpellChecker internal 0.5 2 1 5 4 0.01 wordbreak solr.WordBreakSolrSpellChecker catch_all_original true true 10 3 *Is the spellchecker the right solution or is this the case for something else, like the "more like this" feature?* Thank you -- View this message in context: http://lucene.472066.n3.nabble.com/Suggesting-broken-words-with-solr-WordBreakSolrSpellChecker-tp4182172.html Sent from the Solr - User mailing list archive at Nabble.com.