RE: Suggesting broken words with solr.WordBreakSolrSpellChecker

2015-02-02 Thread Dyer, James
1 is not too small a value, in fact, it’s the default value.  Of course the 
more combinations it has to try, the slower it will run, but the penalty is 
small enough you're not going to notice.  The only problem you might have is if 
you use a lot of 1-character stop-words, you might get these stop-words back as 
nonsense suggestions (assuming you do not filter stop words for your spelling 
dictionary field, but do remove them on the query field).  But I'd try it if I 
were you.  It's probably the best option in your case.

James Dyer
Ingram Content Group

-Original Message-
From: fabio.bozzo [mailto:f.bo...@3-w.it] 
Sent: Friday, January 30, 2015 5:45 PM
To: solr-user@lucene.apache.org
Subject: Re: Suggesting broken words with solr.WordBreakSolrSpellChecker

Nice! It works indeed!
Sorry I didn't noticed that before.

But what if I want the same for the iPhone?
I mean suggesting "I phone" for users who searched "iphone". Minbreaklength
of 1 is just too small isn't it?

Il sabato 31 gennaio 2015, Dyer, James-2 [via Lucene] <
ml-node+s472066n4183176...@n3.nabble.com> ha scritto:

> You need to decrease this to at least 2 because the length of "go" is <3.
>
> 3
>
> James Dyer
> Ingram Content Group
>
>
> -Original Message-
> From: fabio.bozzo [mailto:[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=4183176&i=0>]
> Sent: Wednesday, January 28, 2015 4:55 PM
> To: [hidden email] <http:///user/SendEmail.jtp?type=node&node=4183176&i=1>
> Subject: RE: Suggesting broken words with solr.WordBreakSolrSpellChecker
>
> I tried increasing my alternativeTermCount to 5 and enable extended
> results.
> I also added a filter fq parameter to clarify what I mean:
>
> *Querying for "go pro" is good:*
>
> {
>   "responseHeader": {
> "status": 0,
> "QTime": 2,
> "params": {
>   "q": "go pro",
>   "indent": "true",
>   "fq": "marchio:\"GO PRO\"",
>   "rows": "1",
>   "wt": "json",
>   "spellcheck.extendedResults": "true",
>   "_": "1422485581792"
> }
>   },
>   "response": {
> "numFound": 27,
> "start": 0,
> "docs": [
>   {
> "codice_produttore_s": "DK00150020",
> "codice_s": "5.BAT.27407",
> "id": "27407",
> "marchio": "GO PRO",
> "barcode_interno_s": "185323000958",
> "prezzo_acquisto_d": 16.12,
> "data_aggiornamento_dt": "2012-06-21T00:00:00Z",
> "descrizione": "BATTERIA GO PRO HERO ",
> "prezzo_vendita_d": 39.9,
> "categoria": "Batterie",
> "_version_": 1491583424191791000
>   },
>
>  
>
> ]
>   },
>   "spellcheck": {
> "suggestions": [
>   "go pro",
>   {
> "numFound": 1,
> "startOffset": 0,
> "endOffset": 6,
> "origFreq": 433,
> "suggestion": [
>   {
> "word": "gopro",
> "freq": 2
>   }
> ]
>   },
>   "correctlySpelled",
>   false,
>   "collation",
>   [
> "collationQuery",
> "gopro",
> "hits",
> 3,
> "misspellingsAndCorrections",
> [
>   "go pro",
>   "gopro"
> ]
>   ]
> ]
>   }
> }
>
> While querying for "gopro" is not:
>
> {
>   "responseHeader": {
> "status": 0,
> "QTime": 6,
> "params": {
>   "q": "gopro",
>   "indent": "true",
>   "fq": "marchio:\"GO PRO\"",
>   "rows": "1",
>   "wt": "json",
>   "spellcheck.extendedResults": "true",
>   "_": "1422485629480"
> }
>   },
>   "response": {
> "numFound": 3,
> "start": 0,
> "docs": [
>   {
> "codice_produttore_s": "DK0030010",

Re: Suggesting broken words with solr.WordBreakSolrSpellChecker

2015-01-30 Thread fabio.bozzo
Nice! It works indeed!
Sorry I didn't noticed that before.

But what if I want the same for the iPhone?
I mean suggesting "I phone" for users who searched "iphone". Minbreaklength
of 1 is just too small isn't it?

Il sabato 31 gennaio 2015, Dyer, James-2 [via Lucene] <
ml-node+s472066n4183176...@n3.nabble.com> ha scritto:

> You need to decrease this to at least 2 because the length of "go" is <3.
>
> 3
>
> James Dyer
> Ingram Content Group
>
>
> -Original Message-
> From: fabio.bozzo [mailto:[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=4183176&i=0>]
> Sent: Wednesday, January 28, 2015 4:55 PM
> To: [hidden email] <http:///user/SendEmail.jtp?type=node&node=4183176&i=1>
> Subject: RE: Suggesting broken words with solr.WordBreakSolrSpellChecker
>
> I tried increasing my alternativeTermCount to 5 and enable extended
> results.
> I also added a filter fq parameter to clarify what I mean:
>
> *Querying for "go pro" is good:*
>
> {
>   "responseHeader": {
> "status": 0,
> "QTime": 2,
> "params": {
>   "q": "go pro",
>   "indent": "true",
>   "fq": "marchio:\"GO PRO\"",
>   "rows": "1",
>   "wt": "json",
>   "spellcheck.extendedResults": "true",
>   "_": "1422485581792"
> }
>   },
>   "response": {
> "numFound": 27,
> "start": 0,
> "docs": [
>   {
> "codice_produttore_s": "DK00150020",
> "codice_s": "5.BAT.27407",
> "id": "27407",
> "marchio": "GO PRO",
> "barcode_interno_s": "185323000958",
> "prezzo_acquisto_d": 16.12,
> "data_aggiornamento_dt": "2012-06-21T00:00:00Z",
> "descrizione": "BATTERIA GO PRO HERO ",
> "prezzo_vendita_d": 39.9,
> "categoria": "Batterie",
> "_version_": 1491583424191791000
>   },
>
>  
>
> ]
>   },
>   "spellcheck": {
> "suggestions": [
>   "go pro",
>   {
> "numFound": 1,
> "startOffset": 0,
> "endOffset": 6,
> "origFreq": 433,
> "suggestion": [
>   {
> "word": "gopro",
> "freq": 2
>   }
> ]
>   },
>   "correctlySpelled",
>   false,
>   "collation",
>   [
> "collationQuery",
> "gopro",
> "hits",
> 3,
> "misspellingsAndCorrections",
> [
>   "go pro",
>   "gopro"
> ]
>   ]
> ]
>   }
> }
>
> While querying for "gopro" is not:
>
> {
>   "responseHeader": {
> "status": 0,
> "QTime": 6,
> "params": {
>   "q": "gopro",
>   "indent": "true",
>   "fq": "marchio:\"GO PRO\"",
>   "rows": "1",
>   "wt": "json",
>   "spellcheck.extendedResults": "true",
>   "_": "1422485629480"
> }
>   },
>   "response": {
> "numFound": 3,
> "start": 0,
> "docs": [
>   {
> "codice_produttore_s": "DK0030010",
> "codice_s": "5.VID.39163",
> "id": "38814",
> "marchio": "GO PRO",
> "barcode_interno_s": "818279012477",
> "prezzo_acquisto_d": 150.84,
> "data_aggiornamento_dt": "2014-12-24T00:00:00Z",
> "descrizione": "VIDEOCAMERA GO-PRO HERO 3 WHITE NUOVO SLIM",
> "prezzo_vendita_d": 219,
> "categoria": "Fotografia",
> "_version_": 1491583425479442400
>   },
> 
> ]
>   },
>   "spellcheck": {
> "suggestions": [
>   "gopro",
>   {
>

RE: Suggesting broken words with solr.WordBreakSolrSpellChecker

2015-01-30 Thread Dyer, James
You need to decrease this to at least 2 because the length of "go" is <3.

3

James Dyer
Ingram Content Group


-Original Message-
From: fabio.bozzo [mailto:f.bo...@3-w.it] 
Sent: Wednesday, January 28, 2015 4:55 PM
To: solr-user@lucene.apache.org
Subject: RE: Suggesting broken words with solr.WordBreakSolrSpellChecker

I tried increasing my alternativeTermCount to 5 and enable extended results.
I also added a filter fq parameter to clarify what I mean:

*Querying for "go pro" is good:*

{
  "responseHeader": {
"status": 0,
"QTime": 2,
"params": {
  "q": "go pro",
  "indent": "true",
  "fq": "marchio:\"GO PRO\"",
  "rows": "1",
  "wt": "json",
  "spellcheck.extendedResults": "true",
  "_": "1422485581792"
}
  },
  "response": {
"numFound": 27,
"start": 0,
"docs": [
  {
"codice_produttore_s": "DK00150020",
"codice_s": "5.BAT.27407",
"id": "27407",
"marchio": "GO PRO",
"barcode_interno_s": "185323000958",
"prezzo_acquisto_d": 16.12,
"data_aggiornamento_dt": "2012-06-21T00:00:00Z",
"descrizione": "BATTERIA GO PRO HERO ",
"prezzo_vendita_d": 39.9,
"categoria": "Batterie",
"_version_": 1491583424191791000
  },

 

]
  },
  "spellcheck": {
"suggestions": [
  "go pro",
  {
"numFound": 1,
"startOffset": 0,
"endOffset": 6,
"origFreq": 433,
"suggestion": [
  {
"word": "gopro",
"freq": 2
  }
]
  },
  "correctlySpelled",
  false,
  "collation",
  [
"collationQuery",
"gopro",
"hits",
3,
"misspellingsAndCorrections",
[
  "go pro",
  "gopro"
]
  ]
]
  }
}

While querying for "gopro" is not:

{
  "responseHeader": {
"status": 0,
"QTime": 6,
"params": {
  "q": "gopro",
  "indent": "true",
  "fq": "marchio:\"GO PRO\"",
  "rows": "1",
  "wt": "json",
  "spellcheck.extendedResults": "true",
  "_": "1422485629480"
}
  },
  "response": {
"numFound": 3,
"start": 0,
"docs": [
  {
"codice_produttore_s": "DK0030010",
"codice_s": "5.VID.39163",
"id": "38814",
"marchio": "GO PRO",
"barcode_interno_s": "818279012477",
"prezzo_acquisto_d": 150.84,
"data_aggiornamento_dt": "2014-12-24T00:00:00Z",
"descrizione": "VIDEOCAMERA GO-PRO HERO 3 WHITE NUOVO SLIM",
"prezzo_vendita_d": 219,
"categoria": "Fotografia",
"_version_": 1491583425479442400
  },

]
  },
  "spellcheck": {
"suggestions": [
  "gopro",
  {
"numFound": 1,
"startOffset": 0,
"endOffset": 5,
"origFreq": 2,
"suggestion": [
  {
"word": "giro",
"freq": 6
  }
]
  },
  "correctlySpelled",
  false
]
  }
}

---

I'd like "go pro" as a suggestion for "gopro" too.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Suggesting-broken-words-with-solr-WordBreakSolrSpellChecker-tp4182172p4182735.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Suggesting broken words with solr.WordBreakSolrSpellChecker

2015-01-28 Thread fabio.bozzo
I tried increasing my alternativeTermCount to 5 and enable extended results.
I also added a filter fq parameter to clarify what I mean:

*Querying for "go pro" is good:*

{
  "responseHeader": {
"status": 0,
"QTime": 2,
"params": {
  "q": "go pro",
  "indent": "true",
  "fq": "marchio:\"GO PRO\"",
  "rows": "1",
  "wt": "json",
  "spellcheck.extendedResults": "true",
  "_": "1422485581792"
}
  },
  "response": {
"numFound": 27,
"start": 0,
"docs": [
  {
"codice_produttore_s": "DK00150020",
"codice_s": "5.BAT.27407",
"id": "27407",
"marchio": "GO PRO",
"barcode_interno_s": "185323000958",
"prezzo_acquisto_d": 16.12,
"data_aggiornamento_dt": "2012-06-21T00:00:00Z",
"descrizione": "BATTERIA GO PRO HERO ",
"prezzo_vendita_d": 39.9,
"categoria": "Batterie",
"_version_": 1491583424191791000
  },

 

]
  },
  "spellcheck": {
"suggestions": [
  "go pro",
  {
"numFound": 1,
"startOffset": 0,
"endOffset": 6,
"origFreq": 433,
"suggestion": [
  {
"word": "gopro",
"freq": 2
  }
]
  },
  "correctlySpelled",
  false,
  "collation",
  [
"collationQuery",
"gopro",
"hits",
3,
"misspellingsAndCorrections",
[
  "go pro",
  "gopro"
]
  ]
]
  }
}

While querying for "gopro" is not:

{
  "responseHeader": {
"status": 0,
"QTime": 6,
"params": {
  "q": "gopro",
  "indent": "true",
  "fq": "marchio:\"GO PRO\"",
  "rows": "1",
  "wt": "json",
  "spellcheck.extendedResults": "true",
  "_": "1422485629480"
}
  },
  "response": {
"numFound": 3,
"start": 0,
"docs": [
  {
"codice_produttore_s": "DK0030010",
"codice_s": "5.VID.39163",
"id": "38814",
"marchio": "GO PRO",
"barcode_interno_s": "818279012477",
"prezzo_acquisto_d": 150.84,
"data_aggiornamento_dt": "2014-12-24T00:00:00Z",
"descrizione": "VIDEOCAMERA GO-PRO HERO 3 WHITE NUOVO SLIM",
"prezzo_vendita_d": 219,
"categoria": "Fotografia",
"_version_": 1491583425479442400
  },

]
  },
  "spellcheck": {
"suggestions": [
  "gopro",
  {
"numFound": 1,
"startOffset": 0,
"endOffset": 5,
"origFreq": 2,
"suggestion": [
  {
"word": "giro",
"freq": 6
  }
]
  },
  "correctlySpelled",
  false
]
  }
}

---

I'd like "go pro" as a suggestion for "gopro" too.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Suggesting-broken-words-with-solr-WordBreakSolrSpellChecker-tp4182172p4182735.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Suggesting broken words with solr.WordBreakSolrSpellChecker

2015-01-28 Thread Dyer, James
Try using something larger than 2 for alternativeTermCount.  5 is probably ok 
here.  If that doesn't work, then post the exact query you are using and the 
full extended spellcheck results.

James Dyer
Ingram Content Group


-Original Message-
From: fabio.bozzo [mailto:f.bo...@3-w.it] 
Sent: Tuesday, January 27, 2015 3:59 PM
To: solr-user@lucene.apache.org
Subject: RE: Suggesting broken words with solr.WordBreakSolrSpellChecker

I have this in my solrconfig:




explicit
10
catch_all

on
default
wordbreak
false
5
2
100
true
true
5
3



spellcheck




Although my spellchecker does work, suggesting for misspelled terms, it
doesn't work for the example above:
I mean terms which are both valid, ("gopro"=100 docs; "go pro"=150 'others'
docs).
I want to suggest "gopro" for "go pro" search term and vice-versa, even if
they're both perfectly valid terms in the index. Thank you



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Suggesting-broken-words-with-solr-WordBreakSolrSpellChecker-tp4182172p4182398.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Suggesting broken words with solr.WordBreakSolrSpellChecker

2015-01-27 Thread fabio.bozzo
I have this in my solrconfig:




explicit
10
catch_all

on
default
wordbreak
false
5
2
100
true
true
5
3



spellcheck




Although my spellchecker does work, suggesting for misspelled terms, it
doesn't work for the example above:
I mean terms which are both valid, ("gopro"=100 docs; "go pro"=150 'others'
docs).
I want to suggest "gopro" for "go pro" search term and vice-versa, even if
they're both perfectly valid terms in the index. Thank you



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Suggesting-broken-words-with-solr-WordBreakSolrSpellChecker-tp4182172p4182398.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Suggesting broken words with solr.WordBreakSolrSpellChecker

2015-01-27 Thread Dyer, James
You need to set "spellcheck.alternativeTermCount" to a value greater than zero. 
 Without it, spellcheck will never suggest for something in the index.

See 
https://cwiki.apache.org/confluence/display/solr/Spell+Checking#SpellChecking-Thespellcheck.alternativeTermCountParameter

James Dyer
Ingram Content Group


-Original Message-
From: fabio.bozzo [mailto:f.bo...@3-w.it] 
Sent: Tuesday, January 27, 2015 9:57 AM
To: solr-user@lucene.apache.org
Subject: Re: Suggesting broken words with solr.WordBreakSolrSpellChecker

Good, I'll try.
But imagine I have 100 documents containing "go pro" and 150 documents
containing "gopro".
Suggestions of the "other" term do not come up in any case.

2015-01-27 16:21 GMT+01:00 Dyer, James-2 [via Lucene] <
ml-node+s472066n4182254...@n3.nabble.com>:

> I think the word break spellchecker will do what you want.  But, if I were
> you, I'd dial back "maxChanges" to 1 or 2.  You don't want it slicing a
> word into 10 parts or trying to combine 10 adjacent words.  You also need
> the "minBreakLength" to be no more than 2, if you want it to break "go"
> (length=2) off of "gopro".
>
> James Dyer
> Ingram Content Group
>
>
> -Original Message-
> From: fabio.bozzo [mailto:[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=4182254&i=0>]
> Sent: Tuesday, January 27, 2015 2:58 AM
> To: [hidden email] <http:///user/SendEmail.jtp?type=node&node=4182254&i=1>
> Subject: Suggesting broken words with solr.WordBreakSolrSpellChecker
>
> I indexed an electronics e-commerce product catalog.
>
> This is a typical document from my collection:
>
>
> "docs": [
>   {
> "prezzo_vendita_d": 39.9,
> "codice_produttore_s": "DK00150020",
> "codice_s": "5.BAT.27407",
> "descrizione": "BATTERIA GO PRO HERO ",
> "barcode_interno_s": "185323000958",
> "categoria": "Batterie",
> "prezzo_acquisto_d": 16.12,
> "marchio": "GO PRO",
> "data_aggiornamento_dt": "2012-06-21T00:00:00Z",
> "id": "27407",
> "_version_": 1491274123542790100
>   },
>   {
> "codice_produttore_s": "DK0052043",
> "codice_s": "05.SP.42760",
> "id": "42760",
> "marchio": "SP GADGETS",
> "barcode_interno_s": "4028017520430",
> "prezzo_acquisto_d": 34.4,
> "data_aggiornamento_dt": "2014-11-04T00:00:00Z",
> "descrizione": "SP POS CASE GOPRO OLIVE LARGE",
> "prezzo_vendita_d": 59.95,
> "_version_": 1491274406746390500
>   }
> ...]
> I want my spellchecker to suggest "go pro" to users searching "gopro"
> (without whitespace).
>
> I also want users searching "go pro" to find "gopro" products, too.
>
> Here's a little bit of my configuration:
>
> *schema.xml*
> 
>  stored="true"/>
>  stored="true"/>
>  stored="true"/>
>
>  indexed="true"
> stored="false" multiValued="true" />
>  stored="false"
> multiValued="true" />
>
> 
> 
> 
> 
>
> 
> 
> 
> 
> ...
>
>  positionIncrementGap="100">
> 
> 
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
> preserveOriginal="1" />
>
>  ignoreCase="true"
> articles="lang/contractions_it.txt"/>
> 
> 
>  words="lang/stopwords_it.txt" format="snowball" />
> 
> 
> 
> 
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
> preserveOriginal="1" />
>
>  ignoreCase="true"
> articles="lang/contractions_it.txt"/>
> 
> 
>  words="lang/stopwords_it.txt&qu

Re: Suggesting broken words with solr.WordBreakSolrSpellChecker

2015-01-27 Thread fabio.bozzo
Good, I'll try.
But imagine I have 100 documents containing "go pro" and 150 documents
containing "gopro".
Suggestions of the "other" term do not come up in any case.

2015-01-27 16:21 GMT+01:00 Dyer, James-2 [via Lucene] <
ml-node+s472066n4182254...@n3.nabble.com>:

> I think the word break spellchecker will do what you want.  But, if I were
> you, I'd dial back "maxChanges" to 1 or 2.  You don't want it slicing a
> word into 10 parts or trying to combine 10 adjacent words.  You also need
> the "minBreakLength" to be no more than 2, if you want it to break "go"
> (length=2) off of "gopro".
>
> James Dyer
> Ingram Content Group
>
>
> -Original Message-
> From: fabio.bozzo [mailto:[hidden email]
> ]
> Sent: Tuesday, January 27, 2015 2:58 AM
> To: [hidden email] 
> Subject: Suggesting broken words with solr.WordBreakSolrSpellChecker
>
> I indexed an electronics e-commerce product catalog.
>
> This is a typical document from my collection:
>
>
> "docs": [
>   {
> "prezzo_vendita_d": 39.9,
> "codice_produttore_s": "DK00150020",
> "codice_s": "5.BAT.27407",
> "descrizione": "BATTERIA GO PRO HERO ",
> "barcode_interno_s": "185323000958",
> "categoria": "Batterie",
> "prezzo_acquisto_d": 16.12,
> "marchio": "GO PRO",
> "data_aggiornamento_dt": "2012-06-21T00:00:00Z",
> "id": "27407",
> "_version_": 1491274123542790100
>   },
>   {
> "codice_produttore_s": "DK0052043",
> "codice_s": "05.SP.42760",
> "id": "42760",
> "marchio": "SP GADGETS",
> "barcode_interno_s": "4028017520430",
> "prezzo_acquisto_d": 34.4,
> "data_aggiornamento_dt": "2014-11-04T00:00:00Z",
> "descrizione": "SP POS CASE GOPRO OLIVE LARGE",
> "prezzo_vendita_d": 59.95,
> "_version_": 1491274406746390500
>   }
> ...]
> I want my spellchecker to suggest "go pro" to users searching "gopro"
> (without whitespace).
>
> I also want users searching "go pro" to find "gopro" products, too.
>
> Here's a little bit of my configuration:
>
> *schema.xml*
> 
>  stored="true"/>
>  stored="true"/>
>  stored="true"/>
>
>  indexed="true"
> stored="false" multiValued="true" />
>  stored="false"
> multiValued="true" />
>
> 
> 
> 
> 
>
> 
> 
> 
> 
> ...
>
>  positionIncrementGap="100">
> 
> 
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
> preserveOriginal="1" />
>
>  ignoreCase="true"
> articles="lang/contractions_it.txt"/>
> 
> 
>  words="lang/stopwords_it.txt" format="snowball" />
> 
> 
> 
> 
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
> preserveOriginal="1" />
>
>  ignoreCase="true"
> articles="lang/contractions_it.txt"/>
> 
> 
>  words="lang/stopwords_it.txt" format="snowball" />
>
> 
>  synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> 
> 
>
> 
>
> *solr-config.xml*
> 
>
> 
> explicit
> 10
> catch_all
>
> on
> default
> wordbreak
> false
> 5
> 2
> 5
> true
> true
> 5
> 3
> 
>
> 
> spellcheck
> 
>
> 
> ...
> 
>
> text_general
>
> 
> default
> catch_all_original
> solr.DirectSolrSpellChecker
> internal
> 0.5
> 2
> 1
> 5
> 4
> 0.01
> 
>
> 
> wordbreak
> solr.WordBreakSolrSpellChecker
>
> catch_all_original
> true
> true
> 10
> 3
> 
>
> 
>
>
> *Is the spellchecker the right solution or is this the case for something
> else, like the "more like this" feature?*
>
> Thank you
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Suggesting-broken-words-with-solr-WordBreakSolrSpellChecker-tp4182172.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Suggesting-broken-words-with-solr-WordBreakSolrSpellChecker-tp4182172p4182254.html
>  To unsubscribe from Suggesting broken words with
> solr.WordBreakSolrSpellChecker, click here
>

RE: Suggesting broken words with solr.WordBreakSolrSpellChecker

2015-01-27 Thread Dyer, James
I think the word break spellchecker will do what you want.  But, if I were you, 
I'd dial back "maxChanges" to 1 or 2.  You don't want it slicing a word into 10 
parts or trying to combine 10 adjacent words.  You also need the 
"minBreakLength" to be no more than 2, if you want it to break "go" (length=2) 
off of "gopro".  

James Dyer
Ingram Content Group


-Original Message-
From: fabio.bozzo [mailto:f.bo...@3-w.it] 
Sent: Tuesday, January 27, 2015 2:58 AM
To: solr-user@lucene.apache.org
Subject: Suggesting broken words with solr.WordBreakSolrSpellChecker

I indexed an electronics e-commerce product catalog.

This is a typical document from my collection:


"docs": [
  {
"prezzo_vendita_d": 39.9,
"codice_produttore_s": "DK00150020",
"codice_s": "5.BAT.27407",
"descrizione": "BATTERIA GO PRO HERO ",
"barcode_interno_s": "185323000958",
"categoria": "Batterie",
"prezzo_acquisto_d": 16.12,
"marchio": "GO PRO",
"data_aggiornamento_dt": "2012-06-21T00:00:00Z",
"id": "27407",
"_version_": 1491274123542790100
  },
  {
"codice_produttore_s": "DK0052043",
"codice_s": "05.SP.42760",
"id": "42760",
"marchio": "SP GADGETS",
"barcode_interno_s": "4028017520430",
"prezzo_acquisto_d": 34.4,
"data_aggiornamento_dt": "2014-11-04T00:00:00Z",
"descrizione": "SP POS CASE GOPRO OLIVE LARGE",
"prezzo_vendita_d": 59.95,
"_version_": 1491274406746390500
  }
...]
I want my spellchecker to suggest "go pro" to users searching "gopro"
(without whitespace).

I also want users searching "go pro" to find "gopro" products, too.

Here's a little bit of my configuration:

*schema.xml*

















...




























*solr-config.xml*



explicit
10
catch_all

on
default
wordbreak
false
5
2
5
true
true
5
3



spellcheck



...


text_general


default
catch_all_original
solr.DirectSolrSpellChecker
internal
0.5
2  
1
5
4
0.01



wordbreak
solr.WordBreakSolrSpellChecker  
catch_all_original
true
true
10
3





*Is the spellchecker the right solution or is this the case for something
else, like the "more like this" feature?*

Thank you



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Suggesting-broken-words-with-solr-WordBreakSolrSpellChecker-tp4182172.html
Sent from the Solr - User mailing list archive at Nabble.com.