Re: Suggesting broken words with solr.WordBreakSolrSpellChecker

2015-01-30 Thread fabio.bozzo
Nice! It works indeed!
Sorry I didn't noticed that before.

But what if I want the same for the iPhone?
I mean suggesting "I phone" for users who searched "iphone". Minbreaklength
of 1 is just too small isn't it?

Il sabato 31 gennaio 2015, Dyer, James-2 [via Lucene] <
ml-node+s472066n4183176...@n3.nabble.com> ha scritto:

> You need to decrease this to at least 2 because the length of "go" is <3.
>
> 3
>
> James Dyer
> Ingram Content Group
>
>
> -Original Message-
> From: fabio.bozzo [mailto:[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=4183176&i=0>]
> Sent: Wednesday, January 28, 2015 4:55 PM
> To: [hidden email] <http:///user/SendEmail.jtp?type=node&node=4183176&i=1>
> Subject: RE: Suggesting broken words with solr.WordBreakSolrSpellChecker
>
> I tried increasing my alternativeTermCount to 5 and enable extended
> results.
> I also added a filter fq parameter to clarify what I mean:
>
> *Querying for "go pro" is good:*
>
> {
>   "responseHeader": {
> "status": 0,
> "QTime": 2,
> "params": {
>   "q": "go pro",
>   "indent": "true",
>   "fq": "marchio:\"GO PRO\"",
>   "rows": "1",
>   "wt": "json",
>   "spellcheck.extendedResults": "true",
>   "_": "1422485581792"
> }
>   },
>   "response": {
> "numFound": 27,
> "start": 0,
> "docs": [
>   {
> "codice_produttore_s": "DK00150020",
> "codice_s": "5.BAT.27407",
> "id": "27407",
> "marchio": "GO PRO",
> "barcode_interno_s": "185323000958",
> "prezzo_acquisto_d": 16.12,
> "data_aggiornamento_dt": "2012-06-21T00:00:00Z",
> "descrizione": "BATTERIA GO PRO HERO ",
> "prezzo_vendita_d": 39.9,
> "categoria": "Batterie",
> "_version_": 1491583424191791000
>   },
>
>  
>
> ]
>   },
>   "spellcheck": {
> "suggestions": [
>   "go pro",
>   {
> "numFound": 1,
> "startOffset": 0,
> "endOffset": 6,
> "origFreq": 433,
> "suggestion": [
>   {
> "word": "gopro",
> "freq": 2
>   }
> ]
>   },
>   "correctlySpelled",
>   false,
>   "collation",
>   [
> "collationQuery",
> "gopro",
> "hits",
> 3,
> "misspellingsAndCorrections",
> [
>   "go pro",
>   "gopro"
> ]
>   ]
> ]
>   }
> }
>
> While querying for "gopro" is not:
>
> {
>   "responseHeader": {
> "status": 0,
> "QTime": 6,
> "params": {
>   "q": "gopro",
>   "indent": "true",
>   "fq": "marchio:\"GO PRO\"",
>   "rows": "1",
>   "wt": "json",
>   "spellcheck.extendedResults": "true",
>   "_": "1422485629480"
> }
>   },
>   "response": {
> "numFound": 3,
> "start": 0,
> "docs": [
>   {
> "codice_produttore_s": "DK0030010",
> "codice_s": "5.VID.39163",
> "id": "38814",
> "marchio": "GO PRO",
> "barcode_interno_s": "818279012477",
> "prezzo_acquisto_d": 150.84,
> "data_aggiornamento_dt": "2014-12-24T00:00:00Z",
> "descrizione": "VIDEOCAMERA GO-PRO HERO 3 WHITE NUOVO SLIM",
> "prezzo_vendita_d": 219,
> "categoria": "Fotografia",
> "_version_": 1491583425479442400
>   },
> 
> ]
>   },
>   "spellcheck": {
> "suggestions": [
>   "gopro",
>   {
>

RE: Suggesting broken words with solr.WordBreakSolrSpellChecker

2015-01-28 Thread fabio.bozzo
I tried increasing my alternativeTermCount to 5 and enable extended results.
I also added a filter fq parameter to clarify what I mean:

*Querying for "go pro" is good:*

{
  "responseHeader": {
"status": 0,
"QTime": 2,
"params": {
  "q": "go pro",
  "indent": "true",
  "fq": "marchio:\"GO PRO\"",
  "rows": "1",
  "wt": "json",
  "spellcheck.extendedResults": "true",
  "_": "1422485581792"
}
  },
  "response": {
"numFound": 27,
"start": 0,
"docs": [
  {
"codice_produttore_s": "DK00150020",
"codice_s": "5.BAT.27407",
"id": "27407",
"marchio": "GO PRO",
"barcode_interno_s": "185323000958",
"prezzo_acquisto_d": 16.12,
"data_aggiornamento_dt": "2012-06-21T00:00:00Z",
"descrizione": "BATTERIA GO PRO HERO ",
"prezzo_vendita_d": 39.9,
"categoria": "Batterie",
"_version_": 1491583424191791000
  },

 

]
  },
  "spellcheck": {
"suggestions": [
  "go pro",
  {
"numFound": 1,
"startOffset": 0,
"endOffset": 6,
"origFreq": 433,
"suggestion": [
  {
"word": "gopro",
"freq": 2
  }
]
  },
  "correctlySpelled",
  false,
  "collation",
  [
"collationQuery",
"gopro",
"hits",
3,
"misspellingsAndCorrections",
[
  "go pro",
  "gopro"
]
  ]
]
  }
}

While querying for "gopro" is not:

{
  "responseHeader": {
"status": 0,
"QTime": 6,
"params": {
  "q": "gopro",
  "indent": "true",
  "fq": "marchio:\"GO PRO\"",
  "rows": "1",
  "wt": "json",
  "spellcheck.extendedResults": "true",
  "_": "1422485629480"
}
  },
  "response": {
"numFound": 3,
"start": 0,
"docs": [
  {
"codice_produttore_s": "DK0030010",
"codice_s": "5.VID.39163",
"id": "38814",
"marchio": "GO PRO",
"barcode_interno_s": "818279012477",
"prezzo_acquisto_d": 150.84,
"data_aggiornamento_dt": "2014-12-24T00:00:00Z",
"descrizione": "VIDEOCAMERA GO-PRO HERO 3 WHITE NUOVO SLIM",
"prezzo_vendita_d": 219,
"categoria": "Fotografia",
"_version_": 1491583425479442400
  },

]
  },
  "spellcheck": {
"suggestions": [
  "gopro",
  {
"numFound": 1,
"startOffset": 0,
"endOffset": 5,
"origFreq": 2,
"suggestion": [
  {
"word": "giro",
"freq": 6
  }
]
  },
  "correctlySpelled",
  false
]
  }
}

---

I'd like "go pro" as a suggestion for "gopro" too.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Suggesting-broken-words-with-solr-WordBreakSolrSpellChecker-tp4182172p4182735.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Suggesting broken words with solr.WordBreakSolrSpellChecker

2015-01-27 Thread fabio.bozzo
I have this in my solrconfig:




explicit
10
catch_all

on
default
wordbreak
false
5
2
100
true
true
5
3



spellcheck




Although my spellchecker does work, suggesting for misspelled terms, it
doesn't work for the example above:
I mean terms which are both valid, ("gopro"=100 docs; "go pro"=150 'others'
docs).
I want to suggest "gopro" for "go pro" search term and vice-versa, even if
they're both perfectly valid terms in the index. Thank you



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Suggesting-broken-words-with-solr-WordBreakSolrSpellChecker-tp4182172p4182398.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Suggesting broken words with solr.WordBreakSolrSpellChecker

2015-01-27 Thread fabio.bozzo
Good, I'll try.
But imagine I have 100 documents containing "go pro" and 150 documents
containing "gopro".
Suggestions of the "other" term do not come up in any case.

2015-01-27 16:21 GMT+01:00 Dyer, James-2 [via Lucene] <
ml-node+s472066n4182254...@n3.nabble.com>:

> I think the word break spellchecker will do what you want.  But, if I were
> you, I'd dial back "maxChanges" to 1 or 2.  You don't want it slicing a
> word into 10 parts or trying to combine 10 adjacent words.  You also need
> the "minBreakLength" to be no more than 2, if you want it to break "go"
> (length=2) off of "gopro".
>
> James Dyer
> Ingram Content Group
>
>
> -Original Message-
> From: fabio.bozzo [mailto:[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=4182254&i=0>]
> Sent: Tuesday, January 27, 2015 2:58 AM
> To: [hidden email] <http:///user/SendEmail.jtp?type=node&node=4182254&i=1>
> Subject: Suggesting broken words with solr.WordBreakSolrSpellChecker
>
> I indexed an electronics e-commerce product catalog.
>
> This is a typical document from my collection:
>
>
> "docs": [
>   {
> "prezzo_vendita_d": 39.9,
> "codice_produttore_s": "DK00150020",
> "codice_s": "5.BAT.27407",
> "descrizione": "BATTERIA GO PRO HERO ",
> "barcode_interno_s": "185323000958",
> "categoria": "Batterie",
> "prezzo_acquisto_d": 16.12,
> "marchio": "GO PRO",
> "data_aggiornamento_dt": "2012-06-21T00:00:00Z",
> "id": "27407",
> "_version_": 1491274123542790100
>   },
>   {
> "codice_produttore_s": "DK0052043",
> "codice_s": "05.SP.42760",
> "id": "42760",
> "marchio": "SP GADGETS",
> "barcode_interno_s": "4028017520430",
> "prezzo_acquisto_d": 34.4,
> "data_aggiornamento_dt": "2014-11-04T00:00:00Z",
> "descrizione": "SP POS CASE GOPRO OLIVE LARGE",
> "prezzo_vendita_d": 59.95,
> "_version_": 1491274406746390500
>   }
> ...]
> I want my spellchecker to suggest "go pro" to users searching "gopro"
> (without whitespace).
>
> I also want users searching "go pro" to find "gopro" products, too.
>
> Here's a little bit of my configuration:
>
> *schema.xml*
> 
>  stored="true"/>
>  stored="true"/>
>  stored="true"/>
>
>  indexed="true"
> stored="false" multiValued="true" />
>  stored="false"
> multiValued="true" />
>
> 
> 
> 
> 
>
> 
> 
> 
> 
> ...
>
>  positionIncrementGap="100">
> 
> 
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
> preserveOriginal="1" />
>
>  ignoreCase="true"
> articles="lang/contractions_it.txt"/>
> 
> 
>  words="lang/stopwords_it.txt" format="snowball" />
> 
> 
> 
> 
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
> preserveOriginal="1" />
>
>  ignoreCase="true"
> articles="lang/contractions_it.txt"/>
> 
> 
>  words="lang/stopwords_it.txt" format="snowball" />
>
> 
>  synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> 
> 
>
> 
>
> *solr-config.xml*
> 
>
> 
> explicit
> 10
> catch_all
>
> on
> default
> wordbreak
> false
> 5
> 2
> 5
> true
>  

Suggesting broken words with solr.WordBreakSolrSpellChecker

2015-01-27 Thread fabio.bozzo
I indexed an electronics e-commerce product catalog.

This is a typical document from my collection:


"docs": [
  {
"prezzo_vendita_d": 39.9,
"codice_produttore_s": "DK00150020",
"codice_s": "5.BAT.27407",
"descrizione": "BATTERIA GO PRO HERO ",
"barcode_interno_s": "185323000958",
"categoria": "Batterie",
"prezzo_acquisto_d": 16.12,
"marchio": "GO PRO",
"data_aggiornamento_dt": "2012-06-21T00:00:00Z",
"id": "27407",
"_version_": 1491274123542790100
  },
  {
"codice_produttore_s": "DK0052043",
"codice_s": "05.SP.42760",
"id": "42760",
"marchio": "SP GADGETS",
"barcode_interno_s": "4028017520430",
"prezzo_acquisto_d": 34.4,
"data_aggiornamento_dt": "2014-11-04T00:00:00Z",
"descrizione": "SP POS CASE GOPRO OLIVE LARGE",
"prezzo_vendita_d": 59.95,
"_version_": 1491274406746390500
  }
...]
I want my spellchecker to suggest "go pro" to users searching "gopro"
(without whitespace).

I also want users searching "go pro" to find "gopro" products, too.

Here's a little bit of my configuration:

*schema.xml*

















...




























*solr-config.xml*



explicit
10
catch_all

on
default
wordbreak
false
5
2
5
true
true
5
3



spellcheck



...


text_general


default
catch_all_original
solr.DirectSolrSpellChecker
internal
0.5
2  
1
5
4
0.01



wordbreak
solr.WordBreakSolrSpellChecker  
catch_all_original
true
true
10
3





*Is the spellchecker the right solution or is this the case for something
else, like the "more like this" feature?*

Thank you



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Suggesting-broken-words-with-solr-WordBreakSolrSpellChecker-tp4182172.html
Sent from the Solr - User mailing list archive at Nabble.com.