Re: Auto-suggest in Solr

2015-07-12 Thread Zheng Lin Edwin Yeo
Thank you so much.

I'll read up on that and try that out.

Regards,
Edwin


On 12 July 2015 at 00:41, Erick Erickson erickerick...@gmail.com wrote:

 Cool! I've bookmarked it, much more thorough

 Erick

 On Sat, Jul 11, 2015 at 8:13 AM, Walter Underwood wun...@wunderwood.org
 wrote:
  Thanks, this is very helpful.
 
  Suggester config is quite under documented. It took me longer than I
 expected to get it working.
 
  wunder
  Walter Underwood
  wun...@wunderwood.org
  http://observer.wunderwood.org/  (my blog)
 
 
  On Jul 10, 2015, at 6:30 PM, Alessandro Benedetti 
 benedetti.ale...@gmail.com wrote:
 
  Hi guys,
  just wrote a blog to integrate Erick's post and to explain in details
 with
  practical examples all the main Lookup implementations :
 
  http://alexbenedetti.blogspot.co.uk/2015/07/solr-you-complete-me.html
 
  I think this can be useful for Edwin to finally fix the config for the
  FreeTextSuggester ( which finally I clarified Erick, thanks to Mike
 answer
  in dev, and deep code analysis and testing :) )
 
  Cheers
 
  2015-06-27 23:51 GMT+01:00 Alessandro Benedetti 
 benedetti.ale...@gmail.com
  :
 
  Thanks, Erick, i didn't have time to go again through the code.
  But i will forward this to the Dev list.
  Thank you for your time !
 
  Cheers
 
  2015-06-27 16:19 GMT+01:00 Erick Erickson erickerick...@gmail.com:
 
  Alessandro:
 
  Going to have to defer to Mike McCandless et.al., they're the
  authorities here. Don't quite know whether they monitor this list,
  consider the dev list?
 
  Best,
  Erick
 
  On Fri, Jun 26, 2015 at 4:53 AM, Alessandro Benedetti
  benedetti.ale...@gmail.com wrote:
  Up, Can anyone gently take a look to my considerations related the
  FreeText
  Suggester ?
  I am curious to have more insight.
  Eventually I will deeply analyse the code to understand my errors.
 
  Cheers
 
  2015-06-19 11:53 GMT+01:00 Alessandro Benedetti 
  benedetti.ale...@gmail.com
  :
 
  Actually the documentation is not clear enough.
  Let's try to understand this suggester.
 
  *Building*
  This suggester build a FST that it will use to provide the
 autocomplete
  feature running prefix searches on it .
  The terms it uses to generate the FST are the tokens produced by the
  suggestFreeTextAnalyzerFieldType .
 
  And this should be correct.
  So if we have a shingle token filter[1-3] ( we produce unigrams as
  well)
  in our analysis to keep it simple , from these original field
 values :
  mp3 ipod
  mp3 player
  mp3 player ipod
  player of Real
 
  - we produce these list of possible suggestions in our FST :
 
  mp3
  player
  ipod
  real
  of
 
  mp3 ipod
  mp3 player
  player ipod
  player of
  of real
 
  mp3 player ipod
  player of real
 
  From the documentation I read :
 
   ngrams: The max number of tokens out of which singles will be
 make
  the
  dictionary. The default value is 2. Increasing this would mean you
  want
  more than the previous 2 tokens to be taken into consideration when
  making
  the suggestions. 
 
 
  This makes me confused, as I was not expecting this param to affect
 the
  suggestion dictionary.
  So I would like a clarification here from our masters :)
  At this point let's see what happens at query time .
 
  *Query Time *
  As my understanding the ngrams params will consider  the last N-1
  tokens
  the user put separated by the space separator.
 
  Builds an ngram model from the text sent to {@link
  * #build} and predicts based on the last grams-1 tokens in
  * the request sent to {@link #lookup}. This tries to
  * handle the long tail of suggestions for when the
  * incoming query is a never before seen query string.
 
 
  Example , grams=3 should consider only the last 2 tokens
 
  special mp3 p - mp3 p
 
  Then this query is analysed using the
  suggestFreeTextAnalyzerFieldType .
  We produce 3 tokens :
  mp3
  p
  mp3 p
 
  And we run the prefix matching on the FST .
 
  *Conclusion*
  My understanding is wrong for sure at some point, as the behaviour I
  get
  is different.
  Can we discuss this , clarify this and eventually put it in the
  official
  documentation ?
 
  Cheers
 
  2015-06-19 6:40 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com
 :
 
  I'm implementing an auto-suggest feature in Solr, and I'll like to
  achieve
  the follwing:
 
  For example, if the user enters mp3, Solr might suggest mp3
  player,
  mp3 nano and mp3 music.
  When the user enters mp3 p, the suggestion should narrow down to
  mp3
  player.
 
  Currently, when I type mp3 p, the suggester is returning words
 that
  starts with the letter p only, and I'm getting results like
 plan,
  production, etc, and it does not take the mp3 token into
  consideration.
 
  I'm using Solr 5.1 and below is my configuration:
 
  In solrconfig.xml:
 
  searchComponent name=suggest class=solr.SuggestComponent
   lst name=suggester
 
  str name=lookupImplFreeTextLookupFactory/str
  str name=indexPathsuggester_freetext_dir/str
 
  str name

Re: Auto-suggest in Solr

2015-07-11 Thread Walter Underwood
Thanks, this is very helpful.

Suggester config is quite under documented. It took me longer than I expected 
to get it working.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


On Jul 10, 2015, at 6:30 PM, Alessandro Benedetti benedetti.ale...@gmail.com 
wrote:

 Hi guys,
 just wrote a blog to integrate Erick's post and to explain in details with
 practical examples all the main Lookup implementations :
 
 http://alexbenedetti.blogspot.co.uk/2015/07/solr-you-complete-me.html
 
 I think this can be useful for Edwin to finally fix the config for the
 FreeTextSuggester ( which finally I clarified Erick, thanks to Mike answer
 in dev, and deep code analysis and testing :) )
 
 Cheers
 
 2015-06-27 23:51 GMT+01:00 Alessandro Benedetti benedetti.ale...@gmail.com
 :
 
 Thanks, Erick, i didn't have time to go again through the code.
 But i will forward this to the Dev list.
 Thank you for your time !
 
 Cheers
 
 2015-06-27 16:19 GMT+01:00 Erick Erickson erickerick...@gmail.com:
 
 Alessandro:
 
 Going to have to defer to Mike McCandless et.al., they're the
 authorities here. Don't quite know whether they monitor this list,
 consider the dev list?
 
 Best,
 Erick
 
 On Fri, Jun 26, 2015 at 4:53 AM, Alessandro Benedetti
 benedetti.ale...@gmail.com wrote:
 Up, Can anyone gently take a look to my considerations related the
 FreeText
 Suggester ?
 I am curious to have more insight.
 Eventually I will deeply analyse the code to understand my errors.
 
 Cheers
 
 2015-06-19 11:53 GMT+01:00 Alessandro Benedetti 
 benedetti.ale...@gmail.com
 :
 
 Actually the documentation is not clear enough.
 Let's try to understand this suggester.
 
 *Building*
 This suggester build a FST that it will use to provide the autocomplete
 feature running prefix searches on it .
 The terms it uses to generate the FST are the tokens produced by the
 suggestFreeTextAnalyzerFieldType .
 
 And this should be correct.
 So if we have a shingle token filter[1-3] ( we produce unigrams as
 well)
 in our analysis to keep it simple , from these original field values :
 mp3 ipod
 mp3 player
 mp3 player ipod
 player of Real
 
 - we produce these list of possible suggestions in our FST :
 
 mp3
 player
 ipod
 real
 of
 
 mp3 ipod
 mp3 player
 player ipod
 player of
 of real
 
 mp3 player ipod
 player of real
 
 From the documentation I read :
 
  ngrams: The max number of tokens out of which singles will be make
 the
 dictionary. The default value is 2. Increasing this would mean you
 want
 more than the previous 2 tokens to be taken into consideration when
 making
 the suggestions. 
 
 
 This makes me confused, as I was not expecting this param to affect the
 suggestion dictionary.
 So I would like a clarification here from our masters :)
 At this point let's see what happens at query time .
 
 *Query Time *
 As my understanding the ngrams params will consider  the last N-1
 tokens
 the user put separated by the space separator.
 
 Builds an ngram model from the text sent to {@link
 * #build} and predicts based on the last grams-1 tokens in
 * the request sent to {@link #lookup}. This tries to
 * handle the long tail of suggestions for when the
 * incoming query is a never before seen query string.
 
 
 Example , grams=3 should consider only the last 2 tokens
 
 special mp3 p - mp3 p
 
 Then this query is analysed using the
 suggestFreeTextAnalyzerFieldType .
 We produce 3 tokens :
 mp3
 p
 mp3 p
 
 And we run the prefix matching on the FST .
 
 *Conclusion*
 My understanding is wrong for sure at some point, as the behaviour I
 get
 is different.
 Can we discuss this , clarify this and eventually put it in the
 official
 documentation ?
 
 Cheers
 
 2015-06-19 6:40 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:
 
 I'm implementing an auto-suggest feature in Solr, and I'll like to
 achieve
 the follwing:
 
 For example, if the user enters mp3, Solr might suggest mp3
 player,
 mp3 nano and mp3 music.
 When the user enters mp3 p, the suggestion should narrow down to
 mp3
 player.
 
 Currently, when I type mp3 p, the suggester is returning words that
 starts with the letter p only, and I'm getting results like plan,
 production, etc, and it does not take the mp3 token into
 consideration.
 
 I'm using Solr 5.1 and below is my configuration:
 
 In solrconfig.xml:
 
 searchComponent name=suggest class=solr.SuggestComponent
  lst name=suggester
 
 str name=lookupImplFreeTextLookupFactory/str
 str name=indexPathsuggester_freetext_dir/str
 
 str name=dictionaryImplDocumentDictionaryFactory/str
 str name=fieldSuggestion/str
 str name=weightFieldProject/str
 str name=suggestFreeTextAnalyzerFieldTypesuggestType/str
 int name=ngrams5/int
 str name=buildOnStartupfalse/str
 str name=buildOnCommitfalse/str
  /lst
 /searchComponent
 
 
 In schema.xml
 
 fieldType name=suggestType class=solr.TextField
 positionIncrementGap=100
 analyzer type=index
 charFilter class

Re: Auto-suggest in Solr

2015-07-11 Thread Erick Erickson
Cool! I've bookmarked it, much more thorough

Erick

On Sat, Jul 11, 2015 at 8:13 AM, Walter Underwood wun...@wunderwood.org wrote:
 Thanks, this is very helpful.

 Suggester config is quite under documented. It took me longer than I expected 
 to get it working.

 wunder
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/  (my blog)


 On Jul 10, 2015, at 6:30 PM, Alessandro Benedetti 
 benedetti.ale...@gmail.com wrote:

 Hi guys,
 just wrote a blog to integrate Erick's post and to explain in details with
 practical examples all the main Lookup implementations :

 http://alexbenedetti.blogspot.co.uk/2015/07/solr-you-complete-me.html

 I think this can be useful for Edwin to finally fix the config for the
 FreeTextSuggester ( which finally I clarified Erick, thanks to Mike answer
 in dev, and deep code analysis and testing :) )

 Cheers

 2015-06-27 23:51 GMT+01:00 Alessandro Benedetti benedetti.ale...@gmail.com
 :

 Thanks, Erick, i didn't have time to go again through the code.
 But i will forward this to the Dev list.
 Thank you for your time !

 Cheers

 2015-06-27 16:19 GMT+01:00 Erick Erickson erickerick...@gmail.com:

 Alessandro:

 Going to have to defer to Mike McCandless et.al., they're the
 authorities here. Don't quite know whether they monitor this list,
 consider the dev list?

 Best,
 Erick

 On Fri, Jun 26, 2015 at 4:53 AM, Alessandro Benedetti
 benedetti.ale...@gmail.com wrote:
 Up, Can anyone gently take a look to my considerations related the
 FreeText
 Suggester ?
 I am curious to have more insight.
 Eventually I will deeply analyse the code to understand my errors.

 Cheers

 2015-06-19 11:53 GMT+01:00 Alessandro Benedetti 
 benedetti.ale...@gmail.com
 :

 Actually the documentation is not clear enough.
 Let's try to understand this suggester.

 *Building*
 This suggester build a FST that it will use to provide the autocomplete
 feature running prefix searches on it .
 The terms it uses to generate the FST are the tokens produced by the
 suggestFreeTextAnalyzerFieldType .

 And this should be correct.
 So if we have a shingle token filter[1-3] ( we produce unigrams as
 well)
 in our analysis to keep it simple , from these original field values :
 mp3 ipod
 mp3 player
 mp3 player ipod
 player of Real

 - we produce these list of possible suggestions in our FST :

 mp3
 player
 ipod
 real
 of

 mp3 ipod
 mp3 player
 player ipod
 player of
 of real

 mp3 player ipod
 player of real

 From the documentation I read :

  ngrams: The max number of tokens out of which singles will be make
 the
 dictionary. The default value is 2. Increasing this would mean you
 want
 more than the previous 2 tokens to be taken into consideration when
 making
 the suggestions. 


 This makes me confused, as I was not expecting this param to affect the
 suggestion dictionary.
 So I would like a clarification here from our masters :)
 At this point let's see what happens at query time .

 *Query Time *
 As my understanding the ngrams params will consider  the last N-1
 tokens
 the user put separated by the space separator.

 Builds an ngram model from the text sent to {@link
 * #build} and predicts based on the last grams-1 tokens in
 * the request sent to {@link #lookup}. This tries to
 * handle the long tail of suggestions for when the
 * incoming query is a never before seen query string.


 Example , grams=3 should consider only the last 2 tokens

 special mp3 p - mp3 p

 Then this query is analysed using the
 suggestFreeTextAnalyzerFieldType .
 We produce 3 tokens :
 mp3
 p
 mp3 p

 And we run the prefix matching on the FST .

 *Conclusion*
 My understanding is wrong for sure at some point, as the behaviour I
 get
 is different.
 Can we discuss this , clarify this and eventually put it in the
 official
 documentation ?

 Cheers

 2015-06-19 6:40 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:

 I'm implementing an auto-suggest feature in Solr, and I'll like to
 achieve
 the follwing:

 For example, if the user enters mp3, Solr might suggest mp3
 player,
 mp3 nano and mp3 music.
 When the user enters mp3 p, the suggestion should narrow down to
 mp3
 player.

 Currently, when I type mp3 p, the suggester is returning words that
 starts with the letter p only, and I'm getting results like plan,
 production, etc, and it does not take the mp3 token into
 consideration.

 I'm using Solr 5.1 and below is my configuration:

 In solrconfig.xml:

 searchComponent name=suggest class=solr.SuggestComponent
  lst name=suggester

 str name=lookupImplFreeTextLookupFactory/str
 str name=indexPathsuggester_freetext_dir/str

 str name=dictionaryImplDocumentDictionaryFactory/str
 str name=fieldSuggestion/str
 str name=weightFieldProject/str
 str name=suggestFreeTextAnalyzerFieldTypesuggestType/str
 int name=ngrams5/int
 str name=buildOnStartupfalse/str
 str name=buildOnCommitfalse/str
  /lst
 /searchComponent


 In schema.xml

 fieldType name=suggestType class

Re: Auto-suggest in Solr

2015-07-10 Thread Alessandro Benedetti
Hi guys,
just wrote a blog to integrate Erick's post and to explain in details with
practical examples all the main Lookup implementations :

http://alexbenedetti.blogspot.co.uk/2015/07/solr-you-complete-me.html

I think this can be useful for Edwin to finally fix the config for the
FreeTextSuggester ( which finally I clarified Erick, thanks to Mike answer
in dev, and deep code analysis and testing :) )

Cheers

2015-06-27 23:51 GMT+01:00 Alessandro Benedetti benedetti.ale...@gmail.com
:

 Thanks, Erick, i didn't have time to go again through the code.
 But i will forward this to the Dev list.
 Thank you for your time !

 Cheers

 2015-06-27 16:19 GMT+01:00 Erick Erickson erickerick...@gmail.com:

 Alessandro:

 Going to have to defer to Mike McCandless et.al., they're the
 authorities here. Don't quite know whether they monitor this list,
 consider the dev list?

 Best,
 Erick

 On Fri, Jun 26, 2015 at 4:53 AM, Alessandro Benedetti
 benedetti.ale...@gmail.com wrote:
  Up, Can anyone gently take a look to my considerations related the
 FreeText
  Suggester ?
  I am curious to have more insight.
  Eventually I will deeply analyse the code to understand my errors.
 
  Cheers
 
  2015-06-19 11:53 GMT+01:00 Alessandro Benedetti 
 benedetti.ale...@gmail.com
  :
 
  Actually the documentation is not clear enough.
  Let's try to understand this suggester.
 
  *Building*
  This suggester build a FST that it will use to provide the autocomplete
  feature running prefix searches on it .
  The terms it uses to generate the FST are the tokens produced by the
   suggestFreeTextAnalyzerFieldType .
 
  And this should be correct.
  So if we have a shingle token filter[1-3] ( we produce unigrams as
 well)
  in our analysis to keep it simple , from these original field values :
  mp3 ipod
  mp3 player
  mp3 player ipod
  player of Real
 
  - we produce these list of possible suggestions in our FST :
 
  mp3
  player
  ipod
  real
  of
 
  mp3 ipod
  mp3 player
  player ipod
  player of
  of real
 
  mp3 player ipod
  player of real
 
  From the documentation I read :
 
   ngrams: The max number of tokens out of which singles will be make
 the
  dictionary. The default value is 2. Increasing this would mean you
 want
  more than the previous 2 tokens to be taken into consideration when
 making
  the suggestions. 
 
 
  This makes me confused, as I was not expecting this param to affect the
  suggestion dictionary.
  So I would like a clarification here from our masters :)
  At this point let's see what happens at query time .
 
  *Query Time *
  As my understanding the ngrams params will consider  the last N-1
 tokens
  the user put separated by the space separator.
 
  Builds an ngram model from the text sent to {@link
  * #build} and predicts based on the last grams-1 tokens in
  * the request sent to {@link #lookup}. This tries to
  * handle the long tail of suggestions for when the
  * incoming query is a never before seen query string.
 
 
  Example , grams=3 should consider only the last 2 tokens
 
  special mp3 p - mp3 p
 
  Then this query is analysed using the
 suggestFreeTextAnalyzerFieldType .
  We produce 3 tokens :
  mp3
  p
  mp3 p
 
  And we run the prefix matching on the FST .
 
  *Conclusion*
  My understanding is wrong for sure at some point, as the behaviour I
 get
  is different.
  Can we discuss this , clarify this and eventually put it in the
 official
  documentation ?
 
  Cheers
 
  2015-06-19 6:40 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:
 
  I'm implementing an auto-suggest feature in Solr, and I'll like to
 achieve
  the follwing:
 
  For example, if the user enters mp3, Solr might suggest mp3
 player,
  mp3 nano and mp3 music.
  When the user enters mp3 p, the suggestion should narrow down to
 mp3
  player.
 
  Currently, when I type mp3 p, the suggester is returning words that
  starts with the letter p only, and I'm getting results like plan,
  production, etc, and it does not take the mp3 token into
  consideration.
 
  I'm using Solr 5.1 and below is my configuration:
 
  In solrconfig.xml:
 
  searchComponent name=suggest class=solr.SuggestComponent
lst name=suggester
 
   str name=lookupImplFreeTextLookupFactory/str
   str name=indexPathsuggester_freetext_dir/str
 
  str name=dictionaryImplDocumentDictionaryFactory/str
  str name=fieldSuggestion/str
  str name=weightFieldProject/str
  str name=suggestFreeTextAnalyzerFieldTypesuggestType/str
  int name=ngrams5/int
  str name=buildOnStartupfalse/str
  str name=buildOnCommitfalse/str
/lst
  /searchComponent
 
 
  In schema.xml
 
  fieldType name=suggestType class=solr.TextField
  positionIncrementGap=100
  analyzer type=index
  charFilter class=solr.PatternReplaceCharFilterFactory
  pattern=[^a-zA-Z0-9] replacement=  /
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.ShingleFilterFactory minShingleSize=2
  maxShingleSize=6 outputUnigrams=false/
  /analyzer
  analyzer type

Re: Auto-suggest in Solr

2015-06-27 Thread Erick Erickson
Alessandro:

Going to have to defer to Mike McCandless et.al., they're the
authorities here. Don't quite know whether they monitor this list,
consider the dev list?

Best,
Erick

On Fri, Jun 26, 2015 at 4:53 AM, Alessandro Benedetti
benedetti.ale...@gmail.com wrote:
 Up, Can anyone gently take a look to my considerations related the FreeText
 Suggester ?
 I am curious to have more insight.
 Eventually I will deeply analyse the code to understand my errors.

 Cheers

 2015-06-19 11:53 GMT+01:00 Alessandro Benedetti benedetti.ale...@gmail.com
 :

 Actually the documentation is not clear enough.
 Let's try to understand this suggester.

 *Building*
 This suggester build a FST that it will use to provide the autocomplete
 feature running prefix searches on it .
 The terms it uses to generate the FST are the tokens produced by the
  suggestFreeTextAnalyzerFieldType .

 And this should be correct.
 So if we have a shingle token filter[1-3] ( we produce unigrams as well)
 in our analysis to keep it simple , from these original field values :
 mp3 ipod
 mp3 player
 mp3 player ipod
 player of Real

 - we produce these list of possible suggestions in our FST :

 mp3
 player
 ipod
 real
 of

 mp3 ipod
 mp3 player
 player ipod
 player of
 of real

 mp3 player ipod
 player of real

 From the documentation I read :

  ngrams: The max number of tokens out of which singles will be make the
 dictionary. The default value is 2. Increasing this would mean you want
 more than the previous 2 tokens to be taken into consideration when making
 the suggestions. 


 This makes me confused, as I was not expecting this param to affect the
 suggestion dictionary.
 So I would like a clarification here from our masters :)
 At this point let's see what happens at query time .

 *Query Time *
 As my understanding the ngrams params will consider  the last N-1 tokens
 the user put separated by the space separator.

 Builds an ngram model from the text sent to {@link
 * #build} and predicts based on the last grams-1 tokens in
 * the request sent to {@link #lookup}. This tries to
 * handle the long tail of suggestions for when the
 * incoming query is a never before seen query string.


 Example , grams=3 should consider only the last 2 tokens

 special mp3 p - mp3 p

 Then this query is analysed using the suggestFreeTextAnalyzerFieldType .
 We produce 3 tokens :
 mp3
 p
 mp3 p

 And we run the prefix matching on the FST .

 *Conclusion*
 My understanding is wrong for sure at some point, as the behaviour I get
 is different.
 Can we discuss this , clarify this and eventually put it in the official
 documentation ?

 Cheers

 2015-06-19 6:40 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:

 I'm implementing an auto-suggest feature in Solr, and I'll like to achieve
 the follwing:

 For example, if the user enters mp3, Solr might suggest mp3 player,
 mp3 nano and mp3 music.
 When the user enters mp3 p, the suggestion should narrow down to mp3
 player.

 Currently, when I type mp3 p, the suggester is returning words that
 starts with the letter p only, and I'm getting results like plan,
 production, etc, and it does not take the mp3 token into
 consideration.

 I'm using Solr 5.1 and below is my configuration:

 In solrconfig.xml:

 searchComponent name=suggest class=solr.SuggestComponent
   lst name=suggester

  str name=lookupImplFreeTextLookupFactory/str
  str name=indexPathsuggester_freetext_dir/str

 str name=dictionaryImplDocumentDictionaryFactory/str
 str name=fieldSuggestion/str
 str name=weightFieldProject/str
 str name=suggestFreeTextAnalyzerFieldTypesuggestType/str
 int name=ngrams5/int
 str name=buildOnStartupfalse/str
 str name=buildOnCommitfalse/str
   /lst
 /searchComponent


 In schema.xml

 fieldType name=suggestType class=solr.TextField
 positionIncrementGap=100
 analyzer type=index
 charFilter class=solr.PatternReplaceCharFilterFactory
 pattern=[^a-zA-Z0-9] replacement=  /
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.ShingleFilterFactory minShingleSize=2
 maxShingleSize=6 outputUnigrams=false/
 /analyzer
 analyzer type=query
 charFilter class=solr.PatternReplaceCharFilterFactory
 pattern=[^a-zA-Z0-9] replacement=  /
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.ShingleFilterFactory minShingleSize=2
 maxShingleSize=6 outputUnigrams=true/
 /analyzer
 /fieldType


 Is there anything that I configured wrongly?


 Regards,
 Edwin




 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England




 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry

Re: Auto-suggest in Solr

2015-06-27 Thread Alessandro Benedetti
Thanks, Erick, i didn't have time to go again through the code.
But i will forward this to the Dev list.
Thank you for your time !

Cheers

2015-06-27 16:19 GMT+01:00 Erick Erickson erickerick...@gmail.com:

 Alessandro:

 Going to have to defer to Mike McCandless et.al., they're the
 authorities here. Don't quite know whether they monitor this list,
 consider the dev list?

 Best,
 Erick

 On Fri, Jun 26, 2015 at 4:53 AM, Alessandro Benedetti
 benedetti.ale...@gmail.com wrote:
  Up, Can anyone gently take a look to my considerations related the
 FreeText
  Suggester ?
  I am curious to have more insight.
  Eventually I will deeply analyse the code to understand my errors.
 
  Cheers
 
  2015-06-19 11:53 GMT+01:00 Alessandro Benedetti 
 benedetti.ale...@gmail.com
  :
 
  Actually the documentation is not clear enough.
  Let's try to understand this suggester.
 
  *Building*
  This suggester build a FST that it will use to provide the autocomplete
  feature running prefix searches on it .
  The terms it uses to generate the FST are the tokens produced by the
   suggestFreeTextAnalyzerFieldType .
 
  And this should be correct.
  So if we have a shingle token filter[1-3] ( we produce unigrams as well)
  in our analysis to keep it simple , from these original field values :
  mp3 ipod
  mp3 player
  mp3 player ipod
  player of Real
 
  - we produce these list of possible suggestions in our FST :
 
  mp3
  player
  ipod
  real
  of
 
  mp3 ipod
  mp3 player
  player ipod
  player of
  of real
 
  mp3 player ipod
  player of real
 
  From the documentation I read :
 
   ngrams: The max number of tokens out of which singles will be make
 the
  dictionary. The default value is 2. Increasing this would mean you want
  more than the previous 2 tokens to be taken into consideration when
 making
  the suggestions. 
 
 
  This makes me confused, as I was not expecting this param to affect the
  suggestion dictionary.
  So I would like a clarification here from our masters :)
  At this point let's see what happens at query time .
 
  *Query Time *
  As my understanding the ngrams params will consider  the last N-1 tokens
  the user put separated by the space separator.
 
  Builds an ngram model from the text sent to {@link
  * #build} and predicts based on the last grams-1 tokens in
  * the request sent to {@link #lookup}. This tries to
  * handle the long tail of suggestions for when the
  * incoming query is a never before seen query string.
 
 
  Example , grams=3 should consider only the last 2 tokens
 
  special mp3 p - mp3 p
 
  Then this query is analysed using the
 suggestFreeTextAnalyzerFieldType .
  We produce 3 tokens :
  mp3
  p
  mp3 p
 
  And we run the prefix matching on the FST .
 
  *Conclusion*
  My understanding is wrong for sure at some point, as the behaviour I get
  is different.
  Can we discuss this , clarify this and eventually put it in the official
  documentation ?
 
  Cheers
 
  2015-06-19 6:40 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:
 
  I'm implementing an auto-suggest feature in Solr, and I'll like to
 achieve
  the follwing:
 
  For example, if the user enters mp3, Solr might suggest mp3 player,
  mp3 nano and mp3 music.
  When the user enters mp3 p, the suggestion should narrow down to mp3
  player.
 
  Currently, when I type mp3 p, the suggester is returning words that
  starts with the letter p only, and I'm getting results like plan,
  production, etc, and it does not take the mp3 token into
  consideration.
 
  I'm using Solr 5.1 and below is my configuration:
 
  In solrconfig.xml:
 
  searchComponent name=suggest class=solr.SuggestComponent
lst name=suggester
 
   str name=lookupImplFreeTextLookupFactory/str
   str name=indexPathsuggester_freetext_dir/str
 
  str name=dictionaryImplDocumentDictionaryFactory/str
  str name=fieldSuggestion/str
  str name=weightFieldProject/str
  str name=suggestFreeTextAnalyzerFieldTypesuggestType/str
  int name=ngrams5/int
  str name=buildOnStartupfalse/str
  str name=buildOnCommitfalse/str
/lst
  /searchComponent
 
 
  In schema.xml
 
  fieldType name=suggestType class=solr.TextField
  positionIncrementGap=100
  analyzer type=index
  charFilter class=solr.PatternReplaceCharFilterFactory
  pattern=[^a-zA-Z0-9] replacement=  /
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.ShingleFilterFactory minShingleSize=2
  maxShingleSize=6 outputUnigrams=false/
  /analyzer
  analyzer type=query
  charFilter class=solr.PatternReplaceCharFilterFactory
  pattern=[^a-zA-Z0-9] replacement=  /
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.ShingleFilterFactory minShingleSize=2
  maxShingleSize=6 outputUnigrams=true/
  /analyzer
  /fieldType
 
 
  Is there anything that I configured wrongly?
 
 
  Regards,
  Edwin
 
 
 
 
  --
  --
 
  Benedetti Alessandro
  Visiting card : http://about.me/alessandro_benedetti
 
  Tyger, tyger burning bright

Re: Auto-suggest in Solr

2015-06-26 Thread Alessandro Benedetti
Up, Can anyone gently take a look to my considerations related the FreeText
Suggester ?
I am curious to have more insight.
Eventually I will deeply analyse the code to understand my errors.

Cheers

2015-06-19 11:53 GMT+01:00 Alessandro Benedetti benedetti.ale...@gmail.com
:

 Actually the documentation is not clear enough.
 Let's try to understand this suggester.

 *Building*
 This suggester build a FST that it will use to provide the autocomplete
 feature running prefix searches on it .
 The terms it uses to generate the FST are the tokens produced by the
  suggestFreeTextAnalyzerFieldType .

 And this should be correct.
 So if we have a shingle token filter[1-3] ( we produce unigrams as well)
 in our analysis to keep it simple , from these original field values :
 mp3 ipod
 mp3 player
 mp3 player ipod
 player of Real

 - we produce these list of possible suggestions in our FST :

 mp3
 player
 ipod
 real
 of

 mp3 ipod
 mp3 player
 player ipod
 player of
 of real

 mp3 player ipod
 player of real

 From the documentation I read :

  ngrams: The max number of tokens out of which singles will be make the
 dictionary. The default value is 2. Increasing this would mean you want
 more than the previous 2 tokens to be taken into consideration when making
 the suggestions. 


 This makes me confused, as I was not expecting this param to affect the
 suggestion dictionary.
 So I would like a clarification here from our masters :)
 At this point let's see what happens at query time .

 *Query Time *
 As my understanding the ngrams params will consider  the last N-1 tokens
 the user put separated by the space separator.

 Builds an ngram model from the text sent to {@link
 * #build} and predicts based on the last grams-1 tokens in
 * the request sent to {@link #lookup}. This tries to
 * handle the long tail of suggestions for when the
 * incoming query is a never before seen query string.


 Example , grams=3 should consider only the last 2 tokens

 special mp3 p - mp3 p

 Then this query is analysed using the suggestFreeTextAnalyzerFieldType .
 We produce 3 tokens :
 mp3
 p
 mp3 p

 And we run the prefix matching on the FST .

 *Conclusion*
 My understanding is wrong for sure at some point, as the behaviour I get
 is different.
 Can we discuss this , clarify this and eventually put it in the official
 documentation ?

 Cheers

 2015-06-19 6:40 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:

 I'm implementing an auto-suggest feature in Solr, and I'll like to achieve
 the follwing:

 For example, if the user enters mp3, Solr might suggest mp3 player,
 mp3 nano and mp3 music.
 When the user enters mp3 p, the suggestion should narrow down to mp3
 player.

 Currently, when I type mp3 p, the suggester is returning words that
 starts with the letter p only, and I'm getting results like plan,
 production, etc, and it does not take the mp3 token into
 consideration.

 I'm using Solr 5.1 and below is my configuration:

 In solrconfig.xml:

 searchComponent name=suggest class=solr.SuggestComponent
   lst name=suggester

  str name=lookupImplFreeTextLookupFactory/str
  str name=indexPathsuggester_freetext_dir/str

 str name=dictionaryImplDocumentDictionaryFactory/str
 str name=fieldSuggestion/str
 str name=weightFieldProject/str
 str name=suggestFreeTextAnalyzerFieldTypesuggestType/str
 int name=ngrams5/int
 str name=buildOnStartupfalse/str
 str name=buildOnCommitfalse/str
   /lst
 /searchComponent


 In schema.xml

 fieldType name=suggestType class=solr.TextField
 positionIncrementGap=100
 analyzer type=index
 charFilter class=solr.PatternReplaceCharFilterFactory
 pattern=[^a-zA-Z0-9] replacement=  /
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.ShingleFilterFactory minShingleSize=2
 maxShingleSize=6 outputUnigrams=false/
 /analyzer
 analyzer type=query
 charFilter class=solr.PatternReplaceCharFilterFactory
 pattern=[^a-zA-Z0-9] replacement=  /
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.ShingleFilterFactory minShingleSize=2
 maxShingleSize=6 outputUnigrams=true/
 /analyzer
 /fieldType


 Is there anything that I configured wrongly?


 Regards,
 Edwin




 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: Auto-suggest in Solr

2015-06-22 Thread Alessandro Benedetti
Can any of our beloved super guru take a look to my mail ?
It could help Edwin as well :)

Cheers

2015-06-19 11:53 GMT+01:00 Alessandro Benedetti benedetti.ale...@gmail.com
:

 Actually the documentation is not clear enough.
 Let's try to understand this suggester.

 *Building*
 This suggester build a FST that it will use to provide the autocomplete
 feature running prefix searches on it .
 The terms it uses to generate the FST are the tokens produced by the
  suggestFreeTextAnalyzerFieldType .

 And this should be correct.
 So if we have a shingle token filter[1-3] ( we produce unigrams as well)
 in our analysis to keep it simple , from these original field values :
 mp3 ipod
 mp3 player
 mp3 player ipod
 player of Real

 - we produce these list of possible suggestions in our FST :

 mp3
 player
 ipod
 real
 of

 mp3 ipod
 mp3 player
 player ipod
 player of
 of real

 mp3 player ipod
 player of real

 From the documentation I read :

  ngrams: The max number of tokens out of which singles will be make the
 dictionary. The default value is 2. Increasing this would mean you want
 more than the previous 2 tokens to be taken into consideration when making
 the suggestions. 


 This makes me confused, as I was not expecting this param to affect the
 suggestion dictionary.
 So I would like a clarification here from our masters :)
 At this point let's see what happens at query time .

 *Query Time *
 As my understanding the ngrams params will consider  the last N-1 tokens
 the user put separated by the space separator.

 Builds an ngram model from the text sent to {@link
 * #build} and predicts based on the last grams-1 tokens in
 * the request sent to {@link #lookup}. This tries to
 * handle the long tail of suggestions for when the
 * incoming query is a never before seen query string.


 Example , grams=3 should consider only the last 2 tokens

 special mp3 p - mp3 p

 Then this query is analysed using the suggestFreeTextAnalyzerFieldType .
 We produce 3 tokens :
 mp3
 p
 mp3 p

 And we run the prefix matching on the FST .

 *Conclusion*
 My understanding is wrong for sure at some point, as the behaviour I get
 is different.
 Can we discuss this , clarify this and eventually put it in the official
 documentation ?

 Cheers

 2015-06-19 6:40 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:

 I'm implementing an auto-suggest feature in Solr, and I'll like to achieve
 the follwing:

 For example, if the user enters mp3, Solr might suggest mp3 player,
 mp3 nano and mp3 music.
 When the user enters mp3 p, the suggestion should narrow down to mp3
 player.

 Currently, when I type mp3 p, the suggester is returning words that
 starts with the letter p only, and I'm getting results like plan,
 production, etc, and it does not take the mp3 token into
 consideration.

 I'm using Solr 5.1 and below is my configuration:

 In solrconfig.xml:

 searchComponent name=suggest class=solr.SuggestComponent
   lst name=suggester

  str name=lookupImplFreeTextLookupFactory/str
  str name=indexPathsuggester_freetext_dir/str

 str name=dictionaryImplDocumentDictionaryFactory/str
 str name=fieldSuggestion/str
 str name=weightFieldProject/str
 str name=suggestFreeTextAnalyzerFieldTypesuggestType/str
 int name=ngrams5/int
 str name=buildOnStartupfalse/str
 str name=buildOnCommitfalse/str
   /lst
 /searchComponent


 In schema.xml

 fieldType name=suggestType class=solr.TextField
 positionIncrementGap=100
 analyzer type=index
 charFilter class=solr.PatternReplaceCharFilterFactory
 pattern=[^a-zA-Z0-9] replacement=  /
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.ShingleFilterFactory minShingleSize=2
 maxShingleSize=6 outputUnigrams=false/
 /analyzer
 analyzer type=query
 charFilter class=solr.PatternReplaceCharFilterFactory
 pattern=[^a-zA-Z0-9] replacement=  /
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.ShingleFilterFactory minShingleSize=2
 maxShingleSize=6 outputUnigrams=true/
 /analyzer
 /fieldType


 Is there anything that I configured wrongly?


 Regards,
 Edwin




 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: Auto-suggest in Solr

2015-06-19 Thread Zheng Lin Edwin Yeo
Ok sure.

  ngrams: The max number of tokens out of which singles will be make the
 dictionary. The default value is 2. Increasing this would mean you want
 more than the previous 2 tokens to be taken into consideration when making
 the suggestions. 

I got confused by this, as I could not get the behavior when I use the
suggester. Since the default value is 2, it means the search for mp3 p
should include only suggestions that contains mp3 ... and not just from
the letter p. But I have only been getting suggestions that starts with
p only.
Even when I try with a bigger ngrams value for longer search, I'm getting
the same results as well, that the suggester only consider the last token
when giving the suggestions.

I still could not achieve anything that consider 2 or more tokens when
returning the suggestions.

So am I actually following the right direction with this?

Regards,
Edwin



On 19 June 2015 at 18:53, Alessandro Benedetti benedetti.ale...@gmail.com
wrote:

 Actually the documentation is not clear enough.
 Let's try to understand this suggester.

 *Building*
 This suggester build a FST that it will use to provide the autocomplete
 feature running prefix searches on it .
 The terms it uses to generate the FST are the tokens produced by the
  suggestFreeTextAnalyzerFieldType .

 And this should be correct.
 So if we have a shingle token filter[1-3] ( we produce unigrams as well) in
 our analysis to keep it simple , from these original field values :
 mp3 ipod
 mp3 player
 mp3 player ipod
 player of Real

 - we produce these list of possible suggestions in our FST :

 mp3
 player
 ipod
 real
 of

 mp3 ipod
 mp3 player
 player ipod
 player of
 of real

 mp3 player ipod
 player of real

 From the documentation I read :

   ngrams: The max number of tokens out of which singles will be make the
  dictionary. The default value is 2. Increasing this would mean you want
  more than the previous 2 tokens to be taken into consideration when
 making
  the suggestions. 


 This makes me confused, as I was not expecting this param to affect the
 suggestion dictionary.
 So I would like a clarification here from our masters :)
 At this point let's see what happens at query time .

 *Query Time *
 As my understanding the ngrams params will consider  the last N-1 tokens
 the user put separated by the space separator.

 Builds an ngram model from the text sent to {@link
  * #build} and predicts based on the last grams-1 tokens in
  * the request sent to {@link #lookup}. This tries to
  * handle the long tail of suggestions for when the
  * incoming query is a never before seen query string.


 Example , grams=3 should consider only the last 2 tokens

 special mp3 p - mp3 p

 Then this query is analysed using the suggestFreeTextAnalyzerFieldType .
 We produce 3 tokens :
 mp3
 p
 mp3 p

 And we run the prefix matching on the FST .

 *Conclusion*
 My understanding is wrong for sure at some point, as the behaviour I get is
 different.
 Can we discuss this , clarify this and eventually put it in the official
 documentation ?

 Cheers

 2015-06-19 6:40 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:

  I'm implementing an auto-suggest feature in Solr, and I'll like to
 achieve
  the follwing:
 
  For example, if the user enters mp3, Solr might suggest mp3 player,
  mp3 nano and mp3 music.
  When the user enters mp3 p, the suggestion should narrow down to mp3
  player.
 
  Currently, when I type mp3 p, the suggester is returning words that
  starts with the letter p only, and I'm getting results like plan,
  production, etc, and it does not take the mp3 token into
 consideration.
 
  I'm using Solr 5.1 and below is my configuration:
 
  In solrconfig.xml:
 
  searchComponent name=suggest class=solr.SuggestComponent
lst name=suggester
 
   str name=lookupImplFreeTextLookupFactory/str
   str name=indexPathsuggester_freetext_dir/str
 
  str name=dictionaryImplDocumentDictionaryFactory/str
  str name=fieldSuggestion/str
  str name=weightFieldProject/str
  str name=suggestFreeTextAnalyzerFieldTypesuggestType/str
  int name=ngrams5/int
  str name=buildOnStartupfalse/str
  str name=buildOnCommitfalse/str
/lst
  /searchComponent
 
 
  In schema.xml
 
  fieldType name=suggestType class=solr.TextField
  positionIncrementGap=100
  analyzer type=index
  charFilter class=solr.PatternReplaceCharFilterFactory
  pattern=[^a-zA-Z0-9] replacement=  /
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.ShingleFilterFactory minShingleSize=2
  maxShingleSize=6 outputUnigrams=false/
  /analyzer
  analyzer type=query
  charFilter class=solr.PatternReplaceCharFilterFactory
  pattern=[^a-zA-Z0-9] replacement=  /
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.ShingleFilterFactory minShingleSize=2
  maxShingleSize=6 outputUnigrams=true/
  /analyzer
  /fieldType
 
 
  Is there anything that I configured wrongly?
 
 
  Regards,
  Edwin

Re: Auto-suggest in Solr

2015-06-19 Thread Alessandro Benedetti
Actually the documentation is not clear enough.
Let's try to understand this suggester.

*Building*
This suggester build a FST that it will use to provide the autocomplete
feature running prefix searches on it .
The terms it uses to generate the FST are the tokens produced by the
 suggestFreeTextAnalyzerFieldType .

And this should be correct.
So if we have a shingle token filter[1-3] ( we produce unigrams as well) in
our analysis to keep it simple , from these original field values :
mp3 ipod
mp3 player
mp3 player ipod
player of Real

- we produce these list of possible suggestions in our FST :

mp3
player
ipod
real
of

mp3 ipod
mp3 player
player ipod
player of
of real

mp3 player ipod
player of real

From the documentation I read :

  ngrams: The max number of tokens out of which singles will be make the
 dictionary. The default value is 2. Increasing this would mean you want
 more than the previous 2 tokens to be taken into consideration when making
 the suggestions. 


This makes me confused, as I was not expecting this param to affect the
suggestion dictionary.
So I would like a clarification here from our masters :)
At this point let's see what happens at query time .

*Query Time *
As my understanding the ngrams params will consider  the last N-1 tokens
the user put separated by the space separator.

Builds an ngram model from the text sent to {@link
 * #build} and predicts based on the last grams-1 tokens in
 * the request sent to {@link #lookup}. This tries to
 * handle the long tail of suggestions for when the
 * incoming query is a never before seen query string.


Example , grams=3 should consider only the last 2 tokens

special mp3 p - mp3 p

Then this query is analysed using the suggestFreeTextAnalyzerFieldType .
We produce 3 tokens :
mp3
p
mp3 p

And we run the prefix matching on the FST .

*Conclusion*
My understanding is wrong for sure at some point, as the behaviour I get is
different.
Can we discuss this , clarify this and eventually put it in the official
documentation ?

Cheers

2015-06-19 6:40 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:

 I'm implementing an auto-suggest feature in Solr, and I'll like to achieve
 the follwing:

 For example, if the user enters mp3, Solr might suggest mp3 player,
 mp3 nano and mp3 music.
 When the user enters mp3 p, the suggestion should narrow down to mp3
 player.

 Currently, when I type mp3 p, the suggester is returning words that
 starts with the letter p only, and I'm getting results like plan,
 production, etc, and it does not take the mp3 token into consideration.

 I'm using Solr 5.1 and below is my configuration:

 In solrconfig.xml:

 searchComponent name=suggest class=solr.SuggestComponent
   lst name=suggester

  str name=lookupImplFreeTextLookupFactory/str
  str name=indexPathsuggester_freetext_dir/str

 str name=dictionaryImplDocumentDictionaryFactory/str
 str name=fieldSuggestion/str
 str name=weightFieldProject/str
 str name=suggestFreeTextAnalyzerFieldTypesuggestType/str
 int name=ngrams5/int
 str name=buildOnStartupfalse/str
 str name=buildOnCommitfalse/str
   /lst
 /searchComponent


 In schema.xml

 fieldType name=suggestType class=solr.TextField
 positionIncrementGap=100
 analyzer type=index
 charFilter class=solr.PatternReplaceCharFilterFactory
 pattern=[^a-zA-Z0-9] replacement=  /
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.ShingleFilterFactory minShingleSize=2
 maxShingleSize=6 outputUnigrams=false/
 /analyzer
 analyzer type=query
 charFilter class=solr.PatternReplaceCharFilterFactory
 pattern=[^a-zA-Z0-9] replacement=  /
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.ShingleFilterFactory minShingleSize=2
 maxShingleSize=6 outputUnigrams=true/
 /analyzer
 /fieldType


 Is there anything that I configured wrongly?


 Regards,
 Edwin




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Auto-suggest in Solr

2015-06-18 Thread Zheng Lin Edwin Yeo
I'm implementing an auto-suggest feature in Solr, and I'll like to achieve
the follwing:

For example, if the user enters mp3, Solr might suggest mp3 player,
mp3 nano and mp3 music.
When the user enters mp3 p, the suggestion should narrow down to mp3
player.

Currently, when I type mp3 p, the suggester is returning words that
starts with the letter p only, and I'm getting results like plan,
production, etc, and it does not take the mp3 token into consideration.

I'm using Solr 5.1 and below is my configuration:

In solrconfig.xml:

searchComponent name=suggest class=solr.SuggestComponent
  lst name=suggester

 str name=lookupImplFreeTextLookupFactory/str
 str name=indexPathsuggester_freetext_dir/str

str name=dictionaryImplDocumentDictionaryFactory/str
str name=fieldSuggestion/str
str name=weightFieldProject/str
str name=suggestFreeTextAnalyzerFieldTypesuggestType/str
int name=ngrams5/int
str name=buildOnStartupfalse/str
str name=buildOnCommitfalse/str
  /lst
/searchComponent


In schema.xml

fieldType name=suggestType class=solr.TextField
positionIncrementGap=100
analyzer type=index
charFilter class=solr.PatternReplaceCharFilterFactory
pattern=[^a-zA-Z0-9] replacement=  /
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.ShingleFilterFactory minShingleSize=2
maxShingleSize=6 outputUnigrams=false/
/analyzer
analyzer type=query
charFilter class=solr.PatternReplaceCharFilterFactory
pattern=[^a-zA-Z0-9] replacement=  /
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.ShingleFilterFactory minShingleSize=2
maxShingleSize=6 outputUnigrams=true/
/analyzer
/fieldType


Is there anything that I configured wrongly?


Regards,
Edwin