Re: Auto-suggest in Solr
Thank you so much. I'll read up on that and try that out. Regards, Edwin On 12 July 2015 at 00:41, Erick Erickson erickerick...@gmail.com wrote: Cool! I've bookmarked it, much more thorough Erick On Sat, Jul 11, 2015 at 8:13 AM, Walter Underwood wun...@wunderwood.org wrote: Thanks, this is very helpful. Suggester config is quite under documented. It took me longer than I expected to get it working. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Jul 10, 2015, at 6:30 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Hi guys, just wrote a blog to integrate Erick's post and to explain in details with practical examples all the main Lookup implementations : http://alexbenedetti.blogspot.co.uk/2015/07/solr-you-complete-me.html I think this can be useful for Edwin to finally fix the config for the FreeTextSuggester ( which finally I clarified Erick, thanks to Mike answer in dev, and deep code analysis and testing :) ) Cheers 2015-06-27 23:51 GMT+01:00 Alessandro Benedetti benedetti.ale...@gmail.com : Thanks, Erick, i didn't have time to go again through the code. But i will forward this to the Dev list. Thank you for your time ! Cheers 2015-06-27 16:19 GMT+01:00 Erick Erickson erickerick...@gmail.com: Alessandro: Going to have to defer to Mike McCandless et.al., they're the authorities here. Don't quite know whether they monitor this list, consider the dev list? Best, Erick On Fri, Jun 26, 2015 at 4:53 AM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Up, Can anyone gently take a look to my considerations related the FreeText Suggester ? I am curious to have more insight. Eventually I will deeply analyse the code to understand my errors. Cheers 2015-06-19 11:53 GMT+01:00 Alessandro Benedetti benedetti.ale...@gmail.com : Actually the documentation is not clear enough. Let's try to understand this suggester. *Building* This suggester build a FST that it will use to provide the autocomplete feature running prefix searches on it . The terms it uses to generate the FST are the tokens produced by the suggestFreeTextAnalyzerFieldType . And this should be correct. So if we have a shingle token filter[1-3] ( we produce unigrams as well) in our analysis to keep it simple , from these original field values : mp3 ipod mp3 player mp3 player ipod player of Real - we produce these list of possible suggestions in our FST : mp3 player ipod real of mp3 ipod mp3 player player ipod player of of real mp3 player ipod player of real From the documentation I read : ngrams: The max number of tokens out of which singles will be make the dictionary. The default value is 2. Increasing this would mean you want more than the previous 2 tokens to be taken into consideration when making the suggestions. This makes me confused, as I was not expecting this param to affect the suggestion dictionary. So I would like a clarification here from our masters :) At this point let's see what happens at query time . *Query Time * As my understanding the ngrams params will consider the last N-1 tokens the user put separated by the space separator. Builds an ngram model from the text sent to {@link * #build} and predicts based on the last grams-1 tokens in * the request sent to {@link #lookup}. This tries to * handle the long tail of suggestions for when the * incoming query is a never before seen query string. Example , grams=3 should consider only the last 2 tokens special mp3 p - mp3 p Then this query is analysed using the suggestFreeTextAnalyzerFieldType . We produce 3 tokens : mp3 p mp3 p And we run the prefix matching on the FST . *Conclusion* My understanding is wrong for sure at some point, as the behaviour I get is different. Can we discuss this , clarify this and eventually put it in the official documentation ? Cheers 2015-06-19 6:40 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com : I'm implementing an auto-suggest feature in Solr, and I'll like to achieve the follwing: For example, if the user enters mp3, Solr might suggest mp3 player, mp3 nano and mp3 music. When the user enters mp3 p, the suggestion should narrow down to mp3 player. Currently, when I type mp3 p, the suggester is returning words that starts with the letter p only, and I'm getting results like plan, production, etc, and it does not take the mp3 token into consideration. I'm using Solr 5.1 and below is my configuration: In solrconfig.xml: searchComponent name=suggest class=solr.SuggestComponent lst name=suggester str name=lookupImplFreeTextLookupFactory/str str name=indexPathsuggester_freetext_dir/str str name
Re: Auto-suggest in Solr
Thanks, this is very helpful. Suggester config is quite under documented. It took me longer than I expected to get it working. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Jul 10, 2015, at 6:30 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Hi guys, just wrote a blog to integrate Erick's post and to explain in details with practical examples all the main Lookup implementations : http://alexbenedetti.blogspot.co.uk/2015/07/solr-you-complete-me.html I think this can be useful for Edwin to finally fix the config for the FreeTextSuggester ( which finally I clarified Erick, thanks to Mike answer in dev, and deep code analysis and testing :) ) Cheers 2015-06-27 23:51 GMT+01:00 Alessandro Benedetti benedetti.ale...@gmail.com : Thanks, Erick, i didn't have time to go again through the code. But i will forward this to the Dev list. Thank you for your time ! Cheers 2015-06-27 16:19 GMT+01:00 Erick Erickson erickerick...@gmail.com: Alessandro: Going to have to defer to Mike McCandless et.al., they're the authorities here. Don't quite know whether they monitor this list, consider the dev list? Best, Erick On Fri, Jun 26, 2015 at 4:53 AM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Up, Can anyone gently take a look to my considerations related the FreeText Suggester ? I am curious to have more insight. Eventually I will deeply analyse the code to understand my errors. Cheers 2015-06-19 11:53 GMT+01:00 Alessandro Benedetti benedetti.ale...@gmail.com : Actually the documentation is not clear enough. Let's try to understand this suggester. *Building* This suggester build a FST that it will use to provide the autocomplete feature running prefix searches on it . The terms it uses to generate the FST are the tokens produced by the suggestFreeTextAnalyzerFieldType . And this should be correct. So if we have a shingle token filter[1-3] ( we produce unigrams as well) in our analysis to keep it simple , from these original field values : mp3 ipod mp3 player mp3 player ipod player of Real - we produce these list of possible suggestions in our FST : mp3 player ipod real of mp3 ipod mp3 player player ipod player of of real mp3 player ipod player of real From the documentation I read : ngrams: The max number of tokens out of which singles will be make the dictionary. The default value is 2. Increasing this would mean you want more than the previous 2 tokens to be taken into consideration when making the suggestions. This makes me confused, as I was not expecting this param to affect the suggestion dictionary. So I would like a clarification here from our masters :) At this point let's see what happens at query time . *Query Time * As my understanding the ngrams params will consider the last N-1 tokens the user put separated by the space separator. Builds an ngram model from the text sent to {@link * #build} and predicts based on the last grams-1 tokens in * the request sent to {@link #lookup}. This tries to * handle the long tail of suggestions for when the * incoming query is a never before seen query string. Example , grams=3 should consider only the last 2 tokens special mp3 p - mp3 p Then this query is analysed using the suggestFreeTextAnalyzerFieldType . We produce 3 tokens : mp3 p mp3 p And we run the prefix matching on the FST . *Conclusion* My understanding is wrong for sure at some point, as the behaviour I get is different. Can we discuss this , clarify this and eventually put it in the official documentation ? Cheers 2015-06-19 6:40 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com: I'm implementing an auto-suggest feature in Solr, and I'll like to achieve the follwing: For example, if the user enters mp3, Solr might suggest mp3 player, mp3 nano and mp3 music. When the user enters mp3 p, the suggestion should narrow down to mp3 player. Currently, when I type mp3 p, the suggester is returning words that starts with the letter p only, and I'm getting results like plan, production, etc, and it does not take the mp3 token into consideration. I'm using Solr 5.1 and below is my configuration: In solrconfig.xml: searchComponent name=suggest class=solr.SuggestComponent lst name=suggester str name=lookupImplFreeTextLookupFactory/str str name=indexPathsuggester_freetext_dir/str str name=dictionaryImplDocumentDictionaryFactory/str str name=fieldSuggestion/str str name=weightFieldProject/str str name=suggestFreeTextAnalyzerFieldTypesuggestType/str int name=ngrams5/int str name=buildOnStartupfalse/str str name=buildOnCommitfalse/str /lst /searchComponent In schema.xml fieldType name=suggestType class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class
Re: Auto-suggest in Solr
Cool! I've bookmarked it, much more thorough Erick On Sat, Jul 11, 2015 at 8:13 AM, Walter Underwood wun...@wunderwood.org wrote: Thanks, this is very helpful. Suggester config is quite under documented. It took me longer than I expected to get it working. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Jul 10, 2015, at 6:30 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Hi guys, just wrote a blog to integrate Erick's post and to explain in details with practical examples all the main Lookup implementations : http://alexbenedetti.blogspot.co.uk/2015/07/solr-you-complete-me.html I think this can be useful for Edwin to finally fix the config for the FreeTextSuggester ( which finally I clarified Erick, thanks to Mike answer in dev, and deep code analysis and testing :) ) Cheers 2015-06-27 23:51 GMT+01:00 Alessandro Benedetti benedetti.ale...@gmail.com : Thanks, Erick, i didn't have time to go again through the code. But i will forward this to the Dev list. Thank you for your time ! Cheers 2015-06-27 16:19 GMT+01:00 Erick Erickson erickerick...@gmail.com: Alessandro: Going to have to defer to Mike McCandless et.al., they're the authorities here. Don't quite know whether they monitor this list, consider the dev list? Best, Erick On Fri, Jun 26, 2015 at 4:53 AM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Up, Can anyone gently take a look to my considerations related the FreeText Suggester ? I am curious to have more insight. Eventually I will deeply analyse the code to understand my errors. Cheers 2015-06-19 11:53 GMT+01:00 Alessandro Benedetti benedetti.ale...@gmail.com : Actually the documentation is not clear enough. Let's try to understand this suggester. *Building* This suggester build a FST that it will use to provide the autocomplete feature running prefix searches on it . The terms it uses to generate the FST are the tokens produced by the suggestFreeTextAnalyzerFieldType . And this should be correct. So if we have a shingle token filter[1-3] ( we produce unigrams as well) in our analysis to keep it simple , from these original field values : mp3 ipod mp3 player mp3 player ipod player of Real - we produce these list of possible suggestions in our FST : mp3 player ipod real of mp3 ipod mp3 player player ipod player of of real mp3 player ipod player of real From the documentation I read : ngrams: The max number of tokens out of which singles will be make the dictionary. The default value is 2. Increasing this would mean you want more than the previous 2 tokens to be taken into consideration when making the suggestions. This makes me confused, as I was not expecting this param to affect the suggestion dictionary. So I would like a clarification here from our masters :) At this point let's see what happens at query time . *Query Time * As my understanding the ngrams params will consider the last N-1 tokens the user put separated by the space separator. Builds an ngram model from the text sent to {@link * #build} and predicts based on the last grams-1 tokens in * the request sent to {@link #lookup}. This tries to * handle the long tail of suggestions for when the * incoming query is a never before seen query string. Example , grams=3 should consider only the last 2 tokens special mp3 p - mp3 p Then this query is analysed using the suggestFreeTextAnalyzerFieldType . We produce 3 tokens : mp3 p mp3 p And we run the prefix matching on the FST . *Conclusion* My understanding is wrong for sure at some point, as the behaviour I get is different. Can we discuss this , clarify this and eventually put it in the official documentation ? Cheers 2015-06-19 6:40 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com: I'm implementing an auto-suggest feature in Solr, and I'll like to achieve the follwing: For example, if the user enters mp3, Solr might suggest mp3 player, mp3 nano and mp3 music. When the user enters mp3 p, the suggestion should narrow down to mp3 player. Currently, when I type mp3 p, the suggester is returning words that starts with the letter p only, and I'm getting results like plan, production, etc, and it does not take the mp3 token into consideration. I'm using Solr 5.1 and below is my configuration: In solrconfig.xml: searchComponent name=suggest class=solr.SuggestComponent lst name=suggester str name=lookupImplFreeTextLookupFactory/str str name=indexPathsuggester_freetext_dir/str str name=dictionaryImplDocumentDictionaryFactory/str str name=fieldSuggestion/str str name=weightFieldProject/str str name=suggestFreeTextAnalyzerFieldTypesuggestType/str int name=ngrams5/int str name=buildOnStartupfalse/str str name=buildOnCommitfalse/str /lst /searchComponent In schema.xml fieldType name=suggestType class
Re: Auto-suggest in Solr
Hi guys, just wrote a blog to integrate Erick's post and to explain in details with practical examples all the main Lookup implementations : http://alexbenedetti.blogspot.co.uk/2015/07/solr-you-complete-me.html I think this can be useful for Edwin to finally fix the config for the FreeTextSuggester ( which finally I clarified Erick, thanks to Mike answer in dev, and deep code analysis and testing :) ) Cheers 2015-06-27 23:51 GMT+01:00 Alessandro Benedetti benedetti.ale...@gmail.com : Thanks, Erick, i didn't have time to go again through the code. But i will forward this to the Dev list. Thank you for your time ! Cheers 2015-06-27 16:19 GMT+01:00 Erick Erickson erickerick...@gmail.com: Alessandro: Going to have to defer to Mike McCandless et.al., they're the authorities here. Don't quite know whether they monitor this list, consider the dev list? Best, Erick On Fri, Jun 26, 2015 at 4:53 AM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Up, Can anyone gently take a look to my considerations related the FreeText Suggester ? I am curious to have more insight. Eventually I will deeply analyse the code to understand my errors. Cheers 2015-06-19 11:53 GMT+01:00 Alessandro Benedetti benedetti.ale...@gmail.com : Actually the documentation is not clear enough. Let's try to understand this suggester. *Building* This suggester build a FST that it will use to provide the autocomplete feature running prefix searches on it . The terms it uses to generate the FST are the tokens produced by the suggestFreeTextAnalyzerFieldType . And this should be correct. So if we have a shingle token filter[1-3] ( we produce unigrams as well) in our analysis to keep it simple , from these original field values : mp3 ipod mp3 player mp3 player ipod player of Real - we produce these list of possible suggestions in our FST : mp3 player ipod real of mp3 ipod mp3 player player ipod player of of real mp3 player ipod player of real From the documentation I read : ngrams: The max number of tokens out of which singles will be make the dictionary. The default value is 2. Increasing this would mean you want more than the previous 2 tokens to be taken into consideration when making the suggestions. This makes me confused, as I was not expecting this param to affect the suggestion dictionary. So I would like a clarification here from our masters :) At this point let's see what happens at query time . *Query Time * As my understanding the ngrams params will consider the last N-1 tokens the user put separated by the space separator. Builds an ngram model from the text sent to {@link * #build} and predicts based on the last grams-1 tokens in * the request sent to {@link #lookup}. This tries to * handle the long tail of suggestions for when the * incoming query is a never before seen query string. Example , grams=3 should consider only the last 2 tokens special mp3 p - mp3 p Then this query is analysed using the suggestFreeTextAnalyzerFieldType . We produce 3 tokens : mp3 p mp3 p And we run the prefix matching on the FST . *Conclusion* My understanding is wrong for sure at some point, as the behaviour I get is different. Can we discuss this , clarify this and eventually put it in the official documentation ? Cheers 2015-06-19 6:40 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com: I'm implementing an auto-suggest feature in Solr, and I'll like to achieve the follwing: For example, if the user enters mp3, Solr might suggest mp3 player, mp3 nano and mp3 music. When the user enters mp3 p, the suggestion should narrow down to mp3 player. Currently, when I type mp3 p, the suggester is returning words that starts with the letter p only, and I'm getting results like plan, production, etc, and it does not take the mp3 token into consideration. I'm using Solr 5.1 and below is my configuration: In solrconfig.xml: searchComponent name=suggest class=solr.SuggestComponent lst name=suggester str name=lookupImplFreeTextLookupFactory/str str name=indexPathsuggester_freetext_dir/str str name=dictionaryImplDocumentDictionaryFactory/str str name=fieldSuggestion/str str name=weightFieldProject/str str name=suggestFreeTextAnalyzerFieldTypesuggestType/str int name=ngrams5/int str name=buildOnStartupfalse/str str name=buildOnCommitfalse/str /lst /searchComponent In schema.xml fieldType name=suggestType class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.PatternReplaceCharFilterFactory pattern=[^a-zA-Z0-9] replacement= / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ShingleFilterFactory minShingleSize=2 maxShingleSize=6 outputUnigrams=false/ /analyzer analyzer type
Re: Auto-suggest in Solr
Alessandro: Going to have to defer to Mike McCandless et.al., they're the authorities here. Don't quite know whether they monitor this list, consider the dev list? Best, Erick On Fri, Jun 26, 2015 at 4:53 AM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Up, Can anyone gently take a look to my considerations related the FreeText Suggester ? I am curious to have more insight. Eventually I will deeply analyse the code to understand my errors. Cheers 2015-06-19 11:53 GMT+01:00 Alessandro Benedetti benedetti.ale...@gmail.com : Actually the documentation is not clear enough. Let's try to understand this suggester. *Building* This suggester build a FST that it will use to provide the autocomplete feature running prefix searches on it . The terms it uses to generate the FST are the tokens produced by the suggestFreeTextAnalyzerFieldType . And this should be correct. So if we have a shingle token filter[1-3] ( we produce unigrams as well) in our analysis to keep it simple , from these original field values : mp3 ipod mp3 player mp3 player ipod player of Real - we produce these list of possible suggestions in our FST : mp3 player ipod real of mp3 ipod mp3 player player ipod player of of real mp3 player ipod player of real From the documentation I read : ngrams: The max number of tokens out of which singles will be make the dictionary. The default value is 2. Increasing this would mean you want more than the previous 2 tokens to be taken into consideration when making the suggestions. This makes me confused, as I was not expecting this param to affect the suggestion dictionary. So I would like a clarification here from our masters :) At this point let's see what happens at query time . *Query Time * As my understanding the ngrams params will consider the last N-1 tokens the user put separated by the space separator. Builds an ngram model from the text sent to {@link * #build} and predicts based on the last grams-1 tokens in * the request sent to {@link #lookup}. This tries to * handle the long tail of suggestions for when the * incoming query is a never before seen query string. Example , grams=3 should consider only the last 2 tokens special mp3 p - mp3 p Then this query is analysed using the suggestFreeTextAnalyzerFieldType . We produce 3 tokens : mp3 p mp3 p And we run the prefix matching on the FST . *Conclusion* My understanding is wrong for sure at some point, as the behaviour I get is different. Can we discuss this , clarify this and eventually put it in the official documentation ? Cheers 2015-06-19 6:40 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com: I'm implementing an auto-suggest feature in Solr, and I'll like to achieve the follwing: For example, if the user enters mp3, Solr might suggest mp3 player, mp3 nano and mp3 music. When the user enters mp3 p, the suggestion should narrow down to mp3 player. Currently, when I type mp3 p, the suggester is returning words that starts with the letter p only, and I'm getting results like plan, production, etc, and it does not take the mp3 token into consideration. I'm using Solr 5.1 and below is my configuration: In solrconfig.xml: searchComponent name=suggest class=solr.SuggestComponent lst name=suggester str name=lookupImplFreeTextLookupFactory/str str name=indexPathsuggester_freetext_dir/str str name=dictionaryImplDocumentDictionaryFactory/str str name=fieldSuggestion/str str name=weightFieldProject/str str name=suggestFreeTextAnalyzerFieldTypesuggestType/str int name=ngrams5/int str name=buildOnStartupfalse/str str name=buildOnCommitfalse/str /lst /searchComponent In schema.xml fieldType name=suggestType class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.PatternReplaceCharFilterFactory pattern=[^a-zA-Z0-9] replacement= / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ShingleFilterFactory minShingleSize=2 maxShingleSize=6 outputUnigrams=false/ /analyzer analyzer type=query charFilter class=solr.PatternReplaceCharFilterFactory pattern=[^a-zA-Z0-9] replacement= / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ShingleFilterFactory minShingleSize=2 maxShingleSize=6 outputUnigrams=true/ /analyzer /fieldType Is there anything that I configured wrongly? Regards, Edwin -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry
Re: Auto-suggest in Solr
Thanks, Erick, i didn't have time to go again through the code. But i will forward this to the Dev list. Thank you for your time ! Cheers 2015-06-27 16:19 GMT+01:00 Erick Erickson erickerick...@gmail.com: Alessandro: Going to have to defer to Mike McCandless et.al., they're the authorities here. Don't quite know whether they monitor this list, consider the dev list? Best, Erick On Fri, Jun 26, 2015 at 4:53 AM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Up, Can anyone gently take a look to my considerations related the FreeText Suggester ? I am curious to have more insight. Eventually I will deeply analyse the code to understand my errors. Cheers 2015-06-19 11:53 GMT+01:00 Alessandro Benedetti benedetti.ale...@gmail.com : Actually the documentation is not clear enough. Let's try to understand this suggester. *Building* This suggester build a FST that it will use to provide the autocomplete feature running prefix searches on it . The terms it uses to generate the FST are the tokens produced by the suggestFreeTextAnalyzerFieldType . And this should be correct. So if we have a shingle token filter[1-3] ( we produce unigrams as well) in our analysis to keep it simple , from these original field values : mp3 ipod mp3 player mp3 player ipod player of Real - we produce these list of possible suggestions in our FST : mp3 player ipod real of mp3 ipod mp3 player player ipod player of of real mp3 player ipod player of real From the documentation I read : ngrams: The max number of tokens out of which singles will be make the dictionary. The default value is 2. Increasing this would mean you want more than the previous 2 tokens to be taken into consideration when making the suggestions. This makes me confused, as I was not expecting this param to affect the suggestion dictionary. So I would like a clarification here from our masters :) At this point let's see what happens at query time . *Query Time * As my understanding the ngrams params will consider the last N-1 tokens the user put separated by the space separator. Builds an ngram model from the text sent to {@link * #build} and predicts based on the last grams-1 tokens in * the request sent to {@link #lookup}. This tries to * handle the long tail of suggestions for when the * incoming query is a never before seen query string. Example , grams=3 should consider only the last 2 tokens special mp3 p - mp3 p Then this query is analysed using the suggestFreeTextAnalyzerFieldType . We produce 3 tokens : mp3 p mp3 p And we run the prefix matching on the FST . *Conclusion* My understanding is wrong for sure at some point, as the behaviour I get is different. Can we discuss this , clarify this and eventually put it in the official documentation ? Cheers 2015-06-19 6:40 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com: I'm implementing an auto-suggest feature in Solr, and I'll like to achieve the follwing: For example, if the user enters mp3, Solr might suggest mp3 player, mp3 nano and mp3 music. When the user enters mp3 p, the suggestion should narrow down to mp3 player. Currently, when I type mp3 p, the suggester is returning words that starts with the letter p only, and I'm getting results like plan, production, etc, and it does not take the mp3 token into consideration. I'm using Solr 5.1 and below is my configuration: In solrconfig.xml: searchComponent name=suggest class=solr.SuggestComponent lst name=suggester str name=lookupImplFreeTextLookupFactory/str str name=indexPathsuggester_freetext_dir/str str name=dictionaryImplDocumentDictionaryFactory/str str name=fieldSuggestion/str str name=weightFieldProject/str str name=suggestFreeTextAnalyzerFieldTypesuggestType/str int name=ngrams5/int str name=buildOnStartupfalse/str str name=buildOnCommitfalse/str /lst /searchComponent In schema.xml fieldType name=suggestType class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.PatternReplaceCharFilterFactory pattern=[^a-zA-Z0-9] replacement= / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ShingleFilterFactory minShingleSize=2 maxShingleSize=6 outputUnigrams=false/ /analyzer analyzer type=query charFilter class=solr.PatternReplaceCharFilterFactory pattern=[^a-zA-Z0-9] replacement= / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ShingleFilterFactory minShingleSize=2 maxShingleSize=6 outputUnigrams=true/ /analyzer /fieldType Is there anything that I configured wrongly? Regards, Edwin -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright
Re: Auto-suggest in Solr
Up, Can anyone gently take a look to my considerations related the FreeText Suggester ? I am curious to have more insight. Eventually I will deeply analyse the code to understand my errors. Cheers 2015-06-19 11:53 GMT+01:00 Alessandro Benedetti benedetti.ale...@gmail.com : Actually the documentation is not clear enough. Let's try to understand this suggester. *Building* This suggester build a FST that it will use to provide the autocomplete feature running prefix searches on it . The terms it uses to generate the FST are the tokens produced by the suggestFreeTextAnalyzerFieldType . And this should be correct. So if we have a shingle token filter[1-3] ( we produce unigrams as well) in our analysis to keep it simple , from these original field values : mp3 ipod mp3 player mp3 player ipod player of Real - we produce these list of possible suggestions in our FST : mp3 player ipod real of mp3 ipod mp3 player player ipod player of of real mp3 player ipod player of real From the documentation I read : ngrams: The max number of tokens out of which singles will be make the dictionary. The default value is 2. Increasing this would mean you want more than the previous 2 tokens to be taken into consideration when making the suggestions. This makes me confused, as I was not expecting this param to affect the suggestion dictionary. So I would like a clarification here from our masters :) At this point let's see what happens at query time . *Query Time * As my understanding the ngrams params will consider the last N-1 tokens the user put separated by the space separator. Builds an ngram model from the text sent to {@link * #build} and predicts based on the last grams-1 tokens in * the request sent to {@link #lookup}. This tries to * handle the long tail of suggestions for when the * incoming query is a never before seen query string. Example , grams=3 should consider only the last 2 tokens special mp3 p - mp3 p Then this query is analysed using the suggestFreeTextAnalyzerFieldType . We produce 3 tokens : mp3 p mp3 p And we run the prefix matching on the FST . *Conclusion* My understanding is wrong for sure at some point, as the behaviour I get is different. Can we discuss this , clarify this and eventually put it in the official documentation ? Cheers 2015-06-19 6:40 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com: I'm implementing an auto-suggest feature in Solr, and I'll like to achieve the follwing: For example, if the user enters mp3, Solr might suggest mp3 player, mp3 nano and mp3 music. When the user enters mp3 p, the suggestion should narrow down to mp3 player. Currently, when I type mp3 p, the suggester is returning words that starts with the letter p only, and I'm getting results like plan, production, etc, and it does not take the mp3 token into consideration. I'm using Solr 5.1 and below is my configuration: In solrconfig.xml: searchComponent name=suggest class=solr.SuggestComponent lst name=suggester str name=lookupImplFreeTextLookupFactory/str str name=indexPathsuggester_freetext_dir/str str name=dictionaryImplDocumentDictionaryFactory/str str name=fieldSuggestion/str str name=weightFieldProject/str str name=suggestFreeTextAnalyzerFieldTypesuggestType/str int name=ngrams5/int str name=buildOnStartupfalse/str str name=buildOnCommitfalse/str /lst /searchComponent In schema.xml fieldType name=suggestType class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.PatternReplaceCharFilterFactory pattern=[^a-zA-Z0-9] replacement= / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ShingleFilterFactory minShingleSize=2 maxShingleSize=6 outputUnigrams=false/ /analyzer analyzer type=query charFilter class=solr.PatternReplaceCharFilterFactory pattern=[^a-zA-Z0-9] replacement= / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ShingleFilterFactory minShingleSize=2 maxShingleSize=6 outputUnigrams=true/ /analyzer /fieldType Is there anything that I configured wrongly? Regards, Edwin -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Auto-suggest in Solr
Can any of our beloved super guru take a look to my mail ? It could help Edwin as well :) Cheers 2015-06-19 11:53 GMT+01:00 Alessandro Benedetti benedetti.ale...@gmail.com : Actually the documentation is not clear enough. Let's try to understand this suggester. *Building* This suggester build a FST that it will use to provide the autocomplete feature running prefix searches on it . The terms it uses to generate the FST are the tokens produced by the suggestFreeTextAnalyzerFieldType . And this should be correct. So if we have a shingle token filter[1-3] ( we produce unigrams as well) in our analysis to keep it simple , from these original field values : mp3 ipod mp3 player mp3 player ipod player of Real - we produce these list of possible suggestions in our FST : mp3 player ipod real of mp3 ipod mp3 player player ipod player of of real mp3 player ipod player of real From the documentation I read : ngrams: The max number of tokens out of which singles will be make the dictionary. The default value is 2. Increasing this would mean you want more than the previous 2 tokens to be taken into consideration when making the suggestions. This makes me confused, as I was not expecting this param to affect the suggestion dictionary. So I would like a clarification here from our masters :) At this point let's see what happens at query time . *Query Time * As my understanding the ngrams params will consider the last N-1 tokens the user put separated by the space separator. Builds an ngram model from the text sent to {@link * #build} and predicts based on the last grams-1 tokens in * the request sent to {@link #lookup}. This tries to * handle the long tail of suggestions for when the * incoming query is a never before seen query string. Example , grams=3 should consider only the last 2 tokens special mp3 p - mp3 p Then this query is analysed using the suggestFreeTextAnalyzerFieldType . We produce 3 tokens : mp3 p mp3 p And we run the prefix matching on the FST . *Conclusion* My understanding is wrong for sure at some point, as the behaviour I get is different. Can we discuss this , clarify this and eventually put it in the official documentation ? Cheers 2015-06-19 6:40 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com: I'm implementing an auto-suggest feature in Solr, and I'll like to achieve the follwing: For example, if the user enters mp3, Solr might suggest mp3 player, mp3 nano and mp3 music. When the user enters mp3 p, the suggestion should narrow down to mp3 player. Currently, when I type mp3 p, the suggester is returning words that starts with the letter p only, and I'm getting results like plan, production, etc, and it does not take the mp3 token into consideration. I'm using Solr 5.1 and below is my configuration: In solrconfig.xml: searchComponent name=suggest class=solr.SuggestComponent lst name=suggester str name=lookupImplFreeTextLookupFactory/str str name=indexPathsuggester_freetext_dir/str str name=dictionaryImplDocumentDictionaryFactory/str str name=fieldSuggestion/str str name=weightFieldProject/str str name=suggestFreeTextAnalyzerFieldTypesuggestType/str int name=ngrams5/int str name=buildOnStartupfalse/str str name=buildOnCommitfalse/str /lst /searchComponent In schema.xml fieldType name=suggestType class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.PatternReplaceCharFilterFactory pattern=[^a-zA-Z0-9] replacement= / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ShingleFilterFactory minShingleSize=2 maxShingleSize=6 outputUnigrams=false/ /analyzer analyzer type=query charFilter class=solr.PatternReplaceCharFilterFactory pattern=[^a-zA-Z0-9] replacement= / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ShingleFilterFactory minShingleSize=2 maxShingleSize=6 outputUnigrams=true/ /analyzer /fieldType Is there anything that I configured wrongly? Regards, Edwin -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Auto-suggest in Solr
Ok sure. ngrams: The max number of tokens out of which singles will be make the dictionary. The default value is 2. Increasing this would mean you want more than the previous 2 tokens to be taken into consideration when making the suggestions. I got confused by this, as I could not get the behavior when I use the suggester. Since the default value is 2, it means the search for mp3 p should include only suggestions that contains mp3 ... and not just from the letter p. But I have only been getting suggestions that starts with p only. Even when I try with a bigger ngrams value for longer search, I'm getting the same results as well, that the suggester only consider the last token when giving the suggestions. I still could not achieve anything that consider 2 or more tokens when returning the suggestions. So am I actually following the right direction with this? Regards, Edwin On 19 June 2015 at 18:53, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Actually the documentation is not clear enough. Let's try to understand this suggester. *Building* This suggester build a FST that it will use to provide the autocomplete feature running prefix searches on it . The terms it uses to generate the FST are the tokens produced by the suggestFreeTextAnalyzerFieldType . And this should be correct. So if we have a shingle token filter[1-3] ( we produce unigrams as well) in our analysis to keep it simple , from these original field values : mp3 ipod mp3 player mp3 player ipod player of Real - we produce these list of possible suggestions in our FST : mp3 player ipod real of mp3 ipod mp3 player player ipod player of of real mp3 player ipod player of real From the documentation I read : ngrams: The max number of tokens out of which singles will be make the dictionary. The default value is 2. Increasing this would mean you want more than the previous 2 tokens to be taken into consideration when making the suggestions. This makes me confused, as I was not expecting this param to affect the suggestion dictionary. So I would like a clarification here from our masters :) At this point let's see what happens at query time . *Query Time * As my understanding the ngrams params will consider the last N-1 tokens the user put separated by the space separator. Builds an ngram model from the text sent to {@link * #build} and predicts based on the last grams-1 tokens in * the request sent to {@link #lookup}. This tries to * handle the long tail of suggestions for when the * incoming query is a never before seen query string. Example , grams=3 should consider only the last 2 tokens special mp3 p - mp3 p Then this query is analysed using the suggestFreeTextAnalyzerFieldType . We produce 3 tokens : mp3 p mp3 p And we run the prefix matching on the FST . *Conclusion* My understanding is wrong for sure at some point, as the behaviour I get is different. Can we discuss this , clarify this and eventually put it in the official documentation ? Cheers 2015-06-19 6:40 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com: I'm implementing an auto-suggest feature in Solr, and I'll like to achieve the follwing: For example, if the user enters mp3, Solr might suggest mp3 player, mp3 nano and mp3 music. When the user enters mp3 p, the suggestion should narrow down to mp3 player. Currently, when I type mp3 p, the suggester is returning words that starts with the letter p only, and I'm getting results like plan, production, etc, and it does not take the mp3 token into consideration. I'm using Solr 5.1 and below is my configuration: In solrconfig.xml: searchComponent name=suggest class=solr.SuggestComponent lst name=suggester str name=lookupImplFreeTextLookupFactory/str str name=indexPathsuggester_freetext_dir/str str name=dictionaryImplDocumentDictionaryFactory/str str name=fieldSuggestion/str str name=weightFieldProject/str str name=suggestFreeTextAnalyzerFieldTypesuggestType/str int name=ngrams5/int str name=buildOnStartupfalse/str str name=buildOnCommitfalse/str /lst /searchComponent In schema.xml fieldType name=suggestType class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.PatternReplaceCharFilterFactory pattern=[^a-zA-Z0-9] replacement= / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ShingleFilterFactory minShingleSize=2 maxShingleSize=6 outputUnigrams=false/ /analyzer analyzer type=query charFilter class=solr.PatternReplaceCharFilterFactory pattern=[^a-zA-Z0-9] replacement= / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ShingleFilterFactory minShingleSize=2 maxShingleSize=6 outputUnigrams=true/ /analyzer /fieldType Is there anything that I configured wrongly? Regards, Edwin
Re: Auto-suggest in Solr
Actually the documentation is not clear enough. Let's try to understand this suggester. *Building* This suggester build a FST that it will use to provide the autocomplete feature running prefix searches on it . The terms it uses to generate the FST are the tokens produced by the suggestFreeTextAnalyzerFieldType . And this should be correct. So if we have a shingle token filter[1-3] ( we produce unigrams as well) in our analysis to keep it simple , from these original field values : mp3 ipod mp3 player mp3 player ipod player of Real - we produce these list of possible suggestions in our FST : mp3 player ipod real of mp3 ipod mp3 player player ipod player of of real mp3 player ipod player of real From the documentation I read : ngrams: The max number of tokens out of which singles will be make the dictionary. The default value is 2. Increasing this would mean you want more than the previous 2 tokens to be taken into consideration when making the suggestions. This makes me confused, as I was not expecting this param to affect the suggestion dictionary. So I would like a clarification here from our masters :) At this point let's see what happens at query time . *Query Time * As my understanding the ngrams params will consider the last N-1 tokens the user put separated by the space separator. Builds an ngram model from the text sent to {@link * #build} and predicts based on the last grams-1 tokens in * the request sent to {@link #lookup}. This tries to * handle the long tail of suggestions for when the * incoming query is a never before seen query string. Example , grams=3 should consider only the last 2 tokens special mp3 p - mp3 p Then this query is analysed using the suggestFreeTextAnalyzerFieldType . We produce 3 tokens : mp3 p mp3 p And we run the prefix matching on the FST . *Conclusion* My understanding is wrong for sure at some point, as the behaviour I get is different. Can we discuss this , clarify this and eventually put it in the official documentation ? Cheers 2015-06-19 6:40 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com: I'm implementing an auto-suggest feature in Solr, and I'll like to achieve the follwing: For example, if the user enters mp3, Solr might suggest mp3 player, mp3 nano and mp3 music. When the user enters mp3 p, the suggestion should narrow down to mp3 player. Currently, when I type mp3 p, the suggester is returning words that starts with the letter p only, and I'm getting results like plan, production, etc, and it does not take the mp3 token into consideration. I'm using Solr 5.1 and below is my configuration: In solrconfig.xml: searchComponent name=suggest class=solr.SuggestComponent lst name=suggester str name=lookupImplFreeTextLookupFactory/str str name=indexPathsuggester_freetext_dir/str str name=dictionaryImplDocumentDictionaryFactory/str str name=fieldSuggestion/str str name=weightFieldProject/str str name=suggestFreeTextAnalyzerFieldTypesuggestType/str int name=ngrams5/int str name=buildOnStartupfalse/str str name=buildOnCommitfalse/str /lst /searchComponent In schema.xml fieldType name=suggestType class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.PatternReplaceCharFilterFactory pattern=[^a-zA-Z0-9] replacement= / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ShingleFilterFactory minShingleSize=2 maxShingleSize=6 outputUnigrams=false/ /analyzer analyzer type=query charFilter class=solr.PatternReplaceCharFilterFactory pattern=[^a-zA-Z0-9] replacement= / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ShingleFilterFactory minShingleSize=2 maxShingleSize=6 outputUnigrams=true/ /analyzer /fieldType Is there anything that I configured wrongly? Regards, Edwin -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Auto-suggest in Solr
I'm implementing an auto-suggest feature in Solr, and I'll like to achieve the follwing: For example, if the user enters mp3, Solr might suggest mp3 player, mp3 nano and mp3 music. When the user enters mp3 p, the suggestion should narrow down to mp3 player. Currently, when I type mp3 p, the suggester is returning words that starts with the letter p only, and I'm getting results like plan, production, etc, and it does not take the mp3 token into consideration. I'm using Solr 5.1 and below is my configuration: In solrconfig.xml: searchComponent name=suggest class=solr.SuggestComponent lst name=suggester str name=lookupImplFreeTextLookupFactory/str str name=indexPathsuggester_freetext_dir/str str name=dictionaryImplDocumentDictionaryFactory/str str name=fieldSuggestion/str str name=weightFieldProject/str str name=suggestFreeTextAnalyzerFieldTypesuggestType/str int name=ngrams5/int str name=buildOnStartupfalse/str str name=buildOnCommitfalse/str /lst /searchComponent In schema.xml fieldType name=suggestType class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.PatternReplaceCharFilterFactory pattern=[^a-zA-Z0-9] replacement= / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ShingleFilterFactory minShingleSize=2 maxShingleSize=6 outputUnigrams=false/ /analyzer analyzer type=query charFilter class=solr.PatternReplaceCharFilterFactory pattern=[^a-zA-Z0-9] replacement= / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ShingleFilterFactory minShingleSize=2 maxShingleSize=6 outputUnigrams=true/ /analyzer /fieldType Is there anything that I configured wrongly? Regards, Edwin