Re: distinct on my result
okay. we have a lot of products and i just importet the name of each product to a core. make an edgengram to this and my autoCOMPLETION runs. but i want an auto-suggestion: example. autoCompletion-- I: harry O: harry potter... but when the input ist -- I. potter -- O: / so what i want is, that i get harry potter ... when i tipping potter into my search field! any idea ? i think the solution is a mixe of termsComponent and EdgeNGram or not ? i am a little bit despair, and in this forum are too many information about it =( gwk-4 wrote: Hi, The autosuggest core is filled by a simple script (written in PHP) which request facet values for all the possible strings one can search for and adds them one by one as a document. Our case has some special issues due to the fact that we search in multiple languages (Typing España will suggest Spain and the other way around when on the Spanish site). We have about 97500 documents yeilding approximately 12500 different documents in our autosuggest-core and the autosuggest-update script takes about 5 minutes to do a full re-index (all this is done on a separate server and replicated so the indexing has no impact on the performance of the site). Regards, gwk On 3/10/2010 3:09 PM, stocki wrote: okay. thx my suggestion run in another core;) do you distinct during the import with DIH ? -- View this message in context: http://old.nabble.com/distinct-on-my-result-tp27849951p27864088.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: distinct on my result
hey, okay i show your my settings ;) i use an extra core with the standard requesthandler. SCHEMA.XML field name=id type=string indexed=true stored=true required=true / field name=name type=textindexed=true stored=true required=true / field name=suggest type=autocomplete indexed=true stored=true multiValued=true/ copyField source=name dest=suggest/ so i copy my names to the field suggest and use the EdgeNGramFilter and some others fieldType name=autocomplete class=solr.TextField analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.EdgeNGramFilterFactory maxGramSize=100 minGramSize=1 / filter class=solr.StandardFilterFactory/ filter class=solr.TrimFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German2 protected=protwords.txt/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.EdgeNGramFilterFactory maxGramSize=100 minGramSize=1 / filter class=solr.StandardFilterFactory/ filter class=solr.TrimFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German2 protected=protwords.txt/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ /analyzer /fieldType so with this konfig i get the results above ... maybe i have t many filters ;) ?! gwk-4 wrote: Hi, I'm no expert on the full-text search features of Solr but I guess that has something to do with your fieldtype, or query. Are you using the standard request handler or dismax for your queries? And what analysers are you using on your product name field? Regards, gwk On 3/11/2010 3:24 PM, stocki wrote: okay. we have a lot of products and i just importet the name of each product to a core. make an edgengram to this and my autoCOMPLETION runs. but i want an auto-suggestion: example. autoCompletion--I: harry O: harry potter... but when the input ist -- I. potter -- O: / so what i want is, that i get harry potter ... when i tipping potter into my search field! any idea ? i think the solution is a mixe of termsComponent and EdgeNGram or not ? i am a little bit despair, and in this forum are too many information about it =( gwk-4 wrote: Hi, The autosuggest core is filled by a simple script (written in PHP) which request facet values for all the possible strings one can search for and adds them one by one as a document. Our case has some special issues due to the fact that we search in multiple languages (Typing España will suggest Spain and the other way around when on the Spanish site). We have about 97500 documents yeilding approximately 12500 different documents in our autosuggest-core and the autosuggest-update script takes about 5 minutes to do a full re-index (all this is done on a separate server and replicated so the indexing has no impact on the performance of the site). Regards, gwk On 3/10/2010 3:09 PM, stocki wrote: okay. thx my suggestion run in another core;) do you distinct during the import with DIH ? -- View this message in context: http://old.nabble.com/distinct-on-my-result-tp27849951p27865058.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: distinct on my result
Hi, Try replacing KeywordTokenizerFactory with a WhitespaceTokenizerFactory so it'll create separate terms per word. After a reindex it should work. Regards, gwk On 3/11/2010 4:33 PM, stocki wrote: hey, okay i show your my settings ;) i use an extra core with the standard requesthandler. SCHEMA.XML field name=id type=string indexed=true stored=true required=true / field name=name type=textindexed=true stored=true required=true / field name=suggest type=autocomplete indexed=true stored=true multiValued=true/ copyField source=name dest=suggest/ so i copy my names to the field suggest and use the EdgeNGramFilter and some others fieldType name=autocomplete class=solr.TextField analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.EdgeNGramFilterFactory maxGramSize=100 minGramSize=1 / filter class=solr.StandardFilterFactory/ filter class=solr.TrimFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German2 protected=protwords.txt/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.EdgeNGramFilterFactory maxGramSize=100 minGramSize=1 / filter class=solr.StandardFilterFactory/ filter class=solr.TrimFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German2 protected=protwords.txt/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ /analyzer /fieldType so with this konfig i get the results above ... maybe i have t many filters ;) ?! gwk-4 wrote: Hi, I'm no expert on the full-text search features of Solr but I guess that has something to do with your fieldtype, or query. Are you using the standard request handler or dismax for your queries? And what analysers are you using on your product name field? Regards, gwk On 3/11/2010 3:24 PM, stocki wrote: okay. we have a lot of products and i just importet the name of each product to a core. make an edgengram to this and my autoCOMPLETION runs. but i want an auto-suggestion: example. autoCompletion-- I: harry O: harry potter... but when the input ist -- I. potter -- O: / so what i want is, that i get harry potter ... when i tipping potter into my search field! any idea ? i think the solution is a mixe of termsComponent and EdgeNGram or not ? i am a little bit despair, and in this forum are too many information about it =( gwk-4 wrote: Hi, The autosuggest core is filled by a simple script (written in PHP) which request facet values for all the possible strings one can search for and adds them one by one as a document. Our case has some special issues due to the fact that we search in multiple languages (Typing España will suggest Spain and the other way around when on the Spanish site). We have about 97500 documents yeilding approximately 12500 different documents in our autosuggest-core and the autosuggest-update script takes about 5 minutes to do a full re-index (all this is done on a separate server and replicated so the indexing has no impact on the performance of the site). Regards, gwk On 3/10/2010 3:09 PM, stocki wrote: okay. thx my suggestion run in another core;) do you distinct during the import with DIH ?
distinct on my result
hello. i implement my suggest-function with edgengramfilter. now when i get my result , is the result not distinct. often ist the name double or more. is it possible that solr gives me only distinct result ? response:{numFound:172,start:0,docs:[ { name:Halloween}, { name:Hallo Taxi}, { name:Halloween}, { name:Hallstatt}, { name:Hallo Mary}, { name:Halloween}, { name:Halloween}, { name:Halloween}, { name:Halleluja}, { name:Halloween}] so how can i delete Halloween from solr ? i didnt want delete it from client-side thx -- View this message in context: http://old.nabble.com/distinct-on-my-result-tp27849951p27849951.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: distinct on my result
Hi, I ran into the same issue, and what I did (at http://www.mysecondhome.co.uk/) was to create a separate core just for autosuggest which is fully updated once an hour which contains the distinct values of the items I want to look for including the count so I can display the approximate amount of results in the suggest dropdown. This might not be a good solution when your data is updated frequently but for us it's worked very well so far. Maybe you can also use clustering so you won't have to create a separate core but I'm thinking my solution performs better (although I haven't tested it so I could be horribly horribly wrong). Regards, gwk On 3/10/2010 2:55 PM, stocki wrote: hello. i implement my suggest-function with edgengramfilter. now when i get my result , is the result not distinct. often ist the name double or more. is it possible that solr gives me only distinct result ? response:{numFound:172,start:0,docs:[ { name:Halloween}, { name:Hallo Taxi}, { name:Halloween}, { name:Hallstatt}, { name:Hallo Mary}, { name:Halloween}, { name:Halloween}, { name:Halloween}, { name:Halleluja}, { name:Halloween}] so how can i delete Halloween from solr ? i didnt want delete it from client-side thx
Re: distinct on my result
hey. okay. thx my suggestion run in another core ;) do you distinct during the import with DIH ? gwk-4 wrote: Hi, I ran into the same issue, and what I did (at http://www.mysecondhome.co.uk/) was to create a separate core just for autosuggest which is fully updated once an hour which contains the distinct values of the items I want to look for including the count so I can display the approximate amount of results in the suggest dropdown. This might not be a good solution when your data is updated frequently but for us it's worked very well so far. Maybe you can also use clustering so you won't have to create a separate core but I'm thinking my solution performs better (although I haven't tested it so I could be horribly horribly wrong). Regards, gwk On 3/10/2010 2:55 PM, stocki wrote: hello. i implement my suggest-function with edgengramfilter. now when i get my result , is the result not distinct. often ist the name double or more. is it possible that solr gives me only distinct result ? response:{numFound:172,start:0,docs:[ { name:Halloween}, { name:Hallo Taxi}, { name:Halloween}, { name:Hallstatt}, { name:Hallo Mary}, { name:Halloween}, { name:Halloween}, { name:Halloween}, { name:Halleluja}, { name:Halloween}] so how can i delete Halloween from solr ? i didnt want delete it from client-side thx -- View this message in context: http://old.nabble.com/distinct-on-my-result-tp27849951p27850157.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: distinct on my result
Hi, The autosuggest core is filled by a simple script (written in PHP) which request facet values for all the possible strings one can search for and adds them one by one as a document. Our case has some special issues due to the fact that we search in multiple languages (Typing España will suggest Spain and the other way around when on the Spanish site). We have about 97500 documents yeilding approximately 12500 different documents in our autosuggest-core and the autosuggest-update script takes about 5 minutes to do a full re-index (all this is done on a separate server and replicated so the indexing has no impact on the performance of the site). Regards, gwk On 3/10/2010 3:09 PM, stocki wrote: okay. thx my suggestion run in another core;) do you distinct during the import with DIH ?