Re: Autosuggest on PART of cityname
On 8/20/2010 7:04 PM, PeterKerk wrote: @Markus: thanks, will try to work with that. @Gijs: I've looked at the site and the search function on your homepage is EXACTLY what I need! Do you have some Solr code samples for me to study perhaps? (I just need the relevant fields in the schema.xml and the query url) It would help me a lot! :) Thanks to you both! The fields in our schema are: field name=id type=string indexed=true stored=true required=true / - Just an id based on type, depth and a number, not important field name=type type=string indexed=true stored=true required=true / - This is either buy or rent as our sections have separate autocompleters field name=depth type=string indexed=true stored=true / - Since you can search by country, region or city, this stores the type of this document (well, since we use geonames.org geographical data we actually have 4 regions) field name=name type=text indexed=true stored=true / - The canonical name of the country/region/city dynamicField name=name_* type=text indexed=true stored=true / - The name of the country/region/city in various languages field name=parent type=text indexed=true stored=true / - The name of the country/region/city with any of it's parents comma separated, this is used for phrase searches so if you enter Amsterdam, Netherlands the dutch Amsterdam will match before any of the Amsterdams in other countries. dynamicField name=parent_* type=text indexed=true stored=true / - The same as parent but in different languages field name=data type=string indexed=false stored=true / - This is some internal data used to create the correct filters when this particular suggestion is selected dynamicField name=data_* type=text indexed=true stored=true / - The same as parent but in different languages, as our filters are on the actual name of countries/regions/cities field name=count type=tint indexed=true stored=true / - The number of documents, i.e. the number on the right of the suggestions field name=names type=text indexed=true multiValued=true / - Multivalued field which is copyfield-ed from name and name_* field name=parents type=text indexed=true multiValued=true / - Multivalued field which is copyfield-ed from parent and parent_* Where text is fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.ASCIIFoldingFilterFactory / filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=30/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.ASCIIFoldingFilterFactory / filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Our autocompletion requests are dismax request where the most important parameters are: - q=the text the user has entered into the searchbox so far - fq=type:sale (or rent) - qf=name_lang^4 name^4 names (Where lang is the currently selected language on the website) - pf=name_lang^4 name^4 names parents Honestly, those parameters are basically just tweaked without quite understanding their meaning until I got something that worked adequately. Hope this helps. Regards, gwk
RE: Autosuggest on PART of cityname
Ok, I now do this (searching for utr in cityname): http://localhost:8983/solr/db/select/?wt=jsonindent=onq=*:*rows=0facet=truefacet.field=cityfacet.prefix=utr In the DB there's 1 location with cityname 'Utrecht' and the other 1 is with 'Utrecht Overvecht' So in my dropdown I would like: Utrecht (1) Utrecht Overvecht (1) But I get this: { responseHeader:{ status:0, QTime:0, params:{ facet:true, indent:on, q:*:*, facet.prefix:utr, facet.field:city, wt:json, rows:0}}, response:{numFound:6,start:0,docs:[] }, facet_counts:{ facet_queries:{}, facet_fields:{ city:[ utrecht,2, utrechtovervecht,1]}, facet_dates:{}}} As you can see it looks at field city, where the tokenizer looks at each individual word. I also tried city_raw, but that was without any results. How can I fix that my dropdown will show the correct values? -- View this message in context: http://lucene.472066.n3.nabble.com/Autosuggest-on-PART-of-cityname-tp1226088p1241444.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Autosuggest on PART of cityname
You can't, it's analyzed. And if you facet on a non-analyzed field, you cannot distinguish between upper- and lowercase tokens. If you want that, you must create a new field with an EdgeNGramTokenizer, search on it and then you can facet on a non-analyzed field. Your query will be a bit different then: q=new_ngram_field:utr rows=0 facet=true facet.field=non_analyzed_city_field -Original message- From: PeterKerk vettepa...@hotmail.com Sent: Fri 20-08-2010 12:36 To: solr-user@lucene.apache.org; Subject: RE: Autosuggest on PART of cityname Ok, I now do this (searching for utr in cityname): http://localhost:8983/solr/db/select/?wt=jsonindent=onq=*:*rows=0facet=truefacet.field=cityfacet.prefix=utr In the DB there's 1 location with cityname 'Utrecht' and the other 1 is with 'Utrecht Overvecht' So in my dropdown I would like: Utrecht (1) Utrecht Overvecht (1) But I get this: { responseHeader:{ status:0, QTime:0, params:{ facet:true, indent:on, q:*:*, facet.prefix:utr, facet.field:city, wt:json, rows:0}}, response:{numFound:6,start:0,docs:[] }, facet_counts:{ facet_queries:{}, facet_fields:{ city:[ utrecht,2, utrechtovervecht,1]}, facet_dates:{}}} As you can see it looks at field city, where the tokenizer looks at each individual word. I also tried city_raw, but that was without any results. How can I fix that my dropdown will show the correct values? -- View this message in context: http://lucene.472066.n3.nabble.com/Autosuggest-on-PART-of-cityname-tp1226088p1241444.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Autosuggest on PART of cityname
On 8/19/2010 4:45 PM, PeterKerk wrote: I want to have a Google-like autosuggest function on citynames. So when user types some characters I want to show cities that match those characters but ALSO the amount of locations that are in that city. Now with Solr I now have the parameter: fq=title:Bost But the result doesnt show the city Boston. So the fq parameter now seems to be an exact match, where I want it to be a partial match as well, more like this in SQL: WHERE title LIKE 'value%' How can I do this? Hi, We do something similar (http://www.mysecondhome.co.uk), our solution is quite similar to the one proposed by Markus however we use a separate core for the auto-completion data which is updated hourly, this is due to the fact you can complete on multiple levels of geography which would be quite hard to do with faceting. Regards, gwk
Re: Autosuggest on PART of cityname
@Markus: thanks, will try to work with that. @Gijs: I've looked at the site and the search function on your homepage is EXACTLY what I need! Do you have some Solr code samples for me to study perhaps? (I just need the relevant fields in the schema.xml and the query url) It would help me a lot! :) Thanks to you both! -- View this message in context: http://lucene.472066.n3.nabble.com/Autosuggest-on-PART-of-cityname-tp1226088p1249313.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Autosuggest on PART of cityname
You need a new analyzed field with the EdgeNGramTokenizer or you can try facet.prefix for this to work. To retrieve the number of locations for that city, just use the results from the faceting engine as usual. I'm unsure which approach is actually faster but i'd guess using the EdgeNGramTokenizer is faster, but also takes up more disk space. Using the faceting engine will not take more disk space. -Original message- From: PeterKerk vettepa...@hotmail.com Sent: Thu 19-08-2010 16:46 To: solr-user@lucene.apache.org; Subject: Autosuggest on PART of cityname I want to have a Google-like autosuggest function on citynames. So when user types some characters I want to show cities that match those characters but ALSO the amount of locations that are in that city. Now with Solr I now have the parameter: fq=title:Bost But the result doesnt show the city Boston. So the fq parameter now seems to be an exact match, where I want it to be a partial match as well, more like this in SQL: WHERE title LIKE 'value%' How can I do this? -- View this message in context: http://lucene.472066.n3.nabble.com/Autosuggest-on-PART-of-cityname-tp1226088p1226088.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Autosuggest on PART of cityname
Ok, I now tried this: http://localhost:8983/solr/db/select/?wt=jsonindent=onq=*:*fl=cityfacet.field=cityfacet.prefix=Bost Then I get: { responseHeader:{ status:0, QTime:0, params:{ fl:city, indent:on, q:*:*, facet.prefix:Bost, facet.field:city, wt:json}}, response:{numFound:4,start:0,docs:[ {}, {}, {}, {}] }} So 4 total results, but I would have expected 1 What am I doing wrong? -- View this message in context: http://lucene.472066.n3.nabble.com/Autosuggest-on-PART-of-cityname-tp1226088p1226571.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Autosuggest on PART of cityname
Hmm, you have only four documents in your index i guess? That would make sense because you query for *:*. This technique doesn't rely on the found documents but the faceting engine so you should include rows=0 in your query and the fl parameter is not required anymore. Also, add facet=true to enable the faceting engine. http://localhost:8983/solr/db/select/?wt=jsonq=*:*rows=0facet=truefacet.field=cityfacet.prefix=bost -Original message- From: PeterKerk vettepa...@hotmail.com Sent: Thu 19-08-2010 17:11 To: solr-user@lucene.apache.org; Subject: RE: Autosuggest on PART of cityname Ok, I now tried this: http://localhost:8983/solr/db/select/?wt=jsonindent=onq=*:*fl=cityfacet.field=cityfacet.prefix=Bost Then I get: { responseHeader:{ status:0, QTime:0, params:{ fl:city, indent:on, q:*:*, facet.prefix:Bost, facet.field:city, wt:json}}, response:{numFound:4,start:0,docs:[ {}, {}, {}, {}] }} So 4 total results, but I would have expected 1 What am I doing wrong? -- View this message in context: http://lucene.472066.n3.nabble.com/Autosuggest-on-PART-of-cityname-tp1226088p1226571.html Sent from the Solr - User mailing list archive at Nabble.com.