Re: Autosuggest on PART of cityname

2010-08-23 Thread gwk

 On 8/20/2010 7:04 PM, PeterKerk wrote:

@Markus: thanks, will try to work with that.

@Gijs: I've looked at the site and the search function on your homepage is
EXACTLY what I need! Do you have some Solr code samples for me to study
perhaps? (I just need the relevant fields in the schema.xml and the query
url) It would help me a lot! :)

Thanks to you both!

The fields in our schema are:
field name=id type=string indexed=true stored=true 
required=true /

- Just an id based on type, depth and a number, not important
field name=type type=string indexed=true stored=true 
required=true /
- This is either buy or rent as our sections have separate 
autocompleters

field name=depth type=string indexed=true stored=true /
- Since you can search by country, region or city, this stores 
the type of this document (well, since we use geonames.org geographical 
data we actually have 4 regions)

field name=name type=text indexed=true stored=true /
- The canonical name of the country/region/city
dynamicField name=name_* type=text indexed=true stored=true /
- The name of the country/region/city in various languages
field name=parent type=text indexed=true stored=true /
- The name of the country/region/city with any of it's parents 
comma separated, this is used for phrase searches so if you enter 
Amsterdam, Netherlands the dutch Amsterdam will match before any of 
the Amsterdams in other countries.

dynamicField name=parent_* type=text indexed=true stored=true /
- The same as parent but in different languages
field name=data type=string indexed=false stored=true /
- This is some internal data used to create the correct filters 
when this particular suggestion is selected

dynamicField name=data_* type=text indexed=true stored=true /
- The same as parent but in different languages, as our filters 
are on the actual name of countries/regions/cities

field name=count type=tint indexed=true stored=true /
- The number of documents, i.e. the number on the right of the 
suggestions

field name=names type=text indexed=true multiValued=true /
- Multivalued field which is copyfield-ed from name and name_*
field name=parents type=text indexed=true multiValued=true /
- Multivalued field which is copyfield-ed from parent and parent_*

Where text is
fieldType name=text class=solr.TextField positionIncrementGap=100
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=1 catenateNumbers=1 
catenateAll=0 splitOnCaseChange=1/

filter class=solr.ASCIIFoldingFilterFactory /
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EdgeNGramFilterFactory minGramSize=1 
maxGramSize=30/

/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=0 catenateNumbers=0 
catenateAll=0 splitOnCaseChange=1/

filter class=solr.ASCIIFoldingFilterFactory /
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType


Our autocompletion requests are dismax request where the most important 
parameters are:

- q=the text the user has entered into the searchbox so far
- fq=type:sale (or rent)
- qf=name_lang^4 name^4 names (Where lang is the currently selected 
language on the website)

- pf=name_lang^4 name^4 names parents

Honestly, those parameters are basically just tweaked without quite 
understanding their meaning until I got something that worked 
adequately. Hope this helps.


Regards,

gwk


RE: Autosuggest on PART of cityname

2010-08-20 Thread PeterKerk

Ok, I now do this (searching for utr in cityname):
http://localhost:8983/solr/db/select/?wt=jsonindent=onq=*:*rows=0facet=truefacet.field=cityfacet.prefix=utr

In the DB there's 1 location with cityname 'Utrecht' and the other 1 is with
'Utrecht Overvecht'

So in my dropdown I would like:
Utrecht (1)
Utrecht Overvecht (1)

But I get this:
{
 responseHeader:{
  status:0,
  QTime:0,
  params:{
facet:true,
indent:on,
q:*:*,
facet.prefix:utr,
facet.field:city,
wt:json,
rows:0}},
 response:{numFound:6,start:0,docs:[]
 },
 facet_counts:{
  facet_queries:{},
  facet_fields:{
city:[
 utrecht,2,
 utrechtovervecht,1]},
  facet_dates:{}}}

As you can see it looks at field city, where the tokenizer looks at each
individual word. I also tried city_raw, but that was without any results.

How can I fix that my dropdown will show the correct values?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Autosuggest-on-PART-of-cityname-tp1226088p1241444.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Autosuggest on PART of cityname

2010-08-20 Thread Markus Jelsma
You can't, it's analyzed. And if you facet on a non-analyzed field, you cannot 
distinguish between upper- and lowercase tokens. If you want that, you must 
create a new field with an EdgeNGramTokenizer, search on it and then you can 
facet on a non-analyzed field. Your query will be a bit different then:

 

q=new_ngram_field:utr

rows=0

facet=true

facet.field=non_analyzed_city_field

 

 
-Original message-
From: PeterKerk vettepa...@hotmail.com
Sent: Fri 20-08-2010 12:36
To: solr-user@lucene.apache.org; 
Subject: RE: Autosuggest on PART of cityname


Ok, I now do this (searching for utr in cityname):
http://localhost:8983/solr/db/select/?wt=jsonindent=onq=*:*rows=0facet=truefacet.field=cityfacet.prefix=utr

In the DB there's 1 location with cityname 'Utrecht' and the other 1 is with
'Utrecht Overvecht'

So in my dropdown I would like:
Utrecht (1)
Utrecht Overvecht (1)

But I get this:
{
responseHeader:{
 status:0,
 QTime:0,
 params:{
facet:true,
indent:on,
q:*:*,
facet.prefix:utr,
facet.field:city,
wt:json,
rows:0}},
response:{numFound:6,start:0,docs:[]
},
facet_counts:{
 facet_queries:{},
 facet_fields:{
city:[
utrecht,2,
utrechtovervecht,1]},
 facet_dates:{}}}

As you can see it looks at field city, where the tokenizer looks at each
individual word. I also tried city_raw, but that was without any results.

How can I fix that my dropdown will show the correct values?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Autosuggest-on-PART-of-cityname-tp1226088p1241444.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Autosuggest on PART of cityname

2010-08-20 Thread gwk

 On 8/19/2010 4:45 PM, PeterKerk wrote:

I want to have a Google-like autosuggest function on citynames. So when user
types some characters I want to show cities that match those characters but
ALSO the amount of locations that are in that city.

Now with Solr I now have the parameter:
fq=title:Bost

But the result doesnt show the city Boston. So the fq parameter now seems to
be an exact match, where I want it to be a partial match as well, more like
this in SQL: WHERE title LIKE 'value%'

How can I do this?




Hi,

We do something similar (http://www.mysecondhome.co.uk), our solution is 
quite similar to the one proposed by Markus however we use a separate 
core for the auto-completion data which is updated hourly, this is due 
to the fact you can complete on multiple levels of geography which would 
be quite hard to do with faceting.


Regards,

gwk


Re: Autosuggest on PART of cityname

2010-08-20 Thread PeterKerk

@Markus: thanks, will try to work with that.

@Gijs: I've looked at the site and the search function on your homepage is
EXACTLY what I need! Do you have some Solr code samples for me to study
perhaps? (I just need the relevant fields in the schema.xml and the query
url) It would help me a lot! :)

Thanks to you both!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Autosuggest-on-PART-of-cityname-tp1226088p1249313.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Autosuggest on PART of cityname

2010-08-19 Thread Markus Jelsma
You need a new analyzed field with the EdgeNGramTokenizer or you can try 
facet.prefix for this to work. To retrieve the number of locations for that 
city, just use the results from the faceting engine as usual.

 

I'm unsure which approach is actually faster but i'd guess using the 
EdgeNGramTokenizer is faster, but also takes up more disk space. Using the 
faceting engine will not take more disk space.
 
-Original message-
From: PeterKerk vettepa...@hotmail.com
Sent: Thu 19-08-2010 16:46
To: solr-user@lucene.apache.org; 
Subject: Autosuggest on PART of cityname


I want to have a Google-like autosuggest function on citynames. So when user
types some characters I want to show cities that match those characters but
ALSO the amount of locations that are in that city.

Now with Solr I now have the parameter:
fq=title:Bost

But the result doesnt show the city Boston. So the fq parameter now seems to
be an exact match, where I want it to be a partial match as well, more like
this in SQL: WHERE title LIKE 'value%'

How can I do this?



-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Autosuggest-on-PART-of-cityname-tp1226088p1226088.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Autosuggest on PART of cityname

2010-08-19 Thread PeterKerk

Ok, I now tried this:
http://localhost:8983/solr/db/select/?wt=jsonindent=onq=*:*fl=cityfacet.field=cityfacet.prefix=Bost

Then I get:
{
 responseHeader:{
  status:0,
  QTime:0,
  params:{
fl:city,
indent:on,
q:*:*,
facet.prefix:Bost,
facet.field:city,
wt:json}},
 response:{numFound:4,start:0,docs:[
{},
{},
{},
{}]
 }}


So 4 total results, but I would have expected 1

What am I doing wrong?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Autosuggest-on-PART-of-cityname-tp1226088p1226571.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Autosuggest on PART of cityname

2010-08-19 Thread Markus Jelsma
Hmm, you have only four documents in your index i guess? That would make sense 
because you query for *:*. This technique doesn't rely on the found documents 
but the faceting engine so you should include rows=0 in your query and the fl 
parameter is not required anymore. Also, add facet=true to enable the faceting 
engine.

 

http://localhost:8983/solr/db/select/?wt=jsonq=*:*rows=0facet=truefacet.field=cityfacet.prefix=bost


 
-Original message-
From: PeterKerk vettepa...@hotmail.com
Sent: Thu 19-08-2010 17:11
To: solr-user@lucene.apache.org; 
Subject: RE: Autosuggest on PART of cityname


Ok, I now tried this:
http://localhost:8983/solr/db/select/?wt=jsonindent=onq=*:*fl=cityfacet.field=cityfacet.prefix=Bost

Then I get:
{
responseHeader:{
 status:0,
 QTime:0,
 params:{
fl:city,
indent:on,
q:*:*,
facet.prefix:Bost,
facet.field:city,
wt:json}},
response:{numFound:4,start:0,docs:[
{},
{},
{},
{}]
}}


So 4 total results, but I would have expected 1

What am I doing wrong?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Autosuggest-on-PART-of-cityname-tp1226088p1226571.html
Sent from the Solr - User mailing list archive at Nabble.com.