Re: Postcode/zipcode search

Grant Ingersoll Tue, 06 May 2008 09:39:01 -0700

You might have a look at using a phrase query when you have more thanone term in the query in addition to your term query, but giving thephrase query more weight (i.e. give an exact match more weight) andkeep your original tokenization process.


Something like:
"NW10 7NY"^5 OR NW10 OR 7NY

or even downweighting the individual terms. Thus, exact matches onthe full phrase will weigh much higher, and you can still doindividual term matching for the single term case (NW10)


-Grant

On May 6, 2008, at 12:28 PM, Chris Mannion wrote:

Hi all
I've got a bit of a niggling problem with how one of my searches isworkingas opposed to how my users would like it too work. We're indexingon UKpostcodes, which are in the format of a 3 or 4 character area codefollowedby a 3 or 4 character street specific code, e.g. "NW10 7NY" or "M111LQ".We originally had the values being indexed as tokenized and used averysimple search string in the format "postcode:xxx xxx", with nogrouping orboosting or fuzzy searching, just an straight search on whatever theuseranswered. This had the benefit of finding exact matches to searchesand
allowing us to search just on the area part of the code to return all
records with that area code, eg a search on "NW2" returning anything
starting NW2, like "NW2 6TB", "NW2 1ER" etc etc.
However, the downside to that was that searches could also returnrecordsonly tenuously related to what was searched for, eg. a search for"NW10 7NY"would also return a record with a postcode "SE9 6NY" because of theslight
match of the "NY".  Obviously this was technically correct but users
complained because their searches were returning records fromcompletely
different areas.  Our first step to put this right was to take off the
tokenization of the field, which we also weren't happy with so have
continued to fiddle.
The current status is as follows - we index the values by strippingoutspaces and tokeniing them and use a keywordAnalyzer. In searchingwe also
strip spaces from the search term entered and search with a
keywordAnalyzer. Searches for full postcodes, e.g. "NW10 7NY" findallexact matches but also any full values that are partial matches(e.g. somerecords just have "NW10" as their postcode field and the "NW10 7NY"searchpulls them back too), but searches for partial postcodes e.g. "NW10"stillonly finds exact matches, e.g. it only pulls back those record thathavejust "NW10" as their postcode, rather than anything *starting* withNW10 as
we'd like it to do.

Can anyone help me get this working in the way we need it too please?

--
Chris Mannion
iCasework and LocalAlert implementation team
0208 144 4416


--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ







---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Postcode/zipcode search

Reply via email to