I have a database of company names and their addresses. The user will have 
already matched at the city and state level. I want to match the more free form 
user entered company name and address to one of the known mailing addresses. My 
thought was to create something that normalizes the various ways of writing 
"1st St", "First St", "First Street" to help with matching. Eventually I might 
look into using geocoding but my initial thought was to get things as 
self-correcting as possible using Lucene without adding another layer of 
potentially slow matching (the geocoding stuff).



----- Original Message ----
From: Robert Taintor <robert.c.tain...@gmail.com>
To: lucene-net-user@incubator.apache.org
Sent: Mon, November 2, 2009 8:18:52 PM
Subject: Re: I want to index mailing addresses...how can I map Ave to Avenue,  
St to Street, Ct to Court, etc.?

it might be better to use geocoding depending on your use case.  i know
there is a lucene spatial indexer that will let you search "nearby" but you
could also just index the geocode and use that to retrieve the record.

On Mon, Nov 2, 2009 at 8:13 PM, Ron Grabowski <rongrabow...@yahoo.com>wrote:

> I'm looking to index mailing addresses. I'd like to take into account these
> common abbreviations:
>
>  http://www.usps.com/ncsc/lookups/usps_abbreviations.html
>
> Would those be considered synonyms? I'm not exactly sure if I should use
> the WordNet modules or extend a built in analyzer and append my own filter.
>
> Has someone (in Java or .NET) already written a mailing address analyzer
> that handles normalizing things like "163 N 4th St" into "163 North Fourth
> Street"...if that's even a good thing to do?
>
>

Reply via email to