You may find this thread useful: http://www.gossamer-threads.com/ lists/lucene/java-user/47824?search_string=record%20linkage;#47824
although it doesn't answer all your ?'s

I think in the end you will need to do post processing on the results, but maybe not.

On Jun 29, 2007, at 11:41 AM, Darren Hartford wrote:

Hey all,
As you can tell by the subject, interested in 'name searching' and
'nearby name' searching.  Scenarios include Geneology and
Similar-Person-from-Different-Datasources matchings.  Assuming
java-based lucene, and more than likely the Solr project.

*nickname: would it be feasible to create an Analyzer that will tie to
an external/internal nickname datasource (datasource would vary
dramatically based on nationality).  Usecase:  Jon, John, Johnny,
Jonathan would have 'weight' in the relevance.  Similarly 'Dick',
'Chuck', and 'Charles'.

Maybe you could inject these as synonyms?



*levenstein distance: This is why I'm looking at lucene and the related
Solr project - levenstein already exists, but seems to be a separate
query/relevance metric.  Is there a way to add to the overall weight
across multiple Analyzers? Sorry, very ignorant of the capabilities and
am curious.

not sure what it means to add the weight across multiple Analyzers, what you are thinking of doing?





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to