Hi All,

 

I am using Jarowinkler scoring in my current project for matching names. The
database of names against which the inputted value has to be matched is huge
and thus we are faced with performance issues.

 

We now want lucene to help us here; we want lucene's speed for handling huge
data but still want to depend on Jarowinkler for its scoring.

 

Some sample data that we expect a match between are, such as:

*       'Lowery, Betty' and 'Lowery, Betty Sue'
*       'Malik, Mohammad Saleem'  and 'Malik, Mohammad Salim'

 

The questions I have are as follows:

*       What, in lucene, should be used for doing a match between such
Strings where a cut-off score is also to be provided while matching them? I
tried using fuzzy query in the following way but didn't know how to work
with it when the String to be matched has more than one word.

 

        QueryParser queryParser = new QueryParser(Constants.INDEX_KEY, new
StandardAnalyzer());

        query = queryParser.parse(strSearchString +
Constants.LUCENE_FUZZY_QUERY_SYMBOL + Constants.MATCH_CUT_OFF);

 

*       Is Lucene scoring to be used here or simply the fuzzy query?
*       Is there any way I can override the way Lucene's logic for String
match and use Jarowinkler there instead. 

 

 

 

Any help will be appreciated.

 

Thanks

Shivani Sawhney  

 

Reply via email to