Hi All,
I am using Jarowinkler scoring in my current project for matching names. The database of names against which the inputted value has to be matched is huge and thus we are faced with performance issues. We now want lucene to help us here; we want lucene's speed for handling huge data but still want to depend on Jarowinkler for its scoring. Some sample data that we expect a match between are, such as: * 'Lowery, Betty' and 'Lowery, Betty Sue' * 'Malik, Mohammad Saleem' and 'Malik, Mohammad Salim' The questions I have are as follows: * What, in lucene, should be used for doing a match between such Strings where a cut-off score is also to be provided while matching them? I tried using fuzzy query in the following way but didn't know how to work with it when the String to be matched has more than one word. QueryParser queryParser = new QueryParser(Constants.INDEX_KEY, new StandardAnalyzer()); query = queryParser.parse(strSearchString + Constants.LUCENE_FUZZY_QUERY_SYMBOL + Constants.MATCH_CUT_OFF); * Is Lucene scoring to be used here or simply the fuzzy query? * Is there any way I can override the way Lucene's logic for String match and use Jarowinkler there instead. Any help will be appreciated. Thanks Shivani Sawhney