[ 
https://issues.apache.org/jira/browse/LUCENE-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661314#action_12661314
 ] 

Robert Muir commented on LUCENE-1513:
-------------------------------------

otis, discussion was on java-user.

again, I apologize for the messy code. as mentioned there, my setup is very 
specific to exactly what I am doing and in no way is this code ready. But since 
i'm currently pretty busy with other things at work I just wanted to put 
something up here for anyone else interested.

theres the issues you mentioned, and also some i mentioned on java-user. for 
example how to handle updates to indexes that introduce new terms (they must be 
added to auxiliary index), or even if auxiliary index is the best approach.

the general idea is that instead of enumerating terms to find terms, the 
deletion neighborhood as described in the paper is used instead. this way 
search time is not linear based on number of terms. yes you are correct that it 
only can guarantee edit distances of K which is determined at index time. 
perhaps this should be configurable, but i hardcoded k=1 for simplicity. i 
think its something like 80% of typos...

as i mentioned on the list another idea is you could implement FastSS (not the 
wC variant) with deletion positions maybe by using payloads. This would require 
more space but eliminate the candidate verification step. maybe it would be 
nice to have some of their other algorithms such as block-based,etc available 
also. 



> fastss fuzzyquery
> -----------------
>
>                 Key: LUCENE-1513
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1513
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/*
>            Reporter: Robert Muir
>            Priority: Minor
>         Attachments: fastSSfuzzy.zip
>
>
> code for doing fuzzyqueries with fastssWC algorithm.
> FuzzyIndexer: given a lucene field, it enumerates all terms and creates an 
> auxiliary offline index for fuzzy queries.
> FastFuzzyQuery: similar to fuzzy query except it queries the auxiliary index 
> to retrieve a candidate list. this list is then verified with levenstein 
> algorithm.
> sorry but the code is a bit messy... what I'm actually using is very 
> different from this so its pretty much untested. but at least you can see 
> whats going on or fix it up.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to