[jira] Updated: (LUCENE-1536) if a filter can support random access API, we should use it
[ https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-1536: - Attachment: LUCENE-1536.patch * Filter has a parameter for included deletes. IndexSearcher uses it for setting the RandomAccessDocIdSet on the Scorers * MatchAllDocsScorer in addition to TermScorer properly supports setting a RandomAccessDocIdSet * Added more test cases * AndRandomAccessDocIdSet.iterator() and BitVector.iterator() needs to be implemented if a filter can support random access API, we should use it --- Key: LUCENE-1536 URL: https://issues.apache.org/jira/browse/LUCENE-1536 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.4 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch I ran some performance tests, comparing applying a filter via random-access API instead of current trunk's iterator API. This was inspired by LUCENE-1476, where we realized deletions should really be implemented just like a filter, but then in testing found that switching deletions to iterator was a very sizable performance hit. Some notes on the test: * Index is first 2M docs of Wikipedia. Test machine is Mac OS X 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153. * I test across multiple queries. 1-X means an OR query, eg 1-4 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2 AND 3 AND 4. u s means united states (phrase search). * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90, 95, 98, 99, 99.9 (filter is non-null but all bits are set), 100 (filter=null, control)). * Method high means I use random-access filter API in IndexSearcher's main loop. Method low means I use random-access filter API down in SegmentTermDocs (just like deleted docs today). * Baseline (QPS) is current trunk, where filter is applied as iterator up high (ie in IndexSearcher's search loop). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12701261#action_12701261 ] Robert Muir commented on LUCENE-1606: - found this interesting article applicable to this query: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.16.652 We show how to compute, for any fixed bound n and any input word W, a deterministic Levenshtein-automaton of degree n for W in time linear in the length of W. Automaton Query/Filter (scalable regex) --- Key: LUCENE-1606 URL: https://issues.apache.org/jira/browse/LUCENE-1606 Project: Lucene - Java Issue Type: New Feature Components: contrib/* Reporter: Robert Muir Priority: Minor Fix For: 2.9 Attachments: automaton.patch, automatonMultiQuery.patch, automatonmultiqueryfuzzy.patch, automatonMultiQuerySmart.patch, automatonWithWildCard.patch, automatonWithWildCard2.patch Attached is a patch for an AutomatonQuery/Filter (name can change if its not suitable). Whereas the out-of-box contrib RegexQuery is nice, I have some very large indexes (100M+ unique tokens) where queries are quite slow, 2 minutes, etc. Additionally all of the existing RegexQuery implementations in Lucene are really slow if there is no constant prefix. This implementation does not depend upon constant prefix, and runs the same query in 640ms. Some use cases I envision: 1. lexicography/etc on large text corpora 2. looking for things such as urls where the prefix is not constant (http:// or ftp://) The Filter uses the BRICS package (http://www.brics.dk/automaton/) to convert regular expressions into a DFA. Then, the filter enumerates terms in a special way, by using the underlying state machine. Here is my short description from the comments: The algorithm here is pretty basic. Enumerate terms but instead of a binary accept/reject do: 1. Look at the portion that is OK (did not enter a reject state in the DFA) 2. Generate the next possible String and seek to that. the Query simply wraps the filter with ConstantScoreQuery. I did not include the automaton.jar inside the patch but it can be downloaded from http://www.brics.dk/automaton/ and is BSD-licensed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12701279#action_12701279 ] Eks Dev commented on LUCENE-1606: - Robert, in order for Lev. Automata to work, you need to have the complete dictionary as DFA. Once you have dictionary as DFA (or any sort of trie), computing simple regex-s or simple fixed or weighted Levenshtein distance becomes a snap. Levenshtein-Automata is particularity fast at it, much simpler and only slightly slower method (one pager code) K.Oflazerhttp://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.136.3862 As said, you cannot really walk current term dictionary as automata/trie (or you have an idea on how to do that?). I guess there is enough application where stoing complete Term dictionary into RAM-DFA is not a problem. Even making some smart (heavily cached) persistent trie/DFA should not be all that complex. Or you intended just to iterate all terms, and compute distance faster break LD Matrix computation as soon as you see you hit the boundary? But this requires iteration over all terms? I have done something similar, in memory, but unfortunately someone else paid me for this and is not willing to share... Automaton Query/Filter (scalable regex) --- Key: LUCENE-1606 URL: https://issues.apache.org/jira/browse/LUCENE-1606 Project: Lucene - Java Issue Type: New Feature Components: contrib/* Reporter: Robert Muir Priority: Minor Fix For: 2.9 Attachments: automaton.patch, automatonMultiQuery.patch, automatonmultiqueryfuzzy.patch, automatonMultiQuerySmart.patch, automatonWithWildCard.patch, automatonWithWildCard2.patch Attached is a patch for an AutomatonQuery/Filter (name can change if its not suitable). Whereas the out-of-box contrib RegexQuery is nice, I have some very large indexes (100M+ unique tokens) where queries are quite slow, 2 minutes, etc. Additionally all of the existing RegexQuery implementations in Lucene are really slow if there is no constant prefix. This implementation does not depend upon constant prefix, and runs the same query in 640ms. Some use cases I envision: 1. lexicography/etc on large text corpora 2. looking for things such as urls where the prefix is not constant (http:// or ftp://) The Filter uses the BRICS package (http://www.brics.dk/automaton/) to convert regular expressions into a DFA. Then, the filter enumerates terms in a special way, by using the underlying state machine. Here is my short description from the comments: The algorithm here is pretty basic. Enumerate terms but instead of a binary accept/reject do: 1. Look at the portion that is OK (did not enter a reject state in the DFA) 2. Generate the next possible String and seek to that. the Query simply wraps the filter with ConstantScoreQuery. I did not include the automaton.jar inside the patch but it can be downloaded from http://www.brics.dk/automaton/ and is BSD-licensed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12701285#action_12701285 ] Robert Muir commented on LUCENE-1606: - eks: the AutomatonTermEnumerator in this patch does walk the term dictionary according to the transitions present in the DFA. Thats what this JIRA issue is all about to me, not iterating all the terms! So you do not need the complete dictionary as a DFA. for example: a regexp query of (a|b)cdefg with this patch seeks to 'acdefg', then 'bcdefg', as opposed to the current regex support which exhaustively enumerates all terms. slightly more complex example, query of (a|b)cd*efg first seeks to 'acd' (because of kleen star operator). suppose it then encounters term 'acda', it will next seek to 'acdd', etc. if it encounters 'acdf', then next it seeks to 'bcd'. this patch implements regex, wildcard, and fuzzy with n=1 in terms of this enumeration. what it doesnt do is fuzzy with arbitrary n!. I used the simplistic quadratic method to compute a DFA for fuzzy with n=1 for the FuzzyAutomatonQuery present in this patch, the paper has a more complicate but linear method to compute the DFA. Automaton Query/Filter (scalable regex) --- Key: LUCENE-1606 URL: https://issues.apache.org/jira/browse/LUCENE-1606 Project: Lucene - Java Issue Type: New Feature Components: contrib/* Reporter: Robert Muir Priority: Minor Fix For: 2.9 Attachments: automaton.patch, automatonMultiQuery.patch, automatonmultiqueryfuzzy.patch, automatonMultiQuerySmart.patch, automatonWithWildCard.patch, automatonWithWildCard2.patch Attached is a patch for an AutomatonQuery/Filter (name can change if its not suitable). Whereas the out-of-box contrib RegexQuery is nice, I have some very large indexes (100M+ unique tokens) where queries are quite slow, 2 minutes, etc. Additionally all of the existing RegexQuery implementations in Lucene are really slow if there is no constant prefix. This implementation does not depend upon constant prefix, and runs the same query in 640ms. Some use cases I envision: 1. lexicography/etc on large text corpora 2. looking for things such as urls where the prefix is not constant (http:// or ftp://) The Filter uses the BRICS package (http://www.brics.dk/automaton/) to convert regular expressions into a DFA. Then, the filter enumerates terms in a special way, by using the underlying state machine. Here is my short description from the comments: The algorithm here is pretty basic. Enumerate terms but instead of a binary accept/reject do: 1. Look at the portion that is OK (did not enter a reject state in the DFA) 2. Generate the next possible String and seek to that. the Query simply wraps the filter with ConstantScoreQuery. I did not include the automaton.jar inside the patch but it can be downloaded from http://www.brics.dk/automaton/ and is BSD-licensed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)
Q - Original Message - From: java-dev-return-32502-michael.j.goddard=saic@lucene.apache.org java-dev-return-32502-michael.j.goddard=saic@lucene.apache.org To: java-dev@lucene.apache.org java-dev@lucene.apache.org Sent: Tue Apr 21 18:02:47 2009 Subject: [jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex) [ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12701285#action_12701285 ] Robert Muir commented on LUCENE-1606: - eks: the AutomatonTermEnumerator in this patch does walk the term dictionary according to the transitions present in the DFA. Thats what this JIRA issue is all about to me, not iterating all the terms! So you do not need the complete dictionary as a DFA. for example: a regexp query of (a|b)cdefg with this patch seeks to 'acdefg', then 'bcdefg', as opposed to the current regex support which exhaustively enumerates all terms. slightly more complex example, query of (a|b)cd*efg first seeks to 'acd' (because of kleen star operator). suppose it then encounters term 'acda', it will next seek to 'acdd', etc. if it encounters 'acdf', then next it seeks to 'bcd'. this patch implements regex, wildcard, and fuzzy with n=1 in terms of this enumeration. what it doesnt do is fuzzy with arbitrary n!. I used the simplistic quadratic method to compute a DFA for fuzzy with n=1 for the FuzzyAutomatonQuery present in this patch, the paper has a more complicate but linear method to compute the DFA. Automaton Query/Filter (scalable regex) --- Key: LUCENE-1606 URL: https://issues.apache.org/jira/browse/LUCENE-1606 Project: Lucene - Java Issue Type: New Feature Components: contrib/* Reporter: Robert Muir Priority: Minor Fix For: 2.9 Attachments: automaton.patch, automatonMultiQuery.patch, automatonmultiqueryfuzzy.patch, automatonMultiQuerySmart.patch, automatonWithWildCard.patch, automatonWithWildCard2.patch Attached is a patch for an AutomatonQuery/Filter (name can change if its not suitable). Whereas the out-of-box contrib RegexQuery is nice, I have some very large indexes (100M+ unique tokens) where queries are quite slow, 2 minutes, etc. Additionally all of the existing RegexQuery implementations in Lucene are really slow if there is no constant prefix. This implementation does not depend upon constant prefix, and runs the same query in 640ms. Some use cases I envision: 1. lexicography/etc on large text corpora 2. looking for things such as urls where the prefix is not constant (http:// or ftp://) The Filter uses the BRICS package (http://www.brics.dk/automaton/) to convert regular expressions into a DFA. Then, the filter enumerates terms in a special way, by using the underlying state machine. Here is my short description from the comments: The algorithm here is pretty basic. Enumerate terms but instead of a binary accept/reject do: 1. Look at the portion that is OK (did not enter a reject state in the DFA) 2. Generate the next possible String and seek to that. the Query simply wraps the filter with ConstantScoreQuery. I did not include the automaton.jar inside the patch but it can be downloaded from http://www.brics.dk/automaton/ and is BSD-licensed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12701298#action_12701298 ] Eks Dev commented on LUCENE-1606: - hmmm, sounds like good idea, but I am still not convinced it would work for Fuzzy take simple dictionary: one two three four query Term is, e.g. ana, right? and n=1, means your DFA would be: {.na, a.a, an., an, na, ana, .ana, ana., a.na, an.a, ana.} where dot represents any character in you alphabet. For the first element in DFA (in expanded form) you need to visit all terms, no matter how you walk DFA... or am I missing something? Where you could save time is actual calculation of LD Matrix for terms that do not pass automata Automaton Query/Filter (scalable regex) --- Key: LUCENE-1606 URL: https://issues.apache.org/jira/browse/LUCENE-1606 Project: Lucene - Java Issue Type: New Feature Components: contrib/* Reporter: Robert Muir Priority: Minor Fix For: 2.9 Attachments: automaton.patch, automatonMultiQuery.patch, automatonmultiqueryfuzzy.patch, automatonMultiQuerySmart.patch, automatonWithWildCard.patch, automatonWithWildCard2.patch Attached is a patch for an AutomatonQuery/Filter (name can change if its not suitable). Whereas the out-of-box contrib RegexQuery is nice, I have some very large indexes (100M+ unique tokens) where queries are quite slow, 2 minutes, etc. Additionally all of the existing RegexQuery implementations in Lucene are really slow if there is no constant prefix. This implementation does not depend upon constant prefix, and runs the same query in 640ms. Some use cases I envision: 1. lexicography/etc on large text corpora 2. looking for things such as urls where the prefix is not constant (http:// or ftp://) The Filter uses the BRICS package (http://www.brics.dk/automaton/) to convert regular expressions into a DFA. Then, the filter enumerates terms in a special way, by using the underlying state machine. Here is my short description from the comments: The algorithm here is pretty basic. Enumerate terms but instead of a binary accept/reject do: 1. Look at the portion that is OK (did not enter a reject state in the DFA) 2. Generate the next possible String and seek to that. the Query simply wraps the filter with ConstantScoreQuery. I did not include the automaton.jar inside the patch but it can be downloaded from http://www.brics.dk/automaton/ and is BSD-licensed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12701303#action_12701303 ] Robert Muir commented on LUCENE-1606: - eks, well it does work well for fuzzy n=1 (I have tested against my huge index). for your simple dictionary it will do 3 comparisons instead of 4. this is because your simple dictionary is sorted in the index as such: four one three two when it encounters 'three' it will next ask for a TermEnum(una) which will return null. give it a try on a big dictionary, you might be surprised :) -- Robert Muir rcm...@gmail.com Automaton Query/Filter (scalable regex) --- Key: LUCENE-1606 URL: https://issues.apache.org/jira/browse/LUCENE-1606 Project: Lucene - Java Issue Type: New Feature Components: contrib/* Reporter: Robert Muir Priority: Minor Fix For: 2.9 Attachments: automaton.patch, automatonMultiQuery.patch, automatonmultiqueryfuzzy.patch, automatonMultiQuerySmart.patch, automatonWithWildCard.patch, automatonWithWildCard2.patch Attached is a patch for an AutomatonQuery/Filter (name can change if its not suitable). Whereas the out-of-box contrib RegexQuery is nice, I have some very large indexes (100M+ unique tokens) where queries are quite slow, 2 minutes, etc. Additionally all of the existing RegexQuery implementations in Lucene are really slow if there is no constant prefix. This implementation does not depend upon constant prefix, and runs the same query in 640ms. Some use cases I envision: 1. lexicography/etc on large text corpora 2. looking for things such as urls where the prefix is not constant (http:// or ftp://) The Filter uses the BRICS package (http://www.brics.dk/automaton/) to convert regular expressions into a DFA. Then, the filter enumerates terms in a special way, by using the underlying state machine. Here is my short description from the comments: The algorithm here is pretty basic. Enumerate terms but instead of a binary accept/reject do: 1. Look at the portion that is OK (did not enter a reject state in the DFA) 2. Generate the next possible String and seek to that. the Query simply wraps the filter with ConstantScoreQuery. I did not include the automaton.jar inside the patch but it can be downloaded from http://www.brics.dk/automaton/ and is BSD-licensed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12701304#action_12701304 ] Robert Muir commented on LUCENE-1606: - eks in your example it does three comparisons instead of four (not much of a gain for this example, but a big gain on a real index) this is because it doesnt need to compare 'two', after encountering 'three' it requests TermEnum(uana), which returns null. i hope you can see how this helps for a large index... (or i can try to construct a more realistic example) Automaton Query/Filter (scalable regex) --- Key: LUCENE-1606 URL: https://issues.apache.org/jira/browse/LUCENE-1606 Project: Lucene - Java Issue Type: New Feature Components: contrib/* Reporter: Robert Muir Priority: Minor Fix For: 2.9 Attachments: automaton.patch, automatonMultiQuery.patch, automatonmultiqueryfuzzy.patch, automatonMultiQuerySmart.patch, automatonWithWildCard.patch, automatonWithWildCard2.patch Attached is a patch for an AutomatonQuery/Filter (name can change if its not suitable). Whereas the out-of-box contrib RegexQuery is nice, I have some very large indexes (100M+ unique tokens) where queries are quite slow, 2 minutes, etc. Additionally all of the existing RegexQuery implementations in Lucene are really slow if there is no constant prefix. This implementation does not depend upon constant prefix, and runs the same query in 640ms. Some use cases I envision: 1. lexicography/etc on large text corpora 2. looking for things such as urls where the prefix is not constant (http:// or ftp://) The Filter uses the BRICS package (http://www.brics.dk/automaton/) to convert regular expressions into a DFA. Then, the filter enumerates terms in a special way, by using the underlying state machine. Here is my short description from the comments: The algorithm here is pretty basic. Enumerate terms but instead of a binary accept/reject do: 1. Look at the portion that is OK (did not enter a reject state in the DFA) 2. Generate the next possible String and seek to that. the Query simply wraps the filter with ConstantScoreQuery. I did not include the automaton.jar inside the patch but it can be downloaded from http://www.brics.dk/automaton/ and is BSD-licensed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-1608) CustomScoreQuery should support arbitrary Queries
CustomScoreQuery should support arbitrary Queries - Key: LUCENE-1608 URL: https://issues.apache.org/jira/browse/LUCENE-1608 Project: Lucene - Java Issue Type: New Feature Components: Query/Scoring Reporter: Steven Bethard Priority: Minor CustomScoreQuery only allows the secondary queries to be of type ValueSourceQuery instead of allowing them to be any type of Query. As a result, what you can do with CustomScoreQuery is pretty limited. It would be nice to extend CustomScoreQuery to allow arbitrary Query objects. Most of the code should stay about the same, though a little more care would need to be taken in CustomScorer.score() to use 0.0 when the sub-scorer does not produce a score for the current document. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1550) Add N-Gram String Matching for Spell Checking
[ https://issues.apache.org/jira/browse/LUCENE-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Morton updated LUCENE-1550: -- Attachment: LUCENE-1550.patch Fixes the empty string case. Adds some additional unit tests. Add N-Gram String Matching for Spell Checking - Key: LUCENE-1550 URL: https://issues.apache.org/jira/browse/LUCENE-1550 Project: Lucene - Java Issue Type: New Feature Components: contrib/spellchecker Affects Versions: 2.9 Reporter: Thomas Morton Assignee: Grant Ingersoll Priority: Minor Fix For: 2.9 Attachments: LUCENE-1550.patch, LUCENE-1550.patch, LUCENE-1550.patch N-Gram version of edit distance based on paper by Grzegorz Kondrak, N-gram similarity and distance. Proceedings of the Twelfth International Conference on String Processing and Information Retrieval (SPIRE 2005), pp. 115-126, Buenos Aires, Argentina, November 2005. http://www.cs.ualberta.ca/~kondrak/papers/spire05.pdf -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1550) Add N-Gram String Matching for Spell Checking
[ https://issues.apache.org/jira/browse/LUCENE-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12701362#action_12701362 ] Thomas Morton commented on LUCENE-1550: --- The implementations returns a normalized edit distance (normalized by string length) and specifically 1 if the strings are the same and 0 if that are maximally different. 0 in that case makes sense as the number of edits is equal to the number of characters in the longest string, so: 1- (2 edits /2 length) = 0 Add N-Gram String Matching for Spell Checking - Key: LUCENE-1550 URL: https://issues.apache.org/jira/browse/LUCENE-1550 Project: Lucene - Java Issue Type: New Feature Components: contrib/spellchecker Affects Versions: 2.9 Reporter: Thomas Morton Assignee: Grant Ingersoll Priority: Minor Fix For: 2.9 Attachments: LUCENE-1550.patch, LUCENE-1550.patch, LUCENE-1550.patch N-Gram version of edit distance based on paper by Grzegorz Kondrak, N-gram similarity and distance. Proceedings of the Twelfth International Conference on String Processing and Information Retrieval (SPIRE 2005), pp. 115-126, Buenos Aires, Argentina, November 2005. http://www.cs.ualberta.ca/~kondrak/papers/spire05.pdf -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org