Thanks for investigating this Ryan! Could you open a JIRA bug and maybe provide a patch? (and a testcase reproducing the problem would be great too).
-Yonik http://incubator.apache.org/solr Solr, the open-source Lucene search server On 11/14/06, Ryan Heinen <[EMAIL PROTECTED]> wrote:
Hello, I believe that I may have discovered a bug in the spellchecker contrib, specifically the LuceneDictionary (or SpellChecker, depending on how you look at it) class. I noticed while doing some testing in my own code that when I was running the indexDictionary method of the SpellChecker class it was always missing the first term (alphabetically) of the field that I specified. I did some investigating, and believe that I have determined the cause of the issue. When its getWordsIterator() method is invoked, LuceneDictionary instantiates a TermEnum by calling terms(new Field(field, "") on the IndexReader that it is provided. (field = the name of the field supplied to the LuceneDictionary) The LuceneDictionary.hasNext() method calls termEnum.next() to determine whether or not there are more terms left in the TermEnum. Unfortunately, because terms(Field) returns a TermEnum of all terms greater than the supplied term, the next biggest term is already set to be the current term of the TermEnum. Thus, because LuceneDictionary.hasNext() calls TermEnum.next() regardless of whether or not the first term has been read, loops that use the following structure, as the SpellChecker does, do have the expected results: while (iterator.hasNext()) { // obtain and do something with iterator.next(); } With data "abc", "def", "ghi", jkl" in the specified index & field, the loop will only execute 3 times, with "def", "ghi", "jkl" being the only values retrieved. One would expect that the loop should execute 4 times, with all four values ("abc", "def", "ghi", jkl") showing up in the loop. Has anyone encountered this problem before? Am I missing something, or should I report this as a bug? As far as I see it, the LuceneIterator should not be calling the next() method of it's underlying TermEnum unless the next() method of the LuceneIterator class is called. Any advice would be appreciated. I've appended some code below. Thanks, Ryan -------- Here are a few lines from SpellChecker.java showing how it uses LuceneDictionary's iterator: Iterator iter=dict.getWordsIterator(); while (iter.hasNext()) { String word=(String) iter.next(); ... } Below are the next() and hasNext() methods from LuceneDictionary.java public Object next() { if (!has_next_called) { hasNext(); } has_next_called = false; return (actualTerm != null) ? actualTerm.text() : null; } public boolean hasNext() { has_next_called = true; try { // if there is still words if (!termEnum.next()) { actualTerm = null; return false; } // if the next word are in the field actualTerm = termEnum.term(); String fieldt = actualTerm.field(); if (fieldt != field) { actualTerm = null; return false; } return true; } catch (IOException ex) { ex.printStackTrace(); return false; } }
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]