[jira] Commented: (LUCENE-763) LuceneDictionary skips first word in enumeration

Steven Parkes (JIRA) Wed, 03 Jan 2007 11:50:51 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462042
 ]


Steven Parkes commented on LUCENE-763:
--------------------------------------

I was wondering about something very similar just recently: to call 
TermEnum.next() or not to call TermEnum.next() to get the first term. However, 
in my case I use terms() rather than terms( Term ) and there's the rub.

After looking through things, there looks to be an inconsistency between the 
two cases. terms( Term ) seeks such that the new TermEnum object is ready. On 
the other hand, terms() leaves the enum state "before" the first term: you need 
to call next() first and calling term() earlier will return null.

I've only tried this against SegmentReader#terms(...).

This difference of behaviour isn't mentioned in the documentation.

It would seem like it would be nice to have the same behaviour between the two 
calls but I'm a little worried that half the existing code would break. Should 
we just document the existing behaviour?

In that case, the spell checker does just need to get rid of the extra next() 
call.

While investigating, I noticed there are several other issues around the spell 
checker now, both the functional code and test code. It plays a bit fast and 
loose with when index readers and writers are opened. Perhaps it used to work, 
depending on when things got flushed to disk, but it doesn't work for me now 
under the trunk.

> LuceneDictionary skips first word in enumeration
> ------------------------------------------------
>
>                 Key: LUCENE-763
>                 URL: https://issues.apache.org/jira/browse/LUCENE-763
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Other
>    Affects Versions: 2.0.0
>         Environment: Windows Sun JRE 1.4.2_10_b03
>            Reporter: Dan Ertman
>
> The current code for LuceneDictionary will always skip the first word of the 
> TermEnum. The reason is that it doesn't initially retrieve TermEnum.term - 
> its first call is to TermEnum.next, which moves it past the first term (line 
> 76).
> To see this problem cause a failure, add this test to TestSpellChecker:
> similar = spellChecker.suggestSimilar("eihgt",2);
>       assertEquals(1, similar.length);
>       assertEquals(similar[0], "eight");
> Because "eight" is the first word in the index, it will fail.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-763) LuceneDictionary skips first word in enumeration

Reply via email to