[ 
https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211921#comment-13211921
 ] 

Robert Muir commented on LUCENE-3807:
-------------------------------------

I like the patch, but only one thing (its fine to commit it as-is though, we 
can solve this on another issue, i just couldnt help but notice)

I don't think we should have the BufferedTermFreqIteratorWrapper/etc and the 
SortedTermFreqIterator marker interface needs to be fixed.

Here are the problems:
* Marker interface SortedTermFreqIterator doesn't tell you if its UTF-8 or 
UTF-16 order. Its implemented by two classes: SortedTermFreqIteratorWrapper,
which sorts in UTF-16 order, and HighFrequencyDictionary, which returns terms 
from the index (so UTF-8 order). The problem is that classes
that rely upon sorted order like JaSpell/TST are likely broken already. 
Fortunately FST/WFST always do their own sort.
* Buffering in RAM is not ideal. Instead I think all of these classes should be 
using our Sort anyway which can spill to disk.

For now could we put the BytesRefList in the suggest package since its only 
used there? we might not need it after we clean up
this sorting stuff in some future issue.

Also I don't think we should factor out the BytesRefIterator. I seriously think 
its a bad idea to tie our core index Terms enumeration API
with the spellcheck API at this time, it would make it hard to change in the 
future if we need, especially with spellcheck being... needing work :)

                
> Cleanup suggester API
> ---------------------
>
>                 Key: LUCENE-3807
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3807
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/other
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3807.patch
>
>
> Currently the suggester api and especially TermFreqIterator don't play that 
> nice with BytesRef and other paradigms we use in lucene, further the java 
> iterator pattern isn't that useful when it gets to work with TermsEnum, 
> BytesRef etc. We should try to clean up this api step by step moving over to 
> BytesRef including the Lookup class and its interface...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to