[
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13716577#comment-13716577
]
Michael McCandless commented on LUCENE-3069:
--------------------------------------------
Patch looks great! Wonderful how you were able to share some code in
BaseTermsEnum...
It looks like you impl'd seekCeil in general for the IntersectEnum? Wild :)
You should not need to .getPosition / .setPosition on the fstReader:
the FST APIs do this under-the-hood.
bq. currently, CompiledAutomaton provides a commonSuffixRef, but how can we
make use of it in FST?
I think we can't really make use of it, which is fine (it's an
optional optimization).
{quote}
when FST is large enough, the next() operation will takes much time
doing the linear arc read, maybe we should make use of
CompiledAutomaton.sortedTransition[] when leaving arcs are heavy.
{quote}
Interesting ... you mean e.g. if the Automaton is very restrictive
compared to the FST, then we can do a binary search. But this can
only be done if that FST node's arcs are array'd right?
Separately, supporting ord w/ FST terms dict should in theory be not
so hard; you'd need to use getByOutput to seek by ord. Maybe (later,
eventually) we can make this a write-time option. We should open a
separate issue ...
> Lucene should have an entirely memory resident term dictionary
> --------------------------------------------------------------
>
> Key: LUCENE-3069
> URL: https://issues.apache.org/jira/browse/LUCENE-3069
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/index, core/search
> Affects Versions: 4.0-ALPHA
> Reporter: Simon Willnauer
> Assignee: Han Jiang
> Labels: gsoc2013
> Fix For: 4.4
>
> Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch,
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch
>
>
> FST based TermDictionary has been a great improvement yet it still uses a
> delta codec file for scanning to terms. Some environments have enough memory
> available to keep the entire FST based term dict in memory. We should add a
> TermDictionary implementation that encodes all needed information for each
> term into the FST (custom fst.Output) and builds a FST from the entire term
> not just the delta.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]