[ https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990123#comment-12990123 ]
Michael McCandless commented on LUCENE-2843: -------------------------------------------- bq. Thank you. I will use the FixedGap-version myself, but that only works when I'm the one controlling the index build, right? Right, but, this is fair? I mean, it's easy (in Lucene 4.0) to pick the appropriate codec per field. So, if people want to use your faceting package, and you explain that it requires using a certain Codec, that seems OK? {quote} As for the faceting system then the principle really simple: Instead of holding terms (BytesRefs) in memory, I just hold their ordinals. As the terms themselves only need to be resolved when the final faceting result is to be returned, seeking for a few hundred or thousand terms by their ordinal has worked very well so far (no guarantees for old hardware such as spinning disks though). {quote} OK that makes sense... impressive that seeking up to a few thousand terms is giving you good perf. You could also load DocTermsIndex in FieldCache, but of course then all terms data & ords are RAM resident (and the point of LUCENE-2369 is to have low memory overhead). > Add variable-gap terms index impl. > ---------------------------------- > > Key: LUCENE-2843 > URL: https://issues.apache.org/jira/browse/LUCENE-2843 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Reporter: Michael McCandless > Assignee: Michael McCandless > Fix For: 4.0 > > Attachments: LUCENE-2843.patch, LUCENE-2843.patch > > > PrefixCodedTermsReader/Writer (used by all "real" core codecs) already > supports pluggable terms index impls. > The only impl we have now is FixedGapTermsIndexReader/Writer, which > picks every Nth (default 32) term and holds it in efficient packed > int/byte arrays in RAM. This is already an enormous improvement (RAM > reduction, init time) over 3.x. > This patch adds another impl, VariableGapTermsIndexReader/Writer, > which lets you specify an arbitrary IndexTermSelector to pick which > terms are indexed, and then uses an FST to hold the indexed terms. > This is typically even more memory efficient than packed int/byte > arrays, though, it does not support ord() so it's not quite a fair > comparison. > I had to relax the terms index plugin api for > PrefixCodedTermsReader/Writer to not assume that the terms index impl > supports ord. > I also did some cleanup of the FST/FSTEnum APIs and impls, and broke > out separate seekCeil and seekFloor in FSTEnum. Eg we need seekFloor > when the FST is used as a terms index but seekCeil when it's holding > all terms in the index (ie which SimpleText uses FSTs for). -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org