Op Tuesday 11 November 2008 21:55:45 schreef Michael McCandless: > Also, one nice optimization we could do with the "term number column- > stride array" is do bit packing (borrowing from the PFOR code) > dynamically. > > Ie since we know there are X unique terms in this segment, when > populating the array that maps docID to term number we could use > exactly the right number of bits. Enumerated fields with not many > unique values (eg, country, state) would take relatively little RAM. > With LUCENE-1231, where the fields are stored column stride on disk, > we could do this packing during index such that loading at search > time is very fast.
Perhaps we'd better continue this at LUCENE-1231 or LUCENE-1410. I think what you're referring to is PDICT, which has frame exceptions for values that occur infrequently. Regards, Paul Elschot > > Mike > > Paul Elschot wrote: > > Op Tuesday 11 November 2008 11:29:27 schreef Michael McCandless: > >> The other part of your proposal was to somehow "number" term text > >> such that term range comparisons can be implemented fast int > >> comparison. > > > > ... > > > >> http://fontoura.org/papers/paramsearch.pdf > >> > >> However that'd be quite a bit deeper change to Lucene. > > > > The cheap version is hierarchical prefixing here: > > > > http://wiki.apache.org/jakarta-lucene/DateRangeQueries > > > > Regards, > > Paul Elschot > > > > ------------------------------------------------------------------- > >-- To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]