Op Tuesday 11 November 2008 21:55:45 schreef Michael McCandless:
> Also, one nice optimization we could do with the "term number column-
> stride array" is do bit packing (borrowing from the PFOR code)
> dynamically.
>
> Ie since we know there are X unique terms in this segment, when
> populating the array that maps docID to term number we could use
> exactly the right number of bits.  Enumerated fields with not many
> unique values (eg, country, state) would take relatively little RAM.
> With LUCENE-1231, where the fields are stored column stride on disk,
> we could do this packing during index such that loading at search
> time is very fast.

Perhaps we'd better continue this at LUCENE-1231 or LUCENE-1410.
I think what you're referring to is PDICT, which has frame exceptions
for values that occur infrequently.

Regards,
Paul Elschot


>
> Mike
>
> Paul Elschot wrote:
> > Op Tuesday 11 November 2008 11:29:27 schreef Michael McCandless:
> >> The other part of your proposal was to somehow "number" term text
> >> such that term range comparisons can be implemented fast int
> >> comparison.
> >
> > ...
> >
> >>   http://fontoura.org/papers/paramsearch.pdf
> >>
> >> However that'd be quite a bit deeper change to Lucene.
> >
> > The cheap version is hierarchical prefixing here:
> >
> > http://wiki.apache.org/jakarta-lucene/DateRangeQueries
> >
> > Regards,
> > Paul Elschot
> >
> > -------------------------------------------------------------------
> >-- To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to