On Mon, Nov 15, 2021 at 1:14 PM Robert Muir <rcm...@gmail.com> wrote:

On Mon, Nov 15, 2021 at 12:57 PM Michael McCandless
> <luc...@mikemccandless.com> wrote:
> >
> > I think for PR 420 (https://github.com/apache/lucene/pull/420) we are
> (confusingly!) not really seeing performance benefits -- taxonomy index got
> a bit bigger, and loading the parent arrays no faster?  So Patrick closed
> that one.
>
> I'm confused about this (Sorry I am not up to speed), but are we not
> able to offload today's very large arrays to docvalues (e.g. mmap)
> with the change? Wasn't that the original motivation, that the memory
> usage was somewhat trappy? I wouldn't expect to see performance
> benefits over today's on-heap arrays that are read from payloads or
> whatever, instead it would be a memory benefit?
>

Yeah I love that idea, but that's not what Patrick's PR explored (yet?).

His explored switching away from custom token positions to NumericDocValues
to store the same data (ordinal -> parent mapping), but it still loaded all
of those into massive heap-resident int[].

I agree it would be awesome to try avoiding those big int[] and reading
live from NumericDocValues during faceting!  It would require some re-work
of the facetting code to e.g. sort the ordinals to (efficiently) visiting
them in forward iterator-friendly order.

But that is a different change and probably we should not hold 9.0 for it?

Mike McCandless

http://blog.mikemccandless.com

Reply via email to