Toke:

I think part of it is locality. By that I mean two docValues fields in
the same document have no relation to each other in terms of their
location on disk. So _assuming_ all your DocValues can't be contained
in memory, you may be doing a bunch of disk seeks.

This as opposed to just storing the fields which implies one disk
seek/decompression for all fields for a given doc (assuming the 16K
block read/decompressed holds all the fields).

And maybe part of it is the notion of stuffing large text fields into
a DocValues field just to return it seems like abusing DV.

That said, the Streaming code uses DV fields exclusively and I got
200K rows/second returned without tuning a single thing which I doubt
you're going to get with stored fields!

So I think as usual, "it depends".
On Mon, Sep 24, 2018 at 10:25 AM Toke Eskildsen <t...@kb.dk> wrote:
>
> David Smiley <david.w.smi...@gmail.com> wrote:
> > I don't think it makes a difference if some people think docValues should
> > never be used for value-retrieval.  When that performance drop occurred
> > due to those changes, I'm sure it would have affected sorting & faceting
> > as well as value-retrieval. Some more than others perhaps.
>
> Yes. The iterative API is fine for relatively small jumps, so it works 
> perfectly for sorting on medium- to large result sets. Depending on the type 
> of faceting it's the same. Grouping and faceting on small result sets is 
> (probably) relatively affected, but as the amount of needed data is small in 
> those cases, the (assumed) impact is not that high.
>
> Retrieving documents is different as there are typically more fields involved 
> and the amount of documents itself is nearly always small, which means large 
> jumps repeated for all the fields.
>
> > I don't see any disagreement about improving docValues in the ways
> > you suggest.
>
> You are right about that. I apologize if I was being unclear: It is not the 
> concrete patch I am asking about, that's just how this started. I am asking 
> for background on why it is considered misuse to use Doc Values for document 
> retrieval.
>
> - Toke Eskildsen
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to