DocValues, retrieval performance and policy

Toke Eskildsen Mon, 24 Sep 2018 05:40:31 -0700

The Solr 7 switch to iterative API for Doc Values
https://issues.apache.org/jira/browse/LUCENE-7407
meant a severe performance regression for Solr export and document
retrieval with our web archive index, which is distinguished by having
quite large segments (300M docs / 900GB) and using primarily doc values
 to hold field content.


Technically there is a working patch
https://issues.apache.org/jira/browse/LUCENE-8374
but during discussion of performance measurements elsewhere
https://github.com/mikemccand/luceneutil/issues/23
it came up that doc values are not intended for document retrieval and
as such that Lucene should not be optimized towards that.


>From my point of view, using doc values to build retrieval documents is
quite natural: The data are there, so making a double representation by
also making them stored seems a waste of space.

If this is somehow a misuse of Doc Values, maybe I could be explained
what the problem is or directed towards more information?

- Toke Eskildsen, Royal Danish Library


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

DocValues, retrieval performance and policy

Reply via email to