The Solr 7 switch to iterative API for Doc Values https://issues.apache.org/jira/browse/LUCENE-7407 meant a severe performance regression for Solr export and document retrieval with our web archive index, which is distinguished by having quite large segments (300M docs / 900GB) and using primarily doc values to hold field content.
Technically there is a working patch https://issues.apache.org/jira/browse/LUCENE-8374 but during discussion of performance measurements elsewhere https://github.com/mikemccand/luceneutil/issues/23 it came up that doc values are not intended for document retrieval and as such that Lucene should not be optimized towards that. >From my point of view, using doc values to build retrieval documents is quite natural: The data are there, so making a double representation by also making them stored seems a waste of space. If this is somehow a misuse of Doc Values, maybe I could be explained what the problem is or directed towards more information? - Toke Eskildsen, Royal Danish Library --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org