I think there are two issues here that are being conflated 1> _within_ a document, i.e. for a multi-valued field the values are stored as Dominik says as a SORTED_SET. Not only will they be returned (if you return from docValues rather than stored) in lexical order, but identical values will be collapsed
2> across multiple documents, the question about "...persisted with order of values, not document id..." really makes no sense. The point of DocValues is to answer the question "for document X what is the value of field Y". X here is the _internal_ document ID. Now consider a search. There are two documents that are hits, doc 35 and doc 198 (internal lucene doc ID). To sort them by field Y you have to know what the value in that field is for those two docs is. How would "pre-ordering" the values help here? If I have the _values_ in order, I have no clue what docs are associated with them. That question is what the "inverted index" is there to answer. So I have doc 35 and 198. Think of DocValues as a large array indexed by internal doc id. To know how these two docs sort all I have to do is index into the array. It's slightly more complicated than that, but conceptually that's what happens. Best, Erick On Mon, Mar 5, 2018 at 11:29 AM, Dominik Safaric <dominiksafa...@gmail.com> wrote: >> So, can doc values be persisted with order of values, not document id? This >> should be fast in sort scenario that the values are pre-ordered instead of >> scan/sort at runtime. > > > No, unfortunately doc values cannot be persisted in order. Lucene stores this > values internally as a DocValuesType.SORTED_SET, where the values are being > stored using for example Long.compareTo(). > > If you'd like to retrieve the values in insertion order, use stored instead > of doc values instead of. Then you might access the values in order using the > LeafReader's document function. However, beware that may induce performance > issues because it requires loading the document from disk. > > If you require to store and retrieve multiple numeric values per document in > order, you might consider using PointValues. PointValues are internally > indexed with KD-trees. But, beware that PointValues have a limited > dimensionality, in terms that you can for example store values in 8 > dimensions, each of max 16 bytes. > >> On 5 Mar 2018, at 15:33, Tony Ma <t...@opentext.com> wrote: >> >> Per my understanding, doc values (binary doc values / numeric doc values) >> are stored with sequence of document id. Sorted numeric doc values just >> means if a document has multiple values, the values will be sorted for same >> document, but for different documents, the value is still ordered by >> document id. Is that true? >> So, can doc values be persisted with order of values, not document id? This >> should be fast in sort scenario that the values are pre-ordered instead of >> scan/sort at runtime. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org