I think there are two issues here that are being conflated
1> _within_ a document, i.e. for a multi-valued field the values are
stored as Dominik says as a SORTED_SET. Not only will they be returned
(if you return from docValues rather than stored) in lexical order,
but identical values will be collapsed

2> across multiple documents, the question about  "...persisted with
order of values, not document id..." really makes no sense. The point
of DocValues is to answer the question "for document X what is the
value of field Y". X here is the _internal_ document ID. Now consider
a search. There are two documents that are hits, doc 35 and doc 198
(internal lucene doc ID). To sort them by field Y you have to know
what the value in that field is for those two docs is. How would
"pre-ordering" the values help here? If I have the _values_ in order,
I have no clue what docs are associated with them. That question is
what the "inverted index" is there to answer.

So I have doc 35 and 198. Think of DocValues as a large array indexed
by internal doc id. To know how these two docs sort all I have to do
is index into the array. It's slightly more complicated than that, but
conceptually that's what happens.

Best,
Erick

On Mon, Mar 5, 2018 at 11:29 AM, Dominik Safaric
<dominiksafa...@gmail.com> wrote:
>> So, can doc values be persisted with order of values, not document id? This 
>> should be fast in sort scenario that the values are pre-ordered instead of 
>> scan/sort at runtime.
>
>
> No, unfortunately doc values cannot be persisted in order. Lucene stores this 
> values internally as a DocValuesType.SORTED_SET, where the values are being 
> stored using for example Long.compareTo().
>
> If you'd like to retrieve the values in insertion order, use stored instead 
> of doc values instead of. Then you might access the values in order using the 
> LeafReader's document function. However, beware that may induce performance 
> issues because it requires loading the document from disk.
>
> If you require to store and retrieve multiple numeric values per document in 
> order, you might consider using PointValues. PointValues are internally 
> indexed with KD-trees. But, beware that PointValues have a limited 
> dimensionality, in terms that you can for example store values in 8 
> dimensions, each of max 16 bytes.
>
>> On 5 Mar 2018, at 15:33, Tony Ma <t...@opentext.com> wrote:
>>
>> Per my understanding, doc values (binary doc values / numeric doc values) 
>> are stored with sequence of document id. Sorted numeric doc values just 
>> means if a document has multiple values, the values will be sorted for same 
>> document, but for different documents, the value is still ordered by 
>> document id. Is that true?
>> So, can doc values be persisted with order of values, not document id? This 
>> should be fast in sort scenario that the values are pre-ordered instead of 
>> scan/sort at runtime.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to