We did this monotonic detection/compression before in older times, but
had to remove it because it caused too many slowdowns.

I think it easily causes too much type pollution, for example, for a
typical large index with unsorted docvalues field, big segments aren't
won't be sorted, tiny segments with a few values might happen to be
sorted (depending on chance/luck), tiny tiny ones with e.g. a single
document are sorted. Now we have a mix of monotonic and non-monotonic
over the same field.

On the other hand, optimization is very fragile and rare: even for
these log users actually sorting on that field at index-time, it will
just apply to one field out of the somehow typical dozens/hundreds
that they like to have. But may destroy performance of all the other
fields and overall causes more harm than good.

On Tue, Jun 15, 2021 at 5:49 AM LuXugang <[email protected]> wrote:
>
> Hi,
>
> In class Lucene80DocValuesConsumer#writeValues(FieldInfo field, 
> DocValuesProducer valuesProducer), all numericDocValues will be visited to 
> calculate gcd, in the meantime,  we can check if all values were sorted. if 
> so, maybe we could use DirectMonotonicWriter to store them.  
> DirectMonotonicWriter can get impressive compression.
>
> In addition, when i use Elasticsearch to store numeric field types, in Lucene 
> level,  the data always at least stored by 
> NumericDocValues/SortedNumericDocValues. So when indexing some sorted values 
> like ID, TIMESTAMP, maybe the upon optimization is applicable.
>
> Could I have some suggestions?
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to