Tony Xu created LUCENE-8878:
-------------------------------

             Summary: Provide alternative sorting utility from SortField other 
than FieldComparator
                 Key: LUCENE-8878
                 URL: https://issues.apache.org/jira/browse/LUCENE-8878
             Project: Lucene - Core
          Issue Type: Improvement
          Components: core/search
    Affects Versions: 8.1.1
            Reporter: Tony Xu


The `FieldComparator` has many responsibilities and users get all of them at 
once. At high level the main functionalities of `FieldComparator` are
* Manage LeafFieldComparator
* Allocate storage for requested number of hits
* Read the values from DocValues/Custom source etc.
* Compare two values 

There are two major areas for improvement
# 1. The logic of reading values and storing them are coupled.
# 2. From `FieldComparator`'s API, one can't reason about thread-safety so it 
is not suitable for concurrent search. 
E.g. Can two concurrent thread use the same `FieldComparator` to call 
`getLeafComparator` for two different segments they are working on? In fact, 
almost all existing implementations of `FieldComparator` are not thread-safe.


The proposal is to enhance `SortField` with two APIs
#1. int compare(Object v1, Object v2) -- this is to compare two values from 
different docs for this field
#2. ValueAccessor newValueAccessor(LeafReaderContext leaf) -- This encapsulate 
the logic for obtaining the right implementation in order to read the field 
values.
`ValueAccessor` should be accessed in a similar way as `DocValues` to provide 
the sort value for a document in an advance & read fashion.


With this API, hopefully we can reduce the memory usage when using 
`FieldComparator` because the users either store the sort values or at least 
the slot number besides the storage allocated by `FieldComparator` itself. 
Ideally, only once copy of the values should be stored.

The proposed API is also more friendly to concurrent search since it provides 
the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared if 
there are more than one thread working on the same leaf, at least they can 
initialize their own `ValueAccessor`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to