Re: Custom SliceExecutor and slices computation in IndexSearcher

Luca Cavanna Tue, 06 Jun 2023 00:51:08 -0700

Thanks Sorabh,
I will have a look in the coming days.

On Tue, Jun 6, 2023 at 12:04 AM SorabhApache <sor...@apache.org> wrote:


> Hi Luca,
> I looked into moving the slice computation to SliceExecutor and using that
> in the default case as well. This way the package private constructor with
> SliceExecutor can be exposed and utilized by different extensions to
> customize the slice computation and execution as well. I have created a
> GitHub issue[1] and PR[2] to share the changes and will look forward to the
> feedback.
>
> [1]: https://github.com/apache/lucene/issues/12347
> [2]: https://github.com/apache/lucene/pull/12348
>
> Thanks,
> Sorabh
>
> On Sat, May 27, 2023 at 1:09 AM SorabhApache <sor...@apache.org> wrote:
>
>> Hi Luca,
>> Thanks for the suggestion. Let me explore more on it and I will get back.
>> But I agree making the slices non-final will require consumers to ensure
>> slices are not mutated after the construction time and is not preferred.
>>
>> Thanks,
>> Sorabh
>>
>>
>>
>> On Tue, May 23, 2023 at 3:14 AM Luca Cavanna <java...@apache.org> wrote:
>>
>>> Hi Sorabh,
>>> thanks for explaining. I see what you mean, it does get awkward to
>>> customize how slices are created. If the plan is to make SliceExecutor
>>> public and extensible, would it make sense to figure out what its public
>>> methods should be, and include the slice creation in there so it is
>>> detached from the IndexSearcher? I think that in practice there is a
>>> correlation between how slices are created and how they are executed (e.g.
>>> you may want to limit the number of slices created and execute each one on
>>> a separate thread, or possibly have more slices than threads, and reuse the
>>> same thread to execute multiple slices). Moving the slice creation to the
>>> component responsible for their execution would give the desired
>>> flexibility for users to customize their logic without having to poke with
>>> IndexSearcher constructors? What do you think? I personally prefer this
>>> option over adding a function argument to the existing searcher
>>> constructor, or making the slices non-final which affects the inter-segment
>>> concurrency design (slices should not be mutable?).
>>>
>>> Cheers
>>> Luca
>>>
>>> On Fri, May 19, 2023 at 9:02 AM SorabhApache <sor...@apache.org> wrote:
>>>
>>>> Hi Luca,
>>>> Thanks for your reply. Sharing an example below for clarity.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *public class CustomIndexSearcher extends IndexSearcher {
>>>>  public CustomIndexSearcher(IndexReader reader, Executor executor, int
>>>> maxSliceCount) {                super(reader, executor);         }
>>>>  @Override          protected LeafSlice[] slices(List<LeafReaderContext>
>>>> leaves) {*
>>>> *                 // cannot use maxSliceCount here to control custom
>>>> logic as this is called from constructor of super class*
>>>> *                 // I want to use parameter[s] in the constructor
>>>> input to control this slice computation*
>>>> *          }*
>>>> *}*
>>>>
>>>> Yes, the SliceExecutor class will become public. Will also need a
>>>> constructor in IndexSearcher which can take an implementation from the
>>>> extensions.
>>>>
>>>> Thanks,
>>>> Sorabh
>>>>
>>>>
>>>> On Thu, May 18, 2023 at 11:40 PM Luca Cavanna <l...@elastic.co.invalid>
>>>> wrote:
>>>>
>>>>> Hi Sorabh,
>>>>> You'll want to override the protected slices method to include your
>>>>> custom logic for creating the leaf slices. Your IndexSearcher extension 
>>>>> can
>>>>> also retrieve the slices through the getSlices public method. I don't
>>>>> understand what makes the additional constructor necessary, could you
>>>>> clarify that for me?
>>>>>
>>>>> One thing that may make sense to do is making the SliceExecutor
>>>>> extensible. Currently it is package private, and I can see how users may
>>>>> want to provide their own implementation when it comes to handling
>>>>> rejections, executing on the caller thread in certain scenarios. Possibly
>>>>> even the task creation, and the coordination of their execution could be
>>>>> moved to the SliceExecutor too.
>>>>>
>>>>> Cheers
>>>>> Luca
>>>>>
>>>>> On Fri, May 19, 2023, 03:27 SorabhApache <sor...@apache.org> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> For concurrent segment search, lucene uses the *slices* method to
>>>>>> compute the number of work units which can be processed concurrently.
>>>>>>
>>>>>> a) It calculates *slices* in the constructor of *IndexSearcher*
>>>>>> <https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L239>
>>>>>> with default thresholds for document count and segment counts.
>>>>>> b) Provides an implementation of *SliceExecutor* (i.e.
>>>>>> QueueSizeBasedExecutor)
>>>>>> <https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L1008>
>>>>>> based on executor type which applies the backpressure in concurrent
>>>>>> execution based on a limiting factor of 1.5 times the passed in 
>>>>>> threadpool
>>>>>> maxPoolSize.
>>>>>>
>>>>>> In OpenSearch, we have a search threadpool which serves the search
>>>>>> request to all the lucene indices (or OpenSearch shards) assigned to a
>>>>>> node. Each node can get the requests to some or all the indices on that
>>>>>> node.
>>>>>> I am exploring a mechanism such that I can dynamically control the
>>>>>> max slices for each lucene index search request. For example: search
>>>>>> requests to some indices on that node to have max 4 slices each and 
>>>>>> others
>>>>>> to have 2 slices each. Then the threadpool shared to execute these slices
>>>>>> does not have any limiting factor. In this model the top level search
>>>>>> threadpool will limit the number of active search requests which will 
>>>>>> limit
>>>>>> the number of work units in the SliceExecutor threadpool.
>>>>>>
>>>>>> For this the derived implementation of IndexSearcher can get an input
>>>>>> value in the constructor to control the slice count computation. Even
>>>>>> though the slice method is protected it gets called from the constructor 
>>>>>> of
>>>>>> base IndexSearcher class which prevents the derived class from using the
>>>>>> passed in input.
>>>>>>
>>>>>> To achieve this I can think of the following ways (in order of
>>>>>> preference) and would like to submit a pull request for it. But I wanted 
>>>>>> to
>>>>>> get some feedback if option 1 looks fine or take some other approach.
>>>>>>
>>>>>> 1. Provide another constructor in IndexSearcher which takes in 4
>>>>>> input parameters:
>>>>>>   protected IndexSearcher(IndexReaderContext context, Executor
>>>>>> executor, SliceExecutor sliceExecutor, Function<List<LeafReaderContext>,
>>>>>> LeafSlice[]> sliceProvider)
>>>>>>
>>>>>> 2. Make the `leafSlices` member protected and non final. After it is
>>>>>> initialized by the IndexSearcher (using default mechanism in lucene), the
>>>>>> derived implementation can again update it if need be (like based on some
>>>>>> input parameter to its own constructor). Also make the constructor with
>>>>>> SliceExecutor input protected such that derived implementation can 
>>>>>> provide
>>>>>> its own implementation of SliceExecutor. This mechanism will have 
>>>>>> redundant
>>>>>> computation of leafSlices.
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Sorabh
>>>>>>
>>>>>

Re: Custom SliceExecutor and slices computation in IndexSearcher

Reply via email to