Thanks Sorabh, I will have a look in the coming days. On Tue, Jun 6, 2023 at 12:04 AM SorabhApache <sor...@apache.org> wrote:
> Hi Luca, > I looked into moving the slice computation to SliceExecutor and using that > in the default case as well. This way the package private constructor with > SliceExecutor can be exposed and utilized by different extensions to > customize the slice computation and execution as well. I have created a > GitHub issue[1] and PR[2] to share the changes and will look forward to the > feedback. > > [1]: https://github.com/apache/lucene/issues/12347 > [2]: https://github.com/apache/lucene/pull/12348 > > Thanks, > Sorabh > > On Sat, May 27, 2023 at 1:09 AM SorabhApache <sor...@apache.org> wrote: > >> Hi Luca, >> Thanks for the suggestion. Let me explore more on it and I will get back. >> But I agree making the slices non-final will require consumers to ensure >> slices are not mutated after the construction time and is not preferred. >> >> Thanks, >> Sorabh >> >> >> >> On Tue, May 23, 2023 at 3:14 AM Luca Cavanna <java...@apache.org> wrote: >> >>> Hi Sorabh, >>> thanks for explaining. I see what you mean, it does get awkward to >>> customize how slices are created. If the plan is to make SliceExecutor >>> public and extensible, would it make sense to figure out what its public >>> methods should be, and include the slice creation in there so it is >>> detached from the IndexSearcher? I think that in practice there is a >>> correlation between how slices are created and how they are executed (e.g. >>> you may want to limit the number of slices created and execute each one on >>> a separate thread, or possibly have more slices than threads, and reuse the >>> same thread to execute multiple slices). Moving the slice creation to the >>> component responsible for their execution would give the desired >>> flexibility for users to customize their logic without having to poke with >>> IndexSearcher constructors? What do you think? I personally prefer this >>> option over adding a function argument to the existing searcher >>> constructor, or making the slices non-final which affects the inter-segment >>> concurrency design (slices should not be mutable?). >>> >>> Cheers >>> Luca >>> >>> On Fri, May 19, 2023 at 9:02 AM SorabhApache <sor...@apache.org> wrote: >>> >>>> Hi Luca, >>>> Thanks for your reply. Sharing an example below for clarity. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> *public class CustomIndexSearcher extends IndexSearcher { >>>> public CustomIndexSearcher(IndexReader reader, Executor executor, int >>>> maxSliceCount) { super(reader, executor); } >>>> @Override protected LeafSlice[] slices(List<LeafReaderContext> >>>> leaves) {* >>>> * // cannot use maxSliceCount here to control custom >>>> logic as this is called from constructor of super class* >>>> * // I want to use parameter[s] in the constructor >>>> input to control this slice computation* >>>> * }* >>>> *}* >>>> >>>> Yes, the SliceExecutor class will become public. Will also need a >>>> constructor in IndexSearcher which can take an implementation from the >>>> extensions. >>>> >>>> Thanks, >>>> Sorabh >>>> >>>> >>>> On Thu, May 18, 2023 at 11:40 PM Luca Cavanna <l...@elastic.co.invalid> >>>> wrote: >>>> >>>>> Hi Sorabh, >>>>> You'll want to override the protected slices method to include your >>>>> custom logic for creating the leaf slices. Your IndexSearcher extension >>>>> can >>>>> also retrieve the slices through the getSlices public method. I don't >>>>> understand what makes the additional constructor necessary, could you >>>>> clarify that for me? >>>>> >>>>> One thing that may make sense to do is making the SliceExecutor >>>>> extensible. Currently it is package private, and I can see how users may >>>>> want to provide their own implementation when it comes to handling >>>>> rejections, executing on the caller thread in certain scenarios. Possibly >>>>> even the task creation, and the coordination of their execution could be >>>>> moved to the SliceExecutor too. >>>>> >>>>> Cheers >>>>> Luca >>>>> >>>>> On Fri, May 19, 2023, 03:27 SorabhApache <sor...@apache.org> wrote: >>>>> >>>>>> Hi All, >>>>>> >>>>>> For concurrent segment search, lucene uses the *slices* method to >>>>>> compute the number of work units which can be processed concurrently. >>>>>> >>>>>> a) It calculates *slices* in the constructor of *IndexSearcher* >>>>>> <https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L239> >>>>>> with default thresholds for document count and segment counts. >>>>>> b) Provides an implementation of *SliceExecutor* (i.e. >>>>>> QueueSizeBasedExecutor) >>>>>> <https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L1008> >>>>>> based on executor type which applies the backpressure in concurrent >>>>>> execution based on a limiting factor of 1.5 times the passed in >>>>>> threadpool >>>>>> maxPoolSize. >>>>>> >>>>>> In OpenSearch, we have a search threadpool which serves the search >>>>>> request to all the lucene indices (or OpenSearch shards) assigned to a >>>>>> node. Each node can get the requests to some or all the indices on that >>>>>> node. >>>>>> I am exploring a mechanism such that I can dynamically control the >>>>>> max slices for each lucene index search request. For example: search >>>>>> requests to some indices on that node to have max 4 slices each and >>>>>> others >>>>>> to have 2 slices each. Then the threadpool shared to execute these slices >>>>>> does not have any limiting factor. In this model the top level search >>>>>> threadpool will limit the number of active search requests which will >>>>>> limit >>>>>> the number of work units in the SliceExecutor threadpool. >>>>>> >>>>>> For this the derived implementation of IndexSearcher can get an input >>>>>> value in the constructor to control the slice count computation. Even >>>>>> though the slice method is protected it gets called from the constructor >>>>>> of >>>>>> base IndexSearcher class which prevents the derived class from using the >>>>>> passed in input. >>>>>> >>>>>> To achieve this I can think of the following ways (in order of >>>>>> preference) and would like to submit a pull request for it. But I wanted >>>>>> to >>>>>> get some feedback if option 1 looks fine or take some other approach. >>>>>> >>>>>> 1. Provide another constructor in IndexSearcher which takes in 4 >>>>>> input parameters: >>>>>> protected IndexSearcher(IndexReaderContext context, Executor >>>>>> executor, SliceExecutor sliceExecutor, Function<List<LeafReaderContext>, >>>>>> LeafSlice[]> sliceProvider) >>>>>> >>>>>> 2. Make the `leafSlices` member protected and non final. After it is >>>>>> initialized by the IndexSearcher (using default mechanism in lucene), the >>>>>> derived implementation can again update it if need be (like based on some >>>>>> input parameter to its own constructor). Also make the constructor with >>>>>> SliceExecutor input protected such that derived implementation can >>>>>> provide >>>>>> its own implementation of SliceExecutor. This mechanism will have >>>>>> redundant >>>>>> computation of leafSlices. >>>>>> >>>>>> >>>>>> Thanks, >>>>>> Sorabh >>>>>> >>>>>