Hi Chris, I looked into this and Elasticsearch seems to only need access to the rate limiter for logging purposes, without adding any information that Lucene doesn't have. Maybe another option would consist of moving the logging to Lucene? Having information in the IndexWriter's InfoStream about rate limiting for each completed merge sounds like something that would generally be useful.
On Wed, Sep 22, 2021 at 9:03 AM Chris Hegarty <christopher.hega...@elastic.co.invalid> wrote: > In an effort to prepare Elasticsearch for modularization, we are > investigating and eliminating split packages. The situation has improved > through recent refactoring in Lucene 9.0, but a number of split > packages still remain. This message identifies one such, so that it can > be discussed in isolation, with a view to a potential solution either in > Lucene or possibly within Elasticsearch itself. > > Elasticsearch defines an `ElasticsearchConcurrentMergeScheduler` [1] > (subclass of lucene's `ConcurrentMergeScheduler`), that provides > tracking of merge times for all and current merges. It does so by > extracting the relevant information from the `MergeThread.rateLimiter` > to report, a) the total-bytes-written, via > `MergeRateLimiter::getTotalBytesWritten`, and b) the MB-per-second > throttle, via `MergeRateLimiter::getMBPerSec`. Currently access to the > package-private `rateLimiter` is gained by effectively (but not > literally) injecting into a lucene package [2]. > > The elasticsearch ECMS overrides the factory for creating merge threads, > so is in control of the thread creation, but still cannot gain access to > the merge tracking information. > > There are several ways that we could resolve this, e.g. > > 1) In Lucene, declare `MergeThread.rateLimiter` as a public field. > Elasticsearch can access the rate limiter directly. > > 2) Same as #1, but instead declare `MergeThread.rateLimiter` as > protected field. Elasticsearch can then subclass MergeThread and > access the rate limiter. > > 3) In Lucene, provide an overloaded MergeThread constructor that accepts > a `MergeRateLimiter`. Elasticsearch can then create the merge rate > limiter itself and pass it to the merge thread on construction. This > would work, but would require ES to retain a map from thread to rate > limiter - not ideal. > > 4) Add public accessors directly to MergeThread to expose the necessary > information, e.g. > class MergeThread { > ... > /** Returns the current mb per second rate limit. */ > public double getMBPerSec() { return rateLimiter.getMBPerSec(); } > > /** Returns the total bytes written by this merge. */ > public long getTotalBytesWritten() { return > rateLimiter.getTotalBytesWritten(); } > } > > I'm sure there are probably a few other ideas that are not listed above, > but so far no.2 above seems the least intrusive. No.4 is also reasonable > if we consider this functionality more broadly desirable. > > -Chris. > > [1] > https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/index/engine/ElasticsearchConcurrentMergeScheduler.java#L39 > [2] > https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/apache/lucene/index/OneMergeHelper.java#L16 > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > -- Adrien