In an effort to prepare Elasticsearch for modularization, we are
investigating and eliminating split packages. The situation has improved
through recent refactoring in Lucene 9.0, but a number of split
packages still remain. This message identifies one such, so that it can
be discussed in isolation, with a view to a potential solution either in
Lucene or possibly within Elasticsearch itself.

Elasticsearch defines an `ElasticsearchConcurrentMergeScheduler` [1]
(subclass of lucene's `ConcurrentMergeScheduler`), that provides
tracking of merge times for all and current merges. It does so by
extracting the relevant information from the `MergeThread.rateLimiter`
to report, a) the total-bytes-written, via
`MergeRateLimiter::getTotalBytesWritten`, and b) the MB-per-second
throttle, via `MergeRateLimiter::getMBPerSec`. Currently access to the
package-private `rateLimiter` is gained by effectively (but not
literally) injecting into a lucene package [2].

The elasticsearch ECMS overrides the factory for creating merge threads,
so is in control of the thread creation, but still cannot gain access to
the merge tracking information.

There are several ways that we could resolve this, e.g.

1) In Lucene, declare `MergeThread.rateLimiter` as a public field.
   Elasticsearch can access the rate limiter directly.

2) Same as #1, but instead declare `MergeThread.rateLimiter` as
   protected field. Elasticsearch can then subclass MergeThread and
   access the rate limiter.

3) In Lucene, provide an overloaded MergeThread constructor that accepts
   a `MergeRateLimiter`. Elasticsearch can then create the merge rate
   limiter itself and pass it to the merge thread on construction. This
   would work, but would require ES to retain a map from thread to rate
   limiter - not ideal.

4) Add public accessors directly to MergeThread to expose the necessary
   information, e.g.
     class MergeThread {
       ...
       /** Returns the current mb per second rate limit. */
      public double getMBPerSec() { return rateLimiter.getMBPerSec(); }

      /** Returns the total bytes written by this merge. */
      public long getTotalBytesWritten() { return 
rateLimiter.getTotalBytesWritten(); }   
    }

I'm sure there are probably a few other ideas that are not listed above,
but so far no.2 above seems the least intrusive. No.4 is also reasonable
if we consider this functionality more broadly desirable.

-Chris.

[1] 
https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/index/engine/ElasticsearchConcurrentMergeScheduler.java#L39
[2] 
https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/apache/lucene/index/OneMergeHelper.java#L16


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to