I'd +1 a change that makes setDiagnostics public. Longer term I wonder if we should have a more locked down API that _only_ allows setting diagnostics. There are lots of things in SegmentCommitInfo that merges should never override like the segment ID, and I can't think of anything else than diagnostics that a merge policy should ever need to set.
On Fri, Sep 24, 2021 at 10:57 AM Chris Hegarty <[email protected]> wrote: > In an effort to prepare Elasticsearch for modularization, we are > investigating and eliminating split packages. The situation has improved > through recent refactoring in Lucene 9.0 [1], but a number of split > packages still remain. This message identifies one such so that it can > be discussed in isolation, with a view to a potential solution either in > Lucene or possibly within Elasticsearch itself. > > Elasticsearch has a custom merge policy, ShuffleForcedMergePolicy, that > interleaves eldest and newest segments. SFMP has but a single > package-private dependency on lucene, namely `SegmentInfo::setDiagnostics`. > The `setDiagnostics` method is used by SFMP to add a single additional > ES-specific diagnostic key. > > It is possible to recreate the whole SegmentCommitInfo (and the SI > within, along with the additional ES-specific diagnostic key), by using > just the accessible parts of the lucene API. This has been prototyped[2], > but shows that something is not quite right [3]. Resolving this issue is > likely better done in lucene. > > The comment in `MergePolicy::setMergeInfo` - "Sets the SegmentCommitInfo > of the merged segment. Allows sub-classes to e.g. set diagnostics > properties"[4] - is telling. To better support the use-cases referred to > in this comment and to allow for other-package subclasses (which seems > completely reasonable given other parts of the API), it would be better > for lucene's SI::setDiagnostics method to be public (rather than > package-private). Such a change is a general improvement in the lucene > API, which can be leveraged by any custom merge policy. > > -Chris. > > [1] https://issues.apache.org/jira/browse/LUCENE-9319 > [2] https://github.com/elastic/elasticsearch/pull/78253 > [3] > https://github.com/elastic/elasticsearch/pull/78253#issuecomment-925861893 > [4] > https://github.com/apache/lucene/blob/d5d6dc079395c47cd6d12dcce3bcfdd2c7d9dc63/lucene/core/src/java/org/apache/lucene/index/MergePolicy.java#L271 > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > -- Adrien
