Bryan Beaudreault created HBASE-27666: -----------------------------------------
Summary: Allow preCompact hooks to return scanners whose cells can be shipped Key: HBASE-27666 URL: https://issues.apache.org/jira/browse/HBASE-27666 Project: HBase Issue Type: Improvement Reporter: Bryan Beaudreault In [https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java#L492,] when the compaction scanner extends KeyValueScanner, we call shipped() periodically. This releases shared buffers which makes compactions less memory hungry. When a preCompact hook returns a custom scanner, that scanner is unlikely to extend KeyValueScanner. So installing a preCompact hook can cause a regression in terms of memory pressure. KeyValueScanner is a large interface with many methods, but the method being called (shipped()) is actually on the Shipper interface (which KeyValueScanner extends). It would be nice to allow coprocessors to more easily integrate here. I can think of two options: # Simply change the cast on [line 436|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java#L436] from KeyValueScanner to Shipper. A coprocessor can easily return an InternalScanner impl which also extends Shipper since it's just one method. A common pattern will likely be to delegate to the underlying real scanner. # Add a new DelegateInternalScanner which coprocessors can return. In that case, call scanner.getDelegate() and check if _that_ extends KeyValueScanner. I prefer option 1 as it's very simple and makes it really easy to reason about – a coprocessor may not _want_ to release cells, in which case they don't need to extend Shipper. The downside of this is we'd have to promote Shipper interface to IA.LimitedPrivate(COPROC) Option 2 could open us to other possibilities in the future, and could be configurable. Like maybe it has a scanner.supportsShipped() or something. If true, the underlying scanner's shipped method will be called. If not, it won't. Barring other opinions, I will go with option 1 as it's trivial. cc [~anoopsamjohn] since you were involved in the linked change and the creation of Shipper interface. In the linked code, we don't use any other method on KeyValueScanner interface so it seems totally fine to use Shipper instead. -- This message was sent by Atlassian Jira (v8.20.10#820010)