Bryan Beaudreault created HBASE-27666:
-----------------------------------------

             Summary: Allow preCompact hooks to return scanners whose cells can 
be shipped
                 Key: HBASE-27666
                 URL: https://issues.apache.org/jira/browse/HBASE-27666
             Project: HBase
          Issue Type: Improvement
            Reporter: Bryan Beaudreault


In 
[https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java#L492,]
 when the compaction scanner extends KeyValueScanner, we call shipped() 
periodically. This releases shared buffers which makes compactions less memory 
hungry.

When a preCompact hook returns a custom scanner, that scanner is unlikely to 
extend KeyValueScanner. So installing a preCompact hook can cause a regression 
in terms of memory pressure.

KeyValueScanner is a large interface with many methods, but the method being 
called (shipped()) is actually on the Shipper interface (which KeyValueScanner 
extends). It would be nice to allow coprocessors to more easily integrate here.

I can think of two options:
 # Simply change the cast on [line 
436|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java#L436]
 from KeyValueScanner to Shipper. A coprocessor can easily return an 
InternalScanner impl which also extends Shipper since it's just one method. A 
common pattern will likely be to delegate to the underlying real scanner.
 # Add a new DelegateInternalScanner which coprocessors can return. In that 
case, call scanner.getDelegate() and check if _that_ extends KeyValueScanner.

I prefer option 1 as it's very simple and makes it really easy to reason about 
– a coprocessor may not _want_ to release cells, in which case they don't need 
to extend Shipper. The downside of this is we'd have to promote Shipper 
interface to IA.LimitedPrivate(COPROC)

Option 2 could open us to other possibilities in the future, and could be 
configurable. Like maybe it has a scanner.supportsShipped() or something. If 
true, the underlying scanner's shipped method will be called. If not, it won't.

Barring other opinions, I will go with option 1 as it's trivial.

cc [~anoopsamjohn] since you were involved in the linked change and the 
creation of Shipper interface. In the linked code, we don't use any other 
method on KeyValueScanner interface so it seems totally fine to use Shipper 
instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to