Ishan Chhabra created HBASE-11558:
-------------------------------------

             Summary: Caching set on Scan object gets lost when using 
TableMapReduceUtil in 0.95+
                 Key: HBASE-11558
                 URL: https://issues.apache.org/jira/browse/HBASE-11558
             Project: HBase
          Issue Type: Bug
          Components: Scanners
    Affects Versions: 0.95.0
            Reporter: Ishan Chhabra


0.94 and before, if one sets caching on the Scan object in the Job by calling 
scan.setCaching(int) and passes it to TableMapReduceUtil, it is correctly read 
and used by the mappers during a mapreduce job. This is because Scan.write 
respects and serializes caching, which is used internally by TableMapReduceUtil 
to serialize and transfer the scan object to the mappers.

0.95+, after the move to protobuf, ProtobufUtil.toScan does not respect caching 
anymore as ClientProtos.Scan does not have the field caching. Caching is passed 
via the ScanRequest object to the server and so is not needed in the Scan 
object. However, this breaks application code that relies on the earlier 
behavior. This will lead to sudden degradation in Scan performance 0.96+ for 
users relying on the old behavior.

There are 2 options here:
1. Add caching to Scan object, adding an extra int to the payload for the Scan 
object which is really not needed in the general case.
2. Document and preach that TableMapReduceUtil.setScannerCaching must be called 
by the client.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to