[ https://issues.apache.org/jira/browse/HBASE-11558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14079944#comment-14079944 ]
Nick Dimiduk commented on HBASE-11558: -------------------------------------- [~ishanc] as a follow-on, what do you think about deprecating TableMapReduceUtil.setScannerCaching in favor of setScanner? Is there any sense in having two ways to specify this? We should also look at what happens when a user specifies both. What's the effective behavior? Mind updating the release note appropriately? > Caching set on Scan object gets lost when using TableMapReduceUtil in 0.95+ > --------------------------------------------------------------------------- > > Key: HBASE-11558 > URL: https://issues.apache.org/jira/browse/HBASE-11558 > Project: HBase > Issue Type: Bug > Components: mapreduce, Scanners > Reporter: Ishan Chhabra > Assignee: Ishan Chhabra > Fix For: 0.99.0, 0.96.3, 0.98.5, 2.0.0 > > Attachments: HBASE_11558-0.96.patch, HBASE_11558-0.96_v2.patch, > HBASE_11558-0.98.patch, HBASE_11558-0.98_v2.patch, HBASE_11558.patch, > HBASE_11558_v2.patch, HBASE_11558_v2.patch > > > 0.94 and before, if one sets caching on the Scan object in the Job by calling > scan.setCaching(int) and passes it to TableMapReduceUtil, it is correctly > read and used by the mappers during a mapreduce job. This is because > Scan.write respects and serializes caching, which is used internally by > TableMapReduceUtil to serialize and transfer the scan object to the mappers. > 0.95+, after the move to protobuf, ProtobufUtil.toScan does not respect > caching anymore as ClientProtos.Scan does not have the field caching. Caching > is passed via the ScanRequest object to the server and so is not needed in > the Scan object. However, this breaks application code that relies on the > earlier behavior. This will lead to sudden degradation in Scan performance > 0.96+ for users relying on the old behavior. > There are 2 options here: > 1. Add caching to Scan object, adding an extra int to the payload for the > Scan object which is really not needed in the general case. > 2. Document and preach that TableMapReduceUtil.setScannerCaching must be > called by the client. -- This message was sent by Atlassian JIRA (v6.2#6252)