Ujjawal Kumar created HBASE-29104:
-------------------------------------
Summary: Support reading partial rows via snapshot based MR job
Key: HBASE-29104
URL: https://issues.apache.org/jira/browse/HBASE-29104
Project: HBase
Issue Type: Improvement
Components: snapshots
Affects Versions: 2.5.10
Reporter: Ujjawal Kumar
Reading larger rows (> hbase.table.max.rowsize) via snapshot based MR job can
fail due to
org.apache.hadoop.hbase.regionserver.RowTooBigException.
For such cases, one way to fix these is increasing value of
hbase.table.max.rowsize via MR job config. However this can also cause OOM
error within mapper in worst case.
One way to fix this is to allow reading rows partially within the snapshot
based MR jobs via usage of Scan#maxResultSize and Scan#allowPartialResults.
This can't be used for snapshot based MR jobs due to the fact that
ClientSideRegionScanner uses [default scanner context while
reading|https://github.com/apache/hbase/blob/5201ae2de2b4b4d18156ab0c00dd42e7726951c0/hbase-server/src/main/java/org/apache/hadoop/hbase/client/ClientSideRegionScanner.java#L104]
which can't enforce size based limits.
Allowing user to pass a custom scanner context to enforce size limit (similar
to the [one used within RSRPCServices
|https://github.com/apache/hbase/blob/5201ae2de2b4b4d18156ab0c00dd42e7726951c0/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java#L3330-L3346]while
reading via regionserver) for snapshot reads can be used to solve this.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)