[ https://issues.apache.org/jira/browse/HBASE-20618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497492#comment-16497492 ]
Swapna commented on HBASE-20618: -------------------------------- Thanks [~eclark] Looked into that option. But we have a server side filter with hasFilterRow set to true. We drop results based on some cells missing for a row. And this is incompatible with partial results as row boundaries are not known. > Skip large rows instead of throwing an exception to client > ---------------------------------------------------------- > > Key: HBASE-20618 > URL: https://issues.apache.org/jira/browse/HBASE-20618 > Project: HBase > Issue Type: New Feature > Reporter: Swapna > Priority: Minor > Fix For: 3.0.0, 2.0.1, 1.4.5 > > Attachments: HBASE-20618.hbasemaster.v01.patch, > HBASE-20618.hbasemaster.v02.patch, HBASE-20618.v1.branch-1.patch, > HBASE-20618.v1.branch-1.patch > > > Currently HBase supports throwing RowTooBigException incase there is a row > with one of the column family data exceeds the configured maximum > https://issues.apache.org/jira/browse/HBASE-10925?attachmentOrder=desc > We have some bad rows growing very large. We need a way to skip these rows > for most of our jobs. > Some of the options we considered: > Option 1: > Hbase client handle the exception and restart the scanner past bad row by > capturing the row key where it failed. Can be by adding the rowkey to the > exception stack trace, which seems brittle. Client would ignore the setting > if its upgraded before server. > Option 2: > Skip through big rows on Server.Go with server level config similar to > "hbase.table.max.rowsize" or request based by changing the scan request api. > If allowed to do per request, based on the scan request config, Client will > have to ignore the setting if its upgraded before server. > {code} > try { > populateResult(results, this.storeHeap, scannerContext, current); > } catch(RowTooBigException e) { > LOG.info("Row exceeded the limit in storeheap. Skipping row with > key:"+Bytes.toString(current.getRowArray())); > this.storeHeap.reseek(PrivateCellUtil.createLastOnRow(current)); > results.clear(); > scannerContext.clearProgress(); > continue; > } > {code} > Prefer the option 2 with server level config. Please share your inputs -- This message was sent by Atlassian JIRA (v7.6.3#76005)