[ https://issues.apache.org/jira/browse/HBASE-16425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423675#comment-15423675 ]
Jean-Marc Spaggiari commented on HBASE-16425: --------------------------------------------- I like this thread! Another thing related to the bulk load. If someone bulkloads a cell wich is WAY too big, the region server might not be able to load it. Like, a 2GB cell. And will fail. Might be nice to detect that and alert the user/log the issue/skip the cell... > [Operability] Autohandling 'bad data' > ------------------------------------- > > Key: HBASE-16425 > URL: https://issues.apache.org/jira/browse/HBASE-16425 > Project: HBase > Issue Type: Brainstorming > Components: Operability > Reporter: stack > > This is a brainstorming issue. It came up chatting w/ a couple of operators > talking about 'bad data'; i.e. no matter how you control your clients, > someone by mistake or under a misconception will load an out-of-spec Cell or > Row. In this particular case, two types of 'bad data' were talked about: > (on) The Big Cell: An upload of a 'big cell' came in via bulkload but it so > happened that their frontend all arrived at the malignant Cell at the same > time so hundreds of threads requesting the big cell. The RS OOME'd. Then when > the region opened on the new RS, it OOME'd, etc. Could we switch to chunking > when a Server sees that it has a large Cell on its hands? I suppose bulk load > could defeat any Put chunking we had in place but would be good to have this > too. Chatting w/ Matteo, we probably want to just move to the streaming > Interface that we've talked of in the past at various times; the Get would > chunk out the big Cell for assembly on the Client, or just give back the Cell > in pieces -- an OutputStream for the Application to suck on. New API and/or > old API could use it when Cells are big. > (on) The user had a row with 29M Columns in it because the default entity had > id=-1.... In this case chunking the Scan (v1.1+) helps but the operator was > having trouble finding the problem row. How could we surface anomalies like > this for operators? On flush, add even more meta data to the HFile (Yahoo! > Data Sketches as [~jleach] has been suggesting) and then an offline tool to > read metadata and run it through a few simple rules. Data Sketches are > mergeable so could build up a region-view or store-view.... > This is sketchy and I'm pretty sure repeats stuff in old issues but parking > this note here while the encounter still fresh. -- This message was sent by Atlassian JIRA (v6.3.4#6332)