[ https://issues.apache.org/jira/browse/HBASE-18023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057674#comment-16057674 ]
Josh Elser commented on HBASE-18023: ------------------------------------ {code} @@ -1363,6 +1363,10 @@ public final class HConstants { public static final String DEFAULT_SNAPSHOT_RESTORE_FAILSAFE_NAME = "hbase-failsafe-{snapshot.name}-{restore.timestamp}"; + public static final String BATCH_ROWS_THRESHOLD_NAME = "hbase.rpc.rows.warning.threshold"; + + public static final int BATCH_ROWS_THRESHOLD_DEFAULT = 1000; {code} We try to avoid placing constants into {{HConstants}} unless they're used in a number of places. Can you move this to RSRpcServices, please? {code} + private static LogDelegate DEFAULT_LOG_DELEGATE = new LogDelegate(){ + @Override + public void logBatchWarning(int sum, int rowSizeWarnThreshold) { + LOG.warn("Large batch operation detected (greater than "+rowSizeWarnThreshold+") (HBASE-18023)." + + " Requested Number of Rows: "+sum + + " Client: "+RpcServer.getRequestUserName()+"/"+RpcServer.getRemoteAddress()); + }}; {code} Please wrap the {{LOG.warn(..)}} call in a {{if (LOG.isWarnEnabled())}} conditional. This is a relatively "hot" code path -- we want to avoid the intermediate string generation if we aren't going to log it. Can you make sure the indentation is consistently to 2-spaces in your changes? There are also a couple of long-lines as well which you could wrap which would be great! One final request, could you add a test that verifies the warning happens when there is one RegionAction in the MultiRequest which exceeds the threshold? Other than the above, I think this looks good. Just minor things at this point :) > Log multi-* requests for more than threshold number of rows > ----------------------------------------------------------- > > Key: HBASE-18023 > URL: https://issues.apache.org/jira/browse/HBASE-18023 > Project: HBase > Issue Type: Improvement > Components: regionserver > Reporter: Clay B. > Assignee: David Harju > Priority: Minor > Attachments: HBASE-18023.master.001.patch, > HBASE-18023.master.002.patch > > > Today, if a user happens to do something like a large multi-put, they can get > through request throttling (e.g. it is one request) but still crash a region > server with a garbage storm. We have seen regionservers hit this issue and it > is silent and deadly. The RS will report nothing more than a mysterious > garbage collection and exit out. > Ideally, we could report a large multi-* request before starting it, in case > it happens to be deadly. Knowing the client, user and how many rows are > affected would be a good start to tracking down painful users. -- This message was sent by Atlassian JIRA (v6.4.14#64029)