[jira] [Commented] (HBASE-2231) Compaction events should be written to HLog
[ https://issues.apache.org/jira/browse/HBASE-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13665477#comment-13665477 ] Prakash Khemani commented on HBASE-2231: Hi, Prakash Khemani is no longer at Facebook so this email address is no longer being monitored. If you need assistance, please contact another person who is currently at the company. Compaction events should be written to HLog --- Key: HBASE-2231 URL: https://issues.apache.org/jira/browse/HBASE-2231 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Todd Lipcon Assignee: stack Priority: Blocker Labels: moved_from_0_20_5 Fix For: 0.98.0, 0.95.1 Attachments: 2231-testcase-0.94.txt, 2231-testcase_v2.txt, 2231-testcase_v3.txt, 2231v2.txt, 2231v3.txt, 2231v4.txt, hbase-2231-testcase.txt, hbase-2231.txt, hbase-2231_v5.patch, hbase-2231_v6.patch, hbase-2231_v7-0.95.patch, hbase-2231_v7.patch, hbase-2231_v7.patch The sequence for a compaction should look like this: # Compact region to new files # Write a Compacted Region entry to the HLog # Delete old files This deals with a case where the RS has paused between step 1 and 2 and the regions have since been reassigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6878) DistributerLogSplit can fail to resubmit a task done if there is an exception during the log archiving
[ https://issues.apache.org/jira/browse/HBASE-6878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13464874#comment-13464874 ] Prakash Khemani commented on HBASE-6878: The logic to indefinitely retry a failing log-splitting task is not inside SplitLogManager. SplitLogManager will retry a task finite number of times. If it fails then it is the outer Master layers that indefinitely retry. the reason for this behavior is to build tools around distributed log splitting. If distributed log splitting were being used by a tool then you wouldn't want it to indefinitely retry. So the behavior outlined in this bug report is correct. But this behavior shouldn't lead to any bug. (There are only a few places in SplitLogManager where it resubmits the task forcefully, disregarding the retry limit. I think the only two cases are when a region server (splitlogworker) dies and when a splitlogworker resigns from the task (i.e. gives up the task even though there were no failures)) DistributerLogSplit can fail to resubmit a task done if there is an exception during the log archiving -- Key: HBASE-6878 URL: https://issues.apache.org/jira/browse/HBASE-6878 Project: HBase Issue Type: Bug Components: master Reporter: nkeywal Priority: Minor The code in SplitLogManager# getDataSetWatchSuccess is: {code} if (slt.isDone()) { LOG.info(task + path + entered state: + slt.toString()); if (taskFinisher != null !ZKSplitLog.isRescanNode(watcher, path)) { if (taskFinisher.finish(slt.getServerName(), ZKSplitLog.getFileName(path)) == Status.DONE) { setDone(path, SUCCESS); } else { resubmitOrFail(path, CHECK); } } else { setDone(path, SUCCESS); } {code} resubmitOrFail(path, CHECK); should be resubmitOrFail(path, FORCE); Without it, the task won't be resubmitted if the delay is not reached, and the task will be marked as failed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5860) splitlogmanager should not unnecessarily resubmit tasks when zk unavailable
[ https://issues.apache.org/jira/browse/HBASE-5860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265461#comment-13265461 ] Prakash Khemani commented on HBASE-5860: I had missed the fact that isAnyCreateZKNodePending() misses the create of RESCAN nodes. Will provide a fix. I was aware of the race condition where isAnyCreateZKNodePending() will return false even when create-zknode is soon going to be retried. Not worth fixing for the reason you outlined - creating an extra RESCAN node doesn't hurt. (The code change you have outlined will need some more changes to make it work) splitlogmanager should not unnecessarily resubmit tasks when zk unavailable --- Key: HBASE-5860 URL: https://issues.apache.org/jira/browse/HBASE-5860 Project: HBase Issue Type: Improvement Reporter: Prakash Khemani Assignee: Prakash Khemani Attachments: 0001-HBASE-5860-splitlogmanager-should-not-unnecessarily-.patch (Doesn't really impact the run time or correctness of log splitting) say the master has lost connection to zk. splitlogmanager's timeoutmanager will realize that all the tasks that were submitted are still unassigned. It will resubmit those tasks (i.e. create dummy znodes) splitlogmanager should realze that the tasks are unassigned but their znodes have not been created. 012-04-20 13:11:20,516 INFO org.apache.hadoop.hbase.master.SplitLogManager: dead splitlog worker msgstore295.snc4.facebook.com,60020,1334948757026 2012-04-20 13:11:20,517 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: Scheduling batch of logs to split 2012-04-20 13:11:20,517 INFO org.apache.hadoop.hbase.master.SplitLogManager: started splitting logs in [hdfs://msgstore215.snc4.facebook.com:9000/MSGSTORE215-SNC4-HBASE/.logs/msgstore295.snc4.facebook.com,60020,1334948757026-splitting] 2012-04-20 13:11:20,565 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server msgstore235.snc4.facebook.com/10.30.222.186:2181 2012-04-20 13:11:20,566 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to msgstore235.snc4.facebook.com/10.30.222.186:2181, initiating session 2012-04-20 13:11:20,575 INFO org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 4 unassigned = 4 2012-04-20 13:11:20,576 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout 2012-04-20 13:11:21,577 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x36ccb0f8010002, likely server has closed socket, closing socket connection and attempting reconnect 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x136ccb0f489, likely server has closed socket, closing socket connection and attempting reconnect 2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951586677 retry=3 2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951920332 retry=3 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5860) splitlogmanager should not unnecessarily resubmit tasks when zk unavailable
[ https://issues.apache.org/jira/browse/HBASE-5860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Khemani updated HBASE-5860: --- Attachment: 0001-HBASE-5860-splitlogmanager-should-not-unnecessarily-.patch Nicolas's feedback applied. also reduced the RESCAN retries to 0. splitlogmanager should not unnecessarily resubmit tasks when zk unavailable --- Key: HBASE-5860 URL: https://issues.apache.org/jira/browse/HBASE-5860 Project: HBase Issue Type: Improvement Reporter: Prakash Khemani Assignee: Prakash Khemani Attachments: 0001-HBASE-5860-splitlogmanager-should-not-unnecessarily-.patch, 0001-HBASE-5860-splitlogmanager-should-not-unnecessarily-.patch (Doesn't really impact the run time or correctness of log splitting) say the master has lost connection to zk. splitlogmanager's timeoutmanager will realize that all the tasks that were submitted are still unassigned. It will resubmit those tasks (i.e. create dummy znodes) splitlogmanager should realze that the tasks are unassigned but their znodes have not been created. 012-04-20 13:11:20,516 INFO org.apache.hadoop.hbase.master.SplitLogManager: dead splitlog worker msgstore295.snc4.facebook.com,60020,1334948757026 2012-04-20 13:11:20,517 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: Scheduling batch of logs to split 2012-04-20 13:11:20,517 INFO org.apache.hadoop.hbase.master.SplitLogManager: started splitting logs in [hdfs://msgstore215.snc4.facebook.com:9000/MSGSTORE215-SNC4-HBASE/.logs/msgstore295.snc4.facebook.com,60020,1334948757026-splitting] 2012-04-20 13:11:20,565 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server msgstore235.snc4.facebook.com/10.30.222.186:2181 2012-04-20 13:11:20,566 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to msgstore235.snc4.facebook.com/10.30.222.186:2181, initiating session 2012-04-20 13:11:20,575 INFO org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 4 unassigned = 4 2012-04-20 13:11:20,576 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout 2012-04-20 13:11:21,577 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x36ccb0f8010002, likely server has closed socket, closing socket connection and attempting reconnect 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x136ccb0f489, likely server has closed socket, closing socket connection and attempting reconnect 2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951586677 retry=3 2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951920332 retry=3 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5890) SplitLog Rescan BusyWaits upon Zk.CONNECTIONLOSS
[ https://issues.apache.org/jira/browse/HBASE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264046#comment-13264046 ] Prakash Khemani commented on HBASE-5890: Most likely, it isn't a good idea to sleep in the zookeeper callback thread. (isn't the zk client single threaded?) Can these be queued in a DelayedQueue(socket-timeout) and retried from SplitLogManager.TimeoutMonitor.chore() SplitLog Rescan BusyWaits upon Zk.CONNECTIONLOSS Key: HBASE-5890 URL: https://issues.apache.org/jira/browse/HBASE-5890 Project: HBase Issue Type: Bug Reporter: Nicolas Spiegelberg Priority: Minor Fix For: 0.94.0, 0.96.0, 0.89-fb Attachments: HBASE-5890.patch We ran into a production issue yesterday where the SplitLogManager tried to create a Rescan node in ZK. The createAsync() generated a KeeperException.CONNECTIONLOSS that was immedately sent to processResult(), createRescan node with --retry_count was called, and this created a CPU busywait that also clogged up the logs. We should handle this better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5860) splitlogmanager should not unnecessarily resubmit tasks when zk unavailable
[ https://issues.apache.org/jira/browse/HBASE-5860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Khemani updated HBASE-5860: --- Attachment: 0001-HBASE-5860-splitlogmanager-should-not-unnecessarily-.patch avoid resubmitting tasks to zk when there are pending zkk nodes create. splitlogmanager should not unnecessarily resubmit tasks when zk unavailable --- Key: HBASE-5860 URL: https://issues.apache.org/jira/browse/HBASE-5860 Project: HBase Issue Type: Improvement Reporter: Prakash Khemani Assignee: Prakash Khemani Attachments: 0001-HBASE-5860-splitlogmanager-should-not-unnecessarily-.patch (Doesn't really impact the run time or correctness of log splitting) say the master has lost connection to zk. splitlogmanager's timeoutmanager will realize that all the tasks that were submitted are still unassigned. It will resubmit those tasks (i.e. create dummy znodes) splitlogmanager should realze that the tasks are unassigned but their znodes have not been created. 012-04-20 13:11:20,516 INFO org.apache.hadoop.hbase.master.SplitLogManager: dead splitlog worker msgstore295.snc4.facebook.com,60020,1334948757026 2012-04-20 13:11:20,517 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: Scheduling batch of logs to split 2012-04-20 13:11:20,517 INFO org.apache.hadoop.hbase.master.SplitLogManager: started splitting logs in [hdfs://msgstore215.snc4.facebook.com:9000/MSGSTORE215-SNC4-HBASE/.logs/msgstore295.snc4.facebook.com,60020,1334948757026-splitting] 2012-04-20 13:11:20,565 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server msgstore235.snc4.facebook.com/10.30.222.186:2181 2012-04-20 13:11:20,566 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to msgstore235.snc4.facebook.com/10.30.222.186:2181, initiating session 2012-04-20 13:11:20,575 INFO org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 4 unassigned = 4 2012-04-20 13:11:20,576 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout 2012-04-20 13:11:21,577 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x36ccb0f8010002, likely server has closed socket, closing socket connection and attempting reconnect 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x136ccb0f489, likely server has closed socket, closing socket connection and attempting reconnect 2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951586677 retry=3 2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951920332 retry=3 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5860) splitlogmanager should not unnecessarily resubmit tasks when zk unavailable
Prakash Khemani created HBASE-5860: -- Summary: splitlogmanager should not unnecessarily resubmit tasks when zk unavailable Key: HBASE-5860 URL: https://issues.apache.org/jira/browse/HBASE-5860 Project: HBase Issue Type: Improvement Reporter: Prakash Khemani Assignee: Prakash Khemani (Doesn't really impact the run time or correctness of log splitting) say the master has lost connection to zk. splitlogmanager's timeoutmanager will realize that all the tasks that were submitted are still unassigned. It will resubmit those tasks (i.e. create dummy znodes) splitlogmanager should realze that the tasks are unassigned but their znodes have not been created. 012-04-20 13:11:20,516 INFO org.apache.hadoop.hbase.master.SplitLogManager: dead splitlog worker msgstore295.snc4.facebook.com,60020,1334948757026 2012-04-20 13:11:20,517 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: Scheduling batch of logs to split 2012-04-20 13:11:20,517 INFO org.apache.hadoop.hbase.master.SplitLogManager: started splitting logs in [hdfs://msgstore215.snc4.facebook.com:9000/MSGSTORE215-SNC4-HBASE/.logs/msgstore295.snc4.facebook.com,60020,1334948757026-splitting] 2012-04-20 13:11:20,565 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server msgstore235.snc4.facebook.com/10.30.222.186:2181 2012-04-20 13:11:20,566 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to msgstore235.snc4.facebook.com/10.30.222.186:2181, initiating session 2012-04-20 13:11:20,575 INFO org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 4 unassigned = 4 2012-04-20 13:11:20,576 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout 2012-04-20 13:11:21,577 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x36ccb0f8010002, likely server has closed socket, closing socket connection and attempting reconnect 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x136ccb0f489, likely server has closed socket, closing socket connection and attempting reconnect 2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951586677 retry=3 2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951920332 retry=3 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4007) distributed log splitting can get indefinitely stuck
[ https://issues.apache.org/jira/browse/HBASE-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13100085#comment-13100085 ] Prakash Khemani commented on HBASE-4007: Hi Stack, I have not pushed this out to production yet ... and the way things are it will be a while before we do the next push to the hbase-90 tiers. I will try to get some cluster testing done and will update this thread. Regarding the use of ConcurrentHashMap as opposed to HashSet + ObjectLock : I could not find any nice way to take a snapshot of a concurrent-hash-map. The way the code is written I need to take a snapshot of the deadWorkers set. I have just rebased. I will try to put it up in the reviewboard one more time. Thanks, Prakash distributed log splitting can get indefinitely stuck Key: HBASE-4007 URL: https://issues.apache.org/jira/browse/HBASE-4007 Project: HBase Issue Type: Bug Reporter: Prakash Khemani Assignee: Prakash Khemani Priority: Critical Fix For: 0.92.0 Attachments: 0001-HBASE-4007-distributed-log-splitting-can-get-indefin.patch After the configured number of retries SplitLogManager is not going to resubmit log-split tasks. In this situation even if the splitLogWorker that owns the task dies the task will not get resubmitted. When a regionserver goes away then all the split-log tasks that it owned should be resubmitted by the SplitLogMaster. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4007) distributed log splitting can get indefinitely stuck
[ https://issues.apache.org/jira/browse/HBASE-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091168#comment-13091168 ] Prakash Khemani commented on HBASE-4007: I am not running the patch yet. It is up internally for review. distributed log splitting can get indefinitely stuck Key: HBASE-4007 URL: https://issues.apache.org/jira/browse/HBASE-4007 Project: HBase Issue Type: Bug Reporter: Prakash Khemani Assignee: Prakash Khemani Priority: Critical Attachments: 0001-HBASE-4007-distributed-log-splitting-can-get-indefin.patch After the configured number of retries SplitLogManager is not going to resubmit log-split tasks. In this situation even if the splitLogWorker that owns the task dies the task will not get resubmitted. When a regionserver goes away then all the split-log tasks that it owned should be resubmitted by the SplitLogMaster. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4007) distributed log splitting can get indefinitely stuck
[ https://issues.apache.org/jira/browse/HBASE-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091345#comment-13091345 ] Prakash Khemani commented on HBASE-4007: (I had tried using reviewboard yesterday but it was fataling on me - Internal Server Error 500. Thanks for reviewing the patch the hard way) == For deadWorkers - one of the goals was to not block handleDeadWorkers() and worry about deadlocks etc. I could have used ConcurrentSet but I was not sure of the semantics - how an iterator behaves when another item is added. I will change this to use ConcurrentSet ... please let me know. == I will change registerHeartbeat() to hearbeat(). distributed log splitting can get indefinitely stuck Key: HBASE-4007 URL: https://issues.apache.org/jira/browse/HBASE-4007 Project: HBase Issue Type: Bug Reporter: Prakash Khemani Assignee: Prakash Khemani Priority: Critical Attachments: 0001-HBASE-4007-distributed-log-splitting-can-get-indefin.patch After the configured number of retries SplitLogManager is not going to resubmit log-split tasks. In this situation even if the splitLogWorker that owns the task dies the task will not get resubmitted. When a regionserver goes away then all the split-log tasks that it owned should be resubmitted by the SplitLogMaster. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4007) distributed log splitting can get indefinitely stuck
[ https://issues.apache.org/jira/browse/HBASE-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Khemani updated HBASE-4007: --- Status: Patch Available (was: Open) fixes (1) buildup of RESCAN zookeeper nodes in the event that all the regionservers are down (2) a bug in tracking when the last RESCAN node was created which could lead to too frequent RESCAN node creation (3) if master/splitlogmanager fails to complete the task handed over by a region-server/worker then keep retrying indefinitely (4) if a regionserver/worker dies then ensure that all its tasks are resubmitted distributed log splitting can get indefinitely stuck Key: HBASE-4007 URL: https://issues.apache.org/jira/browse/HBASE-4007 Project: HBase Issue Type: Bug Reporter: Prakash Khemani Assignee: Prakash Khemani After the configured number of retries SplitLogManager is not going to resubmit log-split tasks. In this situation even if the splitLogWorker that owns the task dies the task will not get resubmitted. When a regionserver goes away then all the split-log tasks that it owned should be resubmitted by the SplitLogMaster. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4007) distributed log splitting can get indefinitely stuck
[ https://issues.apache.org/jira/browse/HBASE-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Khemani updated HBASE-4007: --- Status: Open (was: Patch Available) distributed log splitting can get indefinitely stuck Key: HBASE-4007 URL: https://issues.apache.org/jira/browse/HBASE-4007 Project: HBase Issue Type: Bug Reporter: Prakash Khemani Assignee: Prakash Khemani After the configured number of retries SplitLogManager is not going to resubmit log-split tasks. In this situation even if the splitLogWorker that owns the task dies the task will not get resubmitted. When a regionserver goes away then all the split-log tasks that it owned should be resubmitted by the SplitLogMaster. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4007) distributed log splitting can get indefinitely stuck
[ https://issues.apache.org/jira/browse/HBASE-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Khemani updated HBASE-4007: --- Attachment: 0001-HBASE-4007-distributed-log-splitting-can-get-indefin.patch patch distributed log splitting can get indefinitely stuck Key: HBASE-4007 URL: https://issues.apache.org/jira/browse/HBASE-4007 Project: HBase Issue Type: Bug Reporter: Prakash Khemani Assignee: Prakash Khemani Attachments: 0001-HBASE-4007-distributed-log-splitting-can-get-indefin.patch After the configured number of retries SplitLogManager is not going to resubmit log-split tasks. In this situation even if the splitLogWorker that owns the task dies the task will not get resubmitted. When a regionserver goes away then all the split-log tasks that it owned should be resubmitted by the SplitLogMaster. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits
[ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Khemani updated HBASE-3845: --- Attachment: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch patch deployed internally in facebook data loss because lastSeqWritten can miss memstore edits Key: HBASE-3845 URL: https://issues.apache.org/jira/browse/HBASE-3845 Project: HBase Issue Type: Bug Affects Versions: 0.90.3 Reporter: Prakash Khemani Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.90.5 Attachments: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.) In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably. After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore. HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track of the earliest log-sequence-number that is present in the memstore. Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens. step 1: flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock(). step 2 : as soon as the updatesLock.writeLock() is released new entries will be added into the memstore. step 3 : wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten. step 4: the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing. == as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits
[ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071086#comment-13071086 ] Prakash Khemani commented on HBASE-3845: patch deployed internally in facebook 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch data loss because lastSeqWritten can miss memstore edits Key: HBASE-3845 URL: https://issues.apache.org/jira/browse/HBASE-3845 Project: HBase Issue Type: Bug Affects Versions: 0.90.3 Reporter: Prakash Khemani Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.90.5 Attachments: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.) In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably. After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore. HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track of the earliest log-sequence-number that is present in the memstore. Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens. step 1: flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock(). step 2 : as soon as the updatesLock.writeLock() is released new entries will be added into the memstore. step 3 : wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten. step 4: the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing. == as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits
[ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070542#comment-13070542 ] Prakash Khemani commented on HBASE-3845: In the patch that is deployed internally we have implemented a different approach. We remove the region's entry in startCacheFlush() and save it (as opposed to the current behavior of removing the entry in completeCacheFlush()). If the flush aborts then we restore the saved entry. The approach taken in the latest patch in this jira might also be OK. I have a few comments {noformat} this.lastSeqWritten.remove(encodedRegionName); + Long seqWhileFlush = this.seqWrittenWhileFlush.get(encodedRegionName); + if (null != seqWhileFlush) { +this.lastSeqWritten.putIfAbsent(encodedRegionName, seqWhileFlush); +this.seqWrittenWhileFlush.remove(encodedRegionName); + {noformat} seqWrittenWhileFlush .get() and subsequent .remove() can be replaced by a single .remove() {code} Long seqWhileFlush = this.seqWrittenWhileFlush.remove(encodedRegionName); if (null != seqWhileFlush) { lSW.put(encodedRegionName, seqWhileFlush); else lSW.remove(encodedRegionName); {code} == The bigger problem here is that completeCacheFlush() is not called with updatedLock acquired. Therefore there might still be correctness issues with the latest patch. == {noformat} public void abortCacheFlush() { +this.isFlushInProgress.set(false); this.cacheFlushLock.unlock(); } {noformat} shouldn't seqWrittenWhileFlush be cleaned up in abort cache? data loss because lastSeqWritten can miss memstore edits Key: HBASE-3845 URL: https://issues.apache.org/jira/browse/HBASE-3845 Project: HBase Issue Type: Bug Affects Versions: 0.90.3 Reporter: Prakash Khemani Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.90.5 Attachments: HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.) In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably. After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore. HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track of the earliest log-sequence-number that is present in the memstore. Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens. step 1: flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock(). step 2 : as soon as the updatesLock.writeLock() is released new entries will be added into the memstore. step 3 : wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten. step 4: the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing. == as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits
[ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070614#comment-13070614 ] Prakash Khemani commented on HBASE-3845: In the method internalFlushcache() I don't see updatesLock.writeLock() being held around the following piece of code. {code} if (wal != null) { wal.completeCacheFlush(this.regionInfo.getEncodedNameAsBytes(), regionInfo.getTableDesc().getName(), completeSequenceId, this.getRegionInfo().isMetaRegion()); } {code} == I will upload the internal patch for reference ... data loss because lastSeqWritten can miss memstore edits Key: HBASE-3845 URL: https://issues.apache.org/jira/browse/HBASE-3845 Project: HBase Issue Type: Bug Affects Versions: 0.90.3 Reporter: Prakash Khemani Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.90.5 Attachments: HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.) In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably. After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore. HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track of the earliest log-sequence-number that is present in the memstore. Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens. step 1: flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock(). step 2 : as soon as the updatesLock.writeLock() is released new entries will be added into the memstore. step 3 : wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten. step 4: the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing. == as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4007) distributed log splitting can get indefinitely stuck
[ https://issues.apache.org/jira/browse/HBASE-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052067#comment-13052067 ] Prakash Khemani commented on HBASE-4007: @mingjian What you are talking about is probably a different issue. The scenario you have described can happen when 1. Master puts up a task. 2. No one acquires the task. 3. Master puts up a RESCAN node asking everyone to re-look at the zk splitlog task list. The bug described in this jira will happen in the following way (I have not encountered it yet but should be easy to reproduce) a/ A splitlog task is slow. Master has already moved the task from one worker to another 3 times. It is with the 4th worker now. Even if the 4th worker takes too long doing this task the master is not going to do anything about it. b/ the 4th worker dies. c/ the task will hang. Master has to resubmit the task when the 4th worker dies. distributed log splitting can get indefinitely stuck Key: HBASE-4007 URL: https://issues.apache.org/jira/browse/HBASE-4007 Project: HBase Issue Type: Bug Reporter: Prakash Khemani Assignee: Prakash Khemani After the configured number of retries SplitLogManager is not going to resubmit log-split tasks. In this situation even if the splitLogWorker that owns the task dies the task will not get resubmitted. When a regionserver goes away then all the split-log tasks that it owned should be resubmitted by the SplitLogMaster. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4007) distributed log splitting can get indefinitely stuck
distributed log splitting can get indefinitely stuck Key: HBASE-4007 URL: https://issues.apache.org/jira/browse/HBASE-4007 Project: HBase Issue Type: Bug Reporter: Prakash Khemani Assignee: Prakash Khemani After the configured number of retries SplitLogManager is not going to resubmit log-split tasks. In this situation even if the splitLogWorker that owns the task dies the task will not get resubmitted. When a regionserver goes away then all the split-log tasks that it owned should be resubmitted by the SplitLogMaster. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3963) Schedule all log-spliiting at startup all at once
[ https://issues.apache.org/jira/browse/HBASE-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13048248#comment-13048248 ] Prakash Khemani commented on HBASE-3963: Patch looks good to me. Thanks. Schedule all log-spliiting at startup all at once - Key: HBASE-3963 URL: https://issues.apache.org/jira/browse/HBASE-3963 Project: HBase Issue Type: Improvement Reporter: Prakash Khemani Assignee: Prakash Khemani Attachments: schedule-all-splitlog.patch When distributed log splitting is enabled then it is better to call splitLog() for all region servers simultaneously. A large number of splitlog tasks will get scheduled - one for each log file. But a splitlog-worker (region server) executes only one task at a time and there shouldn't be a danger of DFS overload. Scheduling all the tasks at once ensures maximum parallelism. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-3963) Schedule all log-spliiting at startup all at once
Schedule all log-spliiting at startup all at once - Key: HBASE-3963 URL: https://issues.apache.org/jira/browse/HBASE-3963 Project: HBase Issue Type: Improvement Reporter: Prakash Khemani Assignee: Prakash Khemani When distributed log splitting is enabled then it is better to call splitLog() for all region servers simultaneously. A large number of splitlog tasks will get scheduled - one for each log file. But a splitlog-worker (region server) executes only one task at a time and there shouldn't be a danger of DFS overload. Scheduling all the tasks at once ensures maximum parallelism. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1364) [performance] Distributed splitting of regionserver commit logs
[ https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046044#comment-13046044 ] Prakash Khemani commented on HBASE-1364: Filed https://issues.apache.org/jira/browse/HBASE-3963. Will try to get this done. [performance] Distributed splitting of regionserver commit logs --- Key: HBASE-1364 URL: https://issues.apache.org/jira/browse/HBASE-1364 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: stack Assignee: Prakash Khemani Priority: Critical Fix For: 0.92.0 Attachments: 1364-v5.txt, HBASE-1364.patch, org.apache.hadoop.hbase.master.TestDistributedLogSplitting-output.txt Time Spent: 8h Remaining Estimate: 0h HBASE-1008 has some improvements to our log splitting on regionserver crash; but it needs to run even faster. (Below is from HBASE-1008) In bigtable paper, the split is distributed. If we're going to have 1000 logs, we need to distribute or at least multithread the splitting. 1. As is, regions starting up expect to find one reconstruction log only. Need to make it so pick up a bunch of edit logs and it should be fine that logs are elsewhere in hdfs in an output directory written by all split participants whether multithreaded or a mapreduce-like distributed process (Lets write our distributed sort first as a MR so we learn whats involved; distributed sort, as much as possible should use MR framework pieces). On startup, regions go to this directory and pick up the files written by split participants deleting and clearing the dir when all have been read in. Making it so can take multiple logs for input, can also make the split process more robust rather than current tenuous process which loses all edits if it doesn't make it to the end without error. 2. Each column family rereads the reconstruction log to find its edits. Need to fix that. Split can sort the edits by column family so store only reads its edits. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3889) NPE in Distributed Log Splitting
[ https://issues.apache.org/jira/browse/HBASE-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034139#comment-13034139 ] Prakash Khemani commented on HBASE-3889: Thanks Lars for finding and providing a fix for this issue. I have a few minor comments on the patch ... The following isn't really needed, the earlier check you put in should be good enough. {code} +if (wap == null) { + continue; +} {code} It might be better to catch Throwable in SplitLogWorker.run() and print the Unexpected Error message there. It might not be a good thing to ignore an unexpected exception in SplitLogWorker.grabTask() and continue. {nofrmat} +++ src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java (working copy) @@ -297,6 +297,8 @@ } break; } +} catch (Exception e) { + LOG.error(An error occurred., e); } finally { if (t 0) { LOG.info(worker + serverName + done with task + path + {noformat} NPE in Distributed Log Splitting Key: HBASE-3889 URL: https://issues.apache.org/jira/browse/HBASE-3889 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Environment: Pseudo-distributed on MacOS Reporter: Lars George Assignee: Lars George Fix For: 0.92.0 Attachments: HBASE-3889.patch There is an issue with the log splitting under the specific condition of edits belonging to a non existing region (which went away after a split for example). The HLogSplitter fails to check the condition, which is handled on a lower level, logging manifests it as {noformat} 2011-05-16 13:56:10,300 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: This region's directory doesn't exist: hdfs://localhost:8020/hbase/usertable/30c4d0a47703214845d0676d0c7b36f0. It is very likely that it was already split so it's safe to discard those edits. {noformat} The code returns a null reference which is not check in HLogSplitter.splitLogFileToTemp(): {code} ... WriterAndPath wap = (WriterAndPath)o; if (wap == null) { wap = createWAP(region, entry, rootDir, tmpname, fs, conf); if (wap == null) { logWriters.put(region, BAD_WRITER); } else { logWriters.put(region, wap); } } wap.w.append(entry); ... {code} The createWAP does return null when the above message is logged based on the obsolete region reference in the edit. What made this difficult to detect is that the error (and others) are silently ignored in SplitLogWorker.grabTask(). I added a catch and error logging to see the NPE that was caused by the above. {code} ... break; } } catch (Exception e) { LOG.error(An error occurred., e); } finally { if (t 0) { ... {code} As a side note, there are other errors/asserts triggered that this try/finally not handles. For example {noformat} 2011-05-16 13:58:30,647 WARN org.apache.hadoop.hbase.regionserver.SplitLogWorker: BADVERSION failed to assert ownership for /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389 org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389 at org.apache.zookeeper.KeeperException.create(KeeperException.java:106) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038) at org.apache.hadoop.hbase.regionserver.SplitLogWorker.ownTask(SplitLogWorker.java:329) at org.apache.hadoop.hbase.regionserver.SplitLogWorker.access$100(SplitLogWorker.java:68) at org.apache.hadoop.hbase.regionserver.SplitLogWorker$2.progress(SplitLogWorker.java:265) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:432) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:354) at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:113) at org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:260) at org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:191) at org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:164) at java.lang.Thread.run(Thread.java:680) {noformat} This should probably be handled - or at
[jira] [Commented] (HBASE-3890) Scheduled tasks in distributed log splitting not in sync with ZK
[ https://issues.apache.org/jira/browse/HBASE-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034149#comment-13034149 ] Prakash Khemani commented on HBASE-3890: With the bug you identified in HBASE-3889 this behavior is expected. The SplitLogManager will put up a task, a SplitLogWorker will pick it up and will never complete it because of the bug. Manager will resubmit the task and another worker will pick it up to never complete it. The Manager resubmits at most hbase.splitlog.max.resubmit (default = 3) times after which the task hangs. Scheduled tasks in distributed log splitting not in sync with ZK Key: HBASE-3890 URL: https://issues.apache.org/jira/browse/HBASE-3890 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Lars George Fix For: 0.92.0 This is in continuation to HBASE-3889: Note that there must be more slightly off here. Although the splitlogs znode is now empty the master is still stuck here: {noformat} Doing distributed log split in hdfs://localhost:8020/hbase/.logs/10.0.0.65,60020,1305406356765 - Waiting for distributed tasks to finish. scheduled=2 done=1 error=0 4380s Master startup - Splitting logs after master startup 4388s {noformat} There seems to be an issue with what is in ZK and what the TaskBatch holds. In my case it could be related to the fact that the task was already in ZK after many faulty restarts because of the NPE. Maybe it was added once (since that is keyed by path, and that is unique on my machine), but the reference count upped twice? Now that the real one is done, the done counter has been increased, but will never match the scheduled. The code could also check if ZK is actually depleted, and therefore treat the scheduled task as bogus? This of course only treats the symptom, not the root cause of this condition. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3828) region server stuck in waitOnAllRegionsToClose
[ https://issues.apache.org/jira/browse/HBASE-3828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029452#comment-13029452 ] Prakash Khemani commented on HBASE-3828: In my cluster this turned out to be a problem in code that we had modified internally. In the region server abort code path we had put in a check that if filesystem is unavailable then do not try to close regions. But the main thread anyway went ahead and waited for the regions to close. That was causing the hang in waitOnAllRegionsToClose(). (aside - there is an internal task on this ... when append to HLog fails, hbase relies on dfsclient to close the filesystem for the regionserver abort to be triggered. That is very roundabout and there ought to be more direct and synchronous abort facility) == It is possible that there is no further synchronization necessary when a region is being opened. But I haven't looked at the code closely enough. What happens between the time when zk node is closed and the region is actually closed on the rs? When is the region removed fron onlineRegions - is it possible that one thread adds it and the other immediately removes it ... I will try and spend some time on it this soon. === region server stuck in waitOnAllRegionsToClose -- Key: HBASE-3828 URL: https://issues.apache.org/jira/browse/HBASE-3828 Project: HBase Issue Type: Bug Reporter: Prakash Khemani -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3815) load balancer should ignore bad region servers
[ https://issues.apache.org/jira/browse/HBASE-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029454#comment-13029454 ] Prakash Khemani commented on HBASE-3815: Yeah, I think there is some overlap. In this case we want the region server to be excluded because of some internal logic in master. load balancer should ignore bad region servers -- Key: HBASE-3815 URL: https://issues.apache.org/jira/browse/HBASE-3815 Project: HBase Issue Type: Bug Reporter: Prakash Khemani the loadbalancer should remember which region server is constantly having trouble opening regions and it should take that rs out of the equation ... otherwise the lb goes into an unproductive loop ... I don't have logs handy for this one. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits
data loss because lastSeqWritten can miss memstore edits Key: HBASE-3845 URL: https://issues.apache.org/jira/browse/HBASE-3845 Project: HBase Issue Type: Bug Reporter: Prakash Khemani (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.) In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably. After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore. HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track of the earliest log-sequence-number that is present in the memstore. Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens. step 1: flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock(). step 2 : as soon as the updatesLock.writeLock() is released new entries will be added into the memstore. step 3 : wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten. step 4: the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing. == as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-3843) splitLogWorker starts too early
splitLogWorker starts too early --- Key: HBASE-3843 URL: https://issues.apache.org/jira/browse/HBASE-3843 Project: HBase Issue Type: Bug Reporter: Prakash Khemani Assignee: Prakash Khemani splitlogworker should be started in startServiceThreads() instead of in initializeZookeeper(). This will ensure that the region server accepts a split-logging tasks only after it has successfully done reportForDuty() to the master. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3843) splitLogWorker starts too early
[ https://issues.apache.org/jira/browse/HBASE-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Khemani updated HBASE-3843: --- Status: Open (was: Patch Available) splitLogWorker starts too early --- Key: HBASE-3843 URL: https://issues.apache.org/jira/browse/HBASE-3843 Project: HBase Issue Type: Bug Reporter: Prakash Khemani Assignee: Prakash Khemani splitlogworker should be started in startServiceThreads() instead of in initializeZookeeper(). This will ensure that the region server accepts a split-logging tasks only after it has successfully done reportForDuty() to the master. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3843) splitLogWorker starts too early
[ https://issues.apache.org/jira/browse/HBASE-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Khemani updated HBASE-3843: --- Status: Patch Available (was: Open) splitLogWorker starts too early --- Key: HBASE-3843 URL: https://issues.apache.org/jira/browse/HBASE-3843 Project: HBase Issue Type: Bug Reporter: Prakash Khemani Assignee: Prakash Khemani splitlogworker should be started in startServiceThreads() instead of in initializeZookeeper(). This will ensure that the region server accepts a split-logging tasks only after it has successfully done reportForDuty() to the master. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3843) splitLogWorker starts too early
[ https://issues.apache.org/jira/browse/HBASE-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Khemani updated HBASE-3843: --- Status: Patch Available (was: Open) splitLogWorker starts too early --- Key: HBASE-3843 URL: https://issues.apache.org/jira/browse/HBASE-3843 Project: HBase Issue Type: Bug Reporter: Prakash Khemani Assignee: Prakash Khemani Attachments: 0001-HBASE-3843-start-splitLogWorker-later-at-region-serv.patch splitlogworker should be started in startServiceThreads() instead of in initializeZookeeper(). This will ensure that the region server accepts a split-logging tasks only after it has successfully done reportForDuty() to the master. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3843) splitLogWorker starts too early
[ https://issues.apache.org/jira/browse/HBASE-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Khemani updated HBASE-3843: --- Attachment: 0001-HBASE-3843-start-splitLogWorker-later-at-region-serv.patch splitLogWorker starts too early --- Key: HBASE-3843 URL: https://issues.apache.org/jira/browse/HBASE-3843 Project: HBase Issue Type: Bug Reporter: Prakash Khemani Assignee: Prakash Khemani Attachments: 0001-HBASE-3843-start-splitLogWorker-later-at-region-serv.patch splitlogworker should be started in startServiceThreads() instead of in initializeZookeeper(). This will ensure that the region server accepts a split-logging tasks only after it has successfully done reportForDuty() to the master. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-3828) region server stuck in waitOnAllRegionsToClose
region server stuck in waitOnAllRegionsToClose -- Key: HBASE-3828 URL: https://issues.apache.org/jira/browse/HBASE-3828 Project: HBase Issue Type: Bug Reporter: Prakash Khemani -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-3822) region server stuck in waitOnAllRegionsToClose
[ https://issues.apache.org/jira/browse/HBASE-3822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Khemani resolved HBASE-3822. Resolution: Invalid Release Note: The description is invalid. Will open a new one. region server stuck in waitOnAllRegionsToClose -- Key: HBASE-3822 URL: https://issues.apache.org/jira/browse/HBASE-3822 Project: HBase Issue Type: Bug Reporter: Prakash Khemani The regionserver is not able to exit because the rs thread is stuck here regionserver60020 prio=10 tid=0x2ab2b039e000 nid=0x760a waiting on condition [0x4365e000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:126) at org.apache.hadoop.hbase.regionserver.HRegionServer.waitOnAllRegionsToClose(HRegionServer.java:736) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:689) at java.lang.Thread.run(Thread.java:619) === In CloseRegionHandler.process() we do not call removeFromOnlineRegions() if there is an exception. (In this case I suspect there was a log-rolling exception because of another issue) // Close the region try { // TODO: If we need to keep updating CLOSING stamp to prevent against // a timeout if this is long-running, need to spin up a thread? if (region.close(abort) == null) { // This region got closed. Most likely due to a split. So instead // of doing the setClosedState() below, let's just ignore and continue. // The split message will clean up the master state. LOG.warn(Can't close region: was already closed during close(): + regionInfo.getRegionNameAsString()); return; } } catch (IOException e) { LOG.error(Unrecoverable exception while closing region + regionInfo.getRegionNameAsString() + , still finishing close, e); } this.rsServices.removeFromOnlineRegions(regionInfo.getEncodedName()); === I think we set the closing flag on the region, it won't be taking any more requests, it is as good as offline. Either we should refine the check in waitOnAllRegionsToClose() or CloseRegionHandler.process() should remove the region from online-regions set. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-3822) region server stuck in waitOnAllRegionsToClose
region server stuck in waitOnAllRegionsToClose -- Key: HBASE-3822 URL: https://issues.apache.org/jira/browse/HBASE-3822 Project: HBase Issue Type: Bug Reporter: Prakash Khemani The regionserver is not able to exit because the rs thread is stuck here regionserver60020 prio=10 tid=0x2ab2b039e000 nid=0x760a waiting on condition [0x4365e000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:126) at org.apache.hadoop.hbase.regionserver.HRegionServer.waitOnAllRegionsToClose(HRegionServer.java:736) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:689) at java.lang.Thread.run(Thread.java:619) === In CloseRegionHandler.process() we do not call removeFromOnlineRegions() if there is an exception. (In this case I suspect there was a log-rolling exception because of another issue) // Close the region try { // TODO: If we need to keep updating CLOSING stamp to prevent against // a timeout if this is long-running, need to spin up a thread? if (region.close(abort) == null) { // This region got closed. Most likely due to a split. So instead // of doing the setClosedState() below, let's just ignore and continue. // The split message will clean up the master state. LOG.warn(Can't close region: was already closed during close(): + regionInfo.getRegionNameAsString()); return; } } catch (IOException e) { LOG.error(Unrecoverable exception while closing region + regionInfo.getRegionNameAsString() + , still finishing close, e); } this.rsServices.removeFromOnlineRegions(regionInfo.getEncodedName()); === I think we set the closing flag on the region, it won't be taking any more requests, it is as good as offline. Either we should refine the check in waitOnAllRegionsToClose() or CloseRegionHandler.process() should remove the region from online-regions set. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3822) region server stuck in waitOnAllRegionsToClose
[ https://issues.apache.org/jira/browse/HBASE-3822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025443#comment-13025443 ] Prakash Khemani commented on HBASE-3822: The code snippet that I pointed out doesn't have a problem - that piece of code will remove the region from online regions even if there is an exception. Sorry for the confusion. I don't really know why the onlineRegions set was not cleaned up. region server stuck in waitOnAllRegionsToClose -- Key: HBASE-3822 URL: https://issues.apache.org/jira/browse/HBASE-3822 Project: HBase Issue Type: Bug Reporter: Prakash Khemani The regionserver is not able to exit because the rs thread is stuck here regionserver60020 prio=10 tid=0x2ab2b039e000 nid=0x760a waiting on condition [0x4365e000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:126) at org.apache.hadoop.hbase.regionserver.HRegionServer.waitOnAllRegionsToClose(HRegionServer.java:736) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:689) at java.lang.Thread.run(Thread.java:619) === In CloseRegionHandler.process() we do not call removeFromOnlineRegions() if there is an exception. (In this case I suspect there was a log-rolling exception because of another issue) // Close the region try { // TODO: If we need to keep updating CLOSING stamp to prevent against // a timeout if this is long-running, need to spin up a thread? if (region.close(abort) == null) { // This region got closed. Most likely due to a split. So instead // of doing the setClosedState() below, let's just ignore and continue. // The split message will clean up the master state. LOG.warn(Can't close region: was already closed during close(): + regionInfo.getRegionNameAsString()); return; } } catch (IOException e) { LOG.error(Unrecoverable exception while closing region + regionInfo.getRegionNameAsString() + , still finishing close, e); } this.rsServices.removeFromOnlineRegions(regionInfo.getEncodedName()); === I think we set the closing flag on the region, it won't be taking any more requests, it is as good as offline. Either we should refine the check in waitOnAllRegionsToClose() or CloseRegionHandler.process() should remove the region from online-regions set. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-3823) NPE in ZKAssign.transitionNode
[ https://issues.apache.org/jira/browse/HBASE-3823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Khemani resolved HBASE-3823. Resolution: Duplicate Release Note: fixed in HBASE-3627 NPE in ZKAssign.transitionNode -- Key: HBASE-3823 URL: https://issues.apache.org/jira/browse/HBASE-3823 Project: HBase Issue Type: Bug Reporter: Prakash Khemani This issue led to a region being multiply assigned. hbck output ERROR: Region realtime_domain_imps_urls,afbe,1295556905482.e7a478b4bd164525052f1dedb832de0a. is listed in META on region server pumahbase107.snc5.facebook.com:60020 but is multiply assigned to region servers pumahbase150.snc5.facebook.com:60020, pumahbase107.snc5.facebook.com:60020 === 2011-04-25 09:11:36,844 ERROR org.apache.hadoop.hbase.executor.EventHandler: Caught throwable while processing event M_RS_OPEN_REGION java.lang.NullPointerException at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75) at org.apache.hadoop.hbase.executor.RegionTransitionData.fromBytes(RegionTransitionData.java:198) at org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:672) at org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNodeOpened(ZKAssign.java:621) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:168) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) byte [] existingBytes = ZKUtil.getDataNoWatch(zkw, node, stat); RegionTransitionData existingData = RegionTransitionData.fromBytes(existingBytes); existingBytes can be null. have to return -1 if null. === master logs 2011-04-25 05:24:03,250 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Creating writer path=hdfs://pumahbase002-snc5-dfs.data.facebook.com:9000/PUMAHBASE002-SNC5-HBASE/realtime_domain_imps_urls/e7a478b4bd164525052f1dedb832de0a/recovered.edits/57528037047 region=e7a478b4bd164525052f1dedb832de0a 2011-04-25 09:09:19,246 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Closed path hdfs://pumahbase002-snc5-dfs.data.facebook.com:9000/PUMAHBASE002-SNC5-HBASE/realtime_domain_imps_urls/e7a478b4bd164525052f1dedb832de0a/recovered.edits/57528037047 (wrote 4342690 edits in 46904ms) 2011-04-25 09:09:26,134 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x32f7bb74e8a Creating (or updating) unassigned node for e7a478b4bd164525052f1dedb832de0a with OFFLINE state 2011-04-25 09:09:26,136 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for realtime_domain_imps_urls,afbe,1295556905482.e7a478b4bd164525052f1dedb832de0a. so generated a random one; hri=realtime_domain_imps_urls,afbe,1295556905482.e7a478b4bd164525052f1dedb832de0a., src=, dest=pumahbase107.snc5.facebook.com,60020,1303450731227; 70 (online=70, exclude=null) available servers 2011-04-25 09:09:26,136 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region realtime_domain_imps_urls,afbe,1295556905482.e7a478b4bd164525052f1dedb832de0a. to pumahbase107.snc5.facebook.com,60020,1303450731227 2011-04-25 09:09:26,139 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=pumahbase107.snc5.facebook.com,60020,1303450731227, region=e7a478b4bd164525052f1dedb832de0a 2011-04-25 09:09:44,045 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=pumahbase107.snc5.facebook.com,60020,1303450731227, region=e7a478b4bd164525052f1dedb832de0a 2011-04-25 09:09:59,050 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=pumahbase107.snc5.facebook.com,60020,1303450731227, region=e7a478b4bd164525052f1dedb832de0a 2011-04-25 09:10:14,054 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=pumahbase107.snc5.facebook.com,60020,1303450731227, region=e7a478b4bd164525052f1dedb832de0a 2011-04-25 09:10:29,055 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=pumahbase107.snc5.facebook.com,60020,1303450731227, region=e7a478b4bd164525052f1dedb832de0a 2011-04-25 09:10:44,060 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING,
[jira] [Created] (HBASE-3824) region server timed out during open region
region server timed out during open region -- Key: HBASE-3824 URL: https://issues.apache.org/jira/browse/HBASE-3824 Project: HBase Issue Type: Bug Reporter: Prakash Khemani When replaying a large log file, mestore flushes can happen. But there is no Progressible report being sent during memstore flushes. That can lead to master timing out the region server during region open. === Another related issue and Jonathan's response So if a region server that is handed a region for opening and has done part of the work ... it has created some HFiles (because the logs were so huge that the mestore got flushed while the logs were being replayed) ... and then it is asked to give up because the master thought the region server was taking too long to open the region. When the region server gives up on the region then will it make sure that it removes all the HFiles it had created for that region? Will need to check the code, but would it matter? One issue is whether it cleans up after itself (I'm guessing not). Another issue is whether the replay is idempotent (duplicate KVs across files shouldn't matter in most cases). === 2011-04-25 09:11:36,844 ERROR org.apache.hadoop.hbase.executor.EventHandler: Caught throwable while processing event M_RS_OPEN_REGION java.lang.NullPointerException at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75) at org.apache.hadoop.hbase.executor.RegionTransitionData.fromBytes(RegionTransitionData.java:198) at org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:672) at org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNodeOpened(ZKAssign.java:621) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:168) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) byte [] existingBytes = ZKUtil.getDataNoWatch(zkw, node, stat); RegionTransitionData existingData = RegionTransitionData.fromBytes(existingBytes); existingBytes can be null. have to return -1 if null. === master logs 2011-04-25 05:24:03,250 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Creating writer path=hdfs://pumahbase002-snc5-dfs.data.facebook.com:9000/PUMAHBASE002-SNC5-HBASE/realtime_domain_imps_urls/e7a478b4bd164525052f1dedb832de0a/recovered.edits/57528037047 region=e7a478b4bd164525052f1dedb832de0a 2011-04-25 09:09:19,246 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Closed path hdfs://pumahbase002-snc5-dfs.data.facebook.com:9000/PUMAHBASE002-SNC5-HBASE/realtime_domain_imps_urls/e7a478b4bd164525052f1dedb832de0a/recovered.edits/57528037047 (wrote 4342690 edits in 46904ms) 2011-04-25 09:09:26,134 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x32f7bb74e8a Creating (or updating) unassigned node for e7a478b4bd164525052f1dedb832de0a with OFFLINE state 2011-04-25 09:09:26,136 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for realtime_domain_imps_urls,afbe,1295556905482.e7a478b4bd164525052f1dedb832de0a. so generated a random one; hri=realtime_domain_imps_urls,afbe,1295556905482.e7a478b4bd164525052f1dedb832de0a., src=, dest=pumahbase107.snc5.facebook.com,60020,1303450731227; 70 (online=70, exclude=null) available servers 2011-04-25 09:09:26,136 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region realtime_domain_imps_urls,afbe,1295556905482.e7a478b4bd164525052f1dedb832de0a. to pumahbase107.snc5.facebook.com,60020,1303450731227 2011-04-25 09:09:26,139 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=pumahbase107.snc5.facebook.com,60020,1303450731227, region=e7a478b4bd164525052f1dedb832de0a 2011-04-25 09:09:44,045 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=pumahbase107.snc5.facebook.com,60020,1303450731227, region=e7a478b4bd164525052f1dedb832de0a 2011-04-25 09:09:59,050 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=pumahbase107.snc5.facebook.com,60020,1303450731227, region=e7a478b4bd164525052f1dedb832de0a 2011-04-25 09:10:14,054 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=pumahbase107.snc5.facebook.com,60020,1303450731227, region=e7a478b4bd164525052f1dedb832de0a 2011-04-25 09:10:29,055 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=pumahbase107.snc5.facebook.com,60020,1303450731227,
[jira] [Resolved] (HBASE-3824) region server timed out during open region
[ https://issues.apache.org/jira/browse/HBASE-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Khemani resolved HBASE-3824. Resolution: Not A Problem region server timed out during open region -- Key: HBASE-3824 URL: https://issues.apache.org/jira/browse/HBASE-3824 Project: HBase Issue Type: Bug Reporter: Prakash Khemani When replaying a large log file, mestore flushes can happen. But there is no Progressible report being sent during memstore flushes. That can lead to master timing out the region server during region open. === Another related issue and Jonathan's response So if a region server that is handed a region for opening and has done part of the work ... it has created some HFiles (because the logs were so huge that the mestore got flushed while the logs were being replayed) ... and then it is asked to give up because the master thought the region server was taking too long to open the region. When the region server gives up on the region then will it make sure that it removes all the HFiles it had created for that region? Will need to check the code, but would it matter? One issue is whether it cleans up after itself (I'm guessing not). Another issue is whether the replay is idempotent (duplicate KVs across files shouldn't matter in most cases). === 2011-04-25 09:11:36,844 ERROR org.apache.hadoop.hbase.executor.EventHandler: Caught throwable while processing event M_RS_OPEN_REGION java.lang.NullPointerException at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75) at org.apache.hadoop.hbase.executor.RegionTransitionData.fromBytes(RegionTransitionData.java:198) at org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:672) at org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNodeOpened(ZKAssign.java:621) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:168) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) byte [] existingBytes = ZKUtil.getDataNoWatch(zkw, node, stat); RegionTransitionData existingData = RegionTransitionData.fromBytes(existingBytes); existingBytes can be null. have to return -1 if null. === master logs 2011-04-25 05:24:03,250 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Creating writer path=hdfs://pumahbase002-snc5-dfs.data.facebook.com:9000/PUMAHBASE002-SNC5-HBASE/realtime_domain_imps_urls/e7a478b4bd164525052f1dedb832de0a/recovered.edits/57528037047 region=e7a478b4bd164525052f1dedb832de0a 2011-04-25 09:09:19,246 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Closed path hdfs://pumahbase002-snc5-dfs.data.facebook.com:9000/PUMAHBASE002-SNC5-HBASE/realtime_domain_imps_urls/e7a478b4bd164525052f1dedb832de0a/recovered.edits/57528037047 (wrote 4342690 edits in 46904ms) 2011-04-25 09:09:26,134 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x32f7bb74e8a Creating (or updating) unassigned node for e7a478b4bd164525052f1dedb832de0a with OFFLINE state 2011-04-25 09:09:26,136 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for realtime_domain_imps_urls,afbe,1295556905482.e7a478b4bd164525052f1dedb832de0a. so generated a random one; hri=realtime_domain_imps_urls,afbe,1295556905482.e7a478b4bd164525052f1dedb832de0a., src=, dest=pumahbase107.snc5.facebook.com,60020,1303450731227; 70 (online=70, exclude=null) available servers 2011-04-25 09:09:26,136 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region realtime_domain_imps_urls,afbe,1295556905482.e7a478b4bd164525052f1dedb832de0a. to pumahbase107.snc5.facebook.com,60020,1303450731227 2011-04-25 09:09:26,139 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=pumahbase107.snc5.facebook.com,60020,1303450731227, region=e7a478b4bd164525052f1dedb832de0a 2011-04-25 09:09:44,045 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=pumahbase107.snc5.facebook.com,60020,1303450731227, region=e7a478b4bd164525052f1dedb832de0a 2011-04-25 09:09:59,050 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=pumahbase107.snc5.facebook.com,60020,1303450731227, region=e7a478b4bd164525052f1dedb832de0a 2011-04-25 09:10:14,054 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling
[jira] [Commented] (HBASE-3824) region server timed out during open region
[ https://issues.apache.org/jira/browse/HBASE-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025491#comment-13025491 ] Prakash Khemani commented on HBASE-3824: Probably not an issue. The memstore flush happens in the background and cannot cause the log-replay thread to block. My mistake. I will close this. region server timed out during open region -- Key: HBASE-3824 URL: https://issues.apache.org/jira/browse/HBASE-3824 Project: HBase Issue Type: Bug Reporter: Prakash Khemani When replaying a large log file, mestore flushes can happen. But there is no Progressible report being sent during memstore flushes. That can lead to master timing out the region server during region open. === Another related issue and Jonathan's response So if a region server that is handed a region for opening and has done part of the work ... it has created some HFiles (because the logs were so huge that the mestore got flushed while the logs were being replayed) ... and then it is asked to give up because the master thought the region server was taking too long to open the region. When the region server gives up on the region then will it make sure that it removes all the HFiles it had created for that region? Will need to check the code, but would it matter? One issue is whether it cleans up after itself (I'm guessing not). Another issue is whether the replay is idempotent (duplicate KVs across files shouldn't matter in most cases). === 2011-04-25 09:11:36,844 ERROR org.apache.hadoop.hbase.executor.EventHandler: Caught throwable while processing event M_RS_OPEN_REGION java.lang.NullPointerException at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75) at org.apache.hadoop.hbase.executor.RegionTransitionData.fromBytes(RegionTransitionData.java:198) at org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:672) at org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNodeOpened(ZKAssign.java:621) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:168) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) byte [] existingBytes = ZKUtil.getDataNoWatch(zkw, node, stat); RegionTransitionData existingData = RegionTransitionData.fromBytes(existingBytes); existingBytes can be null. have to return -1 if null. === master logs 2011-04-25 05:24:03,250 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Creating writer path=hdfs://pumahbase002-snc5-dfs.data.facebook.com:9000/PUMAHBASE002-SNC5-HBASE/realtime_domain_imps_urls/e7a478b4bd164525052f1dedb832de0a/recovered.edits/57528037047 region=e7a478b4bd164525052f1dedb832de0a 2011-04-25 09:09:19,246 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Closed path hdfs://pumahbase002-snc5-dfs.data.facebook.com:9000/PUMAHBASE002-SNC5-HBASE/realtime_domain_imps_urls/e7a478b4bd164525052f1dedb832de0a/recovered.edits/57528037047 (wrote 4342690 edits in 46904ms) 2011-04-25 09:09:26,134 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x32f7bb74e8a Creating (or updating) unassigned node for e7a478b4bd164525052f1dedb832de0a with OFFLINE state 2011-04-25 09:09:26,136 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for realtime_domain_imps_urls,afbe,1295556905482.e7a478b4bd164525052f1dedb832de0a. so generated a random one; hri=realtime_domain_imps_urls,afbe,1295556905482.e7a478b4bd164525052f1dedb832de0a., src=, dest=pumahbase107.snc5.facebook.com,60020,1303450731227; 70 (online=70, exclude=null) available servers 2011-04-25 09:09:26,136 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region realtime_domain_imps_urls,afbe,1295556905482.e7a478b4bd164525052f1dedb832de0a. to pumahbase107.snc5.facebook.com,60020,1303450731227 2011-04-25 09:09:26,139 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=pumahbase107.snc5.facebook.com,60020,1303450731227, region=e7a478b4bd164525052f1dedb832de0a 2011-04-25 09:09:44,045 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=pumahbase107.snc5.facebook.com,60020,1303450731227, region=e7a478b4bd164525052f1dedb832de0a 2011-04-25 09:09:59,050 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING,
[jira] [Commented] (HBASE-3674) Treat ChecksumException as we would a ParseException splitting logs; else we replay split on every restart
[ https://issues.apache.org/jira/browse/HBASE-3674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025071#comment-13025071 ] Prakash Khemani commented on HBASE-3674: +1 Treat ChecksumException as we would a ParseException splitting logs; else we replay split on every restart -- Key: HBASE-3674 URL: https://issues.apache.org/jira/browse/HBASE-3674 Project: HBase Issue Type: Bug Components: wal Reporter: stack Assignee: stack Priority: Critical Fix For: 0.90.2 Attachments: 3674-distributed.txt, 3674-v2.txt, 3674.txt In short, a ChecksumException will fail log processing for a server so we skip out w/o archiving logs. On restart, we'll then reprocess the logs -- hit the checksumexception anew, usually -- and so on. Here is the splitLog method (edited): {code} private ListPath splitLog(final FileStatus[] logfiles) throws IOException { outputSink.startWriterThreads(entryBuffers); try { int i = 0; for (FileStatus log : logfiles) { Path logPath = log.getPath(); long logLength = log.getLen(); splitSize += logLength; LOG.debug(Splitting hlog + (i++ + 1) + of + logfiles.length + : + logPath + , length= + logLength); try { recoverFileLease(fs, logPath, conf); parseHLog(log, entryBuffers, fs, conf); processedLogs.add(logPath); } catch (EOFException eof) { // truncated files are expected if a RS crashes (see HBASE-2643) LOG.info(EOF from hlog + logPath + . Continuing); processedLogs.add(logPath); } catch (FileNotFoundException fnfe) { // A file may be missing if the region server was able to archive it // before shutting down. This means the edits were persisted already LOG.info(A log was missing + logPath + , probably because it was moved by the + now dead region server. Continuing); processedLogs.add(logPath); } catch (IOException e) { // If the IOE resulted from bad file format, // then this problem is idempotent and retrying won't help if (e.getCause() instanceof ParseException || e.getCause() instanceof ChecksumException) { LOG.warn(ParseException from hlog + logPath + . continuing); processedLogs.add(logPath); } else { if (skipErrors) { LOG.info(Got while parsing hlog + logPath + . Marking as corrupted, e); corruptedLogs.add(logPath); } else { throw e; } } } } if (fs.listStatus(srcDir).length processedLogs.size() + corruptedLogs.size()) { throw new OrphanHLogAfterSplitException( Discovered orphan hlog after split. Maybe the + HRegionServer was not dead when we started); } archiveLogs(srcDir, corruptedLogs, processedLogs, oldLogDir, fs, conf); } finally { splits = outputSink.finishWritingAndClose(); } return splits; } {code} Notice how we'll only archive logs only if we successfully split all logs. We won't archive 31 of 35 files if we happen to get a checksum exception on file 32. I think we should treat a ChecksumException the same as a ParseException; a retry will not fix it if HDFS could not get around the ChecksumException (seems like in our case all replicas were corrupt). Here is a play-by-play from the logs: {code} 813572 2011-03-18 20:31:44,687 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog 34 of 35: hdfs://sv2borg170:9000/hbase/.logs/sv2borg182,60020,1300384550664/sv2borg182%3A60020.1300461329481, length=150 65662813573 2011-03-18 20:31:44,687 INFO org.apache.hadoop.hbase.util.FSUtils: Recovering file hdfs://sv2borg170:9000/hbase/.logs/sv2borg182,60020,1300384550664/sv2borg182%3A60020.1300461329481 813617 2011-03-18 20:31:46,238 INFO org.apache.hadoop.fs.FSInputChecker: Found checksum error: b[0, 512]=00cd00502037383661376439656265643938636463343433386132343631323633303239371d6170695f6163636573735f746f6b656e5f7374 6174735f6275636b6574000d9fa4d5dc012ec9c7cbaf000001006d005d0008002337626262663764626431616561366234616130656334383436653732333132643a32390764656661756c746170695f616e64726f69645f6c6f67676564
[jira] [Created] (HBASE-3814) force regionserver to halt
force regionserver to halt -- Key: HBASE-3814 URL: https://issues.apache.org/jira/browse/HBASE-3814 Project: HBase Issue Type: Bug Reporter: Prakash Khemani Once abort() on a regionserver is called we should have a timeout thread that does Runtime.halt() if the rs gets stuck somewhere during abort processing. === Pumahbase132 has following the logs .. the dfsclient is not able to set up a write pipeline successfully ... it tries to abort ... but while aborting it gets stuck. I know there is a check that if we are aborting because filesystem is closed then we should not try to flush the logs while aborting. But in this case the fs is up and running, just that it is not functioning. 2011-04-21 23:48:07,082 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream 10.38.131.53:50010 for file /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280java.io.IOException: Bad connect ack with firstBadLink 10.38.133.33:50010 2011-04-21 23:48:07,082 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-8967376451767492285_6537229 for file /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280 2011-04-21 23:48:07,125 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream 10.38.131.53:50010 for file /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280java.io.IOException: Bad connect ack with firstBadLink 10.38.134.59:50010 2011-04-21 23:48:07,125 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_7172251852699100447_6537229 for file /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280 2011-04-21 23:48:07,169 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream 10.38.131.53:50010 for file /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280java.io.IOException: Bad connect ack with firstBadLink 10.38.134.53:50010 2011-04-21 23:48:07,169 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-9153204772467623625_6537229 for file /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280 2011-04-21 23:48:07,213 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream 10.38.131.53:50010 for file /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280java.io.IOException: Bad connect ack with firstBadLink 10.38.134.49:50010 2011-04-21 23:48:07,213 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-2513098940934276625_6537229 for file /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280 2011-04-21 23:48:07,214 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable to create new block. at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3560) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2700(DFSClient.java:2720) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2977) 2011-04-21 23:48:07,214 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_-2513098940934276625_6537229 bad datanode[1] nodes == null 2011-04-21 23:48:07,214 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block locations. Source file /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280 - Aborting... 2011-04-21 23:48:07,216 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog And then the RS gets stuck trying to roll the logs ... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-3815) lb should ignore bad region servers
lb should ignore bad region servers --- Key: HBASE-3815 URL: https://issues.apache.org/jira/browse/HBASE-3815 Project: HBase Issue Type: Bug Reporter: Prakash Khemani the loadbalancer should remember which region server is constantly having trouble opening regions and it should take that rs out of the equation ... otherwise the lb goes into an unproductive loop ... I don't have logs handy for this one. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3815) lb should ignore bad region servers
[ https://issues.apache.org/jira/browse/HBASE-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023352#comment-13023352 ] Prakash Khemani commented on HBASE-3815: Log snippets showing assignment-manager continuously choosing server-132 for region assignment even though it constantly fails. There ought to be a global exclude list in addition to a per region exclude list? 2011-04-17 07:14:06,312 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region realtime_domain_feed_imps_domains,e0a3d6b6,1289467228948.64f5ad9ca3f4d6a235365f10ccc4ae87. to pumahbase132.snc5.facebook.com,60020,1303046136711 2011-04-17 07:14:06,314 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of realtime_domain_feed_imps_domains,e0a3d6b6,1289467228948.64f5ad9ca3f4d6a235365f10ccc4ae87. to serverName=pumahbase132.snc5.facebook.com,60020,1303046136711, load=(requests=0, regions=81, usedHeap=155, maxHeap=31987), trying to assign elsewhere instead; retry=0 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:406) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:884) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:751) at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257) at $Proxy6.openRegion(Unknown Source) at org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:547) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:901) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:730) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:710) at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:92) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) 2011-04-17 07:14:06,314 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for realtime_domain_feed_imps_domains,e0a3d6b6,1289467228948.64f5ad9ca3f4d6a235365f10ccc4ae87. so generated a random one; hri=realtime_domain_feed_imps_domains,e0a3d6b6,1289467228948.64f5ad9ca3f4d6a235365f10ccc4ae87., src=, dest=pumahbase156.snc5.facebook.com,60020,1302847439345; 72 (online=72, exclude=serverName=pumahbase132.snc5.facebook.com,60020,1303046136711, load=(requests=0, regions=81, usedHeap=155, maxHeap=31987)) available servers 2011-04-17 07:19:06,097 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region realtime_domain_acts_urls_hot,9b851e7e,1290555837119.c41a23fd0bd57d1eb4c3a5ef1ed6ccac. to pumahbase132.snc5.facebook.com,60020,1303046136711 2011-04-17 07:19:06,098 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of realtime_domain_acts_urls_hot,9b851e7e,1290555837119.c41a23fd0bd57d1eb4c3a5ef1ed6ccac. to serverName=pumahbase132.snc5.facebook.com,60020,1303046136711, load=(requests=0, regions=81, usedHeap=155, maxHeap=31987), trying to assign elsewhere instead; retry=0 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:406) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:884) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:751) at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257) at $Proxy6.openRegion(Unknown Source) at org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:547) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:901) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:730) at
[jira] [Commented] (HBASE-3814) force regionserver to halt
[ https://issues.apache.org/jira/browse/HBASE-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023408#comment-13023408 ] Prakash Khemani commented on HBASE-3814: I don't have access to the logs right now. The server is powered down and I don't want to bring it up. In all likelihood the server that got stuck had a dfs version mismatch problem. It got stuck in a portion of the code that Dhruba has recently introduced and only present in the internal branch. force regionserver to halt -- Key: HBASE-3814 URL: https://issues.apache.org/jira/browse/HBASE-3814 Project: HBase Issue Type: Bug Reporter: Prakash Khemani Once abort() on a regionserver is called we should have a timeout thread that does Runtime.halt() if the rs gets stuck somewhere during abort processing. === Pumahbase132 has following the logs .. the dfsclient is not able to set up a write pipeline successfully ... it tries to abort ... but while aborting it gets stuck. I know there is a check that if we are aborting because filesystem is closed then we should not try to flush the logs while aborting. But in this case the fs is up and running, just that it is not functioning. 2011-04-21 23:48:07,082 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream 10.38.131.53:50010 for file /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280java.io.IOException: Bad connect ack with firstBadLink 10.38.133.33:50010 2011-04-21 23:48:07,082 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-8967376451767492285_6537229 for file /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280 2011-04-21 23:48:07,125 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream 10.38.131.53:50010 for file /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280java.io.IOException: Bad connect ack with firstBadLink 10.38.134.59:50010 2011-04-21 23:48:07,125 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_7172251852699100447_6537229 for file /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280 2011-04-21 23:48:07,169 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream 10.38.131.53:50010 for file /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280java.io.IOException: Bad connect ack with firstBadLink 10.38.134.53:50010 2011-04-21 23:48:07,169 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-9153204772467623625_6537229 for file /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280 2011-04-21 23:48:07,213 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream 10.38.131.53:50010 for file /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280java.io.IOException: Bad connect ack with firstBadLink 10.38.134.49:50010 2011-04-21 23:48:07,213 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-2513098940934276625_6537229 for file /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280 2011-04-21 23:48:07,214 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable to create new block. at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3560) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2700(DFSClient.java:2720) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2977) 2011-04-21 23:48:07,214 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_-2513098940934276625_6537229 bad datanode[1] nodes == null 2011-04-21 23:48:07,214 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block locations. Source file /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280 - Aborting... 2011-04-21 23:48:07,216 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting close of hlog And then the RS gets stuck trying to roll the logs ... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3806) distributed log splitting double escapes task names
[ https://issues.apache.org/jira/browse/HBASE-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13022646#comment-13022646 ] Prakash Khemani commented on HBASE-3806: uploaded a patch at https://review.cloudera.org/r/1715/ distributed log splitting double escapes task names --- Key: HBASE-3806 URL: https://issues.apache.org/jira/browse/HBASE-3806 Project: HBase Issue Type: Bug Reporter: Prakash Khemani Assignee: Prakash Khemani During startup master double-escapes the (log split) task names when submitting them ... I had missed this in my testing because I was using task names like foo and bar instead of those that need escaping - like hdfs://... Also at startup even though the master fails to acquire the orphan tasks ... the tasks are acquired anyways when master sees the logs that need splitting. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3674) Treat ChecksumException as we would a ParseException splitting logs; else we replay split on every restart
[ https://issues.apache.org/jira/browse/HBASE-3674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13022873#comment-13022873 ] Prakash Khemani commented on HBASE-3674: This change got overwritten when HBASE-1364 was integrated. The change has to be added in HLogSplitter in the method getNextLogLine static private Entry getNextLogLine(Reader in, Path path, boolean skipErrors) throws CorruptedLogFileException, IOException { try { return in.next(); } catch (EOFException eof) { // truncated files are expected if a RS crashes (see HBASE-2643) LOG.info(EOF from hlog + path + . continuing); return null; } catch (IOException e) { // If the IOE resulted from bad file format, // then this problem is idempotent and retrying won't help if (e.getCause() instanceof ParseException) { LOG.warn(ParseException from hlog + path + . continuing); return null; } It might also be necessary to add this change to getReader(...) method protected Reader getReader(FileSystem fs, FileStatus file, Configuration conf, boolean skipErrors) throws IOException, CorruptedLogFileException { Path path = file.getPath(); long length = file.getLen(); Reader in; // Check for possibly empty file. With appends, currently Hadoop reports a // zero length even if the file has been sync'd. Revisit if HDFS-376 or // HDFS-878 is committed. if (length = 0) { LOG.warn(File + path + might be still open, length is 0); } try { recoverFileLease(fs, path, conf); try { in = getReader(fs, path, conf); } catch (EOFException e) { if (length = 0) { // TODO should we ignore an empty, not-last log file if skip.errors // is false? Either way, the caller should decide what to do. E.g. // ignore if this is the last log in sequence. // TODO is this scenario still possible if the log has been // recovered (i.e. closed) LOG.warn(Could not open + path + for reading. File is empty, e); return null; Treat ChecksumException as we would a ParseException splitting logs; else we replay split on every restart -- Key: HBASE-3674 URL: https://issues.apache.org/jira/browse/HBASE-3674 Project: HBase Issue Type: Bug Components: wal Reporter: stack Assignee: stack Priority: Critical Fix For: 0.90.2 Attachments: 3674-v2.txt, 3674.txt In short, a ChecksumException will fail log processing for a server so we skip out w/o archiving logs. On restart, we'll then reprocess the logs -- hit the checksumexception anew, usually -- and so on. Here is the splitLog method (edited): {code} private ListPath splitLog(final FileStatus[] logfiles) throws IOException { outputSink.startWriterThreads(entryBuffers); try { int i = 0; for (FileStatus log : logfiles) { Path logPath = log.getPath(); long logLength = log.getLen(); splitSize += logLength; LOG.debug(Splitting hlog + (i++ + 1) + of + logfiles.length + : + logPath + , length= + logLength); try { recoverFileLease(fs, logPath, conf); parseHLog(log, entryBuffers, fs, conf); processedLogs.add(logPath); } catch (EOFException eof) { // truncated files are expected if a RS crashes (see HBASE-2643) LOG.info(EOF from hlog + logPath + . Continuing); processedLogs.add(logPath); } catch (FileNotFoundException fnfe) { // A file may be missing if the region server was able to archive it // before shutting down. This means the edits were persisted already LOG.info(A log was missing + logPath + , probably because it was moved by the + now dead region server. Continuing); processedLogs.add(logPath); } catch (IOException e) { // If the IOE resulted from bad file format, // then this problem is idempotent and retrying won't help if (e.getCause() instanceof ParseException || e.getCause() instanceof ChecksumException) { LOG.warn(ParseException from hlog + logPath + . continuing); processedLogs.add(logPath); } else { if (skipErrors) { LOG.info(Got while parsing hlog + logPath + . Marking as corrupted, e); corruptedLogs.add(logPath); } else { throw e; } } } } if (fs.listStatus(srcDir).length
[jira] [Commented] (HBASE-3674) Treat ChecksumException as we would a ParseException splitting logs; else we replay split on every restart
[ https://issues.apache.org/jira/browse/HBASE-3674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13022964#comment-13022964 ] Prakash Khemani commented on HBASE-3674: The patch sets the hbase.hlog.split.skip.errors to true by default. I am wondering why the CheckSumException was not ignored as originally proposed? This patch is there in the trunk. In the serialized log splitting case hbase.hlog.split.skip.errors is set to true. But in the distributed log splitting case hbase.hlog.split.skip.errors is set to false by default. Treat ChecksumException as we would a ParseException splitting logs; else we replay split on every restart -- Key: HBASE-3674 URL: https://issues.apache.org/jira/browse/HBASE-3674 Project: HBase Issue Type: Bug Components: wal Reporter: stack Assignee: stack Priority: Critical Fix For: 0.90.2 Attachments: 3674-v2.txt, 3674.txt In short, a ChecksumException will fail log processing for a server so we skip out w/o archiving logs. On restart, we'll then reprocess the logs -- hit the checksumexception anew, usually -- and so on. Here is the splitLog method (edited): {code} private ListPath splitLog(final FileStatus[] logfiles) throws IOException { outputSink.startWriterThreads(entryBuffers); try { int i = 0; for (FileStatus log : logfiles) { Path logPath = log.getPath(); long logLength = log.getLen(); splitSize += logLength; LOG.debug(Splitting hlog + (i++ + 1) + of + logfiles.length + : + logPath + , length= + logLength); try { recoverFileLease(fs, logPath, conf); parseHLog(log, entryBuffers, fs, conf); processedLogs.add(logPath); } catch (EOFException eof) { // truncated files are expected if a RS crashes (see HBASE-2643) LOG.info(EOF from hlog + logPath + . Continuing); processedLogs.add(logPath); } catch (FileNotFoundException fnfe) { // A file may be missing if the region server was able to archive it // before shutting down. This means the edits were persisted already LOG.info(A log was missing + logPath + , probably because it was moved by the + now dead region server. Continuing); processedLogs.add(logPath); } catch (IOException e) { // If the IOE resulted from bad file format, // then this problem is idempotent and retrying won't help if (e.getCause() instanceof ParseException || e.getCause() instanceof ChecksumException) { LOG.warn(ParseException from hlog + logPath + . continuing); processedLogs.add(logPath); } else { if (skipErrors) { LOG.info(Got while parsing hlog + logPath + . Marking as corrupted, e); corruptedLogs.add(logPath); } else { throw e; } } } } if (fs.listStatus(srcDir).length processedLogs.size() + corruptedLogs.size()) { throw new OrphanHLogAfterSplitException( Discovered orphan hlog after split. Maybe the + HRegionServer was not dead when we started); } archiveLogs(srcDir, corruptedLogs, processedLogs, oldLogDir, fs, conf); } finally { splits = outputSink.finishWritingAndClose(); } return splits; } {code} Notice how we'll only archive logs only if we successfully split all logs. We won't archive 31 of 35 files if we happen to get a checksum exception on file 32. I think we should treat a ChecksumException the same as a ParseException; a retry will not fix it if HDFS could not get around the ChecksumException (seems like in our case all replicas were corrupt). Here is a play-by-play from the logs: {code} 813572 2011-03-18 20:31:44,687 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog 34 of 35: hdfs://sv2borg170:9000/hbase/.logs/sv2borg182,60020,1300384550664/sv2borg182%3A60020.1300461329481, length=150 65662813573 2011-03-18 20:31:44,687 INFO org.apache.hadoop.hbase.util.FSUtils: Recovering file hdfs://sv2borg170:9000/hbase/.logs/sv2borg182,60020,1300384550664/sv2borg182%3A60020.1300461329481 813617 2011-03-18 20:31:46,238 INFO org.apache.hadoop.fs.FSInputChecker: Found checksum error: b[0, 512]=00cd00502037383661376439656265643938636463343433386132343631323633303239371d6170695f6163636573735f746f6b656e5f7374
[jira] [Created] (HBASE-3806) distributed log splitting double escapes task names
distributed log splitting double escapes task names --- Key: HBASE-3806 URL: https://issues.apache.org/jira/browse/HBASE-3806 Project: HBase Issue Type: Bug Reporter: Prakash Khemani Assignee: Prakash Khemani During startup master double-escapes the (log split) task names when submitting them ... I had missed this in my testing because I was using task names like foo and bar instead of those that need escaping - like hdfs://... Also at startup even though the master fails to acquire the orphan tasks ... the tasks are acquired anyways when master sees the logs that need splitting. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1364) [performance] Distributed splitting of regionserver commit logs
[ https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020730#comment-13020730 ] Prakash Khemani commented on HBASE-1364: 'fixed' TestDistriButedLogSplitting.testWorkerAbort() by not letting the test fail if the aborting region server completes the split before it closes dfs or zk session. uploaded a new patch in rb [performance] Distributed splitting of regionserver commit logs --- Key: HBASE-1364 URL: https://issues.apache.org/jira/browse/HBASE-1364 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: stack Assignee: Prakash Khemani Priority: Critical Fix For: 0.92.0 Attachments: 1364-v5.txt, HBASE-1364.patch, org.apache.hadoop.hbase.master.TestDistributedLogSplitting-output.txt Time Spent: 8h Remaining Estimate: 0h HBASE-1008 has some improvements to our log splitting on regionserver crash; but it needs to run even faster. (Below is from HBASE-1008) In bigtable paper, the split is distributed. If we're going to have 1000 logs, we need to distribute or at least multithread the splitting. 1. As is, regions starting up expect to find one reconstruction log only. Need to make it so pick up a bunch of edit logs and it should be fine that logs are elsewhere in hdfs in an output directory written by all split participants whether multithreaded or a mapreduce-like distributed process (Lets write our distributed sort first as a MR so we learn whats involved; distributed sort, as much as possible should use MR framework pieces). On startup, regions go to this directory and pick up the files written by split participants deleting and clearing the dir when all have been read in. Making it so can take multiple logs for input, can also make the split process more robust rather than current tenuous process which loses all edits if it doesn't make it to the end without error. 2. Each column family rereads the reconstruction log to find its edits. Need to fix that. Split can sort the edits by column family so store only reads its edits. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1364) [performance] Distributed splitting of regionserver commit logs
[ https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020874#comment-13020874 ] Prakash Khemani commented on HBASE-1364: Yes, it passes for me - just ran it again. This is another of timing related errors. 164 waitForCounter(tot_wkr_task_acquired, 0, 1, 100); 165 waitForCounter(tot_wkr_failed_to_grab_task_lost_race, 0, 1, 100); In your case the failure occurred when in line 165 the counter tot_wkr_failed_to_grab_task_lost_race did not change value from 0 to 1 in 100ms. Can you please increase the timeout in both these lines from 100ms to 1000ms and retry ... I will go over all my tests and try to improve them but I won't be able to get to that before the end of this week. [performance] Distributed splitting of regionserver commit logs --- Key: HBASE-1364 URL: https://issues.apache.org/jira/browse/HBASE-1364 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: stack Assignee: Prakash Khemani Priority: Critical Fix For: 0.92.0 Attachments: 1364-v5.txt, HBASE-1364.patch, org.apache.hadoop.hbase.master.TestDistributedLogSplitting-output.txt Time Spent: 8h Remaining Estimate: 0h HBASE-1008 has some improvements to our log splitting on regionserver crash; but it needs to run even faster. (Below is from HBASE-1008) In bigtable paper, the split is distributed. If we're going to have 1000 logs, we need to distribute or at least multithread the splitting. 1. As is, regions starting up expect to find one reconstruction log only. Need to make it so pick up a bunch of edit logs and it should be fine that logs are elsewhere in hdfs in an output directory written by all split participants whether multithreaded or a mapreduce-like distributed process (Lets write our distributed sort first as a MR so we learn whats involved; distributed sort, as much as possible should use MR framework pieces). On startup, regions go to this directory and pick up the files written by split participants deleting and clearing the dir when all have been read in. Making it so can take multiple logs for input, can also make the split process more robust rather than current tenuous process which loses all edits if it doesn't make it to the end without error. 2. Each column family rereads the reconstruction log to find its edits. Need to fix that. Split can sort the edits by column family so store only reads its edits. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1364) [performance] Distributed splitting of regionserver commit logs
[ https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020721#comment-13020721 ] Prakash Khemani commented on HBASE-1364: This is a problem with the test-case which I will fix. You got this error because of (1) not being able to interrupt the split-log-worker thread when it is doing dfs operations (I think the interrupt is swallowed somewhere) (2) timing issues where in the aborting region server the filesystem and the zk session don't close before the split-log-worker thread completes its splitting task ... I will fix this by removing fail(region server completed the split before aborting) from the test case. [performance] Distributed splitting of regionserver commit logs --- Key: HBASE-1364 URL: https://issues.apache.org/jira/browse/HBASE-1364 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: stack Assignee: Prakash Khemani Priority: Critical Fix For: 0.92.0 Attachments: 1364-v5.txt, HBASE-1364.patch, org.apache.hadoop.hbase.master.TestDistributedLogSplitting-output.txt Time Spent: 8h Remaining Estimate: 0h HBASE-1008 has some improvements to our log splitting on regionserver crash; but it needs to run even faster. (Below is from HBASE-1008) In bigtable paper, the split is distributed. If we're going to have 1000 logs, we need to distribute or at least multithread the splitting. 1. As is, regions starting up expect to find one reconstruction log only. Need to make it so pick up a bunch of edit logs and it should be fine that logs are elsewhere in hdfs in an output directory written by all split participants whether multithreaded or a mapreduce-like distributed process (Lets write our distributed sort first as a MR so we learn whats involved; distributed sort, as much as possible should use MR framework pieces). On startup, regions go to this directory and pick up the files written by split participants deleting and clearing the dir when all have been read in. Making it so can take multiple logs for input, can also make the split process more robust rather than current tenuous process which loses all edits if it doesn't make it to the end without error. 2. Each column family rereads the reconstruction log to find its edits. Need to fix that. Split can sort the edits by column family so store only reads its edits. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1364) [performance] Distributed splitting of regionserver commit logs
[ https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020216#comment-13020216 ] Prakash Khemani commented on HBASE-1364: TestDistributedLogSplitting.testWorkerAbort test failed because the SplitLogWorker was incrementing the tot_wkr_task_resigned twice. My test runs were passing because of a race - if the test happens to look at the counter between the 2 increments then the test will pass. Fixing this in the latest patch. [performance] Distributed splitting of regionserver commit logs --- Key: HBASE-1364 URL: https://issues.apache.org/jira/browse/HBASE-1364 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: stack Assignee: Prakash Khemani Priority: Critical Fix For: 0.92.0 Attachments: 1364-v5.txt, HBASE-1364.patch Time Spent: 8h Remaining Estimate: 0h HBASE-1008 has some improvements to our log splitting on regionserver crash; but it needs to run even faster. (Below is from HBASE-1008) In bigtable paper, the split is distributed. If we're going to have 1000 logs, we need to distribute or at least multithread the splitting. 1. As is, regions starting up expect to find one reconstruction log only. Need to make it so pick up a bunch of edit logs and it should be fine that logs are elsewhere in hdfs in an output directory written by all split participants whether multithreaded or a mapreduce-like distributed process (Lets write our distributed sort first as a MR so we learn whats involved; distributed sort, as much as possible should use MR framework pieces). On startup, regions go to this directory and pick up the files written by split participants deleting and clearing the dir when all have been read in. Making it so can take multiple logs for input, can also make the split process more robust rather than current tenuous process which loses all edits if it doesn't make it to the end without error. 2. Each column family rereads the reconstruction log to find its edits. Need to fix that. Split can sort the edits by column family so store only reads its edits. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1364) [performance] Distributed splitting of regionserver commit logs
[ https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020488#comment-13020488 ] Prakash Khemani commented on HBASE-1364: I uploaded a new diff at the review board https://review.cloudera.org/r/1655/ I think it takes care of all of Stack's comments. added a new test in TestHLogSplit to test that when skip-errors is set to true then corrupted log files are ignored and correctly moved to the .corrupted directory. Some of the tests - especially in TestDistributedLogSplitting - are somewhat timing dependent. For example I will abort a few region servers and wait at most few seconds for all those servers to go down. Sometimes it takes longer and the test fails. Last night I had to bump up the time-limit in one such test (testThreeRSAbort()). I am sure these tests can be made more robust [performance] Distributed splitting of regionserver commit logs --- Key: HBASE-1364 URL: https://issues.apache.org/jira/browse/HBASE-1364 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: stack Assignee: Prakash Khemani Priority: Critical Fix For: 0.92.0 Attachments: 1364-v5.txt, HBASE-1364.patch Time Spent: 8h Remaining Estimate: 0h HBASE-1008 has some improvements to our log splitting on regionserver crash; but it needs to run even faster. (Below is from HBASE-1008) In bigtable paper, the split is distributed. If we're going to have 1000 logs, we need to distribute or at least multithread the splitting. 1. As is, regions starting up expect to find one reconstruction log only. Need to make it so pick up a bunch of edit logs and it should be fine that logs are elsewhere in hdfs in an output directory written by all split participants whether multithreaded or a mapreduce-like distributed process (Lets write our distributed sort first as a MR so we learn whats involved; distributed sort, as much as possible should use MR framework pieces). On startup, regions go to this directory and pick up the files written by split participants deleting and clearing the dir when all have been read in. Making it so can take multiple logs for input, can also make the split process more robust rather than current tenuous process which loses all edits if it doesn't make it to the end without error. 2. Each column family rereads the reconstruction log to find its edits. Need to fix that. Split can sort the edits by column family so store only reads its edits. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1364) [performance] Distributed splitting of regionserver commit logs
[ https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018744#comment-13018744 ] Prakash Khemani commented on HBASE-1364: posted a revised patch at https://review.cloudera.org/r/1655/ [performance] Distributed splitting of regionserver commit logs --- Key: HBASE-1364 URL: https://issues.apache.org/jira/browse/HBASE-1364 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: stack Assignee: Prakash Khemani Priority: Critical Fix For: 0.92.0 Attachments: HBASE-1364.patch Time Spent: 8h Remaining Estimate: 0h HBASE-1008 has some improvements to our log splitting on regionserver crash; but it needs to run even faster. (Below is from HBASE-1008) In bigtable paper, the split is distributed. If we're going to have 1000 logs, we need to distribute or at least multithread the splitting. 1. As is, regions starting up expect to find one reconstruction log only. Need to make it so pick up a bunch of edit logs and it should be fine that logs are elsewhere in hdfs in an output directory written by all split participants whether multithreaded or a mapreduce-like distributed process (Lets write our distributed sort first as a MR so we learn whats involved; distributed sort, as much as possible should use MR framework pieces). On startup, regions go to this directory and pick up the files written by split participants deleting and clearing the dir when all have been read in. Making it so can take multiple logs for input, can also make the split process more robust rather than current tenuous process which loses all edits if it doesn't make it to the end without error. 2. Each column family rereads the reconstruction log to find its edits. Need to fix that. Split can sort the edits by column family so store only reads its edits. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1364) [performance] Distributed splitting of regionserver commit logs
[ https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018973#comment-13018973 ] Prakash Khemani commented on HBASE-1364: updated patch at https://review.cloudera.org/r/1655/ [performance] Distributed splitting of regionserver commit logs --- Key: HBASE-1364 URL: https://issues.apache.org/jira/browse/HBASE-1364 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: stack Assignee: Prakash Khemani Priority: Critical Fix For: 0.92.0 Attachments: HBASE-1364.patch Time Spent: 8h Remaining Estimate: 0h HBASE-1008 has some improvements to our log splitting on regionserver crash; but it needs to run even faster. (Below is from HBASE-1008) In bigtable paper, the split is distributed. If we're going to have 1000 logs, we need to distribute or at least multithread the splitting. 1. As is, regions starting up expect to find one reconstruction log only. Need to make it so pick up a bunch of edit logs and it should be fine that logs are elsewhere in hdfs in an output directory written by all split participants whether multithreaded or a mapreduce-like distributed process (Lets write our distributed sort first as a MR so we learn whats involved; distributed sort, as much as possible should use MR framework pieces). On startup, regions go to this directory and pick up the files written by split participants deleting and clearing the dir when all have been read in. Making it so can take multiple logs for input, can also make the split process more robust rather than current tenuous process which loses all edits if it doesn't make it to the end without error. 2. Each column family rereads the reconstruction log to find its edits. Need to fix that. Split can sort the edits by column family so store only reads its edits. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-1364) [performance] Distributed splitting of regionserver commit logs
[ https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Khemani reassigned HBASE-1364: -- Assignee: Prakash Khemani [performance] Distributed splitting of regionserver commit logs --- Key: HBASE-1364 URL: https://issues.apache.org/jira/browse/HBASE-1364 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: stack Assignee: Prakash Khemani Priority: Critical Fix For: 0.92.0 Attachments: HBASE-1364.patch Time Spent: 8h Remaining Estimate: 0h HBASE-1008 has some improvements to our log splitting on regionserver crash; but it needs to run even faster. (Below is from HBASE-1008) In bigtable paper, the split is distributed. If we're going to have 1000 logs, we need to distribute or at least multithread the splitting. 1. As is, regions starting up expect to find one reconstruction log only. Need to make it so pick up a bunch of edit logs and it should be fine that logs are elsewhere in hdfs in an output directory written by all split participants whether multithreaded or a mapreduce-like distributed process (Lets write our distributed sort first as a MR so we learn whats involved; distributed sort, as much as possible should use MR framework pieces). On startup, regions go to this directory and pick up the files written by split participants deleting and clearing the dir when all have been read in. Making it so can take multiple logs for input, can also make the split process more robust rather than current tenuous process which loses all edits if it doesn't make it to the end without error. 2. Each column family rereads the reconstruction log to find its edits. Need to fix that. Split can sort the edits by column family so store only reads its edits. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1364) [performance] Distributed splitting of regionserver commit logs
[ https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008973#comment-13008973 ] Prakash Khemani commented on HBASE-1364: I justed posted not yet fully done patch for review https://review.cloudera.org/r/1655/ (For some reason it isn't getting automatically linked) [performance] Distributed splitting of regionserver commit logs --- Key: HBASE-1364 URL: https://issues.apache.org/jira/browse/HBASE-1364 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: stack Assignee: Alex Newman Priority: Critical Fix For: 0.92.0 Attachments: HBASE-1364.patch Time Spent: 8h Remaining Estimate: 0h HBASE-1008 has some improvements to our log splitting on regionserver crash; but it needs to run even faster. (Below is from HBASE-1008) In bigtable paper, the split is distributed. If we're going to have 1000 logs, we need to distribute or at least multithread the splitting. 1. As is, regions starting up expect to find one reconstruction log only. Need to make it so pick up a bunch of edit logs and it should be fine that logs are elsewhere in hdfs in an output directory written by all split participants whether multithreaded or a mapreduce-like distributed process (Lets write our distributed sort first as a MR so we learn whats involved; distributed sort, as much as possible should use MR framework pieces). On startup, regions go to this directory and pick up the files written by split participants deleting and clearing the dir when all have been read in. Making it so can take multiple logs for input, can also make the split process more robust rather than current tenuous process which loses all edits if it doesn't make it to the end without error. 2. Each column family rereads the reconstruction log to find its edits. Need to fix that. Split can sort the edits by column family so store only reads its edits. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3585) isLegalFamilyName() can throw ArrayOutOfBoundException
[ https://issues.apache.org/jira/browse/HBASE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13001884#comment-13001884 ] Prakash Khemani commented on HBASE-3585: The ArrayOutOfBound exception happened when doing admin.create(htd) where the HTableDescriptor htd had zero length family-name. isLegalFamilyName() can throw ArrayOutOfBoundException -- Key: HBASE-3585 URL: https://issues.apache.org/jira/browse/HBASE-3585 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.1 Reporter: Prakash Khemani Priority: Minor org.apache.hadoop.hbase.HColumnDescriptor.isLegalFamilyName(byte[]) accesses byte[0] w/o first checking the array length. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HBASE-3585) isLegalFamilyName() can throw ArrayOutOfBoundException
isLegalFamilyName() can throw ArrayOutOfBoundException -- Key: HBASE-3585 URL: https://issues.apache.org/jira/browse/HBASE-3585 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.1 Reporter: Prakash Khemani Priority: Minor org.apache.hadoop.hbase.HColumnDescriptor.isLegalFamilyName(byte[]) accesses byte[0] w/o first checking the array length. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3476) HFile -m option need not scan key values
[ https://issues.apache.org/jira/browse/HBASE-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12987110#action_12987110 ] Prakash Khemani commented on HBASE-3476: I had put up this diff https://review.cloudera.org/r/1489/ . I am not sure why it didn’t get propagated to the JIRA. Please feel free to put your own patch and close this issue. Thanks. HFile -m option need not scan key values Key: HBASE-3476 URL: https://issues.apache.org/jira/browse/HBASE-3476 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Reporter: Prakash Khemani Assignee: Prakash Khemani Priority: Minor bin/hbase org.apache.hadoop.io.hfile.HFile -m -f filename doesn't have to scan the KVs in the file -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-3476) HFile -m option need not scan key values
[ https://issues.apache.org/jira/browse/HBASE-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Khemani updated HBASE-3476: --- Status: Open (was: Patch Available) HFile -m option need not scan key values Key: HBASE-3476 URL: https://issues.apache.org/jira/browse/HBASE-3476 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Reporter: Prakash Khemani Assignee: Prakash Khemani Priority: Minor bin/hbase org.apache.hadoop.io.hfile.HFile -m -f filename doesn't have to scan the KVs in the file -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-3476) HFile -m option need not scan key values
[ https://issues.apache.org/jira/browse/HBASE-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Khemani updated HBASE-3476: --- Status: Patch Available (was: Open) HFile -m option need not scan key values Key: HBASE-3476 URL: https://issues.apache.org/jira/browse/HBASE-3476 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Reporter: Prakash Khemani Assignee: Prakash Khemani Priority: Minor bin/hbase org.apache.hadoop.io.hfile.HFile -m -f filename doesn't have to scan the KVs in the file -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-3476) HFile -m option need not scan key values
HFile -m option need not scan key values Key: HBASE-3476 URL: https://issues.apache.org/jira/browse/HBASE-3476 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Reporter: Prakash Khemani Assignee: Prakash Khemani Priority: Minor bin/hbase org.apache.hadoop.io.hfile.HFile -m -f filename doesn't have to scan the KVs in the file -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-3444) Bytes.toBytesBinary and Bytes.toStringBinary() should be reversible
Bytes.toBytesBinary and Bytes.toStringBinary() should be reversible Key: HBASE-3444 URL: https://issues.apache.org/jira/browse/HBASE-3444 Project: HBase Issue Type: Bug Reporter: Prakash Khemani Priority: Minor Bytes.toStringBinary() doesn't escape \. Otherwise the transformation isn't reversible byte[] a = {'\', 'x' , '0', '0'} Bytes.toBytesBinary(Bytes.toStringBinary(a)) won't be equal to a -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HBASE-3398) increment(Increment, Integer, boolean) might fail
[ https://issues.apache.org/jira/browse/HBASE-3398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Khemani resolved HBASE-3398. Resolution: Not A Problem Release Note: maxversion is set to 1 increment(Increment, Integer, boolean) might fail - Key: HBASE-3398 URL: https://issues.apache.org/jira/browse/HBASE-3398 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Reporter: Prakash Khemani Assignee: Jonathan Gray In org.apache.hadoop.hbase.regionserver.HRegion.increment(Increment, Integer, boolean) the following loop assumes that the result from geLastIncrement() has a single entry for a given family, qualifier. But that is not necessarily true. getLastIncrement() does a union of all entries found in each of the store files ... and multiple versions of the same key are quite possible. ListKeyValue results = getLastIncrement(get); // Iterate the input columns and update existing values if they were // found, otherwise add new column initialized to the increment amount int idx = 0; for (Map.Entrybyte [], Long column : family.getValue().entrySet()) { long amount = column.getValue(); if (idx results.size() results.get(idx).matchingQualifier(column.getKey())) { amount += Bytes.toLong(results.get(idx).getValue()); idx++; } -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HBASE-3396) getLastIncrement() can miss some key-values
[ https://issues.apache.org/jira/browse/HBASE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Khemani resolved HBASE-3396. Resolution: Not A Problem Release Note: maxVersion in the scan is set to 1 getLastIncrement() can miss some key-values --- Key: HBASE-3396 URL: https://issues.apache.org/jira/browse/HBASE-3396 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Reporter: Prakash Khemani Assignee: Jonathan Gray In getLastIncrement() there is an assumption that memstore only scan will never return multiple versions of a kv // found everything we were looking for, done if (results.size() == expected) { return results; } Based on this assumption the code does an early out after it finds the expected number of key-value pairs in the memstore. But what if there were multiple versions of the same kv returned by the memstore scan? I think it is possible when the memstore has a snapshot pending to be written out. A version of the key can be returned each from the online and from the snapshot memory. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-3398) increment(Increment, Integer, boolean) might fail
increment(Increment, Integer, boolean) might fail - Key: HBASE-3398 URL: https://issues.apache.org/jira/browse/HBASE-3398 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Reporter: Prakash Khemani Assignee: Jonathan Gray In org.apache.hadoop.hbase.regionserver.HRegion.increment(Increment, Integer, boolean) the following loop assumes that the result from geLastIncrement() has a single entry for a given family, qualifier. But that is not necessarily true. getLastIncrement() does a union of all entries found in each of the store files ... and multiple versions of the same key are quite possible. ListKeyValue results = getLastIncrement(get); // Iterate the input columns and update existing values if they were // found, otherwise add new column initialized to the increment amount int idx = 0; for (Map.Entrybyte [], Long column : family.getValue().entrySet()) { long amount = column.getValue(); if (idx results.size() results.get(idx).matchingQualifier(column.getKey())) { amount += Bytes.toLong(results.get(idx).getValue()); idx++; } -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-3399) upsert doesn't matchFamily() before removing key
upsert doesn't matchFamily() before removing key Key: HBASE-3399 URL: https://issues.apache.org/jira/browse/HBASE-3399 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Reporter: Prakash Khemani Assignee: Jonathan Gray org.apache.hadoop.hbase.regionserver.MemStore.upsert(KeyValue) doesn't match family before deciding to remove a kv in the memstore // if the qualifier matches and it's a put, remove it if (kv.matchingQualifier(cur)) { // to be extra safe we only remove Puts that have a memstoreTS==0 if (kv.getType() == KeyValue.Type.Put.getCode() kv.getMemstoreTS() == 0) { // false means there was a change, so give us the size. addedSize -= heapSizeChange(kv, true); it.remove(); } shouldn't it be if the family and qualifier match and it's a Put, remove it? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-3395) StoreScanner not being closed?
StoreScanner not being closed? -- Key: HBASE-3395 URL: https://issues.apache.org/jira/browse/HBASE-3395 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Reporter: Prakash Khemani Assignee: Jonathan Gray In StoreScanner::next(ListKeyValue outResult, int limit) case SEEK_NEXT_ROW: // This is just a relatively simple end of scan fix, to short-cut end us if there is a // endKey in the scan. if (!matcher.moreRowsMayExistAfter(kv)) { outResult.addAll(results); return false; } close() is not being called before returning false. In all other cases close is called before returning false. May be this is a problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3292) Expose block cache hit/miss/evict counts into region server metrics
[ https://issues.apache.org/jira/browse/HBASE-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12965762#action_12965762 ] Prakash Khemani commented on HBASE-3292: aggregate hit/miss counts will definitely help. we should also report current block cache hit ratio - it will save us the hassle of deriving it from the aggregate counts. Expose block cache hit/miss/evict counts into region server metrics --- Key: HBASE-3292 URL: https://issues.apache.org/jira/browse/HBASE-3292 Project: HBase Issue Type: Improvement Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Minor Fix For: 0.90.1, 0.92.0 Attachments: HBASE-3292-v1.patch Right now only the hit ratio is exposed into the rs metrics. This value tends to change very slowly and hardly at all once the cluster has been up for some time. We should expose the aggregate hit/miss/evict counts so you can more effectively see how things change over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-3248) support Increment::incrementColumn()
support Increment::incrementColumn() Key: HBASE-3248 URL: https://issues.apache.org/jira/browse/HBASE-3248 Project: HBase Issue Type: Improvement Components: client Reporter: Prakash Khemani The Increment.addColumn() API overwrites the old column value if it exists. we need a new method incrementColumn() that will sum up the old and new values. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3246) Add API to Increment client class that increments rather than replaces the amount for a column when done multiple times
[ https://issues.apache.org/jira/browse/HBASE-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933645#action_12933645 ] Prakash Khemani commented on HBASE-3246: is there any need for addColumn()? it might be cleaner to just have incrementColumn() in this API Add API to Increment client class that increments rather than replaces the amount for a column when done multiple times --- Key: HBASE-3246 URL: https://issues.apache.org/jira/browse/HBASE-3246 Project: HBase Issue Type: Improvement Components: client Reporter: Jonathan Gray Assignee: Jonathan Gray Attachments: HBASE-3246-v1.patch In the new Increment class, the API to add columns is {{addColumn()}}. If you do this multiple times for an individual column, the amount to increment by is replaced. I think this is the right way for this method to work and it is javadoc'd with the behavior. We should add a new method, {{incrementColumn()}} which will increment any existing amount for the specified column rather than replacing it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-3239) NPE when trying to roll logs
NPE when trying to roll logs Key: HBASE-3239 URL: https://issues.apache.org/jira/browse/HBASE-3239 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.0 Reporter: Prakash Khemani Note from Kannan findMemstoresWithEditsEqualOrOlderThan() can return NULL it seems like. And we don't check NULL, before region.length. regions = findMemstoresWithEditsEqualOrOlderThan(this.outputfiles.firstKey(), this.lastSeqWritten); StringBuilder sb = new StringBuilder(); for (int i = 0; i regions.length; i++) { === Stack Trace 2010-11-15 19:19:54,258 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=6.1 GB, free=1.71 GB, max=7.81 GB, blocks=385740, accesses=7020255, hits=6329399, hitRatio=90.15%%, cachingAccesses=6765050, cachingHits=6329399, cachingHitsRatio=93.56%%, evictions=1, evicted=49911, evictedPerRun=49911.0 2010-11-15 19:21:05,204 INFO org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using syncFs -- HDFS-200 2010-11-15 19:21:05,211 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /PUMAHBASE001-SNC5-HBASE/.logs/pumahbase042.snc5.facebook.com,60020,1289856892583/10.38.28.57%3A60020.1289877154987, entries=649004, filesize=255069060. New hlog /PUMAHBASE001-SNC5-HBASE/.logs/pumahbase042.snc5.facebook.com,60020,1289856892583/10.38.28.57%3A60020.1289877665062 2010-11-15 19:21:05,222 ERROR org.apache.hadoop.hbase.regionserver.LogRoller: Log rolling failed java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.wal.HLog.cleanOldLogs(HLog.java:648) at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:528) at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94) 2010-11-15 19:21:05,226 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server serverName=pumahbase042.snc5.facebook.com,60020,1289856892583, load=(requests=3476, regions=40, usedHeap=8388, maxHeap=15987): Log rolling failed java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.wal.HLog.cleanOldLogs(HLog.java:648) at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:528) at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94) 2010-11-15 19:21:05,227 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: request=1264.5834, regions=40, stores=70, storefiles=98, storefileIndexSize=35, memstoreSize=83, compactionQueueSize=0, usedHeap=8370, maxHeap=15987, blockCacheSize=6593768536, blockCacheFree=1788154792, blockCacheCount=388283, blockCacheHitRatio=90, blockCacheHitCachingRatio=93 2010-11-15 19:21:05,227 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Log rolling failed 2010-11-15 19:21:05,227 INFO org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting. 2010-11-15 19:21:07,255 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 60020 === -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-3196) Regionserver stuck when after all IPC Server handlers fatal'd
[ https://issues.apache.org/jira/browse/HBASE-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Khemani updated HBASE-3196: --- Description: The region server is stuck with the following jstack 2010-11-03 22:23:41 Full thread dump Java HotSpot(TM) 64-Bit Server VM (14.0-b16 mixed mode): Attach Listener daemon prio=10 tid=0x2aaeb6774000 nid=0x3974 waiting on condition [0x] java.lang.Thread.State: RUNNABLE RS_CLOSE_REGION-pumahbase028.snc5.facebook.com,60020,1288733355197-2 prio=10 tid=0x2aaeb8449000 nid=0x3bbc waiting on condition [0x43f67000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x2aaab7fd1130 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) RS_CLOSE_REGION-pumahbase028.snc5.facebook.com,60020,1288733355197-1 prio=10 tid=0x2aaeb843f800 nid=0x3bbb waiting on condition [0x43e66000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x2aaab7fd1130 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) RS_CLOSE_REGION-pumahbase028.snc5.facebook.com,60020,1288733355197-0 prio=10 tid=0x2aaeb8447800 nid=0x3bba waiting on condition [0x44068000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x2aaab7fd1130 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) RMI Scheduler(0) daemon prio=10 tid=0x2aaeb48c4800 nid=0x1c97 waiting on condition [0x580a7000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x2aaab773a118 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1963) at java.util.concurrent.DelayQueue.take(DelayQueue.java:164) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:583) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:576) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) RS_OPEN_REGION-pumahbase028.snc5.facebook.com,60020,1288733355197-2 daemon prio=10 tid=0x2aaeb4804800 nid=0x17a0 waiting on condition [0x582a9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x2aaab7fca538 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) at
[jira] Commented: (HBASE-3196) Regionserver stuck when after all IPC Server handlers fatal'd
[ https://issues.apache.org/jira/browse/HBASE-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928144#action_12928144 ] Prakash Khemani commented on HBASE-3196: I am not sure where the version mismatch is coming from. It shouldn't. The 3rd datanode in the pipeline is inaccessible and I cannot ssh to it. === In the logs it doesn't appear that there is any region that could not be closed. When I grep for region names, all log lines at the end, look like the following 2010-11-02 17:36:25,490 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing tmp_realtime_domain_feed_imps_urls_hot,f6b05d27d8b93e0691ae35a846f2742cno.blogg.renate87 sffi 2 7ffa8a0f 10150313506035171,1288637339566.612cf51b8553ea552f3c283a542a7fe9.: disabling compactions flushes 2010-11-02 17:36:25,490 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled for region tmp_realtime_domain_feed_imps_urls_hot,f6b05d27d8b93e0691ae35a846f2742cno.blogg.renate87 sffi 2 7ffa8a0f 10150313506035171,1288637339566.612cf51b8553ea552f3c283a542a7fe9. 2010-11-02 17:36:25,490 ERROR org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Closing region tmp_realtime_domain_feed_imps_urls_hot,f6b05d27d8b93e0691ae35a846f2742cno.blogg.renate87 sffi 2 7ffa8a0f 10150313506035171,1288637339566.612cf51b8553ea552f3c283a542a7fe9. 2010-11-02 17:36:25,490 DEBUG org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Closed region tmp_realtime_domain_feed_imps_urls_hot,f6b05d27d8b93e0691ae35a846f2742cno.blogg.renate87 sffi 2 7ffa8a0f 10150313506035171,1288637339566.612cf51b8553ea552f3c283a542a7fe9. 2010-11-02 17:36:25,490 DEBUG org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing close of tmp_realtime_domain_feed_imps_domains,bceda0983f7f83bfb71291220eb619c5com.mediafire.img16 sffi,1288637064513.05ea84f89daf7274ab9a88d54460b03b. 2010-11-02 17:36:25,490 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing tmp_realtime_domain_feed_imps_domains,bceda0983f7f83bfb71291220eb619c5com.mediafire.img16 sffi,1288637064513.05ea84f89daf7274ab9a88d54460b03b.: disabling compactions flushes 2010-11-02 17:36:25,490 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled for region tmp_realtime_domain_feed_imps_domains,bceda0983f7f83bfb71291220eb619c5com.mediafire.img16 sffi,1288637064513.05ea84f89daf7274ab9a88d54460b03b. 2010-11-02 17:36:25,490 ERROR org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Closing region tmp_realtime_domain_feed_imps_domains,bceda0983f7f83bfb71291220eb619c5com.mediafire.img16 sffi,1288637064513.05ea84f89daf7274ab9a88d54460b03b. 2010-11-02 17:36:25,490 DEBUG org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Closed region tmp_realtime_domain_feed_imps_domains,bceda0983f7f83bfb71291220eb619c5com.mediafire.img16 sffi,1288637064513.05ea84f89daf7274ab9a88d54460b03b. 2010-11-02 17:36:25,490 DEBUG org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing close of tmp_realtime_domain_feed_imps_urls,a6c845f0d1b183bc1817e7aef8354199com.brooklynvegan sffi 8332505780,1288637275045.283d82705c44eafc24e13e0e2d2e2bc5. 2010-11-02 17:36:25,490 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing tmp_realtime_domain_feed_imps_urls,a6c845f0d1b183bc1817e7aef8354199com.brooklynvegan sffi 8332505780,1288637275045.283d82705c44eafc24e13e0e2d2e2bc5.: disabling compactions flushes 2010-11-02 17:36:25,490 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled for region tmp_realtime_domain_feed_imps_urls,a6c845f0d1b183bc1817e7aef8354199com.brooklynvegan sffi 8332505780,1288637275045.283d82705c44eafc24e13e0e2d2e2bc5. 2010-11-02 17:36:25,490 ERROR org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Closing region tmp_realtime_domain_feed_imps_urls,a6c845f0d1b183bc1817e7aef8354199com.brooklynvegan sffi 8332505780,1288637275045.283d82705c44eafc24e13e0e2d2e2bc5. 2010-11-02 17:36:25,490 DEBUG org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Closed region tmp_realtime_domain_feed_imps_urls,a6c845f0d1b183bc1817e7aef8354199com.brooklynvegan sffi 8332505780,1288637275045.283d82705c44eafc24e13e0e2d2e2bc5. 2010-11-02 17:36:25,482 ERROR org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Closing region tmp_realtime_domain_feed_imps_urls_hot,35c28f48,1288637318666.e391902cf1e5a5a64c178001a42f055a. 2010-11-02 17:36:25,495 DEBUG org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Closed region tmp_realtime_domain_feed_imps_urls_hot,35c28f48,1288637318666.e391902cf1e5a5a64c178001a42f055a. 2010-11-02 17:36:25,482 ERROR org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Closing region tmp_realtime_domain_feed_imps_domains,3358dabfafc35708f8de488d6f20f848nl.ronsrovlinks sefi,1288637018644.2feeb71cd2573e8453ef77a9aa40aa38. 2010-11-02
[jira] Created: (HBASE-3196) Regionserver stuck when after all IPC Server handlers fatal'
Regionserver stuck when after all IPC Server handlers fatal' Key: HBASE-3196 URL: https://issues.apache.org/jira/browse/HBASE-3196 Project: HBase Issue Type: Bug Reporter: Prakash Khemani -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2957) Release row lock when waiting for wal-sync
[ https://issues.apache.org/jira/browse/HBASE-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907380#action_12907380 ] Prakash Khemani commented on HBASE-2957: I agree that the ordering on a given region server will be the same with or without delayed sync. But I am pretty sure that globally there will be inconsistencies. Say a value is updated on RS A. This value is not synced yet. The abovementioned unsynced value on RS A is read by someone and based on that value an update is made on another RS B. Say the update on RS B is synced. Now we have a window where B depends on A, B is in the logs but A isn't. In in this window if RS A dies and comes back up then we will have a situation where the update on RS B is present but update on RS A isn't. Release row lock when waiting for wal-sync -- Key: HBASE-2957 URL: https://issues.apache.org/jira/browse/HBASE-2957 Project: HBase Issue Type: Improvement Components: regionserver, wal Affects Versions: 0.20.0 Reporter: Prakash Khemani Is there a reason to hold on to the row-lock while waiting for the WAL-sync to be completed by the logSyncer thread? I think data consistency will be guaranteed even if the following happens (a) the row lock is held while the row is updated in memory (b) the row lock is released after queuing the KV record for WAL-syncing (c) the log-sync system guarantees that the log records for any given row are synced in order (d) the HBase client only receives a success notification after the sync completes (no change from the current state) I think this should be a huge win. For my use case, and I am sure for others, the handler thread spends the bulk of its row-lock critical section time waiting for sync to complete. Even if the log-sync system cannot guarantee the orderly completion of sync records, the Don't hold row lock while waiting for sync option should be available to HBase clients on a per request basis. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2957) Release row lock when waiting for wal-sync
[ https://issues.apache.org/jira/browse/HBASE-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906884#action_12906884 ] Prakash Khemani commented on HBASE-2957: Sorry, I was out and couldn't reply to this thread. I think a general solution that guarantees consistency for PUTs and ICVs and at the same time doesn't hold the row lock while updating hlog is possible. === Thinking aloud. First why do we want to hold the row lock around the log sync? Because we want the log sync to happen in causal ordering. Here is a scenario of what can go wrong if we release the row lock before the sync completes. 1. client-1 does a put/icv on regionserver-1. releases the row lock before the sync. 2. client-2 comes in and reads the new value. Based on this just read value, client-2 then does a put in regionserver-2. 3. client-2 is able to do its sync on rs-2 before client-1's sync on rs-1 completes. 4. rs-1 is brought down ungracefully. During recovery we will have client-2's update but not client-1's. And that violates the causal ordering of events. === So we don't want anyone to read a value which has not already been synced. I think we can transfer the wait-for-sync to the reader instead of asking all writers to wait. A simple way to do that will be to attach a log-sync-number with every cell. When a cell is updated it will keep the next log-sync-number within itself. A get will not return until the current log-sync-number is at least as big as log-sync-number stored in the cell. An update can return immediately after queuing the sync. The wait-for-sync is transferred from the writer to the reader. If the reader comes in sufficiently late (which is likely) then there will be no wait-for-syncs in the system. === Even in this scheme we will have to treat ICVs specially. Logically an ICV has a (a) GET the old value (b) PUT the new value (c) GET and return the new value There are 2 cases (1) The ICV caller doesn't use the return value of the ICV. In this case the ICV need not wait for the earlier sync to complere. (In my use case this what happens predominantly) (2) The ICV caller uses the return value of the ICV call to make further updates. In this case the ICV has to wait for its sync to complete before it returns. While the ICV is waiting for the sync to complete it need not hold the row lock. (At least in my use case this is a very rare case) === I think that it is true in general that while a GET is forced to wait for a sync to complete, there is no need to hold the row lock. === Release row lock when waiting for wal-sync -- Key: HBASE-2957 URL: https://issues.apache.org/jira/browse/HBASE-2957 Project: HBase Issue Type: Improvement Components: regionserver, wal Affects Versions: 0.20.0 Reporter: Prakash Khemani Is there a reason to hold on to the row-lock while waiting for the WAL-sync to be completed by the logSyncer thread? I think data consistency will be guaranteed even if the following happens (a) the row lock is held while the row is updated in memory (b) the row lock is released after queuing the KV record for WAL-syncing (c) the log-sync system guarantees that the log records for any given row are synced in order (d) the HBase client only receives a success notification after the sync completes (no change from the current state) I think this should be a huge win. For my use case, and I am sure for others, the handler thread spends the bulk of its row-lock critical section time waiting for sync to complete. Even if the log-sync system cannot guarantee the orderly completion of sync records, the Don't hold row lock while waiting for sync option should be available to HBase clients on a per request basis. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-2957) Release row lock when waiting for wal-sync
Release row lock when waiting for wal-sync -- Key: HBASE-2957 URL: https://issues.apache.org/jira/browse/HBASE-2957 Project: HBase Issue Type: Improvement Components: regionserver, wal Affects Versions: 0.20.0 Reporter: Prakash Khemani Is there a reason to hold on to the row-lock while waiting for the WAL-sync to be completed by the logSyncer thread? I think data consistency will be guaranteed even if the following happens (a) the row lock is held while the row is updated in memory (b) the row lock is released after queuing the KV record for WAL-syncing (c) the log-sync system guarantees that the log records for any given row are synced in order (d) the HBase client only receives a success notification after the sync completes (no change from the current state) I think this should be a huge win. For my use case, and I am sure for others, the handler thread spends the bulk of its row-lock critical section time waiting for sync to complete. Even if the log-sync system cannot guarantee the orderly completion of sync records, the Don't hold row lock while waiting for sync option should be available to HBase clients on a per request basis. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2957) Release row lock when waiting for wal-sync
[ https://issues.apache.org/jira/browse/HBASE-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905798#action_12905798 ] Prakash Khemani commented on HBASE-2957: Actually, data consistency is not guaranteed if we return to the HBase client any value which has not yet been sync'd to WAL. But for my use case, and I think for many others, it is OK. Release row lock when waiting for wal-sync -- Key: HBASE-2957 URL: https://issues.apache.org/jira/browse/HBASE-2957 Project: HBase Issue Type: Improvement Components: regionserver, wal Affects Versions: 0.20.0 Reporter: Prakash Khemani Is there a reason to hold on to the row-lock while waiting for the WAL-sync to be completed by the logSyncer thread? I think data consistency will be guaranteed even if the following happens (a) the row lock is held while the row is updated in memory (b) the row lock is released after queuing the KV record for WAL-syncing (c) the log-sync system guarantees that the log records for any given row are synced in order (d) the HBase client only receives a success notification after the sync completes (no change from the current state) I think this should be a huge win. For my use case, and I am sure for others, the handler thread spends the bulk of its row-lock critical section time waiting for sync to complete. Even if the log-sync system cannot guarantee the orderly completion of sync records, the Don't hold row lock while waiting for sync option should be available to HBase clients on a per request basis. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-2952) HConnectionManager's shutdown hook interferes with client's operations
HConnectionManager's shutdown hook interferes with client's operations -- Key: HBASE-2952 URL: https://issues.apache.org/jira/browse/HBASE-2952 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.20.0 Reporter: Prakash Khemani My HBase client calls incrementColValue() in pairs. If someone kills the client (SIGINT or SIGTERM) I want my client's increment threads to gracefully exit. If a thread has already done one of the incrementColValue() then I want that thread to complete the other incrementColValue() and then exit. For this purpose I installed my own shutdownHook(). My shitdownHook() thread 'sugnals' all the threads in my process that it is time to exit and then waits for them to complete. The problem is that HConnectionManager's shutdownHook thread also runs and shuts down all connections and IPC threads. My increment thread keeps waiting to increment and then times out after 240s. Two problems with this - the incrementColValiue() didn't go through which will increase the chances of inconsistency in my HBase data. And it too 240s to exit. I am pasting some of the messages that the client thread outputs while it tries contact the HBase server. Signalled. Exiting ... 2010-09-01 12:11:14,769 DEBUG [HCM.shutdownHook] zookeeper.ZooKeeperWrapper(787): localhost:/hbase,org.apache.hadoop.hbase.client.HConnectionManagerClosed connection with ZooKeeper; /hbase/root-region-server flushing after 7899 2010-09-01 12:11:19,669 DEBUG [Line Processing Thread 0] client.HConnectionManager$TableServers(903): Cache hit for row in tableName .META.: location server hadoop2205.snc3.facebook.com:60020, location region name .META.,,1.1028785192 2010-09-01 12:11:19,671 INFO [Line Processing Thread 0] zookeeper.ZooKeeperWrapper(206): Reconnecting to zookeeper 2010-09-01 12:11:19,671 DEBUG [Line Processing Thread 0] zookeeper.ZooKeeperWrapper(212): localhost:/hbase,org.apache.hadoop.hbase.client.HConnectionManagerConnected to zookeeper again 2010-09-01 12:11:24,679 DEBUG [Line Processing Thread 0] client.HConnectionManager$TableServers(964): Removed .META.,,1.1028785192 for tableName=.META. from cache because of content_action_url_metrics,\x080r B\xF7\x81_T\x07\x08\x16uOrcom.gigya 429934274290948,99 2010-09-01 12:11:24,680 DEBUG [Line Processing Thread 0] client.HConnectionManager$TableServers(857): locateRegionInMeta attempt 0 of 4 failed; retrying after sleep of 5000 because: The client is stopped 2010-09-01 12:11:24,680 DEBUG [Line Processing Thread 0] zookeeper.ZooKeeperWrapper(470): localhost:/hbase,org.apache.hadoop.hbase.client.HConnectionManagerTrying to read /hbase/root-region-server 2010-09-01 12:11:24,681 DEBUG [Line Processing Thread 0] zookeeper.ZooKeeperWrapper(489): localhost:/hbase,org.apache.hadoop.hbase.client.HConnectionManagerRead ZNode /hbase/root-region-server got 10.26.119.190:60020 2010-09-01 12:11:24,681 DEBUG [Line Processing Thread 0] client.HConnectionManager$TableServers(1116): Root region location changed. Sleeping. === It might be a good idea to only run the HCM shutdown code when all the HTables referring to it have been closed. That way the client can control when the shutdown actually happens. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2952) HConnectionManager's shutdown hook interferes with client's operations
[ https://issues.apache.org/jira/browse/HBASE-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905232#action_12905232 ] Prakash Khemani commented on HBASE-2952: I don’t think that it can be a shutdownHook thread that HCM can accept. The JVM doesn't allow you to order in any way how the shutdownThreads will be run. It will have to be a user 'callback' that HCM's shutdownHook() thread will invoke. Also, I think to get to the HCM instance we have to go through the HTable. There can be multiple instances of HCM in the same process. How about HTable::disableShutdownHook(). And then it becomes the caller's responsibility to make sure HTable::close() is called for every instance of HTable. HConnectionManager's shutdown hook interferes with client's operations -- Key: HBASE-2952 URL: https://issues.apache.org/jira/browse/HBASE-2952 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.20.0 Reporter: Prakash Khemani My HBase client calls incrementColValue() in pairs. If someone kills the client (SIGINT or SIGTERM) I want my client's increment threads to gracefully exit. If a thread has already done one of the incrementColValue() then I want that thread to complete the other incrementColValue() and then exit. For this purpose I installed my own shutdownHook(). My shitdownHook() thread 'sugnals' all the threads in my process that it is time to exit and then waits for them to complete. The problem is that HConnectionManager's shutdownHook thread also runs and shuts down all connections and IPC threads. My increment thread keeps waiting to increment and then times out after 240s. Two problems with this - the incrementColValiue() didn't go through which will increase the chances of inconsistency in my HBase data. And it too 240s to exit. I am pasting some of the messages that the client thread outputs while it tries contact the HBase server. Signalled. Exiting ... 2010-09-01 12:11:14,769 DEBUG [HCM.shutdownHook] zookeeper.ZooKeeperWrapper(787): localhost:/hbase,org.apache.hadoop.hbase.client.HConnectionManagerClosed connection with ZooKeeper; /hbase/root-region-server flushing after 7899 2010-09-01 12:11:19,669 DEBUG [Line Processing Thread 0] client.HConnectionManager$TableServers(903): Cache hit for row in tableName .META.: location server hadoop2205.snc3.facebook.com:60020, location region name .META.,,1.1028785192 2010-09-01 12:11:19,671 INFO [Line Processing Thread 0] zookeeper.ZooKeeperWrapper(206): Reconnecting to zookeeper 2010-09-01 12:11:19,671 DEBUG [Line Processing Thread 0] zookeeper.ZooKeeperWrapper(212): localhost:/hbase,org.apache.hadoop.hbase.client.HConnectionManagerConnected to zookeeper again 2010-09-01 12:11:24,679 DEBUG [Line Processing Thread 0] client.HConnectionManager$TableServers(964): Removed .META.,,1.1028785192 for tableName=.META. from cache because of content_action_url_metrics,\x080r B\xF7\x81_T\x07\x08\x16uOrcom.gigya 429934274290948,99 2010-09-01 12:11:24,680 DEBUG [Line Processing Thread 0] client.HConnectionManager$TableServers(857): locateRegionInMeta attempt 0 of 4 failed; retrying after sleep of 5000 because: The client is stopped 2010-09-01 12:11:24,680 DEBUG [Line Processing Thread 0] zookeeper.ZooKeeperWrapper(470): localhost:/hbase,org.apache.hadoop.hbase.client.HConnectionManagerTrying to read /hbase/root-region-server 2010-09-01 12:11:24,681 DEBUG [Line Processing Thread 0] zookeeper.ZooKeeperWrapper(489): localhost:/hbase,org.apache.hadoop.hbase.client.HConnectionManagerRead ZNode /hbase/root-region-server got 10.26.119.190:60020 2010-09-01 12:11:24,681 DEBUG [Line Processing Thread 0] client.HConnectionManager$TableServers(1116): Root region location changed. Sleeping. === It might be a good idea to only run the HCM shutdown code when all the HTables referring to it have been closed. That way the client can control when the shutdown actually happens. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2952) HConnectionManager's shutdown hook interferes with client's operations
[ https://issues.apache.org/jira/browse/HBASE-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905258#action_12905258 ] Prakash Khemani commented on HBASE-2952: In my experiment each processing thread that invokes incrementColValue() has its own instance of HTable. In an effort to try create multiple connections from my process to a region server I had each thread put something unique in its conf file. The following code then kicks in and creates multiple HCM - one per HTable instance. So, yes, it is possible to have multiple HCMs in a process - one per config. public static HConnection getConnection(Configuration conf) { TableServers connection; Integer key = HBaseConfiguration.hashCode(conf); synchronized (HBASE_INSTANCES) { connection = HBASE_INSTANCES.get(key); (BTW, my experiment to create multiple connections by creating multiple connection-managers had not worked. I had to modify ConnectionManager::getHRegionConnection() and the servers map to create multiple connections.) HConnectionManager's shutdown hook interferes with client's operations -- Key: HBASE-2952 URL: https://issues.apache.org/jira/browse/HBASE-2952 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.20.0 Reporter: Prakash Khemani My HBase client calls incrementColValue() in pairs. If someone kills the client (SIGINT or SIGTERM) I want my client's increment threads to gracefully exit. If a thread has already done one of the incrementColValue() then I want that thread to complete the other incrementColValue() and then exit. For this purpose I installed my own shutdownHook(). My shitdownHook() thread 'sugnals' all the threads in my process that it is time to exit and then waits for them to complete. The problem is that HConnectionManager's shutdownHook thread also runs and shuts down all connections and IPC threads. My increment thread keeps waiting to increment and then times out after 240s. Two problems with this - the incrementColValiue() didn't go through which will increase the chances of inconsistency in my HBase data. And it too 240s to exit. I am pasting some of the messages that the client thread outputs while it tries contact the HBase server. Signalled. Exiting ... 2010-09-01 12:11:14,769 DEBUG [HCM.shutdownHook] zookeeper.ZooKeeperWrapper(787): localhost:/hbase,org.apache.hadoop.hbase.client.HConnectionManagerClosed connection with ZooKeeper; /hbase/root-region-server flushing after 7899 2010-09-01 12:11:19,669 DEBUG [Line Processing Thread 0] client.HConnectionManager$TableServers(903): Cache hit for row in tableName .META.: location server hadoop2205.snc3.facebook.com:60020, location region name .META.,,1.1028785192 2010-09-01 12:11:19,671 INFO [Line Processing Thread 0] zookeeper.ZooKeeperWrapper(206): Reconnecting to zookeeper 2010-09-01 12:11:19,671 DEBUG [Line Processing Thread 0] zookeeper.ZooKeeperWrapper(212): localhost:/hbase,org.apache.hadoop.hbase.client.HConnectionManagerConnected to zookeeper again 2010-09-01 12:11:24,679 DEBUG [Line Processing Thread 0] client.HConnectionManager$TableServers(964): Removed .META.,,1.1028785192 for tableName=.META. from cache because of content_action_url_metrics,\x080r B\xF7\x81_T\x07\x08\x16uOrcom.gigya 429934274290948,99 2010-09-01 12:11:24,680 DEBUG [Line Processing Thread 0] client.HConnectionManager$TableServers(857): locateRegionInMeta attempt 0 of 4 failed; retrying after sleep of 5000 because: The client is stopped 2010-09-01 12:11:24,680 DEBUG [Line Processing Thread 0] zookeeper.ZooKeeperWrapper(470): localhost:/hbase,org.apache.hadoop.hbase.client.HConnectionManagerTrying to read /hbase/root-region-server 2010-09-01 12:11:24,681 DEBUG [Line Processing Thread 0] zookeeper.ZooKeeperWrapper(489): localhost:/hbase,org.apache.hadoop.hbase.client.HConnectionManagerRead ZNode /hbase/root-region-server got 10.26.119.190:60020 2010-09-01 12:11:24,681 DEBUG [Line Processing Thread 0] client.HConnectionManager$TableServers(1116): Root region location changed. Sleeping. === It might be a good idea to only run the HCM shutdown code when all the HTables referring to it have been closed. That way the client can control when the shutdown actually happens. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2952) HConnectionManager's shutdown hook interferes with client's operations
[ https://issues.apache.org/jira/browse/HBASE-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905396#action_12905396 ] Prakash Khemani commented on HBASE-2952: The shutdown hook prevents the zookeeper logs from getting flooded with unnecessary connection timed out or such messages. If that is the case then the shutdown hook still serves some good purpose. IMO the behavior ought to be the following - users who properly call HTable::close on all the open Htables should see this nice HCM shutdown hook behavior. Others who don’t call close() will have their zk logs flooded. This goes to my earlier suggestion that HTable::close should trigger HCM::close and there should be some kind of ref counting in HCM. HConnectionManager's shutdown hook interferes with client's operations -- Key: HBASE-2952 URL: https://issues.apache.org/jira/browse/HBASE-2952 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.20.0 Reporter: Prakash Khemani My HBase client calls incrementColValue() in pairs. If someone kills the client (SIGINT or SIGTERM) I want my client's increment threads to gracefully exit. If a thread has already done one of the incrementColValue() then I want that thread to complete the other incrementColValue() and then exit. For this purpose I installed my own shutdownHook(). My shitdownHook() thread 'sugnals' all the threads in my process that it is time to exit and then waits for them to complete. The problem is that HConnectionManager's shutdownHook thread also runs and shuts down all connections and IPC threads. My increment thread keeps waiting to increment and then times out after 240s. Two problems with this - the incrementColValiue() didn't go through which will increase the chances of inconsistency in my HBase data. And it too 240s to exit. I am pasting some of the messages that the client thread outputs while it tries contact the HBase server. Signalled. Exiting ... 2010-09-01 12:11:14,769 DEBUG [HCM.shutdownHook] zookeeper.ZooKeeperWrapper(787): localhost:/hbase,org.apache.hadoop.hbase.client.HConnectionManagerClosed connection with ZooKeeper; /hbase/root-region-server flushing after 7899 2010-09-01 12:11:19,669 DEBUG [Line Processing Thread 0] client.HConnectionManager$TableServers(903): Cache hit for row in tableName .META.: location server hadoop2205.snc3.facebook.com:60020, location region name .META.,,1.1028785192 2010-09-01 12:11:19,671 INFO [Line Processing Thread 0] zookeeper.ZooKeeperWrapper(206): Reconnecting to zookeeper 2010-09-01 12:11:19,671 DEBUG [Line Processing Thread 0] zookeeper.ZooKeeperWrapper(212): localhost:/hbase,org.apache.hadoop.hbase.client.HConnectionManagerConnected to zookeeper again 2010-09-01 12:11:24,679 DEBUG [Line Processing Thread 0] client.HConnectionManager$TableServers(964): Removed .META.,,1.1028785192 for tableName=.META. from cache because of content_action_url_metrics,\x080r B\xF7\x81_T\x07\x08\x16uOrcom.gigya 429934274290948,99 2010-09-01 12:11:24,680 DEBUG [Line Processing Thread 0] client.HConnectionManager$TableServers(857): locateRegionInMeta attempt 0 of 4 failed; retrying after sleep of 5000 because: The client is stopped 2010-09-01 12:11:24,680 DEBUG [Line Processing Thread 0] zookeeper.ZooKeeperWrapper(470): localhost:/hbase,org.apache.hadoop.hbase.client.HConnectionManagerTrying to read /hbase/root-region-server 2010-09-01 12:11:24,681 DEBUG [Line Processing Thread 0] zookeeper.ZooKeeperWrapper(489): localhost:/hbase,org.apache.hadoop.hbase.client.HConnectionManagerRead ZNode /hbase/root-region-server got 10.26.119.190:60020 2010-09-01 12:11:24,681 DEBUG [Line Processing Thread 0] client.HConnectionManager$TableServers(1116): Root region location changed. Sleeping. === It might be a good idea to only run the HCM shutdown code when all the HTables referring to it have been closed. That way the client can control when the shutdown actually happens. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.