[jira] [Commented] (HBASE-2231) Compaction events should be written to HLog

2013-05-23 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13665477#comment-13665477
 ] 

Prakash Khemani commented on HBASE-2231:


Hi, Prakash Khemani is no longer at Facebook so this email address is no longer 
being monitored. If you need assistance, please contact another person who is 
currently at the company.


 Compaction events should be written to HLog
 ---

 Key: HBASE-2231
 URL: https://issues.apache.org/jira/browse/HBASE-2231
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Todd Lipcon
Assignee: stack
Priority: Blocker
  Labels: moved_from_0_20_5
 Fix For: 0.98.0, 0.95.1

 Attachments: 2231-testcase-0.94.txt, 2231-testcase_v2.txt, 
 2231-testcase_v3.txt, 2231v2.txt, 2231v3.txt, 2231v4.txt, 
 hbase-2231-testcase.txt, hbase-2231.txt, hbase-2231_v5.patch, 
 hbase-2231_v6.patch, hbase-2231_v7-0.95.patch, hbase-2231_v7.patch, 
 hbase-2231_v7.patch


 The sequence for a compaction should look like this:
 # Compact region to new files
 # Write a Compacted Region entry to the HLog
 # Delete old files
 This deals with a case where the RS has paused between step 1 and 2 and the 
 regions have since been reassigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6878) DistributerLogSplit can fail to resubmit a task done if there is an exception during the log archiving

2012-09-27 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13464874#comment-13464874
 ] 

Prakash Khemani commented on HBASE-6878:


The logic to indefinitely retry a failing log-splitting task is not inside 
SplitLogManager. SplitLogManager will retry a task finite number of times. If 
it fails then it is the outer Master layers that indefinitely retry. the reason 
for this behavior is to build tools around distributed log splitting. If 
distributed log splitting were being used by a tool then you wouldn't want it 
to indefinitely retry.

So the behavior outlined in this bug report is correct. But this behavior 
shouldn't lead to any bug.

(There are only a few places in SplitLogManager where it resubmits the task 
forcefully, disregarding the retry limit. I think the only two cases are when a 
region server (splitlogworker) dies and when a splitlogworker resigns from 
the task (i.e. gives up the task even though there were no failures))

 DistributerLogSplit can fail to resubmit a task done if there is an exception 
 during the log archiving
 --

 Key: HBASE-6878
 URL: https://issues.apache.org/jira/browse/HBASE-6878
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: nkeywal
Priority: Minor

 The code in SplitLogManager# getDataSetWatchSuccess is:
 {code}
 if (slt.isDone()) {
   LOG.info(task  + path +  entered state:  + slt.toString());
   if (taskFinisher != null  !ZKSplitLog.isRescanNode(watcher, path)) {
 if (taskFinisher.finish(slt.getServerName(), 
 ZKSplitLog.getFileName(path)) == Status.DONE) {
   setDone(path, SUCCESS);
 } else {
   resubmitOrFail(path, CHECK);
 }
   } else {
 setDone(path, SUCCESS);
   }
 {code}
   resubmitOrFail(path, CHECK);
 should be 
   resubmitOrFail(path, FORCE);
 Without it, the task won't be resubmitted if the delay is not reached, and 
 the task will be marked as failed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5860) splitlogmanager should not unnecessarily resubmit tasks when zk unavailable

2012-04-30 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265461#comment-13265461
 ] 

Prakash Khemani commented on HBASE-5860:


I had missed the fact that isAnyCreateZKNodePending() misses the create of 
RESCAN nodes. Will provide a fix.

I was aware of the race condition where isAnyCreateZKNodePending() will return 
false even when create-zknode is soon going to be retried. Not worth fixing for 
the reason you outlined - creating an extra RESCAN node doesn't hurt. (The code 
change you have outlined will need some more changes to make it work)

 splitlogmanager should not unnecessarily resubmit tasks when zk unavailable
 ---

 Key: HBASE-5860
 URL: https://issues.apache.org/jira/browse/HBASE-5860
 Project: HBase
  Issue Type: Improvement
Reporter: Prakash Khemani
Assignee: Prakash Khemani
 Attachments: 
 0001-HBASE-5860-splitlogmanager-should-not-unnecessarily-.patch


 (Doesn't really impact the run time or correctness of log splitting)
 say the master has lost connection to zk. splitlogmanager's timeoutmanager 
 will realize that all the tasks that were submitted are still unassigned. It 
 will resubmit those tasks (i.e. create dummy znodes)
 splitlogmanager should realze that the tasks are unassigned but their znodes 
 have not been created.
 012-04-20 13:11:20,516 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 dead splitlog worker msgstore295.snc4.facebook.com,60020,1334948757026
 2012-04-20 13:11:20,517 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 Scheduling batch of logs to split
 2012-04-20 13:11:20,517 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 started splitting logs in 
 [hdfs://msgstore215.snc4.facebook.com:9000/MSGSTORE215-SNC4-HBASE/.logs/msgstore295.snc4.facebook.com,60020,1334948757026-splitting]
 2012-04-20 13:11:20,565 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
 connection to server msgstore235.snc4.facebook.com/10.30.222.186:2181
 2012-04-20 13:11:20,566 INFO org.apache.zookeeper.ClientCnxn: Socket 
 connection established to msgstore235.snc4.facebook.com/10.30.222.186:2181, 
 initiating session
 2012-04-20 13:11:20,575 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 total tasks = 4 unassigned = 4
 2012-04-20 13:11:20,576 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 resubmitting unassigned task(s) after timeout
 2012-04-20 13:11:21,577 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 resubmitting unassigned task(s) after timeout
 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read 
 additional data from server sessionid 0x36ccb0f8010002, likely server has 
 closed socket, closing socket connection and attempting reconnect
 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read 
 additional data from server sessionid 0x136ccb0f489, likely server has 
 closed socket, closing socket connection and attempting reconnect
 2012-04-20 13:11:21,786 WARN 
 org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc 
 =CONNECTIONLOSS for 
 /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951586677
  retry=3
 2012-04-20 13:11:21,786 WARN 
 org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc 
 =CONNECTIONLOSS for 
 /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951920332
  retry=3

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5860) splitlogmanager should not unnecessarily resubmit tasks when zk unavailable

2012-04-30 Thread Prakash Khemani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Khemani updated HBASE-5860:
---

Attachment: 0001-HBASE-5860-splitlogmanager-should-not-unnecessarily-.patch

Nicolas's feedback applied.

also reduced the RESCAN retries to 0.

 splitlogmanager should not unnecessarily resubmit tasks when zk unavailable
 ---

 Key: HBASE-5860
 URL: https://issues.apache.org/jira/browse/HBASE-5860
 Project: HBase
  Issue Type: Improvement
Reporter: Prakash Khemani
Assignee: Prakash Khemani
 Attachments: 
 0001-HBASE-5860-splitlogmanager-should-not-unnecessarily-.patch, 
 0001-HBASE-5860-splitlogmanager-should-not-unnecessarily-.patch


 (Doesn't really impact the run time or correctness of log splitting)
 say the master has lost connection to zk. splitlogmanager's timeoutmanager 
 will realize that all the tasks that were submitted are still unassigned. It 
 will resubmit those tasks (i.e. create dummy znodes)
 splitlogmanager should realze that the tasks are unassigned but their znodes 
 have not been created.
 012-04-20 13:11:20,516 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 dead splitlog worker msgstore295.snc4.facebook.com,60020,1334948757026
 2012-04-20 13:11:20,517 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 Scheduling batch of logs to split
 2012-04-20 13:11:20,517 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 started splitting logs in 
 [hdfs://msgstore215.snc4.facebook.com:9000/MSGSTORE215-SNC4-HBASE/.logs/msgstore295.snc4.facebook.com,60020,1334948757026-splitting]
 2012-04-20 13:11:20,565 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
 connection to server msgstore235.snc4.facebook.com/10.30.222.186:2181
 2012-04-20 13:11:20,566 INFO org.apache.zookeeper.ClientCnxn: Socket 
 connection established to msgstore235.snc4.facebook.com/10.30.222.186:2181, 
 initiating session
 2012-04-20 13:11:20,575 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 total tasks = 4 unassigned = 4
 2012-04-20 13:11:20,576 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 resubmitting unassigned task(s) after timeout
 2012-04-20 13:11:21,577 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 resubmitting unassigned task(s) after timeout
 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read 
 additional data from server sessionid 0x36ccb0f8010002, likely server has 
 closed socket, closing socket connection and attempting reconnect
 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read 
 additional data from server sessionid 0x136ccb0f489, likely server has 
 closed socket, closing socket connection and attempting reconnect
 2012-04-20 13:11:21,786 WARN 
 org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc 
 =CONNECTIONLOSS for 
 /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951586677
  retry=3
 2012-04-20 13:11:21,786 WARN 
 org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc 
 =CONNECTIONLOSS for 
 /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951920332
  retry=3

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5890) SplitLog Rescan BusyWaits upon Zk.CONNECTIONLOSS

2012-04-27 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264046#comment-13264046
 ] 

Prakash Khemani commented on HBASE-5890:


Most likely, it isn't a good idea to sleep in the zookeeper callback thread. 
(isn't the zk client single threaded?)

Can these be queued in a DelayedQueue(socket-timeout) and retried from 
SplitLogManager.TimeoutMonitor.chore()

 SplitLog Rescan BusyWaits upon Zk.CONNECTIONLOSS
 

 Key: HBASE-5890
 URL: https://issues.apache.org/jira/browse/HBASE-5890
 Project: HBase
  Issue Type: Bug
Reporter: Nicolas Spiegelberg
Priority: Minor
 Fix For: 0.94.0, 0.96.0, 0.89-fb

 Attachments: HBASE-5890.patch


 We ran into a production issue yesterday where the SplitLogManager tried to 
 create a Rescan node in ZK.  The createAsync() generated a 
 KeeperException.CONNECTIONLOSS that was immedately sent to processResult(), 
 createRescan node with --retry_count was called, and this created a CPU 
 busywait that also clogged up the logs.  We should handle this better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5860) splitlogmanager should not unnecessarily resubmit tasks when zk unavailable

2012-04-25 Thread Prakash Khemani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Khemani updated HBASE-5860:
---

Attachment: 0001-HBASE-5860-splitlogmanager-should-not-unnecessarily-.patch

avoid resubmitting tasks to zk when there are pending zkk nodes create.

 splitlogmanager should not unnecessarily resubmit tasks when zk unavailable
 ---

 Key: HBASE-5860
 URL: https://issues.apache.org/jira/browse/HBASE-5860
 Project: HBase
  Issue Type: Improvement
Reporter: Prakash Khemani
Assignee: Prakash Khemani
 Attachments: 
 0001-HBASE-5860-splitlogmanager-should-not-unnecessarily-.patch


 (Doesn't really impact the run time or correctness of log splitting)
 say the master has lost connection to zk. splitlogmanager's timeoutmanager 
 will realize that all the tasks that were submitted are still unassigned. It 
 will resubmit those tasks (i.e. create dummy znodes)
 splitlogmanager should realze that the tasks are unassigned but their znodes 
 have not been created.
 012-04-20 13:11:20,516 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 dead splitlog worker msgstore295.snc4.facebook.com,60020,1334948757026
 2012-04-20 13:11:20,517 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 Scheduling batch of logs to split
 2012-04-20 13:11:20,517 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 started splitting logs in 
 [hdfs://msgstore215.snc4.facebook.com:9000/MSGSTORE215-SNC4-HBASE/.logs/msgstore295.snc4.facebook.com,60020,1334948757026-splitting]
 2012-04-20 13:11:20,565 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
 connection to server msgstore235.snc4.facebook.com/10.30.222.186:2181
 2012-04-20 13:11:20,566 INFO org.apache.zookeeper.ClientCnxn: Socket 
 connection established to msgstore235.snc4.facebook.com/10.30.222.186:2181, 
 initiating session
 2012-04-20 13:11:20,575 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 total tasks = 4 unassigned = 4
 2012-04-20 13:11:20,576 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 resubmitting unassigned task(s) after timeout
 2012-04-20 13:11:21,577 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 resubmitting unassigned task(s) after timeout
 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read 
 additional data from server sessionid 0x36ccb0f8010002, likely server has 
 closed socket, closing socket connection and attempting reconnect
 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read 
 additional data from server sessionid 0x136ccb0f489, likely server has 
 closed socket, closing socket connection and attempting reconnect
 2012-04-20 13:11:21,786 WARN 
 org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc 
 =CONNECTIONLOSS for 
 /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951586677
  retry=3
 2012-04-20 13:11:21,786 WARN 
 org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc 
 =CONNECTIONLOSS for 
 /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951920332
  retry=3

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5860) splitlogmanager should not unnecessarily resubmit tasks when zk unavailable

2012-04-23 Thread Prakash Khemani (JIRA)
Prakash Khemani created HBASE-5860:
--

 Summary: splitlogmanager should not unnecessarily resubmit tasks 
when zk unavailable
 Key: HBASE-5860
 URL: https://issues.apache.org/jira/browse/HBASE-5860
 Project: HBase
  Issue Type: Improvement
Reporter: Prakash Khemani
Assignee: Prakash Khemani


(Doesn't really impact the run time or correctness of log splitting)

say the master has lost connection to zk. splitlogmanager's timeoutmanager will 
realize that all the tasks that were submitted are still unassigned. It will 
resubmit those tasks (i.e. create dummy znodes)

splitlogmanager should realze that the tasks are unassigned but their znodes 
have not been created.


012-04-20 13:11:20,516 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
dead splitlog worker msgstore295.snc4.facebook.com,60020,1334948757026
2012-04-20 13:11:20,517 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
Scheduling batch of logs to split
2012-04-20 13:11:20,517 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
started splitting logs in 
[hdfs://msgstore215.snc4.facebook.com:9000/MSGSTORE215-SNC4-HBASE/.logs/msgstore295.snc4.facebook.com,60020,1334948757026-splitting]
2012-04-20 13:11:20,565 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
connection to server msgstore235.snc4.facebook.com/10.30.222.186:2181
2012-04-20 13:11:20,566 INFO org.apache.zookeeper.ClientCnxn: Socket connection 
established to msgstore235.snc4.facebook.com/10.30.222.186:2181, initiating 
session
2012-04-20 13:11:20,575 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
total tasks = 4 unassigned = 4
2012-04-20 13:11:20,576 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
resubmitting unassigned task(s) after timeout
2012-04-20 13:11:21,577 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
resubmitting unassigned task(s) after timeout
2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read 
additional data from server sessionid 0x36ccb0f8010002, likely server has 
closed socket, closing socket connection and attempting reconnect
2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read 
additional data from server sessionid 0x136ccb0f489, likely server has 
closed socket, closing socket connection and attempting reconnect
2012-04-20 13:11:21,786 WARN 
org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc 
=CONNECTIONLOSS for 
/hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951586677
 retry=3
2012-04-20 13:11:21,786 WARN 
org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc 
=CONNECTIONLOSS for 
/hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951920332
 retry=3

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4007) distributed log splitting can get indefinitely stuck

2011-09-07 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13100085#comment-13100085
 ] 

Prakash Khemani commented on HBASE-4007:


Hi Stack, I have not pushed this out to production yet ... and the way things 
are it will be a while before we do the next push to the hbase-90 tiers.

I will try to get some cluster testing done and will update this thread.

Regarding the use of ConcurrentHashMap as opposed to HashSet + ObjectLock :  I 
could not find any nice way to take a snapshot of a concurrent-hash-map. The 
way the code is written I need to take a snapshot of the deadWorkers set.

I have just rebased. I will try to put  it up in the reviewboard one more time.

Thanks,
Prakash



 distributed log splitting can get indefinitely stuck
 

 Key: HBASE-4007
 URL: https://issues.apache.org/jira/browse/HBASE-4007
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani
Assignee: Prakash Khemani
Priority: Critical
 Fix For: 0.92.0

 Attachments: 
 0001-HBASE-4007-distributed-log-splitting-can-get-indefin.patch


 After the configured number of retries SplitLogManager is not going to 
 resubmit log-split tasks. In this situation even if the splitLogWorker that 
 owns the task dies the task will not get resubmitted.
 When a regionserver goes away then all the split-log tasks that it owned 
 should be resubmitted by the SplitLogMaster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4007) distributed log splitting can get indefinitely stuck

2011-08-25 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091168#comment-13091168
 ] 

Prakash Khemani commented on HBASE-4007:


I am not running the patch yet. It is up internally for review.



 distributed log splitting can get indefinitely stuck
 

 Key: HBASE-4007
 URL: https://issues.apache.org/jira/browse/HBASE-4007
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani
Assignee: Prakash Khemani
Priority: Critical
 Attachments: 
 0001-HBASE-4007-distributed-log-splitting-can-get-indefin.patch


 After the configured number of retries SplitLogManager is not going to 
 resubmit log-split tasks. In this situation even if the splitLogWorker that 
 owns the task dies the task will not get resubmitted.
 When a regionserver goes away then all the split-log tasks that it owned 
 should be resubmitted by the SplitLogMaster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4007) distributed log splitting can get indefinitely stuck

2011-08-25 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091345#comment-13091345
 ] 

Prakash Khemani commented on HBASE-4007:


(I had tried using reviewboard yesterday but it was fataling on me - Internal 
Server Error 500. Thanks for reviewing the patch the hard way)
==

For deadWorkers - one of the goals was to not block handleDeadWorkers() and 
worry about deadlocks etc. I could have used ConcurrentSet but I was not sure 
of the semantics - how an iterator behaves when another item is added.

I will change this to use ConcurrentSet ... please let me know.


==

I will change registerHeartbeat() to hearbeat().



 distributed log splitting can get indefinitely stuck
 

 Key: HBASE-4007
 URL: https://issues.apache.org/jira/browse/HBASE-4007
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani
Assignee: Prakash Khemani
Priority: Critical
 Attachments: 
 0001-HBASE-4007-distributed-log-splitting-can-get-indefin.patch


 After the configured number of retries SplitLogManager is not going to 
 resubmit log-split tasks. In this situation even if the splitLogWorker that 
 owns the task dies the task will not get resubmitted.
 When a regionserver goes away then all the split-log tasks that it owned 
 should be resubmitted by the SplitLogMaster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4007) distributed log splitting can get indefinitely stuck

2011-08-24 Thread Prakash Khemani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Khemani updated HBASE-4007:
---

Status: Patch Available  (was: Open)

fixes
(1) buildup of RESCAN zookeeper nodes in the event that all the regionservers 
are down
(2) a bug in tracking when the last RESCAN node was created which could lead to 
too frequent RESCAN node creation
(3) if master/splitlogmanager fails to complete the task handed over by a 
region-server/worker then keep retrying indefinitely
(4) if a regionserver/worker dies then ensure that all its tasks are resubmitted

 distributed log splitting can get indefinitely stuck
 

 Key: HBASE-4007
 URL: https://issues.apache.org/jira/browse/HBASE-4007
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani
Assignee: Prakash Khemani

 After the configured number of retries SplitLogManager is not going to 
 resubmit log-split tasks. In this situation even if the splitLogWorker that 
 owns the task dies the task will not get resubmitted.
 When a regionserver goes away then all the split-log tasks that it owned 
 should be resubmitted by the SplitLogMaster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4007) distributed log splitting can get indefinitely stuck

2011-08-24 Thread Prakash Khemani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Khemani updated HBASE-4007:
---

Status: Open  (was: Patch Available)

 distributed log splitting can get indefinitely stuck
 

 Key: HBASE-4007
 URL: https://issues.apache.org/jira/browse/HBASE-4007
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani
Assignee: Prakash Khemani

 After the configured number of retries SplitLogManager is not going to 
 resubmit log-split tasks. In this situation even if the splitLogWorker that 
 owns the task dies the task will not get resubmitted.
 When a regionserver goes away then all the split-log tasks that it owned 
 should be resubmitted by the SplitLogMaster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4007) distributed log splitting can get indefinitely stuck

2011-08-24 Thread Prakash Khemani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Khemani updated HBASE-4007:
---

Attachment: 0001-HBASE-4007-distributed-log-splitting-can-get-indefin.patch

patch

 distributed log splitting can get indefinitely stuck
 

 Key: HBASE-4007
 URL: https://issues.apache.org/jira/browse/HBASE-4007
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani
Assignee: Prakash Khemani
 Attachments: 
 0001-HBASE-4007-distributed-log-splitting-can-get-indefin.patch


 After the configured number of retries SplitLogManager is not going to 
 resubmit log-split tasks. In this situation even if the splitLogWorker that 
 owns the task dies the task will not get resubmitted.
 When a regionserver goes away then all the split-log tasks that it owned 
 should be resubmitted by the SplitLogMaster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

2011-07-26 Thread Prakash Khemani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Khemani updated HBASE-3845:
---

Attachment: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch

patch deployed internally in facebook

 data loss because lastSeqWritten can miss memstore edits
 

 Key: HBASE-3845
 URL: https://issues.apache.org/jira/browse/HBASE-3845
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.3
Reporter: Prakash Khemani
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.90.5

 Attachments: 
 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, 
 HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, 
 HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, 
 HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch


 (I don't have a test case to prove this yet but I have run it by Dhruba and 
 Kannan internally and wanted to put this up for some feedback.)
 In this discussion let us assume that the region has only one column family. 
 That way I can use region/memstore interchangeably.
 After a memstore flush it is possible for lastSeqWritten to have a 
 log-sequence-id for a region that is not the earliest log-sequence-id for 
 that region's memstore.
 HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure 
 that we only keep track  of the earliest log-sequence-number that is present 
 in the memstore.
 Every time the memstore is flushed we remove the region's entry in 
 lastSequenceWritten and wait for the next append to populate this entry 
 again. This is where the problem happens.
 step 1:
 flusher.prepare() snapshots the memstore under 
 HRegion.updatesLock.writeLock().
 step 2 :
 as soon as the updatesLock.writeLock() is released new entries will be added 
 into the memstore.
 step 3 :
 wal.completeCacheFlush() is called. This method removes the region's entry 
 from lastSeqWritten.
 step 4:
 the next append will create a new entry for the region in lastSeqWritten(). 
 But this will be the log seq id of the current append. All the edits that 
 were added in step 2 are missing.
 ==
 as a temporary measure, instead of removing the region's entry in step 3 I 
 will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

2011-07-26 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071086#comment-13071086
 ] 

Prakash Khemani commented on HBASE-3845:


patch deployed internally in facebook 
0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch

 data loss because lastSeqWritten can miss memstore edits
 

 Key: HBASE-3845
 URL: https://issues.apache.org/jira/browse/HBASE-3845
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.3
Reporter: Prakash Khemani
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.90.5

 Attachments: 
 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, 
 HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, 
 HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, 
 HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch


 (I don't have a test case to prove this yet but I have run it by Dhruba and 
 Kannan internally and wanted to put this up for some feedback.)
 In this discussion let us assume that the region has only one column family. 
 That way I can use region/memstore interchangeably.
 After a memstore flush it is possible for lastSeqWritten to have a 
 log-sequence-id for a region that is not the earliest log-sequence-id for 
 that region's memstore.
 HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure 
 that we only keep track  of the earliest log-sequence-number that is present 
 in the memstore.
 Every time the memstore is flushed we remove the region's entry in 
 lastSequenceWritten and wait for the next append to populate this entry 
 again. This is where the problem happens.
 step 1:
 flusher.prepare() snapshots the memstore under 
 HRegion.updatesLock.writeLock().
 step 2 :
 as soon as the updatesLock.writeLock() is released new entries will be added 
 into the memstore.
 step 3 :
 wal.completeCacheFlush() is called. This method removes the region's entry 
 from lastSeqWritten.
 step 4:
 the next append will create a new entry for the region in lastSeqWritten(). 
 But this will be the log seq id of the current append. All the edits that 
 were added in step 2 are missing.
 ==
 as a temporary measure, instead of removing the region's entry in step 3 I 
 will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

2011-07-25 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070542#comment-13070542
 ] 

Prakash Khemani commented on HBASE-3845:


In the patch that is deployed internally we have implemented a different 
approach. We remove the region's entry in startCacheFlush() and save it (as 
opposed to the current behavior of removing the entry in completeCacheFlush()). 
If the flush aborts then we restore the saved entry.

The approach taken in the latest patch in this jira might also be OK. I have a 
few comments

{noformat}
   this.lastSeqWritten.remove(encodedRegionName);
+  Long seqWhileFlush = 
this.seqWrittenWhileFlush.get(encodedRegionName);
+  if (null != seqWhileFlush) {
+this.lastSeqWritten.putIfAbsent(encodedRegionName, seqWhileFlush);
+this.seqWrittenWhileFlush.remove(encodedRegionName);
+   
{noformat}

seqWrittenWhileFlush .get() and subsequent .remove() can be replaced by a 
single .remove()
{code}
Long seqWhileFlush = this.seqWrittenWhileFlush.remove(encodedRegionName);
if (null != seqWhileFlush) {
  lSW.put(encodedRegionName, seqWhileFlush);
else
  lSW.remove(encodedRegionName);
{code}

==
The bigger problem here is that completeCacheFlush() is not called with 
updatedLock acquired. Therefore there might still be correctness issues with 
the latest patch.

==

{noformat}
   public void abortCacheFlush() {
+this.isFlushInProgress.set(false);
 this.cacheFlushLock.unlock();
   }
{noformat}
shouldn't seqWrittenWhileFlush be cleaned up in abort cache?


 data loss because lastSeqWritten can miss memstore edits
 

 Key: HBASE-3845
 URL: https://issues.apache.org/jira/browse/HBASE-3845
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.3
Reporter: Prakash Khemani
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.90.5

 Attachments: HBASE-3845_1.patch, HBASE-3845_2.patch, 
 HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845__trunk.patch, 
 HBASE-3845_trunk_2.patch


 (I don't have a test case to prove this yet but I have run it by Dhruba and 
 Kannan internally and wanted to put this up for some feedback.)
 In this discussion let us assume that the region has only one column family. 
 That way I can use region/memstore interchangeably.
 After a memstore flush it is possible for lastSeqWritten to have a 
 log-sequence-id for a region that is not the earliest log-sequence-id for 
 that region's memstore.
 HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure 
 that we only keep track  of the earliest log-sequence-number that is present 
 in the memstore.
 Every time the memstore is flushed we remove the region's entry in 
 lastSequenceWritten and wait for the next append to populate this entry 
 again. This is where the problem happens.
 step 1:
 flusher.prepare() snapshots the memstore under 
 HRegion.updatesLock.writeLock().
 step 2 :
 as soon as the updatesLock.writeLock() is released new entries will be added 
 into the memstore.
 step 3 :
 wal.completeCacheFlush() is called. This method removes the region's entry 
 from lastSeqWritten.
 step 4:
 the next append will create a new entry for the region in lastSeqWritten(). 
 But this will be the log seq id of the current append. All the edits that 
 were added in step 2 are missing.
 ==
 as a temporary measure, instead of removing the region's entry in step 3 I 
 will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

2011-07-25 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070614#comment-13070614
 ] 

Prakash Khemani commented on HBASE-3845:


In the method internalFlushcache() I don't see updatesLock.writeLock() being 
held around the following piece of code.

{code}
if (wal != null) {
  wal.completeCacheFlush(this.regionInfo.getEncodedNameAsBytes(),
regionInfo.getTableDesc().getName(), completeSequenceId,
this.getRegionInfo().isMetaRegion());
}
{code}

==

I will upload the internal patch for reference ...





 data loss because lastSeqWritten can miss memstore edits
 

 Key: HBASE-3845
 URL: https://issues.apache.org/jira/browse/HBASE-3845
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.3
Reporter: Prakash Khemani
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.90.5

 Attachments: HBASE-3845_1.patch, HBASE-3845_2.patch, 
 HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, 
 HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch


 (I don't have a test case to prove this yet but I have run it by Dhruba and 
 Kannan internally and wanted to put this up for some feedback.)
 In this discussion let us assume that the region has only one column family. 
 That way I can use region/memstore interchangeably.
 After a memstore flush it is possible for lastSeqWritten to have a 
 log-sequence-id for a region that is not the earliest log-sequence-id for 
 that region's memstore.
 HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure 
 that we only keep track  of the earliest log-sequence-number that is present 
 in the memstore.
 Every time the memstore is flushed we remove the region's entry in 
 lastSequenceWritten and wait for the next append to populate this entry 
 again. This is where the problem happens.
 step 1:
 flusher.prepare() snapshots the memstore under 
 HRegion.updatesLock.writeLock().
 step 2 :
 as soon as the updatesLock.writeLock() is released new entries will be added 
 into the memstore.
 step 3 :
 wal.completeCacheFlush() is called. This method removes the region's entry 
 from lastSeqWritten.
 step 4:
 the next append will create a new entry for the region in lastSeqWritten(). 
 But this will be the log seq id of the current append. All the edits that 
 were added in step 2 are missing.
 ==
 as a temporary measure, instead of removing the region's entry in step 3 I 
 will replace it with the log-seq-id of the region-flush-event.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4007) distributed log splitting can get indefinitely stuck

2011-06-20 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052067#comment-13052067
 ] 

Prakash Khemani commented on HBASE-4007:


@mingjian What you are talking about is probably a different issue. The 
scenario you have described can happen when 1. Master puts up a task. 2. No one 
acquires the task. 3. Master puts up a RESCAN node asking everyone to re-look 
at the zk splitlog task list.

The bug described in this jira will happen in the following way (I have not 
encountered it yet but should be easy to reproduce)

a/ A splitlog task is slow. Master has already moved the task from one worker 
to another 3 times. It is with the 4th worker now. Even if the 4th worker takes 
too long doing this task the master is not going to do anything about it.

b/ the 4th worker dies.

c/ the task will hang. 

Master has to resubmit the task when the 4th worker dies. 



 distributed log splitting can get indefinitely stuck
 

 Key: HBASE-4007
 URL: https://issues.apache.org/jira/browse/HBASE-4007
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani
Assignee: Prakash Khemani

 After the configured number of retries SplitLogManager is not going to 
 resubmit log-split tasks. In this situation even if the splitLogWorker that 
 owns the task dies the task will not get resubmitted.
 When a regionserver goes away then all the split-log tasks that it owned 
 should be resubmitted by the SplitLogMaster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4007) distributed log splitting can get indefinitely stuck

2011-06-18 Thread Prakash Khemani (JIRA)
distributed log splitting can get indefinitely stuck


 Key: HBASE-4007
 URL: https://issues.apache.org/jira/browse/HBASE-4007
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani
Assignee: Prakash Khemani


After the configured number of retries SplitLogManager is not going to resubmit 
log-split tasks. In this situation even if the splitLogWorker that owns the 
task dies the task will not get resubmitted.

When a regionserver goes away then all the split-log tasks that it owned should 
be resubmitted by the SplitLogMaster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3963) Schedule all log-spliiting at startup all at once

2011-06-11 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13048248#comment-13048248
 ] 

Prakash Khemani commented on HBASE-3963:


Patch looks good to me. Thanks.



 Schedule all log-spliiting at startup all at once
 -

 Key: HBASE-3963
 URL: https://issues.apache.org/jira/browse/HBASE-3963
 Project: HBase
  Issue Type: Improvement
Reporter: Prakash Khemani
Assignee: Prakash Khemani
 Attachments: schedule-all-splitlog.patch


 When distributed log splitting is enabled then it is better to call 
 splitLog() for all region servers simultaneously. A large number of splitlog 
 tasks will get scheduled - one for each log file. But a splitlog-worker 
 (region server) executes only one task at a time and there shouldn't be a 
 danger of DFS overload. Scheduling all the tasks at once ensures maximum 
 parallelism.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-3963) Schedule all log-spliiting at startup all at once

2011-06-08 Thread Prakash Khemani (JIRA)
Schedule all log-spliiting at startup all at once
-

 Key: HBASE-3963
 URL: https://issues.apache.org/jira/browse/HBASE-3963
 Project: HBase
  Issue Type: Improvement
Reporter: Prakash Khemani
Assignee: Prakash Khemani


When distributed log splitting is enabled then it is better to call splitLog() 
for all region servers simultaneously. A large number of splitlog tasks will 
get scheduled - one for each log file. But a splitlog-worker (region server) 
executes only one task at a time and there shouldn't be a danger of DFS 
overload. Scheduling all the tasks at once ensures maximum parallelism.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1364) [performance] Distributed splitting of regionserver commit logs

2011-06-08 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046044#comment-13046044
 ] 

Prakash Khemani commented on HBASE-1364:


Filed https://issues.apache.org/jira/browse/HBASE-3963. Will try to get this 
done.



 [performance] Distributed splitting of regionserver commit logs
 ---

 Key: HBASE-1364
 URL: https://issues.apache.org/jira/browse/HBASE-1364
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: stack
Assignee: Prakash Khemani
Priority: Critical
 Fix For: 0.92.0

 Attachments: 1364-v5.txt, HBASE-1364.patch, 
 org.apache.hadoop.hbase.master.TestDistributedLogSplitting-output.txt

  Time Spent: 8h
  Remaining Estimate: 0h

 HBASE-1008 has some improvements to our log splitting on regionserver crash; 
 but it needs to run even faster.
 (Below is from HBASE-1008)
 In bigtable paper, the split is distributed. If we're going to have 1000 
 logs, we need to distribute or at least multithread the splitting.
 1. As is, regions starting up expect to find one reconstruction log only. 
 Need to make it so pick up a bunch of edit logs and it should be fine that 
 logs are elsewhere in hdfs in an output directory written by all split 
 participants whether multithreaded or a mapreduce-like distributed process 
 (Lets write our distributed sort first as a MR so we learn whats involved; 
 distributed sort, as much as possible should use MR framework pieces). On 
 startup, regions go to this directory and pick up the files written by split 
 participants deleting and clearing the dir when all have been read in. Making 
 it so can take multiple logs for input, can also make the split process more 
 robust rather than current tenuous process which loses all edits if it 
 doesn't make it to the end without error.
 2. Each column family rereads the reconstruction log to find its edits. Need 
 to fix that. Split can sort the edits by column family so store only reads 
 its edits.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3889) NPE in Distributed Log Splitting

2011-05-16 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034139#comment-13034139
 ] 

Prakash Khemani commented on HBASE-3889:


Thanks Lars for finding and providing a fix for this issue. I have a few minor 
comments on the patch ...

The following isn't really needed, the earlier check you put in should be good 
enough.
{code}
+if (wap == null) {
+  continue;
+}
{code}


It might be better to catch Throwable in SplitLogWorker.run() and print the 
Unexpected Error message there. It might not be a good thing to ignore an 
unexpected exception in SplitLogWorker.grabTask() and continue.
{nofrmat}
+++ src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java  
(working copy)
@@ -297,6 +297,8 @@
   }
   break;
   }
+} catch (Exception e) {
+  LOG.error(An error occurred., e);
 } finally {
   if (t  0) {
 LOG.info(worker  + serverName +  done with task  + path +
{noformat}





 NPE in Distributed Log Splitting
 

 Key: HBASE-3889
 URL: https://issues.apache.org/jira/browse/HBASE-3889
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0
 Environment: Pseudo-distributed on MacOS
Reporter: Lars George
Assignee: Lars George
 Fix For: 0.92.0

 Attachments: HBASE-3889.patch


 There is an issue with the log splitting under the specific condition of 
 edits belonging to a non existing region (which went away after a split for 
 example). The HLogSplitter fails to check the condition, which is handled on 
 a lower level, logging manifests it as 
 {noformat}
 2011-05-16 13:56:10,300 INFO 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: This region's 
 directory doesn't exist: 
 hdfs://localhost:8020/hbase/usertable/30c4d0a47703214845d0676d0c7b36f0. It is 
 very likely that it was already split so it's safe to discard those edits.
 {noformat}
 The code returns a null reference which is not check in 
 HLogSplitter.splitLogFileToTemp():
 {code}
 ...
 WriterAndPath wap = (WriterAndPath)o;
 if (wap == null) {
   wap = createWAP(region, entry, rootDir, tmpname, fs, conf);
   if (wap == null) {
 logWriters.put(region, BAD_WRITER);
   } else {
 logWriters.put(region, wap);
   }
 }
 wap.w.append(entry);
 ...
 {code}
 The createWAP does return null when the above message is logged based on 
 the obsolete region reference in the edit.
 What made this difficult to detect is that the error (and others) are 
 silently ignored in SplitLogWorker.grabTask(). I added a catch and error 
 logging to see the NPE that was caused by the above.
 {code}
 ...
   break;
   }
 } catch (Exception e) {
   LOG.error(An error occurred., e);
 } finally {
   if (t  0) {
 ...
 {code}
 As a side note, there are other errors/asserts triggered that this 
 try/finally not handles. For example
 {noformat}
 2011-05-16 13:58:30,647 WARN 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: BADVERSION failed to 
 assert ownership for 
 /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
 org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = 
 BadVersion for 
 /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:106)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.ownTask(SplitLogWorker.java:329)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.access$100(SplitLogWorker.java:68)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker$2.progress(SplitLogWorker.java:265)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:432)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:354)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:113)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:260)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:191)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:164)
 at java.lang.Thread.run(Thread.java:680)
 {noformat}
 This should probably be handled - or at 

[jira] [Commented] (HBASE-3890) Scheduled tasks in distributed log splitting not in sync with ZK

2011-05-16 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034149#comment-13034149
 ] 

Prakash Khemani commented on HBASE-3890:


With the bug you identified in HBASE-3889 this behavior is expected. The 
SplitLogManager will put up a task, a SplitLogWorker will pick it up and will 
never complete it because of the bug. Manager will resubmit the task and 
another worker will pick it up to never complete it. The Manager resubmits at 
most hbase.splitlog.max.resubmit (default = 3) times after which the task hangs.



 Scheduled tasks in distributed log splitting not in sync with ZK
 

 Key: HBASE-3890
 URL: https://issues.apache.org/jira/browse/HBASE-3890
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Lars George
 Fix For: 0.92.0


 This is in continuation to HBASE-3889:
 Note that there must be more slightly off here. Although the splitlogs znode 
 is now empty the master is still stuck here:
 {noformat}
 Doing distributed log split in 
 hdfs://localhost:8020/hbase/.logs/10.0.0.65,60020,1305406356765
 - Waiting for distributed tasks to finish. scheduled=2 done=1 error=0   4380s
 Master startup
 - Splitting logs after master startup   4388s
 {noformat}
 There seems to be an issue with what is in ZK and what the TaskBatch holds. 
 In my case it could be related to the fact that the task was already in ZK 
 after many faulty restarts because of the NPE. Maybe it was added once (since 
 that is keyed by path, and that is unique on my machine), but the reference 
 count upped twice? Now that the real one is done, the done counter has been 
 increased, but will never match the scheduled.
 The code could also check if ZK is actually depleted, and therefore treat the 
 scheduled task as bogus? This of course only treats the symptom, not the root 
 cause of this condition. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3828) region server stuck in waitOnAllRegionsToClose

2011-05-05 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029452#comment-13029452
 ] 

Prakash Khemani commented on HBASE-3828:


In my cluster this turned out to be a problem in code that we had modified 
internally. In the region server abort code path we had put in a check that if 
filesystem is unavailable then do not try to close regions. But the main thread 
anyway went ahead and waited for the regions to close. That was causing the 
hang in waitOnAllRegionsToClose(). (aside - there is an internal task on this 
... when append to HLog fails, hbase relies on dfsclient to close the 
filesystem for the regionserver abort to be triggered. That is very roundabout 
and there ought to be more direct and synchronous abort facility)

==

It is possible that there is no further synchronization necessary when a region 
is being opened. But I haven't looked at the code closely enough. What happens 
between the time when zk node is closed and the region is actually closed on 
the rs? When is the region removed fron onlineRegions - is it possible that one 
thread adds it and the other immediately removes it  ... I will try and spend 
some time on it this soon.

===



 region server stuck in waitOnAllRegionsToClose
 --

 Key: HBASE-3828
 URL: https://issues.apache.org/jira/browse/HBASE-3828
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3815) load balancer should ignore bad region servers

2011-05-05 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029454#comment-13029454
 ] 

Prakash Khemani commented on HBASE-3815:


Yeah, I think there is some overlap. In this case we want the region server to 
be excluded because of some internal logic in master.


 load balancer should ignore bad region servers
 --

 Key: HBASE-3815
 URL: https://issues.apache.org/jira/browse/HBASE-3815
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani

 the loadbalancer should remember which region server is constantly having 
 trouble opening regions and it should take that rs out of the equation ... 
 otherwise the lb goes into an unproductive loop ... 
 I don't have logs handy for this one.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits

2011-05-02 Thread Prakash Khemani (JIRA)
data loss because lastSeqWritten can miss memstore edits


 Key: HBASE-3845
 URL: https://issues.apache.org/jira/browse/HBASE-3845
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani


(I don't have a test case to prove this yet but I have run it by Dhruba and 
Kannan internally and wanted to put this up for some feedback.)

In this discussion let us assume that the region has only one column family. 
That way I can use region/memstore interchangeably.

After a memstore flush it is possible for lastSeqWritten to have a 
log-sequence-id for a region that is not the earliest log-sequence-id for that 
region's memstore.

HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure 
that we only keep track  of the earliest log-sequence-number that is present in 
the memstore.

Every time the memstore is flushed we remove the region's entry in 
lastSequenceWritten and wait for the next append to populate this entry again. 
This is where the problem happens.

step 1:
flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock().

step 2 :
as soon as the updatesLock.writeLock() is released new entries will be added 
into the memstore.

step 3 :
wal.completeCacheFlush() is called. This method removes the region's entry from 
lastSeqWritten.

step 4:
the next append will create a new entry for the region in lastSeqWritten(). But 
this will be the log seq id of the current append. All the edits that were 
added in step 2 are missing.

==

as a temporary measure, instead of removing the region's entry in step 3 I will 
replace it with the log-seq-id of the region-flush-event.



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-3843) splitLogWorker starts too early

2011-05-01 Thread Prakash Khemani (JIRA)
splitLogWorker starts too early
---

 Key: HBASE-3843
 URL: https://issues.apache.org/jira/browse/HBASE-3843
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani
Assignee: Prakash Khemani


splitlogworker should be started in startServiceThreads() instead of in 
initializeZookeeper(). This will ensure that the region server accepts a 
split-logging tasks only after it has successfully done reportForDuty() to the 
master.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3843) splitLogWorker starts too early

2011-05-01 Thread Prakash Khemani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Khemani updated HBASE-3843:
---

Status: Open  (was: Patch Available)

 splitLogWorker starts too early
 ---

 Key: HBASE-3843
 URL: https://issues.apache.org/jira/browse/HBASE-3843
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani
Assignee: Prakash Khemani

 splitlogworker should be started in startServiceThreads() instead of in 
 initializeZookeeper(). This will ensure that the region server accepts a 
 split-logging tasks only after it has successfully done reportForDuty() to 
 the master.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3843) splitLogWorker starts too early

2011-05-01 Thread Prakash Khemani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Khemani updated HBASE-3843:
---

Status: Patch Available  (was: Open)

 splitLogWorker starts too early
 ---

 Key: HBASE-3843
 URL: https://issues.apache.org/jira/browse/HBASE-3843
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani
Assignee: Prakash Khemani

 splitlogworker should be started in startServiceThreads() instead of in 
 initializeZookeeper(). This will ensure that the region server accepts a 
 split-logging tasks only after it has successfully done reportForDuty() to 
 the master.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3843) splitLogWorker starts too early

2011-05-01 Thread Prakash Khemani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Khemani updated HBASE-3843:
---

Status: Patch Available  (was: Open)

 splitLogWorker starts too early
 ---

 Key: HBASE-3843
 URL: https://issues.apache.org/jira/browse/HBASE-3843
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani
Assignee: Prakash Khemani
 Attachments: 
 0001-HBASE-3843-start-splitLogWorker-later-at-region-serv.patch


 splitlogworker should be started in startServiceThreads() instead of in 
 initializeZookeeper(). This will ensure that the region server accepts a 
 split-logging tasks only after it has successfully done reportForDuty() to 
 the master.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3843) splitLogWorker starts too early

2011-05-01 Thread Prakash Khemani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Khemani updated HBASE-3843:
---

Attachment: 0001-HBASE-3843-start-splitLogWorker-later-at-region-serv.patch

 splitLogWorker starts too early
 ---

 Key: HBASE-3843
 URL: https://issues.apache.org/jira/browse/HBASE-3843
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani
Assignee: Prakash Khemani
 Attachments: 
 0001-HBASE-3843-start-splitLogWorker-later-at-region-serv.patch


 splitlogworker should be started in startServiceThreads() instead of in 
 initializeZookeeper(). This will ensure that the region server accepts a 
 split-logging tasks only after it has successfully done reportForDuty() to 
 the master.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-3828) region server stuck in waitOnAllRegionsToClose

2011-04-28 Thread Prakash Khemani (JIRA)
region server stuck in waitOnAllRegionsToClose
--

 Key: HBASE-3828
 URL: https://issues.apache.org/jira/browse/HBASE-3828
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-3822) region server stuck in waitOnAllRegionsToClose

2011-04-28 Thread Prakash Khemani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Khemani resolved HBASE-3822.


  Resolution: Invalid
Release Note: The description is invalid. Will open a new one.

 region server stuck in waitOnAllRegionsToClose
 --

 Key: HBASE-3822
 URL: https://issues.apache.org/jira/browse/HBASE-3822
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani

 The regionserver is not able to exit because the rs thread is stuck here
 regionserver60020 prio=10 tid=0x2ab2b039e000 nid=0x760a waiting on 
 condition [0x4365e000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
 at java.lang.Thread.sleep(Native Method)
 at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:126)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.waitOnAllRegionsToClose(HRegionServer.java:736)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:689)
 at java.lang.Thread.run(Thread.java:619)
 ===
 In CloseRegionHandler.process() we do not call removeFromOnlineRegions() if 
 there is an exception. (In this case I suspect there was a log-rolling 
 exception because of another issue)
 // Close the region
 try {
   // TODO: If we need to keep updating CLOSING stamp to prevent against
   // a timeout if this is long-running, need to spin up a thread?
   if (region.close(abort) == null) {
 // This region got closed.  Most likely due to a split. So instead
 // of doing the setClosedState() below, let's just ignore and 
 continue.
 // The split message will clean up the master state.
 LOG.warn(Can't close region: was already closed during close():  +
   regionInfo.getRegionNameAsString());
 return;
   }
 } catch (IOException e) {
   LOG.error(Unrecoverable exception while closing region  +
 regionInfo.getRegionNameAsString() + , still finishing close, e);
 }
 this.rsServices.removeFromOnlineRegions(regionInfo.getEncodedName());
 ===
 I think we set the closing flag on the region, it won't be taking any more 
 requests, it is as good as offline.
 Either we should refine the check in waitOnAllRegionsToClose() or 
 CloseRegionHandler.process() should remove the region from online-regions set.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-3822) region server stuck in waitOnAllRegionsToClose

2011-04-26 Thread Prakash Khemani (JIRA)
region server stuck in waitOnAllRegionsToClose
--

 Key: HBASE-3822
 URL: https://issues.apache.org/jira/browse/HBASE-3822
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani


The regionserver is not able to exit because the rs thread is stuck here



regionserver60020 prio=10 tid=0x2ab2b039e000 nid=0x760a waiting on 
condition [0x4365e000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:126)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.waitOnAllRegionsToClose(HRegionServer.java:736)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:689)
at java.lang.Thread.run(Thread.java:619)


===

In CloseRegionHandler.process() we do not call removeFromOnlineRegions() if 
there is an exception. (In this case I suspect there was a log-rolling 
exception because of another issue)

// Close the region
try {
  // TODO: If we need to keep updating CLOSING stamp to prevent against
  // a timeout if this is long-running, need to spin up a thread?
  if (region.close(abort) == null) {
// This region got closed.  Most likely due to a split. So instead
// of doing the setClosedState() below, let's just ignore and continue.
// The split message will clean up the master state.
LOG.warn(Can't close region: was already closed during close():  +
  regionInfo.getRegionNameAsString());
return;
  }
} catch (IOException e) {
  LOG.error(Unrecoverable exception while closing region  +
regionInfo.getRegionNameAsString() + , still finishing close, e);
}

this.rsServices.removeFromOnlineRegions(regionInfo.getEncodedName());


===

I think we set the closing flag on the region, it won't be taking any more 
requests, it is as good as offline.

Either we should refine the check in waitOnAllRegionsToClose() or 
CloseRegionHandler.process() should remove the region from online-regions set.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3822) region server stuck in waitOnAllRegionsToClose

2011-04-26 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025443#comment-13025443
 ] 

Prakash Khemani commented on HBASE-3822:


The code snippet that I pointed out doesn't have a problem - that piece of code 
will remove the region from online regions even if there is an exception. Sorry 
for the confusion. I don't really know why the onlineRegions set was not 
cleaned up.

 region server stuck in waitOnAllRegionsToClose
 --

 Key: HBASE-3822
 URL: https://issues.apache.org/jira/browse/HBASE-3822
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani

 The regionserver is not able to exit because the rs thread is stuck here
 regionserver60020 prio=10 tid=0x2ab2b039e000 nid=0x760a waiting on 
 condition [0x4365e000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
 at java.lang.Thread.sleep(Native Method)
 at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:126)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.waitOnAllRegionsToClose(HRegionServer.java:736)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:689)
 at java.lang.Thread.run(Thread.java:619)
 ===
 In CloseRegionHandler.process() we do not call removeFromOnlineRegions() if 
 there is an exception. (In this case I suspect there was a log-rolling 
 exception because of another issue)
 // Close the region
 try {
   // TODO: If we need to keep updating CLOSING stamp to prevent against
   // a timeout if this is long-running, need to spin up a thread?
   if (region.close(abort) == null) {
 // This region got closed.  Most likely due to a split. So instead
 // of doing the setClosedState() below, let's just ignore and 
 continue.
 // The split message will clean up the master state.
 LOG.warn(Can't close region: was already closed during close():  +
   regionInfo.getRegionNameAsString());
 return;
   }
 } catch (IOException e) {
   LOG.error(Unrecoverable exception while closing region  +
 regionInfo.getRegionNameAsString() + , still finishing close, e);
 }
 this.rsServices.removeFromOnlineRegions(regionInfo.getEncodedName());
 ===
 I think we set the closing flag on the region, it won't be taking any more 
 requests, it is as good as offline.
 Either we should refine the check in waitOnAllRegionsToClose() or 
 CloseRegionHandler.process() should remove the region from online-regions set.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-3823) NPE in ZKAssign.transitionNode

2011-04-26 Thread Prakash Khemani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Khemani resolved HBASE-3823.


  Resolution: Duplicate
Release Note: fixed in HBASE-3627

 NPE in ZKAssign.transitionNode
 --

 Key: HBASE-3823
 URL: https://issues.apache.org/jira/browse/HBASE-3823
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani

 This issue led to a region being multiply assigned.
 hbck output
 ERROR: Region 
 realtime_domain_imps_urls,afbe,1295556905482.e7a478b4bd164525052f1dedb832de0a.
  is listed in META on region server pumahbase107.snc5.facebook.com:60020 but 
 is multiply assigned to region servers pumahbase150.snc5.facebook.com:60020, 
 pumahbase107.snc5.facebook.com:60020
 ===
 2011-04-25 09:11:36,844 ERROR org.apache.hadoop.hbase.executor.EventHandler: 
 Caught throwable while processing event M_RS_OPEN_REGION
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75)
 at 
 org.apache.hadoop.hbase.executor.RegionTransitionData.fromBytes(RegionTransitionData.java:198)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:672)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNodeOpened(ZKAssign.java:621)
 at 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:168)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 byte [] existingBytes =
   ZKUtil.getDataNoWatch(zkw, node, stat);
 RegionTransitionData existingData =
   RegionTransitionData.fromBytes(existingBytes);
 existingBytes can be null. have to return -1 if null.
 ===
 master logs
 2011-04-25 05:24:03,250 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Creating writer 
 path=hdfs://pumahbase002-snc5-dfs.data.facebook.com:9000/PUMAHBASE002-SNC5-HBASE/realtime_domain_imps_urls/e7a478b4bd164525052f1dedb832de0a/recovered.edits/57528037047
  region=e7a478b4bd164525052f1dedb832de0a
 2011-04-25 09:09:19,246 INFO 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Closed path 
 hdfs://pumahbase002-snc5-dfs.data.facebook.com:9000/PUMAHBASE002-SNC5-HBASE/realtime_domain_imps_urls/e7a478b4bd164525052f1dedb832de0a/recovered.edits/57528037047
  (wrote 4342690 edits in 46904ms)
 2011-04-25 09:09:26,134 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x32f7bb74e8a Creating (or updating) unassigned node for 
 e7a478b4bd164525052f1dedb832de0a with OFFLINE state
 2011-04-25 09:09:26,136 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan 
 was found (or we are ignoring an existing plan) for 
 realtime_domain_imps_urls,afbe,1295556905482.e7a478b4bd164525052f1dedb832de0a.
  so generated a random one; 
 hri=realtime_domain_imps_urls,afbe,1295556905482.e7a478b4bd164525052f1dedb832de0a.,
  src=, dest=pumahbase107.snc5.facebook.com,60020,1303450731227; 70 
 (online=70, exclude=null) available servers
 2011-04-25 09:09:26,136 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 realtime_domain_imps_urls,afbe,1295556905482.e7a478b4bd164525052f1dedb832de0a.
  to pumahbase107.snc5.facebook.com,60020,1303450731227
 2011-04-25 09:09:26,139 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, 
 server=pumahbase107.snc5.facebook.com,60020,1303450731227, 
 region=e7a478b4bd164525052f1dedb832de0a
 2011-04-25 09:09:44,045 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, 
 server=pumahbase107.snc5.facebook.com,60020,1303450731227, 
 region=e7a478b4bd164525052f1dedb832de0a
 2011-04-25 09:09:59,050 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, 
 server=pumahbase107.snc5.facebook.com,60020,1303450731227, 
 region=e7a478b4bd164525052f1dedb832de0a
 2011-04-25 09:10:14,054 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, 
 server=pumahbase107.snc5.facebook.com,60020,1303450731227, 
 region=e7a478b4bd164525052f1dedb832de0a
 2011-04-25 09:10:29,055 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, 
 server=pumahbase107.snc5.facebook.com,60020,1303450731227, 
 region=e7a478b4bd164525052f1dedb832de0a
 2011-04-25 09:10:44,060 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, 
 

[jira] [Created] (HBASE-3824) region server timed out during open region

2011-04-26 Thread Prakash Khemani (JIRA)
region server timed out during open region
--

 Key: HBASE-3824
 URL: https://issues.apache.org/jira/browse/HBASE-3824
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani


When replaying a large log file, mestore flushes can happen. But there is no 
Progressible report being sent during memstore flushes. That can lead to master 
timing out the region server during region open.

===
Another related issue and Jonathan's response

 So if a region server that is handed a region for opening and has done part of
 the work ... it has created some HFiles (because the logs were so huge that
 the mestore got flushed while the logs were being replayed) ... and then it is
 asked to give up because the master thought the region server was taking
 too long to open the region.
 
 When the region server gives up on the region then will it make sure that it
 removes all the HFiles it had created for that region?


Will need to check the code, but would it matter?  One issue is whether it 
cleans up after itself (I'm guessing not).  Another issue is whether the replay 
is idempotent (duplicate KVs across files shouldn't matter in most cases).

===

2011-04-25 09:11:36,844 ERROR org.apache.hadoop.hbase.executor.EventHandler: 
Caught throwable while processing event M_RS_OPEN_REGION
java.lang.NullPointerException
at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75)
at 
org.apache.hadoop.hbase.executor.RegionTransitionData.fromBytes(RegionTransitionData.java:198)
at org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:672)
at 
org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNodeOpened(ZKAssign.java:621)
at 
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:168)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

byte [] existingBytes =
ZKUtil.getDataNoWatch(zkw, node, stat);
RegionTransitionData existingData =
RegionTransitionData.fromBytes(existingBytes);

existingBytes can be null. have to return -1 if null.


===

master logs

2011-04-25 05:24:03,250 DEBUG 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Creating writer 
path=hdfs://pumahbase002-snc5-dfs.data.facebook.com:9000/PUMAHBASE002-SNC5-HBASE/realtime_domain_imps_urls/e7a478b4bd164525052f1dedb832de0a/recovered.edits/57528037047
 region=e7a478b4bd164525052f1dedb832de0a
2011-04-25 09:09:19,246 INFO 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Closed path 
hdfs://pumahbase002-snc5-dfs.data.facebook.com:9000/PUMAHBASE002-SNC5-HBASE/realtime_domain_imps_urls/e7a478b4bd164525052f1dedb832de0a/recovered.edits/57528037047
 (wrote 4342690 edits in 46904ms)
2011-04-25 09:09:26,134 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:6-0x32f7bb74e8a Creating (or updating) unassigned node for 
e7a478b4bd164525052f1dedb832de0a with OFFLINE state
2011-04-25 09:09:26,136 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
No previous transition plan was found (or we are ignoring an existing plan) for 
realtime_domain_imps_urls,afbe,1295556905482.e7a478b4bd164525052f1dedb832de0a.
 so generated a random one; 
hri=realtime_domain_imps_urls,afbe,1295556905482.e7a478b4bd164525052f1dedb832de0a.,
 src=, dest=pumahbase107.snc5.facebook.com,60020,1303450731227; 70 (online=70, 
exclude=null) available servers
2011-04-25 09:09:26,136 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Assigning region 
realtime_domain_imps_urls,afbe,1295556905482.e7a478b4bd164525052f1dedb832de0a.
 to pumahbase107.snc5.facebook.com,60020,1303450731227
2011-04-25 09:09:26,139 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENING, 
server=pumahbase107.snc5.facebook.com,60020,1303450731227, 
region=e7a478b4bd164525052f1dedb832de0a
2011-04-25 09:09:44,045 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENING, 
server=pumahbase107.snc5.facebook.com,60020,1303450731227, 
region=e7a478b4bd164525052f1dedb832de0a
2011-04-25 09:09:59,050 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENING, 
server=pumahbase107.snc5.facebook.com,60020,1303450731227, 
region=e7a478b4bd164525052f1dedb832de0a
2011-04-25 09:10:14,054 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENING, 
server=pumahbase107.snc5.facebook.com,60020,1303450731227, 
region=e7a478b4bd164525052f1dedb832de0a
2011-04-25 09:10:29,055 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENING, 
server=pumahbase107.snc5.facebook.com,60020,1303450731227, 

[jira] [Resolved] (HBASE-3824) region server timed out during open region

2011-04-26 Thread Prakash Khemani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Khemani resolved HBASE-3824.


Resolution: Not A Problem

 region server timed out during open region
 --

 Key: HBASE-3824
 URL: https://issues.apache.org/jira/browse/HBASE-3824
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani

 When replaying a large log file, mestore flushes can happen. But there is no 
 Progressible report being sent during memstore flushes. That can lead to 
 master timing out the region server during region open.
 ===
 Another related issue and Jonathan's response
  So if a region server that is handed a region for opening and has done part 
  of
  the work ... it has created some HFiles (because the logs were so huge that
  the mestore got flushed while the logs were being replayed) ... and then it 
  is
  asked to give up because the master thought the region server was taking
  too long to open the region.
  
  When the region server gives up on the region then will it make sure that it
  removes all the HFiles it had created for that region?
 Will need to check the code, but would it matter?  One issue is whether it 
 cleans up after itself (I'm guessing not).  Another issue is whether the 
 replay is idempotent (duplicate KVs across files shouldn't matter in most 
 cases).
 ===
 2011-04-25 09:11:36,844 ERROR org.apache.hadoop.hbase.executor.EventHandler: 
 Caught throwable while processing event M_RS_OPEN_REGION
 java.lang.NullPointerException
 at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75)
 at 
 org.apache.hadoop.hbase.executor.RegionTransitionData.fromBytes(RegionTransitionData.java:198)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:672)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNodeOpened(ZKAssign.java:621)
 at 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:168)
 at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 byte [] existingBytes =
 ZKUtil.getDataNoWatch(zkw, node, stat);
 RegionTransitionData existingData =
 RegionTransitionData.fromBytes(existingBytes);
 existingBytes can be null. have to return -1 if null.
 ===
 master logs
 2011-04-25 05:24:03,250 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Creating writer 
 path=hdfs://pumahbase002-snc5-dfs.data.facebook.com:9000/PUMAHBASE002-SNC5-HBASE/realtime_domain_imps_urls/e7a478b4bd164525052f1dedb832de0a/recovered.edits/57528037047
  region=e7a478b4bd164525052f1dedb832de0a
 2011-04-25 09:09:19,246 INFO 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Closed path 
 hdfs://pumahbase002-snc5-dfs.data.facebook.com:9000/PUMAHBASE002-SNC5-HBASE/realtime_domain_imps_urls/e7a478b4bd164525052f1dedb832de0a/recovered.edits/57528037047
  (wrote 4342690 edits in 46904ms)
 2011-04-25 09:09:26,134 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x32f7bb74e8a Creating (or updating) unassigned node for 
 e7a478b4bd164525052f1dedb832de0a with OFFLINE state
 2011-04-25 09:09:26,136 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan 
 was found (or we are ignoring an existing plan) for 
 realtime_domain_imps_urls,afbe,1295556905482.e7a478b4bd164525052f1dedb832de0a.
  so generated a random one; 
 hri=realtime_domain_imps_urls,afbe,1295556905482.e7a478b4bd164525052f1dedb832de0a.,
  src=, dest=pumahbase107.snc5.facebook.com,60020,1303450731227; 70 
 (online=70, exclude=null) available servers
 2011-04-25 09:09:26,136 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 realtime_domain_imps_urls,afbe,1295556905482.e7a478b4bd164525052f1dedb832de0a.
  to pumahbase107.snc5.facebook.com,60020,1303450731227
 2011-04-25 09:09:26,139 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, 
 server=pumahbase107.snc5.facebook.com,60020,1303450731227, 
 region=e7a478b4bd164525052f1dedb832de0a
 2011-04-25 09:09:44,045 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, 
 server=pumahbase107.snc5.facebook.com,60020,1303450731227, 
 region=e7a478b4bd164525052f1dedb832de0a
 2011-04-25 09:09:59,050 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, 
 server=pumahbase107.snc5.facebook.com,60020,1303450731227, 
 region=e7a478b4bd164525052f1dedb832de0a
 2011-04-25 09:10:14,054 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 

[jira] [Commented] (HBASE-3824) region server timed out during open region

2011-04-26 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025491#comment-13025491
 ] 

Prakash Khemani commented on HBASE-3824:


Probably not an issue. The memstore flush happens in the background and cannot 
cause the log-replay thread to block. My mistake. I will close this.

 



 region server timed out during open region
 --

 Key: HBASE-3824
 URL: https://issues.apache.org/jira/browse/HBASE-3824
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani

 When replaying a large log file, mestore flushes can happen. But there is no 
 Progressible report being sent during memstore flushes. That can lead to 
 master timing out the region server during region open.
 ===
 Another related issue and Jonathan's response
  So if a region server that is handed a region for opening and has done part 
  of
  the work ... it has created some HFiles (because the logs were so huge that
  the mestore got flushed while the logs were being replayed) ... and then it 
  is
  asked to give up because the master thought the region server was taking
  too long to open the region.
  
  When the region server gives up on the region then will it make sure that it
  removes all the HFiles it had created for that region?
 Will need to check the code, but would it matter?  One issue is whether it 
 cleans up after itself (I'm guessing not).  Another issue is whether the 
 replay is idempotent (duplicate KVs across files shouldn't matter in most 
 cases).
 ===
 2011-04-25 09:11:36,844 ERROR org.apache.hadoop.hbase.executor.EventHandler: 
 Caught throwable while processing event M_RS_OPEN_REGION
 java.lang.NullPointerException
 at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75)
 at 
 org.apache.hadoop.hbase.executor.RegionTransitionData.fromBytes(RegionTransitionData.java:198)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:672)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNodeOpened(ZKAssign.java:621)
 at 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:168)
 at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 byte [] existingBytes =
 ZKUtil.getDataNoWatch(zkw, node, stat);
 RegionTransitionData existingData =
 RegionTransitionData.fromBytes(existingBytes);
 existingBytes can be null. have to return -1 if null.
 ===
 master logs
 2011-04-25 05:24:03,250 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Creating writer 
 path=hdfs://pumahbase002-snc5-dfs.data.facebook.com:9000/PUMAHBASE002-SNC5-HBASE/realtime_domain_imps_urls/e7a478b4bd164525052f1dedb832de0a/recovered.edits/57528037047
  region=e7a478b4bd164525052f1dedb832de0a
 2011-04-25 09:09:19,246 INFO 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Closed path 
 hdfs://pumahbase002-snc5-dfs.data.facebook.com:9000/PUMAHBASE002-SNC5-HBASE/realtime_domain_imps_urls/e7a478b4bd164525052f1dedb832de0a/recovered.edits/57528037047
  (wrote 4342690 edits in 46904ms)
 2011-04-25 09:09:26,134 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x32f7bb74e8a Creating (or updating) unassigned node for 
 e7a478b4bd164525052f1dedb832de0a with OFFLINE state
 2011-04-25 09:09:26,136 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan 
 was found (or we are ignoring an existing plan) for 
 realtime_domain_imps_urls,afbe,1295556905482.e7a478b4bd164525052f1dedb832de0a.
  so generated a random one; 
 hri=realtime_domain_imps_urls,afbe,1295556905482.e7a478b4bd164525052f1dedb832de0a.,
  src=, dest=pumahbase107.snc5.facebook.com,60020,1303450731227; 70 
 (online=70, exclude=null) available servers
 2011-04-25 09:09:26,136 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 realtime_domain_imps_urls,afbe,1295556905482.e7a478b4bd164525052f1dedb832de0a.
  to pumahbase107.snc5.facebook.com,60020,1303450731227
 2011-04-25 09:09:26,139 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, 
 server=pumahbase107.snc5.facebook.com,60020,1303450731227, 
 region=e7a478b4bd164525052f1dedb832de0a
 2011-04-25 09:09:44,045 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, 
 server=pumahbase107.snc5.facebook.com,60020,1303450731227, 
 region=e7a478b4bd164525052f1dedb832de0a
 2011-04-25 09:09:59,050 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, 
 

[jira] [Commented] (HBASE-3674) Treat ChecksumException as we would a ParseException splitting logs; else we replay split on every restart

2011-04-25 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025071#comment-13025071
 ] 

Prakash Khemani commented on HBASE-3674:


+1

 Treat ChecksumException as we would a ParseException splitting logs; else we 
 replay split on every restart
 --

 Key: HBASE-3674
 URL: https://issues.apache.org/jira/browse/HBASE-3674
 Project: HBase
  Issue Type: Bug
  Components: wal
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.90.2

 Attachments: 3674-distributed.txt, 3674-v2.txt, 3674.txt


 In short, a ChecksumException will fail log processing for a server so we 
 skip out w/o archiving logs.  On restart, we'll then reprocess the logs -- 
 hit the checksumexception anew, usually -- and so on.
 Here is the splitLog method (edited):
 {code}
   private ListPath splitLog(final FileStatus[] logfiles) throws IOException 
 {
 
 outputSink.startWriterThreads(entryBuffers);
 
 try {
   int i = 0;
   for (FileStatus log : logfiles) {
Path logPath = log.getPath();
 long logLength = log.getLen();
 splitSize += logLength;
 LOG.debug(Splitting hlog  + (i++ + 1) +  of  + logfiles.length
 + :  + logPath + , length= + logLength);
 try {
   recoverFileLease(fs, logPath, conf);
   parseHLog(log, entryBuffers, fs, conf);
   processedLogs.add(logPath);
 } catch (EOFException eof) {
   // truncated files are expected if a RS crashes (see HBASE-2643)
   LOG.info(EOF from hlog  + logPath + . Continuing);
   processedLogs.add(logPath);
 } catch (FileNotFoundException fnfe) {
   // A file may be missing if the region server was able to archive it
   // before shutting down. This means the edits were persisted already
   LOG.info(A log was missing  + logPath +
   , probably because it was moved by the +
now dead region server. Continuing);
   processedLogs.add(logPath);
 } catch (IOException e) {
   // If the IOE resulted from bad file format,
   // then this problem is idempotent and retrying won't help
   if (e.getCause() instanceof ParseException ||
   e.getCause() instanceof ChecksumException) {
 LOG.warn(ParseException from hlog  + logPath + .  continuing);
 processedLogs.add(logPath);
   } else {
 if (skipErrors) {
   LOG.info(Got while parsing hlog  + logPath +
 . Marking as corrupted, e);
   corruptedLogs.add(logPath);
 } else {
   throw e;
 }
   }
 }
   }
   if (fs.listStatus(srcDir).length  processedLogs.size()
   + corruptedLogs.size()) {
 throw new OrphanHLogAfterSplitException(
 Discovered orphan hlog after split. Maybe the 
 + HRegionServer was not dead when we started);
   }
   archiveLogs(srcDir, corruptedLogs, processedLogs, oldLogDir, fs, conf); 
  
 } finally {
   splits = outputSink.finishWritingAndClose();
 }
 return splits;
   }
 {code}
 Notice how we'll only archive logs only if we successfully split all logs.  
 We won't archive 31 of 35 files if we happen to get a checksum exception on 
 file 32.
 I think we should treat a ChecksumException the same as a ParseException; a 
 retry will not fix it if HDFS could not get around the ChecksumException 
 (seems like in our case all replicas were corrupt).
 Here is a play-by-play from the logs:
 {code}
 813572 2011-03-18 20:31:44,687 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog 34 of 
 35: 
 hdfs://sv2borg170:9000/hbase/.logs/sv2borg182,60020,1300384550664/sv2borg182%3A60020.1300461329481,
  length=150   65662813573 2011-03-18 20:31:44,687 INFO 
 org.apache.hadoop.hbase.util.FSUtils: Recovering file 
 hdfs://sv2borg170:9000/hbase/.logs/sv2borg182,60020,1300384550664/sv2borg182%3A60020.1300461329481
 
 813617 2011-03-18 20:31:46,238 INFO org.apache.hadoop.fs.FSInputChecker: 
 Found checksum error: b[0, 
 512]=00cd00502037383661376439656265643938636463343433386132343631323633303239371d6170695f6163636573735f746f6b656e5f7374

 6174735f6275636b6574000d9fa4d5dc012ec9c7cbaf000001006d005d0008002337626262663764626431616561366234616130656334383436653732333132643a32390764656661756c746170695f616e64726f69645f6c6f67676564

 

[jira] [Created] (HBASE-3814) force regionserver to halt

2011-04-22 Thread Prakash Khemani (JIRA)
force regionserver to halt
--

 Key: HBASE-3814
 URL: https://issues.apache.org/jira/browse/HBASE-3814
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani


Once abort() on a regionserver is called we should have a timeout thread that 
does Runtime.halt() if the rs gets stuck somewhere during abort processing.

===


Pumahbase132 has following the logs .. the dfsclient is not able to set up a 
write pipeline successfully ... it tries to abort ... but while aborting it 
gets stuck. I know there is a check that if we are aborting because filesystem 
is closed then we should not try to flush the logs while aborting. But in this 
case the fs is up and running, just that it is not functioning.

2011-04-21 23:48:07,082 INFO org.apache.hadoop.hdfs.DFSClient: Exception in 
createBlockOutputStream 10.38.131.53:50010  for file 
/PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280java.io.IOException:
 Bad connect ack with firstBadLink 10.38.133.33:50010
2011-04-21 23:48:07,082 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block 
blk_-8967376451767492285_6537229 for file 
/PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280
2011-04-21 23:48:07,125 INFO org.apache.hadoop.hdfs.DFSClient: Exception in 
createBlockOutputStream 10.38.131.53:50010  for file 
/PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280java.io.IOException:
 Bad connect ack with firstBadLink 10.38.134.59:50010
2011-04-21 23:48:07,125 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block 
blk_7172251852699100447_6537229 for file 
/PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280
 

2011-04-21 23:48:07,169 INFO org.apache.hadoop.hdfs.DFSClient: Exception in 
createBlockOutputStream 10.38.131.53:50010  for file 
/PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280java.io.IOException:
 Bad connect ack with firstBadLink 10.38.134.53:50010
2011-04-21 23:48:07,169 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block 
blk_-9153204772467623625_6537229 for file 
/PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280
2011-04-21 23:48:07,213 INFO org.apache.hadoop.hdfs.DFSClient: Exception in 
createBlockOutputStream 10.38.131.53:50010  for file 
/PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280java.io.IOException:
 Bad connect ack with firstBadLink 10.38.134.49:50010
2011-04-21 23:48:07,213 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block 
blk_-2513098940934276625_6537229 for file 
/PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280
2011-04-21 23:48:07,214 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer 
Exception: java.io.IOException: Unable to create new block.
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3560)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2700(DFSClient.java:2720)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2977)

2011-04-21 23:48:07,214 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery 
for block blk_-2513098940934276625_6537229 bad datanode[1] nodes == null
2011-04-21 23:48:07,214 WARN org.apache.hadoop.hdfs.DFSClient: Could not get 
block locations. Source file 
/PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280
 - Aborting...
2011-04-21 23:48:07,216 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: 
Could not append. Requesting close of hlog

And then the RS gets stuck trying to roll the logs ...



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-3815) lb should ignore bad region servers

2011-04-22 Thread Prakash Khemani (JIRA)
lb should ignore bad region servers
---

 Key: HBASE-3815
 URL: https://issues.apache.org/jira/browse/HBASE-3815
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani


the loadbalancer should remember which region server is constantly having 
trouble opening regions and it should take that rs out of the equation ... 
otherwise the lb goes into an unproductive loop ... 

I don't have logs handy for this one.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3815) lb should ignore bad region servers

2011-04-22 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023352#comment-13023352
 ] 

Prakash Khemani commented on HBASE-3815:


Log snippets showing assignment-manager continuously choosing server-132 for 
region assignment even though it constantly fails. There ought to be a global 
exclude list in addition to a per region exclude list?




2011-04-17 07:14:06,312 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Assigning region 
realtime_domain_feed_imps_domains,e0a3d6b6,1289467228948.64f5ad9ca3f4d6a235365f10ccc4ae87.
 to pumahbase132.snc5.facebook.com,60020,1303046136711
2011-04-17 07:14:06,314 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
Failed assignment of 
realtime_domain_feed_imps_domains,e0a3d6b6,1289467228948.64f5ad9ca3f4d6a235365f10ccc4ae87.
 to serverName=pumahbase132.snc5.facebook.com,60020,1303046136711, 
load=(requests=0, regions=81, usedHeap=155, maxHeap=31987), trying to assign 
elsewhere instead; retry=0
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:406)
at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
at 
org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:884)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:751)
at 
org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
at $Proxy6.openRegion(Unknown Source)
at 
org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:547)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:901)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:730)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:710)
at 
org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:92)
at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
2011-04-17 07:14:06,314 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
No previous transition plan was found (or we are ignoring an existing plan) for 
realtime_domain_feed_imps_domains,e0a3d6b6,1289467228948.64f5ad9ca3f4d6a235365f10ccc4ae87.
 so generated a random one; 
hri=realtime_domain_feed_imps_domains,e0a3d6b6,1289467228948.64f5ad9ca3f4d6a235365f10ccc4ae87.,
 src=, dest=pumahbase156.snc5.facebook.com,60020,1302847439345; 72 (online=72, 
exclude=serverName=pumahbase132.snc5.facebook.com,60020,1303046136711, 
load=(requests=0, regions=81, usedHeap=155, maxHeap=31987)) available servers

2011-04-17 07:19:06,097 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Assigning region 
realtime_domain_acts_urls_hot,9b851e7e,1290555837119.c41a23fd0bd57d1eb4c3a5ef1ed6ccac.
 to pumahbase132.snc5.facebook.com,60020,1303046136711
2011-04-17 07:19:06,098 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
Failed assignment of 
realtime_domain_acts_urls_hot,9b851e7e,1290555837119.c41a23fd0bd57d1eb4c3a5ef1ed6ccac.
 to serverName=pumahbase132.snc5.facebook.com,60020,1303046136711, 
load=(requests=0, regions=81, usedHeap=155, maxHeap=31987), trying to assign 
elsewhere instead; retry=0
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:406)
at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
at 
org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:884)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:751)
at 
org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
at $Proxy6.openRegion(Unknown Source)
at 
org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:547)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:901)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:730)
at 

[jira] [Commented] (HBASE-3814) force regionserver to halt

2011-04-22 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023408#comment-13023408
 ] 

Prakash Khemani commented on HBASE-3814:


I don't have access to the logs right now. The server is powered down  and I 
don't want to bring it up.

In all likelihood the server that got stuck had a dfs version mismatch problem. 
It got stuck in a portion of the code that Dhruba has recently introduced and 
only present in the internal branch.



 force regionserver to halt
 --

 Key: HBASE-3814
 URL: https://issues.apache.org/jira/browse/HBASE-3814
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani

 Once abort() on a regionserver is called we should have a timeout thread that 
 does Runtime.halt() if the rs gets stuck somewhere during abort processing.
 ===
 Pumahbase132 has following the logs .. the dfsclient is not able to set up a 
 write pipeline successfully ... it tries to abort ... but while aborting it 
 gets stuck. I know there is a check that if we are aborting because 
 filesystem is closed then we should not try to flush the logs while aborting. 
 But in this case the fs is up and running, just that it is not functioning.
 2011-04-21 23:48:07,082 INFO org.apache.hadoop.hdfs.DFSClient: Exception in 
 createBlockOutputStream 10.38.131.53:50010  for file 
 /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280java.io.IOException:
  Bad connect ack with firstBadLink 10.38.133.33:50010
 2011-04-21 23:48:07,082 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning 
 block blk_-8967376451767492285_6537229 for file 
 /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280
 2011-04-21 23:48:07,125 INFO org.apache.hadoop.hdfs.DFSClient: Exception in 
 createBlockOutputStream 10.38.131.53:50010  for file 
 /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280java.io.IOException:
  Bad connect ack with firstBadLink 10.38.134.59:50010
 2011-04-21 23:48:07,125 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning 
 block blk_7172251852699100447_6537229 for file 
 /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280
  
 2011-04-21 23:48:07,169 INFO org.apache.hadoop.hdfs.DFSClient: Exception in 
 createBlockOutputStream 10.38.131.53:50010  for file 
 /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280java.io.IOException:
  Bad connect ack with firstBadLink 10.38.134.53:50010
 2011-04-21 23:48:07,169 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning 
 block blk_-9153204772467623625_6537229 for file 
 /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280
 2011-04-21 23:48:07,213 INFO org.apache.hadoop.hdfs.DFSClient: Exception in 
 createBlockOutputStream 10.38.131.53:50010  for file 
 /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280java.io.IOException:
  Bad connect ack with firstBadLink 10.38.134.49:50010
 2011-04-21 23:48:07,213 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning 
 block blk_-2513098940934276625_6537229 for file 
 /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280
 2011-04-21 23:48:07,214 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer 
 Exception: java.io.IOException: Unable to create new block.
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3560)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2700(DFSClient.java:2720)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2977)
 2011-04-21 23:48:07,214 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery 
 for block blk_-2513098940934276625_6537229 bad datanode[1] nodes == null
 2011-04-21 23:48:07,214 WARN org.apache.hadoop.hdfs.DFSClient: Could not get 
 block locations. Source file 
 /PUMAHBASE002-SNC5-HBASE/.logs/pumahbase132.snc5.facebook.com,60020,1303450732026/pumahbase132.snc5.facebook.com%3A60020.1303450732280
  - Aborting...
 2011-04-21 23:48:07,216 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: 
 Could not append. Requesting close of hlog
 And then the RS gets stuck trying to roll the logs ...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3806) distributed log splitting double escapes task names

2011-04-21 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13022646#comment-13022646
 ] 

Prakash Khemani commented on HBASE-3806:


uploaded a patch at https://review.cloudera.org/r/1715/

 distributed log splitting double escapes task names
 ---

 Key: HBASE-3806
 URL: https://issues.apache.org/jira/browse/HBASE-3806
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani
Assignee: Prakash Khemani

 During startup master double-escapes the (log split) task names when 
 submitting them ... I had missed this in my testing because I was using task 
 names like foo and bar instead of those that need escaping - like hdfs://... 
 Also at startup even though the master fails to acquire the orphan tasks ... 
 the tasks are acquired anyways when master sees the logs that need splitting.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3674) Treat ChecksumException as we would a ParseException splitting logs; else we replay split on every restart

2011-04-21 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13022873#comment-13022873
 ] 

Prakash Khemani commented on HBASE-3674:


This change got overwritten when HBASE-1364 was integrated.

The change has to be added in HLogSplitter  in the method getNextLogLine

static private Entry getNextLogLine(Reader in, Path path, boolean skipErrors)
  throws CorruptedLogFileException, IOException {
try {
  return in.next();
} catch (EOFException eof) {
  // truncated files are expected if a RS crashes (see HBASE-2643)
  LOG.info(EOF from hlog  + path + .  continuing);
  return null;
} catch (IOException e) {
  // If the IOE resulted from bad file format,
  // then this problem is idempotent and retrying won't help
  if (e.getCause() instanceof ParseException) {
LOG.warn(ParseException from hlog  + path + .  continuing);
return null;
  }


It might also be necessary to add this change to getReader(...) method

  protected Reader getReader(FileSystem fs, FileStatus file, Configuration conf,
  boolean skipErrors)
  throws IOException, CorruptedLogFileException {
Path path = file.getPath();
long length = file.getLen();
Reader in;


// Check for possibly empty file. With appends, currently Hadoop reports a
// zero length even if the file has been sync'd. Revisit if HDFS-376 or
// HDFS-878 is committed.
if (length = 0) {
  LOG.warn(File  + path +  might be still open, length is 0);
}

try {
  recoverFileLease(fs, path, conf);
  try {
in = getReader(fs, path, conf);
  } catch (EOFException e) {
if (length = 0) {
  // TODO should we ignore an empty, not-last log file if skip.errors
  // is false? Either way, the caller should decide what to do. E.g.
  // ignore if this is the last log in sequence.
  // TODO is this scenario still possible if the log has been
  // recovered (i.e. closed)
  LOG.warn(Could not open  + path +  for reading. File is empty, e);
  return null;




 Treat ChecksumException as we would a ParseException splitting logs; else we 
 replay split on every restart
 --

 Key: HBASE-3674
 URL: https://issues.apache.org/jira/browse/HBASE-3674
 Project: HBase
  Issue Type: Bug
  Components: wal
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.90.2

 Attachments: 3674-v2.txt, 3674.txt


 In short, a ChecksumException will fail log processing for a server so we 
 skip out w/o archiving logs.  On restart, we'll then reprocess the logs -- 
 hit the checksumexception anew, usually -- and so on.
 Here is the splitLog method (edited):
 {code}
   private ListPath splitLog(final FileStatus[] logfiles) throws IOException 
 {
 
 outputSink.startWriterThreads(entryBuffers);
 
 try {
   int i = 0;
   for (FileStatus log : logfiles) {
Path logPath = log.getPath();
 long logLength = log.getLen();
 splitSize += logLength;
 LOG.debug(Splitting hlog  + (i++ + 1) +  of  + logfiles.length
 + :  + logPath + , length= + logLength);
 try {
   recoverFileLease(fs, logPath, conf);
   parseHLog(log, entryBuffers, fs, conf);
   processedLogs.add(logPath);
 } catch (EOFException eof) {
   // truncated files are expected if a RS crashes (see HBASE-2643)
   LOG.info(EOF from hlog  + logPath + . Continuing);
   processedLogs.add(logPath);
 } catch (FileNotFoundException fnfe) {
   // A file may be missing if the region server was able to archive it
   // before shutting down. This means the edits were persisted already
   LOG.info(A log was missing  + logPath +
   , probably because it was moved by the +
now dead region server. Continuing);
   processedLogs.add(logPath);
 } catch (IOException e) {
   // If the IOE resulted from bad file format,
   // then this problem is idempotent and retrying won't help
   if (e.getCause() instanceof ParseException ||
   e.getCause() instanceof ChecksumException) {
 LOG.warn(ParseException from hlog  + logPath + .  continuing);
 processedLogs.add(logPath);
   } else {
 if (skipErrors) {
   LOG.info(Got while parsing hlog  + logPath +
 . Marking as corrupted, e);
   corruptedLogs.add(logPath);
 } else {
   throw e;
 }
   }
 }
   }
   if (fs.listStatus(srcDir).length  

[jira] [Commented] (HBASE-3674) Treat ChecksumException as we would a ParseException splitting logs; else we replay split on every restart

2011-04-21 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13022964#comment-13022964
 ] 

Prakash Khemani commented on HBASE-3674:


The patch sets the hbase.hlog.split.skip.errors to true by default. I am 
wondering why the CheckSumException was not ignored as originally proposed?

This patch is there in the trunk. In the serialized log splitting case 
hbase.hlog.split.skip.errors is set to true. But in the distributed log 
splitting case hbase.hlog.split.skip.errors is set to false by default.

 Treat ChecksumException as we would a ParseException splitting logs; else we 
 replay split on every restart
 --

 Key: HBASE-3674
 URL: https://issues.apache.org/jira/browse/HBASE-3674
 Project: HBase
  Issue Type: Bug
  Components: wal
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.90.2

 Attachments: 3674-v2.txt, 3674.txt


 In short, a ChecksumException will fail log processing for a server so we 
 skip out w/o archiving logs.  On restart, we'll then reprocess the logs -- 
 hit the checksumexception anew, usually -- and so on.
 Here is the splitLog method (edited):
 {code}
   private ListPath splitLog(final FileStatus[] logfiles) throws IOException 
 {
 
 outputSink.startWriterThreads(entryBuffers);
 
 try {
   int i = 0;
   for (FileStatus log : logfiles) {
Path logPath = log.getPath();
 long logLength = log.getLen();
 splitSize += logLength;
 LOG.debug(Splitting hlog  + (i++ + 1) +  of  + logfiles.length
 + :  + logPath + , length= + logLength);
 try {
   recoverFileLease(fs, logPath, conf);
   parseHLog(log, entryBuffers, fs, conf);
   processedLogs.add(logPath);
 } catch (EOFException eof) {
   // truncated files are expected if a RS crashes (see HBASE-2643)
   LOG.info(EOF from hlog  + logPath + . Continuing);
   processedLogs.add(logPath);
 } catch (FileNotFoundException fnfe) {
   // A file may be missing if the region server was able to archive it
   // before shutting down. This means the edits were persisted already
   LOG.info(A log was missing  + logPath +
   , probably because it was moved by the +
now dead region server. Continuing);
   processedLogs.add(logPath);
 } catch (IOException e) {
   // If the IOE resulted from bad file format,
   // then this problem is idempotent and retrying won't help
   if (e.getCause() instanceof ParseException ||
   e.getCause() instanceof ChecksumException) {
 LOG.warn(ParseException from hlog  + logPath + .  continuing);
 processedLogs.add(logPath);
   } else {
 if (skipErrors) {
   LOG.info(Got while parsing hlog  + logPath +
 . Marking as corrupted, e);
   corruptedLogs.add(logPath);
 } else {
   throw e;
 }
   }
 }
   }
   if (fs.listStatus(srcDir).length  processedLogs.size()
   + corruptedLogs.size()) {
 throw new OrphanHLogAfterSplitException(
 Discovered orphan hlog after split. Maybe the 
 + HRegionServer was not dead when we started);
   }
   archiveLogs(srcDir, corruptedLogs, processedLogs, oldLogDir, fs, conf); 
  
 } finally {
   splits = outputSink.finishWritingAndClose();
 }
 return splits;
   }
 {code}
 Notice how we'll only archive logs only if we successfully split all logs.  
 We won't archive 31 of 35 files if we happen to get a checksum exception on 
 file 32.
 I think we should treat a ChecksumException the same as a ParseException; a 
 retry will not fix it if HDFS could not get around the ChecksumException 
 (seems like in our case all replicas were corrupt).
 Here is a play-by-play from the logs:
 {code}
 813572 2011-03-18 20:31:44,687 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog 34 of 
 35: 
 hdfs://sv2borg170:9000/hbase/.logs/sv2borg182,60020,1300384550664/sv2borg182%3A60020.1300461329481,
  length=150   65662813573 2011-03-18 20:31:44,687 INFO 
 org.apache.hadoop.hbase.util.FSUtils: Recovering file 
 hdfs://sv2borg170:9000/hbase/.logs/sv2borg182,60020,1300384550664/sv2borg182%3A60020.1300461329481
 
 813617 2011-03-18 20:31:46,238 INFO org.apache.hadoop.fs.FSInputChecker: 
 Found checksum error: b[0, 
 512]=00cd00502037383661376439656265643938636463343433386132343631323633303239371d6170695f6163636573735f746f6b656e5f7374

 

[jira] [Created] (HBASE-3806) distributed log splitting double escapes task names

2011-04-20 Thread Prakash Khemani (JIRA)
distributed log splitting double escapes task names
---

 Key: HBASE-3806
 URL: https://issues.apache.org/jira/browse/HBASE-3806
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani
Assignee: Prakash Khemani


During startup master double-escapes the (log split) task names when submitting 
them ... I had missed this in my testing because I was using task names like 
foo and bar instead of those that need escaping - like hdfs://... Also at 
startup even though the master fails to acquire the orphan tasks ... the tasks 
are acquired anyways when master sees the logs that need splitting.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1364) [performance] Distributed splitting of regionserver commit logs

2011-04-17 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020730#comment-13020730
 ] 

Prakash Khemani commented on HBASE-1364:


'fixed' TestDistriButedLogSplitting.testWorkerAbort() by not letting the test 
fail if the aborting region server completes the split before it closes dfs or 
zk session.

uploaded a new patch in rb

 [performance] Distributed splitting of regionserver commit logs
 ---

 Key: HBASE-1364
 URL: https://issues.apache.org/jira/browse/HBASE-1364
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: stack
Assignee: Prakash Khemani
Priority: Critical
 Fix For: 0.92.0

 Attachments: 1364-v5.txt, HBASE-1364.patch, 
 org.apache.hadoop.hbase.master.TestDistributedLogSplitting-output.txt

  Time Spent: 8h
  Remaining Estimate: 0h

 HBASE-1008 has some improvements to our log splitting on regionserver crash; 
 but it needs to run even faster.
 (Below is from HBASE-1008)
 In bigtable paper, the split is distributed. If we're going to have 1000 
 logs, we need to distribute or at least multithread the splitting.
 1. As is, regions starting up expect to find one reconstruction log only. 
 Need to make it so pick up a bunch of edit logs and it should be fine that 
 logs are elsewhere in hdfs in an output directory written by all split 
 participants whether multithreaded or a mapreduce-like distributed process 
 (Lets write our distributed sort first as a MR so we learn whats involved; 
 distributed sort, as much as possible should use MR framework pieces). On 
 startup, regions go to this directory and pick up the files written by split 
 participants deleting and clearing the dir when all have been read in. Making 
 it so can take multiple logs for input, can also make the split process more 
 robust rather than current tenuous process which loses all edits if it 
 doesn't make it to the end without error.
 2. Each column family rereads the reconstruction log to find its edits. Need 
 to fix that. Split can sort the edits by column family so store only reads 
 its edits.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1364) [performance] Distributed splitting of regionserver commit logs

2011-04-17 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020874#comment-13020874
 ] 

Prakash Khemani commented on HBASE-1364:


Yes, it passes for me - just ran it again.

This is another of timing related errors.


164 waitForCounter(tot_wkr_task_acquired, 0, 1, 100);
165 waitForCounter(tot_wkr_failed_to_grab_task_lost_race, 0, 1, 100);


In your case the failure occurred when in line 165 the counter 
tot_wkr_failed_to_grab_task_lost_race did not change value from 0 to 1 in 100ms.

Can you please increase the timeout in both these lines from 100ms to 1000ms 
and retry ... 

I will go over all my tests and try to improve them but I won't be able to get 
to that before the end of this week.






 [performance] Distributed splitting of regionserver commit logs
 ---

 Key: HBASE-1364
 URL: https://issues.apache.org/jira/browse/HBASE-1364
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: stack
Assignee: Prakash Khemani
Priority: Critical
 Fix For: 0.92.0

 Attachments: 1364-v5.txt, HBASE-1364.patch, 
 org.apache.hadoop.hbase.master.TestDistributedLogSplitting-output.txt

  Time Spent: 8h
  Remaining Estimate: 0h

 HBASE-1008 has some improvements to our log splitting on regionserver crash; 
 but it needs to run even faster.
 (Below is from HBASE-1008)
 In bigtable paper, the split is distributed. If we're going to have 1000 
 logs, we need to distribute or at least multithread the splitting.
 1. As is, regions starting up expect to find one reconstruction log only. 
 Need to make it so pick up a bunch of edit logs and it should be fine that 
 logs are elsewhere in hdfs in an output directory written by all split 
 participants whether multithreaded or a mapreduce-like distributed process 
 (Lets write our distributed sort first as a MR so we learn whats involved; 
 distributed sort, as much as possible should use MR framework pieces). On 
 startup, regions go to this directory and pick up the files written by split 
 participants deleting and clearing the dir when all have been read in. Making 
 it so can take multiple logs for input, can also make the split process more 
 robust rather than current tenuous process which loses all edits if it 
 doesn't make it to the end without error.
 2. Each column family rereads the reconstruction log to find its edits. Need 
 to fix that. Split can sort the edits by column family so store only reads 
 its edits.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1364) [performance] Distributed splitting of regionserver commit logs

2011-04-16 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020721#comment-13020721
 ] 

Prakash Khemani commented on HBASE-1364:


This is a problem with the test-case which I will fix.

You got this error because of (1) not being able to interrupt  the 
split-log-worker thread when it is doing dfs operations (I think the interrupt 
is swallowed somewhere) (2) timing issues where in the aborting region server 
the filesystem and the zk session don't close before the split-log-worker 
thread completes its splitting task ...

I will fix this by removing fail(region server completed the split before 
aborting) from the test case.





 [performance] Distributed splitting of regionserver commit logs
 ---

 Key: HBASE-1364
 URL: https://issues.apache.org/jira/browse/HBASE-1364
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: stack
Assignee: Prakash Khemani
Priority: Critical
 Fix For: 0.92.0

 Attachments: 1364-v5.txt, HBASE-1364.patch, 
 org.apache.hadoop.hbase.master.TestDistributedLogSplitting-output.txt

  Time Spent: 8h
  Remaining Estimate: 0h

 HBASE-1008 has some improvements to our log splitting on regionserver crash; 
 but it needs to run even faster.
 (Below is from HBASE-1008)
 In bigtable paper, the split is distributed. If we're going to have 1000 
 logs, we need to distribute or at least multithread the splitting.
 1. As is, regions starting up expect to find one reconstruction log only. 
 Need to make it so pick up a bunch of edit logs and it should be fine that 
 logs are elsewhere in hdfs in an output directory written by all split 
 participants whether multithreaded or a mapreduce-like distributed process 
 (Lets write our distributed sort first as a MR so we learn whats involved; 
 distributed sort, as much as possible should use MR framework pieces). On 
 startup, regions go to this directory and pick up the files written by split 
 participants deleting and clearing the dir when all have been read in. Making 
 it so can take multiple logs for input, can also make the split process more 
 robust rather than current tenuous process which loses all edits if it 
 doesn't make it to the end without error.
 2. Each column family rereads the reconstruction log to find its edits. Need 
 to fix that. Split can sort the edits by column family so store only reads 
 its edits.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1364) [performance] Distributed splitting of regionserver commit logs

2011-04-15 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020216#comment-13020216
 ] 

Prakash Khemani commented on HBASE-1364:


TestDistributedLogSplitting.testWorkerAbort test failed because the 
SplitLogWorker was incrementing the tot_wkr_task_resigned twice. My test runs 
were passing because of a race - if the test happens to look at the counter 
between the 2 increments then the test will pass. Fixing this in the latest 
patch.



 [performance] Distributed splitting of regionserver commit logs
 ---

 Key: HBASE-1364
 URL: https://issues.apache.org/jira/browse/HBASE-1364
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: stack
Assignee: Prakash Khemani
Priority: Critical
 Fix For: 0.92.0

 Attachments: 1364-v5.txt, HBASE-1364.patch

  Time Spent: 8h
  Remaining Estimate: 0h

 HBASE-1008 has some improvements to our log splitting on regionserver crash; 
 but it needs to run even faster.
 (Below is from HBASE-1008)
 In bigtable paper, the split is distributed. If we're going to have 1000 
 logs, we need to distribute or at least multithread the splitting.
 1. As is, regions starting up expect to find one reconstruction log only. 
 Need to make it so pick up a bunch of edit logs and it should be fine that 
 logs are elsewhere in hdfs in an output directory written by all split 
 participants whether multithreaded or a mapreduce-like distributed process 
 (Lets write our distributed sort first as a MR so we learn whats involved; 
 distributed sort, as much as possible should use MR framework pieces). On 
 startup, regions go to this directory and pick up the files written by split 
 participants deleting and clearing the dir when all have been read in. Making 
 it so can take multiple logs for input, can also make the split process more 
 robust rather than current tenuous process which loses all edits if it 
 doesn't make it to the end without error.
 2. Each column family rereads the reconstruction log to find its edits. Need 
 to fix that. Split can sort the edits by column family so store only reads 
 its edits.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1364) [performance] Distributed splitting of regionserver commit logs

2011-04-15 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020488#comment-13020488
 ] 

Prakash Khemani commented on HBASE-1364:


I uploaded a new diff at the review board https://review.cloudera.org/r/1655/

I think it takes care of all of Stack's comments.

added a new test in TestHLogSplit to test that when skip-errors is set to true 
then corrupted log files are ignored and correctly moved to the .corrupted 
directory.

Some of the tests - especially in TestDistributedLogSplitting - are somewhat 
timing dependent. For example I will abort a few region servers and wait at 
most few seconds for all those servers to go down. Sometimes it takes longer 
and the test fails. Last night I had to bump up the time-limit in one such test 
(testThreeRSAbort()). I am sure these tests can be made more robust 

 [performance] Distributed splitting of regionserver commit logs
 ---

 Key: HBASE-1364
 URL: https://issues.apache.org/jira/browse/HBASE-1364
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: stack
Assignee: Prakash Khemani
Priority: Critical
 Fix For: 0.92.0

 Attachments: 1364-v5.txt, HBASE-1364.patch

  Time Spent: 8h
  Remaining Estimate: 0h

 HBASE-1008 has some improvements to our log splitting on regionserver crash; 
 but it needs to run even faster.
 (Below is from HBASE-1008)
 In bigtable paper, the split is distributed. If we're going to have 1000 
 logs, we need to distribute or at least multithread the splitting.
 1. As is, regions starting up expect to find one reconstruction log only. 
 Need to make it so pick up a bunch of edit logs and it should be fine that 
 logs are elsewhere in hdfs in an output directory written by all split 
 participants whether multithreaded or a mapreduce-like distributed process 
 (Lets write our distributed sort first as a MR so we learn whats involved; 
 distributed sort, as much as possible should use MR framework pieces). On 
 startup, regions go to this directory and pick up the files written by split 
 participants deleting and clearing the dir when all have been read in. Making 
 it so can take multiple logs for input, can also make the split process more 
 robust rather than current tenuous process which loses all edits if it 
 doesn't make it to the end without error.
 2. Each column family rereads the reconstruction log to find its edits. Need 
 to fix that. Split can sort the edits by column family so store only reads 
 its edits.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1364) [performance] Distributed splitting of regionserver commit logs

2011-04-12 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018744#comment-13018744
 ] 

Prakash Khemani commented on HBASE-1364:


posted a revised patch at https://review.cloudera.org/r/1655/

 [performance] Distributed splitting of regionserver commit logs
 ---

 Key: HBASE-1364
 URL: https://issues.apache.org/jira/browse/HBASE-1364
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: stack
Assignee: Prakash Khemani
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-1364.patch

  Time Spent: 8h
  Remaining Estimate: 0h

 HBASE-1008 has some improvements to our log splitting on regionserver crash; 
 but it needs to run even faster.
 (Below is from HBASE-1008)
 In bigtable paper, the split is distributed. If we're going to have 1000 
 logs, we need to distribute or at least multithread the splitting.
 1. As is, regions starting up expect to find one reconstruction log only. 
 Need to make it so pick up a bunch of edit logs and it should be fine that 
 logs are elsewhere in hdfs in an output directory written by all split 
 participants whether multithreaded or a mapreduce-like distributed process 
 (Lets write our distributed sort first as a MR so we learn whats involved; 
 distributed sort, as much as possible should use MR framework pieces). On 
 startup, regions go to this directory and pick up the files written by split 
 participants deleting and clearing the dir when all have been read in. Making 
 it so can take multiple logs for input, can also make the split process more 
 robust rather than current tenuous process which loses all edits if it 
 doesn't make it to the end without error.
 2. Each column family rereads the reconstruction log to find its edits. Need 
 to fix that. Split can sort the edits by column family so store only reads 
 its edits.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1364) [performance] Distributed splitting of regionserver commit logs

2011-04-12 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018973#comment-13018973
 ] 

Prakash Khemani commented on HBASE-1364:


updated patch at https://review.cloudera.org/r/1655/

 [performance] Distributed splitting of regionserver commit logs
 ---

 Key: HBASE-1364
 URL: https://issues.apache.org/jira/browse/HBASE-1364
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: stack
Assignee: Prakash Khemani
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-1364.patch

  Time Spent: 8h
  Remaining Estimate: 0h

 HBASE-1008 has some improvements to our log splitting on regionserver crash; 
 but it needs to run even faster.
 (Below is from HBASE-1008)
 In bigtable paper, the split is distributed. If we're going to have 1000 
 logs, we need to distribute or at least multithread the splitting.
 1. As is, regions starting up expect to find one reconstruction log only. 
 Need to make it so pick up a bunch of edit logs and it should be fine that 
 logs are elsewhere in hdfs in an output directory written by all split 
 participants whether multithreaded or a mapreduce-like distributed process 
 (Lets write our distributed sort first as a MR so we learn whats involved; 
 distributed sort, as much as possible should use MR framework pieces). On 
 startup, regions go to this directory and pick up the files written by split 
 participants deleting and clearing the dir when all have been read in. Making 
 it so can take multiple logs for input, can also make the split process more 
 robust rather than current tenuous process which loses all edits if it 
 doesn't make it to the end without error.
 2. Each column family rereads the reconstruction log to find its edits. Need 
 to fix that. Split can sort the edits by column family so store only reads 
 its edits.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-1364) [performance] Distributed splitting of regionserver commit logs

2011-04-05 Thread Prakash Khemani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Khemani reassigned HBASE-1364:
--

Assignee: Prakash Khemani

 [performance] Distributed splitting of regionserver commit logs
 ---

 Key: HBASE-1364
 URL: https://issues.apache.org/jira/browse/HBASE-1364
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: stack
Assignee: Prakash Khemani
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-1364.patch

  Time Spent: 8h
  Remaining Estimate: 0h

 HBASE-1008 has some improvements to our log splitting on regionserver crash; 
 but it needs to run even faster.
 (Below is from HBASE-1008)
 In bigtable paper, the split is distributed. If we're going to have 1000 
 logs, we need to distribute or at least multithread the splitting.
 1. As is, regions starting up expect to find one reconstruction log only. 
 Need to make it so pick up a bunch of edit logs and it should be fine that 
 logs are elsewhere in hdfs in an output directory written by all split 
 participants whether multithreaded or a mapreduce-like distributed process 
 (Lets write our distributed sort first as a MR so we learn whats involved; 
 distributed sort, as much as possible should use MR framework pieces). On 
 startup, regions go to this directory and pick up the files written by split 
 participants deleting and clearing the dir when all have been read in. Making 
 it so can take multiple logs for input, can also make the split process more 
 robust rather than current tenuous process which loses all edits if it 
 doesn't make it to the end without error.
 2. Each column family rereads the reconstruction log to find its edits. Need 
 to fix that. Split can sort the edits by column family so store only reads 
 its edits.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1364) [performance] Distributed splitting of regionserver commit logs

2011-03-20 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008973#comment-13008973
 ] 

Prakash Khemani commented on HBASE-1364:


I justed posted not yet fully done patch for review 
https://review.cloudera.org/r/1655/ (For some reason it isn't getting 
automatically linked)

 [performance] Distributed splitting of regionserver commit logs
 ---

 Key: HBASE-1364
 URL: https://issues.apache.org/jira/browse/HBASE-1364
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: stack
Assignee: Alex Newman
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-1364.patch

  Time Spent: 8h
  Remaining Estimate: 0h

 HBASE-1008 has some improvements to our log splitting on regionserver crash; 
 but it needs to run even faster.
 (Below is from HBASE-1008)
 In bigtable paper, the split is distributed. If we're going to have 1000 
 logs, we need to distribute or at least multithread the splitting.
 1. As is, regions starting up expect to find one reconstruction log only. 
 Need to make it so pick up a bunch of edit logs and it should be fine that 
 logs are elsewhere in hdfs in an output directory written by all split 
 participants whether multithreaded or a mapreduce-like distributed process 
 (Lets write our distributed sort first as a MR so we learn whats involved; 
 distributed sort, as much as possible should use MR framework pieces). On 
 startup, regions go to this directory and pick up the files written by split 
 participants deleting and clearing the dir when all have been read in. Making 
 it so can take multiple logs for input, can also make the split process more 
 robust rather than current tenuous process which loses all edits if it 
 doesn't make it to the end without error.
 2. Each column family rereads the reconstruction log to find its edits. Need 
 to fix that. Split can sort the edits by column family so store only reads 
 its edits.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HBASE-3585) isLegalFamilyName() can throw ArrayOutOfBoundException

2011-03-02 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13001884#comment-13001884
 ] 

Prakash Khemani commented on HBASE-3585:


The ArrayOutOfBound exception happened when doing admin.create(htd) where the 
HTableDescriptor htd had zero length  family-name.





 isLegalFamilyName() can throw ArrayOutOfBoundException
 --

 Key: HBASE-3585
 URL: https://issues.apache.org/jira/browse/HBASE-3585
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.1
Reporter: Prakash Khemani
Priority: Minor

 org.apache.hadoop.hbase.HColumnDescriptor.isLegalFamilyName(byte[]) accesses 
 byte[0] w/o first checking the array length.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (HBASE-3585) isLegalFamilyName() can throw ArrayOutOfBoundException

2011-03-01 Thread Prakash Khemani (JIRA)
isLegalFamilyName() can throw ArrayOutOfBoundException
--

 Key: HBASE-3585
 URL: https://issues.apache.org/jira/browse/HBASE-3585
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.1
Reporter: Prakash Khemani
Priority: Minor


org.apache.hadoop.hbase.HColumnDescriptor.isLegalFamilyName(byte[]) accesses 
byte[0] w/o first checking the array length.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HBASE-3476) HFile -m option need not scan key values

2011-01-26 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12987110#action_12987110
 ] 

Prakash Khemani commented on HBASE-3476:


I had put up this diff https://review.cloudera.org/r/1489/ . I am not sure why 
it didn’t get propagated to the JIRA.

Please feel free to put your own patch and close this issue.

Thanks.





 HFile -m option need not scan key values
 

 Key: HBASE-3476
 URL: https://issues.apache.org/jira/browse/HBASE-3476
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
Reporter: Prakash Khemani
Assignee: Prakash Khemani
Priority: Minor

 bin/hbase org.apache.hadoop.io.hfile.HFile -m -f filename doesn't have to 
 scan the KVs in the file

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-3476) HFile -m option need not scan key values

2011-01-25 Thread Prakash Khemani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Khemani updated HBASE-3476:
---

Status: Open  (was: Patch Available)

 HFile -m option need not scan key values
 

 Key: HBASE-3476
 URL: https://issues.apache.org/jira/browse/HBASE-3476
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
Reporter: Prakash Khemani
Assignee: Prakash Khemani
Priority: Minor

 bin/hbase org.apache.hadoop.io.hfile.HFile -m -f filename doesn't have to 
 scan the KVs in the file

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-3476) HFile -m option need not scan key values

2011-01-25 Thread Prakash Khemani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Khemani updated HBASE-3476:
---

Status: Patch Available  (was: Open)

 HFile -m option need not scan key values
 

 Key: HBASE-3476
 URL: https://issues.apache.org/jira/browse/HBASE-3476
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
Reporter: Prakash Khemani
Assignee: Prakash Khemani
Priority: Minor

 bin/hbase org.apache.hadoop.io.hfile.HFile -m -f filename doesn't have to 
 scan the KVs in the file

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3476) HFile -m option need not scan key values

2011-01-24 Thread Prakash Khemani (JIRA)
HFile -m option need not scan key values


 Key: HBASE-3476
 URL: https://issues.apache.org/jira/browse/HBASE-3476
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
Reporter: Prakash Khemani
Assignee: Prakash Khemani
Priority: Minor


bin/hbase org.apache.hadoop.io.hfile.HFile -m -f filename doesn't have to 
scan the KVs in the file

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3444) Bytes.toBytesBinary and Bytes.toStringBinary() should be reversible

2011-01-14 Thread Prakash Khemani (JIRA)
Bytes.toBytesBinary and Bytes.toStringBinary()  should be reversible


 Key: HBASE-3444
 URL: https://issues.apache.org/jira/browse/HBASE-3444
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani
Priority: Minor


Bytes.toStringBinary() doesn't escape \.

Otherwise the transformation isn't reversible

byte[] a = {'\', 'x' , '0', '0'}

Bytes.toBytesBinary(Bytes.toStringBinary(a)) won't be equal to a


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HBASE-3398) increment(Increment, Integer, boolean) might fail

2010-12-29 Thread Prakash Khemani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Khemani resolved HBASE-3398.


  Resolution: Not A Problem
Release Note: maxversion is set to 1

 increment(Increment, Integer, boolean) might fail
 -

 Key: HBASE-3398
 URL: https://issues.apache.org/jira/browse/HBASE-3398
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
Reporter: Prakash Khemani
Assignee: Jonathan Gray

 In org.apache.hadoop.hbase.regionserver.HRegion.increment(Increment, Integer, 
 boolean) the following loop assumes that the result from geLastIncrement() 
 has a single entry for a given family, qualifier. But that is not 
 necessarily true. getLastIncrement() does a union of all entries found in 
 each of the store files ... and multiple versions of the same key are quite 
 possible.
   ListKeyValue results = getLastIncrement(get);
   // Iterate the input columns and update existing values if they were
   // found, otherwise add new column initialized to the increment 
 amount
   int idx = 0;
   for (Map.Entrybyte [], Long column : 
 family.getValue().entrySet()) {
 long amount = column.getValue();
 if (idx  results.size() 
 results.get(idx).matchingQualifier(column.getKey())) {
   amount += Bytes.toLong(results.get(idx).getValue());
   idx++;
 }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HBASE-3396) getLastIncrement() can miss some key-values

2010-12-29 Thread Prakash Khemani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Khemani resolved HBASE-3396.


  Resolution: Not A Problem
Release Note: maxVersion in the scan is set to 1

 getLastIncrement() can miss some key-values
 ---

 Key: HBASE-3396
 URL: https://issues.apache.org/jira/browse/HBASE-3396
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
Reporter: Prakash Khemani
Assignee: Jonathan Gray

 In getLastIncrement() there is an assumption that memstore only scan will 
 never return multiple versions of a kv
 // found everything we were looking for, done
 if (results.size() == expected) {
   return results;
 }
 Based on this assumption the code does an early out after it finds the 
 expected number of key-value pairs in the memstore. But what if there were 
 multiple versions of the same kv returned by the memstore scan? I think it is 
 possible when the memstore has a snapshot pending to be written out. A 
 version of the key can be returned each from the online and from the snapshot 
 memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3398) increment(Increment, Integer, boolean) might fail

2010-12-28 Thread Prakash Khemani (JIRA)
increment(Increment, Integer, boolean) might fail
-

 Key: HBASE-3398
 URL: https://issues.apache.org/jira/browse/HBASE-3398
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
Reporter: Prakash Khemani
Assignee: Jonathan Gray


In org.apache.hadoop.hbase.regionserver.HRegion.increment(Increment, Integer, 
boolean) the following loop assumes that the result from geLastIncrement() has 
a single entry for a given family, qualifier. But that is not necessarily 
true. getLastIncrement() does a union of all entries found in each of the store 
files ... and multiple versions of the same key are quite possible.

  ListKeyValue results = getLastIncrement(get);

  // Iterate the input columns and update existing values if they were
  // found, otherwise add new column initialized to the increment amount
  int idx = 0;
  for (Map.Entrybyte [], Long column : family.getValue().entrySet()) {
long amount = column.getValue();
if (idx  results.size() 
results.get(idx).matchingQualifier(column.getKey())) {
  amount += Bytes.toLong(results.get(idx).getValue());
  idx++;
}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3399) upsert doesn't matchFamily() before removing key

2010-12-28 Thread Prakash Khemani (JIRA)
upsert doesn't matchFamily() before removing key


 Key: HBASE-3399
 URL: https://issues.apache.org/jira/browse/HBASE-3399
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
Reporter: Prakash Khemani
Assignee: Jonathan Gray


org.apache.hadoop.hbase.regionserver.MemStore.upsert(KeyValue) doesn't match 
family before deciding to remove a kv in the memstore

  // if the qualifier matches and it's a put, remove it
  if (kv.matchingQualifier(cur)) {

// to be extra safe we only remove Puts that have a memstoreTS==0
if (kv.getType() == KeyValue.Type.Put.getCode() 
kv.getMemstoreTS() == 0) {
  // false means there was a change, so give us the size.
  addedSize -= heapSizeChange(kv, true);
  it.remove();
}

shouldn't it be if the family and qualifier match and it's a Put, remove it?



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3395) StoreScanner not being closed?

2010-12-27 Thread Prakash Khemani (JIRA)
StoreScanner not being closed?
--

 Key: HBASE-3395
 URL: https://issues.apache.org/jira/browse/HBASE-3395
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
Reporter: Prakash Khemani
Assignee: Jonathan Gray


In StoreScanner::next(ListKeyValue outResult, int limit)

case SEEK_NEXT_ROW:
  // This is just a relatively simple end of scan fix, to short-cut end 
us if there is a
  // endKey in the scan.
  if (!matcher.moreRowsMayExistAfter(kv)) {
outResult.addAll(results);
return false;
  }

close() is not being called before returning false. In all other cases close is 
called before returning false. May be this is a problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3292) Expose block cache hit/miss/evict counts into region server metrics

2010-12-01 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12965762#action_12965762
 ] 

Prakash Khemani commented on HBASE-3292:


aggregate hit/miss counts will definitely help. we should also report current 
block cache hit ratio - it will save us the hassle of deriving it from the 
aggregate counts.

 Expose block cache hit/miss/evict counts into region server metrics
 ---

 Key: HBASE-3292
 URL: https://issues.apache.org/jira/browse/HBASE-3292
 Project: HBase
  Issue Type: Improvement
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Minor
 Fix For: 0.90.1, 0.92.0

 Attachments: HBASE-3292-v1.patch


 Right now only the hit ratio is exposed into the rs metrics.  This value 
 tends to change very slowly and hardly at all once the cluster has been up 
 for some time.
 We should expose the aggregate hit/miss/evict counts so you can more 
 effectively see how things change over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3248) support Increment::incrementColumn()

2010-11-18 Thread Prakash Khemani (JIRA)
support Increment::incrementColumn()


 Key: HBASE-3248
 URL: https://issues.apache.org/jira/browse/HBASE-3248
 Project: HBase
  Issue Type: Improvement
  Components: client
Reporter: Prakash Khemani


The Increment.addColumn() API overwrites the old column value if it exists. we 
need a new method incrementColumn() that will sum up the old and new values.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3246) Add API to Increment client class that increments rather than replaces the amount for a column when done multiple times

2010-11-18 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933645#action_12933645
 ] 

Prakash Khemani commented on HBASE-3246:


is there any need for addColumn()? it might be cleaner to just have 
incrementColumn() in this API

 Add API to Increment client class that increments rather than replaces the 
 amount for a column when done multiple times
 ---

 Key: HBASE-3246
 URL: https://issues.apache.org/jira/browse/HBASE-3246
 Project: HBase
  Issue Type: Improvement
  Components: client
Reporter: Jonathan Gray
Assignee: Jonathan Gray
 Attachments: HBASE-3246-v1.patch


 In the new Increment class, the API to add columns is {{addColumn()}}.  If 
 you do this multiple times for an individual column, the amount to increment 
 by is replaced.  I think this is the right way for this method to work and it 
 is javadoc'd with the behavior.
 We should add a new method, {{incrementColumn()}} which will increment any 
 existing amount for the specified column rather than replacing it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3239) NPE when trying to roll logs

2010-11-16 Thread Prakash Khemani (JIRA)
NPE when trying to roll logs


 Key: HBASE-3239
 URL: https://issues.apache.org/jira/browse/HBASE-3239
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.0
Reporter: Prakash Khemani


Note from Kannan

findMemstoresWithEditsEqualOrOlderThan() can return NULL it seems like. And we 
don't check NULL, before region.length.

  regions = 
findMemstoresWithEditsEqualOrOlderThan(this.outputfiles.firstKey(),
this.lastSeqWritten);
  StringBuilder sb = new StringBuilder();
  for (int i = 0; i  regions.length; i++) {

===


Stack Trace

2010-11-15 19:19:54,258 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: 
LRU Stats: total=6.1 GB, free=1.71 GB, max=7.81 GB, blocks=385740, 
accesses=7020255, hits=6329399, hitRatio=90.15%%, cachingAccesses=6765050, 
cachingHits=6329399, cachingHitsRatio=93.56%%, evictions=1, evicted=49911, 
evictedPerRun=49911.0
2010-11-15 19:21:05,204 INFO 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using syncFs -- 
HDFS-200
2010-11-15 19:21:05,211 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: 
Roll 
/PUMAHBASE001-SNC5-HBASE/.logs/pumahbase042.snc5.facebook.com,60020,1289856892583/10.38.28.57%3A60020.1289877154987,
 entries=649004, filesize=255069060. New hlog 
/PUMAHBASE001-SNC5-HBASE/.logs/pumahbase042.snc5.facebook.com,60020,1289856892583/10.38.28.57%3A60020.1289877665062
2010-11-15 19:21:05,222 ERROR org.apache.hadoop.hbase.regionserver.LogRoller: 
Log rolling failed
java.lang.NullPointerException
at 
org.apache.hadoop.hbase.regionserver.wal.HLog.cleanOldLogs(HLog.java:648)
at 
org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:528)
at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
2010-11-15 19:21:05,226 FATAL 
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
serverName=pumahbase042.snc5.facebook.com,60020,1289856892583, 
load=(requests=3476, regions=40, usedHeap=8388, maxHeap=15987): Log rolling 
failed
java.lang.NullPointerException
at 
org.apache.hadoop.hbase.regionserver.wal.HLog.cleanOldLogs(HLog.java:648)
at 
org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:528)
at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
2010-11-15 19:21:05,227 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: 
request=1264.5834, regions=40, stores=70, storefiles=98, storefileIndexSize=35, 
memstoreSize=83, compactionQueueSize=0, usedHeap=8370, maxHeap=15987, 
blockCacheSize=6593768536, blockCacheFree=1788154792, blockCacheCount=388283, 
blockCacheHitRatio=90, blockCacheHitCachingRatio=93
2010-11-15 19:21:05,227 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Log rolling failed
2010-11-15 19:21:05,227 INFO org.apache.hadoop.hbase.regionserver.LogRoller: 
LogRoller exiting.
2010-11-15 19:21:07,255 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server 
on 60020


===




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-3196) Regionserver stuck when after all IPC Server handlers fatal'd

2010-11-04 Thread Prakash Khemani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Khemani updated HBASE-3196:
---

  Description: 
The region server is stuck with the following jstack

2010-11-03 22:23:41
Full thread dump Java HotSpot(TM) 64-Bit Server VM (14.0-b16 mixed mode):

Attach Listener daemon prio=10 tid=0x2aaeb6774000 nid=0x3974 waiting on 
condition [0x]
   java.lang.Thread.State: RUNNABLE

RS_CLOSE_REGION-pumahbase028.snc5.facebook.com,60020,1288733355197-2 prio=10 
tid=0x2aaeb8449000 nid=0x3bbc waiting on condition [0x43f67000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0x2aaab7fd1130 (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:619)

RS_CLOSE_REGION-pumahbase028.snc5.facebook.com,60020,1288733355197-1 prio=10 
tid=0x2aaeb843f800 nid=0x3bbb waiting on condition [0x43e66000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0x2aaab7fd1130 (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:619)

RS_CLOSE_REGION-pumahbase028.snc5.facebook.com,60020,1288733355197-0 prio=10 
tid=0x2aaeb8447800 nid=0x3bba waiting on condition [0x44068000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0x2aaab7fd1130 (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:619)

RMI Scheduler(0) daemon prio=10 tid=0x2aaeb48c4800 nid=0x1c97 waiting on 
condition [0x580a7000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0x2aaab773a118 (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1963)
at java.util.concurrent.DelayQueue.take(DelayQueue.java:164)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:583)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:576)
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:619)

RS_OPEN_REGION-pumahbase028.snc5.facebook.com,60020,1288733355197-2 daemon 
prio=10 tid=0x2aaeb4804800 nid=0x17a0 waiting on condition 
[0x582a9000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0x2aaab7fca538 (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
at 

[jira] Commented: (HBASE-3196) Regionserver stuck when after all IPC Server handlers fatal'd

2010-11-04 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928144#action_12928144
 ] 

Prakash Khemani commented on HBASE-3196:


I am not sure where the version mismatch is coming from. It shouldn't. The 3rd 
datanode in the pipeline is inaccessible and I cannot ssh to it.

===

In the logs it doesn't appear that there is any region that could not be closed.

When I grep for region names, all log lines at the end, look like the following

2010-11-02 17:36:25,490 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Closing 
tmp_realtime_domain_feed_imps_urls_hot,f6b05d27d8b93e0691ae35a846f2742cno.blogg.renate87
 sffi 2 7ffa8a0f 
10150313506035171,1288637339566.612cf51b8553ea552f3c283a542a7fe9.: disabling 
compactions  flushes
2010-11-02 17:36:25,490 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Updates disabled for region 
tmp_realtime_domain_feed_imps_urls_hot,f6b05d27d8b93e0691ae35a846f2742cno.blogg.renate87
 sffi 2 7ffa8a0f 
10150313506035171,1288637339566.612cf51b8553ea552f3c283a542a7fe9.
2010-11-02 17:36:25,490 ERROR 
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Closing region 
tmp_realtime_domain_feed_imps_urls_hot,f6b05d27d8b93e0691ae35a846f2742cno.blogg.renate87
 sffi 2 7ffa8a0f 
10150313506035171,1288637339566.612cf51b8553ea552f3c283a542a7fe9.
2010-11-02 17:36:25,490 DEBUG 
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Closed region 
tmp_realtime_domain_feed_imps_urls_hot,f6b05d27d8b93e0691ae35a846f2742cno.blogg.renate87
 sffi 2 7ffa8a0f 
10150313506035171,1288637339566.612cf51b8553ea552f3c283a542a7fe9.
2010-11-02 17:36:25,490 DEBUG 
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing 
close of 
tmp_realtime_domain_feed_imps_domains,bceda0983f7f83bfb71291220eb619c5com.mediafire.img16
 sffi,1288637064513.05ea84f89daf7274ab9a88d54460b03b.
2010-11-02 17:36:25,490 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Closing 
tmp_realtime_domain_feed_imps_domains,bceda0983f7f83bfb71291220eb619c5com.mediafire.img16
 sffi,1288637064513.05ea84f89daf7274ab9a88d54460b03b.: disabling compactions  
flushes
2010-11-02 17:36:25,490 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Updates disabled for region 
tmp_realtime_domain_feed_imps_domains,bceda0983f7f83bfb71291220eb619c5com.mediafire.img16
 sffi,1288637064513.05ea84f89daf7274ab9a88d54460b03b.
2010-11-02 17:36:25,490 ERROR 
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Closing region 
tmp_realtime_domain_feed_imps_domains,bceda0983f7f83bfb71291220eb619c5com.mediafire.img16
 sffi,1288637064513.05ea84f89daf7274ab9a88d54460b03b.
2010-11-02 17:36:25,490 DEBUG 
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Closed region 
tmp_realtime_domain_feed_imps_domains,bceda0983f7f83bfb71291220eb619c5com.mediafire.img16
 sffi,1288637064513.05ea84f89daf7274ab9a88d54460b03b.
2010-11-02 17:36:25,490 DEBUG 
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing 
close of 
tmp_realtime_domain_feed_imps_urls,a6c845f0d1b183bc1817e7aef8354199com.brooklynvegan
 sffi 8332505780,1288637275045.283d82705c44eafc24e13e0e2d2e2bc5.
2010-11-02 17:36:25,490 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Closing 
tmp_realtime_domain_feed_imps_urls,a6c845f0d1b183bc1817e7aef8354199com.brooklynvegan
 sffi 8332505780,1288637275045.283d82705c44eafc24e13e0e2d2e2bc5.: disabling 
compactions  flushes
2010-11-02 17:36:25,490 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Updates disabled for region 
tmp_realtime_domain_feed_imps_urls,a6c845f0d1b183bc1817e7aef8354199com.brooklynvegan
 sffi 8332505780,1288637275045.283d82705c44eafc24e13e0e2d2e2bc5.
2010-11-02 17:36:25,490 ERROR 
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Closing region 
tmp_realtime_domain_feed_imps_urls,a6c845f0d1b183bc1817e7aef8354199com.brooklynvegan
 sffi 8332505780,1288637275045.283d82705c44eafc24e13e0e2d2e2bc5.
2010-11-02 17:36:25,490 DEBUG 
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Closed region 
tmp_realtime_domain_feed_imps_urls,a6c845f0d1b183bc1817e7aef8354199com.brooklynvegan
 sffi 8332505780,1288637275045.283d82705c44eafc24e13e0e2d2e2bc5.
2010-11-02 17:36:25,482 ERROR 
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Closing region 
tmp_realtime_domain_feed_imps_urls_hot,35c28f48,1288637318666.e391902cf1e5a5a64c178001a42f055a.
2010-11-02 17:36:25,495 DEBUG 
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Closed region 
tmp_realtime_domain_feed_imps_urls_hot,35c28f48,1288637318666.e391902cf1e5a5a64c178001a42f055a.
2010-11-02 17:36:25,482 ERROR 
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Closing region 
tmp_realtime_domain_feed_imps_domains,3358dabfafc35708f8de488d6f20f848nl.ronsrovlinks
 sefi,1288637018644.2feeb71cd2573e8453ef77a9aa40aa38.
2010-11-02 

[jira] Created: (HBASE-3196) Regionserver stuck when after all IPC Server handlers fatal'

2010-11-03 Thread Prakash Khemani (JIRA)
Regionserver stuck when after all IPC Server handlers fatal'


 Key: HBASE-3196
 URL: https://issues.apache.org/jira/browse/HBASE-3196
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2957) Release row lock when waiting for wal-sync

2010-09-08 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907380#action_12907380
 ] 

Prakash Khemani commented on HBASE-2957:



I agree that the ordering on a given region server will be the same with or 
without delayed sync. But I am pretty sure that globally there will be 
inconsistencies.

Say a value is updated on RS A. This value is not synced yet.

The abovementioned unsynced value on RS A is read by someone and based on that 
value an update is made on another RS B. Say the update on RS B is synced.

Now we have a window where B depends on A, B is in the logs but A isn't. In in 
this window if RS A dies and comes back up then we will have a situation where 
the update on RS B is present but update on RS A isn't.






 Release row lock when waiting for wal-sync
 --

 Key: HBASE-2957
 URL: https://issues.apache.org/jira/browse/HBASE-2957
 Project: HBase
  Issue Type: Improvement
  Components: regionserver, wal
Affects Versions: 0.20.0
Reporter: Prakash Khemani

 Is there a reason to hold on to the row-lock while waiting for the WAL-sync 
 to be completed by the logSyncer thread?
 I think data consistency will be guaranteed even if the following happens (a) 
 the row lock is held while the row is updated in memory (b) the row lock is 
 released after queuing the KV record for WAL-syncing (c) the log-sync system 
 guarantees that the log records for any given row are synced in order (d) the 
 HBase client only receives a success notification after the sync completes 
 (no change from the current state)
 I think this should be a huge win. For my use case, and I am sure for others, 
  the handler thread spends the bulk of its row-lock critical section  time 
 waiting for sync to complete.
 Even if the log-sync system cannot guarantee the orderly completion of sync 
 records, the Don't hold row lock while waiting for sync option should be 
 available to HBase clients on a per request basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2957) Release row lock when waiting for wal-sync

2010-09-07 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906884#action_12906884
 ] 

Prakash Khemani commented on HBASE-2957:


Sorry, I was out and couldn't reply to this thread.

I think a general solution that guarantees consistency for PUTs and ICVs and at 
the same time doesn't hold the row lock while updating hlog is possible.

===

Thinking aloud. First why do we want to hold the row lock around the log sync? 
Because we want the log sync to happen in causal ordering. Here is a scenario 
of what can go wrong if we release the row lock before the sync completes.
1. client-1 does a put/icv on regionserver-1. releases the row lock 
before the sync.
2. client-2 comes in and reads the new value. Based on this just read 
value, client-2 then does a put in regionserver-2.
3. client-2 is able to do its sync on rs-2 before client-1's sync on 
rs-1 completes.
4. rs-1 is brought down ungracefully. During recovery we will have 
client-2's update but not client-1's. And that violates the causal ordering of 
events.

===
So we don't want anyone to read a value which has not already been synced. I 
think we can transfer the wait-for-sync to the reader instead of asking all 
writers to wait.

A simple way to do that will be to attach a log-sync-number with every cell. 
When a cell is updated it will keep the next log-sync-number within itself. A 
get will not return until the current log-sync-number is at least as big as 
log-sync-number stored in the cell.

An update can return immediately after queuing the sync. The wait-for-sync is 
transferred from the writer to the reader. If the reader comes in sufficiently 
late (which is likely) then there will be no wait-for-syncs in the system.

===
Even in this scheme we will have to treat ICVs specially. Logically an ICV has 
a (a) GET the old value (b) PUT the new value (c) GET and return the new value

There are 2 cases
(1) The ICV caller doesn't use the return value of the ICV. In this case the 
ICV need not wait for the earlier sync to complere. (In my use case this what 
happens predominantly)

(2) The ICV caller uses the return value of the ICV call to make further 
updates. In this case the ICV has to wait for its sync to complete before it 
returns. While the ICV is waiting for the sync to complete it need not hold the 
row lock. (At least in my use case this is a very rare case)

===
I think that it is true in general that while a GET is forced to wait for a 
sync to complete, there is no need to hold the row lock.

===






 Release row lock when waiting for wal-sync
 --

 Key: HBASE-2957
 URL: https://issues.apache.org/jira/browse/HBASE-2957
 Project: HBase
  Issue Type: Improvement
  Components: regionserver, wal
Affects Versions: 0.20.0
Reporter: Prakash Khemani

 Is there a reason to hold on to the row-lock while waiting for the WAL-sync 
 to be completed by the logSyncer thread?
 I think data consistency will be guaranteed even if the following happens (a) 
 the row lock is held while the row is updated in memory (b) the row lock is 
 released after queuing the KV record for WAL-syncing (c) the log-sync system 
 guarantees that the log records for any given row are synced in order (d) the 
 HBase client only receives a success notification after the sync completes 
 (no change from the current state)
 I think this should be a huge win. For my use case, and I am sure for others, 
  the handler thread spends the bulk of its row-lock critical section  time 
 waiting for sync to complete.
 Even if the log-sync system cannot guarantee the orderly completion of sync 
 records, the Don't hold row lock while waiting for sync option should be 
 available to HBase clients on a per request basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-2957) Release row lock when waiting for wal-sync

2010-09-02 Thread Prakash Khemani (JIRA)
Release row lock when waiting for wal-sync
--

 Key: HBASE-2957
 URL: https://issues.apache.org/jira/browse/HBASE-2957
 Project: HBase
  Issue Type: Improvement
  Components: regionserver, wal
Affects Versions: 0.20.0
Reporter: Prakash Khemani


Is there a reason to hold on to the row-lock while waiting for the WAL-sync to 
be completed by the logSyncer thread?

I think data consistency will be guaranteed even if the following happens (a) 
the row lock is held while the row is updated in memory (b) the row lock is 
released after queuing the KV record for WAL-syncing (c) the log-sync system 
guarantees that the log records for any given row are synced in order (d) the 
HBase client only receives a success notification after the sync completes (no 
change from the current state)

I think this should be a huge win. For my use case, and I am sure for others,  
the handler thread spends the bulk of its row-lock critical section  time 
waiting for sync to complete.

Even if the log-sync system cannot guarantee the orderly completion of sync 
records, the Don't hold row lock while waiting for sync option should be 
available to HBase clients on a per request basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2957) Release row lock when waiting for wal-sync

2010-09-02 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905798#action_12905798
 ] 

Prakash Khemani commented on HBASE-2957:


Actually, data consistency is not guaranteed if we return to the HBase client 
any value which has not yet been sync'd to WAL. But for my use case, and I 
think for many others, it is OK.






 Release row lock when waiting for wal-sync
 --

 Key: HBASE-2957
 URL: https://issues.apache.org/jira/browse/HBASE-2957
 Project: HBase
  Issue Type: Improvement
  Components: regionserver, wal
Affects Versions: 0.20.0
Reporter: Prakash Khemani

 Is there a reason to hold on to the row-lock while waiting for the WAL-sync 
 to be completed by the logSyncer thread?
 I think data consistency will be guaranteed even if the following happens (a) 
 the row lock is held while the row is updated in memory (b) the row lock is 
 released after queuing the KV record for WAL-syncing (c) the log-sync system 
 guarantees that the log records for any given row are synced in order (d) the 
 HBase client only receives a success notification after the sync completes 
 (no change from the current state)
 I think this should be a huge win. For my use case, and I am sure for others, 
  the handler thread spends the bulk of its row-lock critical section  time 
 waiting for sync to complete.
 Even if the log-sync system cannot guarantee the orderly completion of sync 
 records, the Don't hold row lock while waiting for sync option should be 
 available to HBase clients on a per request basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-2952) HConnectionManager's shutdown hook interferes with client's operations

2010-09-01 Thread Prakash Khemani (JIRA)
HConnectionManager's shutdown hook interferes with client's operations
--

 Key: HBASE-2952
 URL: https://issues.apache.org/jira/browse/HBASE-2952
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.20.0
Reporter: Prakash Khemani


My HBase client calls incrementColValue() in pairs. If someone kills the client 
(SIGINT or SIGTERM) I want my client's increment threads to gracefully exit. If 
a thread has already done one of the incrementColValue() then I want that 
thread to complete the other incrementColValue() and then exit.

For this purpose I installed my own shutdownHook(). My shitdownHook() thread 
'sugnals' all the threads in my process that it is time to exit and then waits 
for them to complete.

The problem is that HConnectionManager's shutdownHook thread also runs and 
shuts down all connections and IPC threads.

My increment thread keeps waiting to increment and then times out after 240s. 
Two problems with this - the incrementColValiue() didn't go through which will 
increase the chances of inconsistency in my HBase data. And it too 240s to 
exit. I am pasting some of the messages that the client thread outputs while it 
tries contact the HBase server.

Signalled. Exiting ...
2010-09-01 12:11:14,769 DEBUG [HCM.shutdownHook] 
zookeeper.ZooKeeperWrapper(787): 
localhost:/hbase,org.apache.hadoop.hbase.client.HConnectionManagerClosed 
connection with ZooKeeper; /hbase/root-region-server
flushing after 7899
2010-09-01 12:11:19,669 DEBUG [Line Processing Thread 0] 
client.HConnectionManager$TableServers(903): Cache hit for row  in tableName 
.META.: location server hadoop2205.snc3.facebook.com:60020, location region 
name .META.,,1.1028785192
2010-09-01 12:11:19,671 INFO  [Line Processing Thread 0] 
zookeeper.ZooKeeperWrapper(206): Reconnecting to zookeeper
2010-09-01 12:11:19,671 DEBUG [Line Processing Thread 0] 
zookeeper.ZooKeeperWrapper(212): 
localhost:/hbase,org.apache.hadoop.hbase.client.HConnectionManagerConnected 
to zookeeper again
2010-09-01 12:11:24,679 DEBUG [Line Processing Thread 0] 
client.HConnectionManager$TableServers(964): Removed .META.,,1.1028785192 for 
tableName=.META. from cache because of content_action_url_metrics,\x080r 
B\xF7\x81_T\x07\x08\x16uOrcom.gigya 429934274290948,99
2010-09-01 12:11:24,680 DEBUG [Line Processing Thread 0] 
client.HConnectionManager$TableServers(857): locateRegionInMeta attempt 0 of 4 
failed; retrying after sleep of 5000 because: The client is stopped
2010-09-01 12:11:24,680 DEBUG [Line Processing Thread 0] 
zookeeper.ZooKeeperWrapper(470): 
localhost:/hbase,org.apache.hadoop.hbase.client.HConnectionManagerTrying to 
read /hbase/root-region-server
2010-09-01 12:11:24,681 DEBUG [Line Processing Thread 0] 
zookeeper.ZooKeeperWrapper(489): 
localhost:/hbase,org.apache.hadoop.hbase.client.HConnectionManagerRead ZNode 
/hbase/root-region-server got 10.26.119.190:60020
2010-09-01 12:11:24,681 DEBUG [Line Processing Thread 0] 
client.HConnectionManager$TableServers(1116): Root region location changed. 
Sleeping.

===

It might be a good idea to only run the HCM shutdown code when all the HTables 
referring to it have been closed. That way the client can control when the 
shutdown actually happens.




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2952) HConnectionManager's shutdown hook interferes with client's operations

2010-09-01 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905232#action_12905232
 ] 

Prakash Khemani commented on HBASE-2952:


I don’t think that it can be a shutdownHook thread that HCM can accept. The JVM 
doesn't allow you to order in any way how the shutdownThreads will be run. It 
will have to be a user 'callback' that HCM's shutdownHook() thread will invoke.

Also, I think to get to the HCM instance we have to go through the HTable. 
There can be multiple instances of HCM in the same process.

How about HTable::disableShutdownHook(). And then it becomes the caller's 
responsibility to make sure HTable::close() is called for every instance of 
HTable.






 HConnectionManager's shutdown hook interferes with client's operations
 --

 Key: HBASE-2952
 URL: https://issues.apache.org/jira/browse/HBASE-2952
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.20.0
Reporter: Prakash Khemani

 My HBase client calls incrementColValue() in pairs. If someone kills the 
 client (SIGINT or SIGTERM) I want my client's increment threads to gracefully 
 exit. If a thread has already done one of the incrementColValue() then I want 
 that thread to complete the other incrementColValue() and then exit.
 For this purpose I installed my own shutdownHook(). My shitdownHook() thread 
 'sugnals' all the threads in my process that it is time to exit and then 
 waits for them to complete.
 The problem is that HConnectionManager's shutdownHook thread also runs and 
 shuts down all connections and IPC threads.
 My increment thread keeps waiting to increment and then times out after 240s. 
 Two problems with this - the incrementColValiue() didn't go through which 
 will increase the chances of inconsistency in my HBase data. And it too 240s 
 to exit. I am pasting some of the messages that the client thread outputs 
 while it tries contact the HBase server.
 Signalled. Exiting ...
 2010-09-01 12:11:14,769 DEBUG [HCM.shutdownHook] 
 zookeeper.ZooKeeperWrapper(787): 
 localhost:/hbase,org.apache.hadoop.hbase.client.HConnectionManagerClosed 
 connection with ZooKeeper; /hbase/root-region-server
 flushing after 7899
 2010-09-01 12:11:19,669 DEBUG [Line Processing Thread 0] 
 client.HConnectionManager$TableServers(903): Cache hit for row  in 
 tableName .META.: location server hadoop2205.snc3.facebook.com:60020, 
 location region name .META.,,1.1028785192
 2010-09-01 12:11:19,671 INFO  [Line Processing Thread 0] 
 zookeeper.ZooKeeperWrapper(206): Reconnecting to zookeeper
 2010-09-01 12:11:19,671 DEBUG [Line Processing Thread 0] 
 zookeeper.ZooKeeperWrapper(212): 
 localhost:/hbase,org.apache.hadoop.hbase.client.HConnectionManagerConnected 
 to zookeeper again
 2010-09-01 12:11:24,679 DEBUG [Line Processing Thread 0] 
 client.HConnectionManager$TableServers(964): Removed .META.,,1.1028785192 for 
 tableName=.META. from cache because of content_action_url_metrics,\x080r 
 B\xF7\x81_T\x07\x08\x16uOrcom.gigya 429934274290948,99
 2010-09-01 12:11:24,680 DEBUG [Line Processing Thread 0] 
 client.HConnectionManager$TableServers(857): locateRegionInMeta attempt 0 of 
 4 failed; retrying after sleep of 5000 because: The client is stopped
 2010-09-01 12:11:24,680 DEBUG [Line Processing Thread 0] 
 zookeeper.ZooKeeperWrapper(470): 
 localhost:/hbase,org.apache.hadoop.hbase.client.HConnectionManagerTrying to 
 read /hbase/root-region-server
 2010-09-01 12:11:24,681 DEBUG [Line Processing Thread 0] 
 zookeeper.ZooKeeperWrapper(489): 
 localhost:/hbase,org.apache.hadoop.hbase.client.HConnectionManagerRead 
 ZNode /hbase/root-region-server got 10.26.119.190:60020
 2010-09-01 12:11:24,681 DEBUG [Line Processing Thread 0] 
 client.HConnectionManager$TableServers(1116): Root region location changed. 
 Sleeping.
 ===
 It might be a good idea to only run the HCM shutdown code when all the 
 HTables referring to it have been closed. That way the client can control 
 when the shutdown actually happens.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2952) HConnectionManager's shutdown hook interferes with client's operations

2010-09-01 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905258#action_12905258
 ] 

Prakash Khemani commented on HBASE-2952:


In my experiment each processing thread that invokes incrementColValue() has 
its own instance of HTable. In an effort to try create multiple connections 
from my process to a region server I had each thread put something unique in 
its conf file. The following code then kicks in and creates multiple HCM - one 
per HTable instance. So, yes, it is possible to have multiple HCMs in a process 
- one per config.

  public static HConnection getConnection(Configuration conf) {
TableServers connection;
Integer key = HBaseConfiguration.hashCode(conf);
synchronized (HBASE_INSTANCES) {
  connection = HBASE_INSTANCES.get(key);

(BTW, my experiment to create multiple connections by creating multiple 
connection-managers had not worked. I had to modify 
ConnectionManager::getHRegionConnection() and the servers map to create 
multiple connections.)






 HConnectionManager's shutdown hook interferes with client's operations
 --

 Key: HBASE-2952
 URL: https://issues.apache.org/jira/browse/HBASE-2952
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.20.0
Reporter: Prakash Khemani

 My HBase client calls incrementColValue() in pairs. If someone kills the 
 client (SIGINT or SIGTERM) I want my client's increment threads to gracefully 
 exit. If a thread has already done one of the incrementColValue() then I want 
 that thread to complete the other incrementColValue() and then exit.
 For this purpose I installed my own shutdownHook(). My shitdownHook() thread 
 'sugnals' all the threads in my process that it is time to exit and then 
 waits for them to complete.
 The problem is that HConnectionManager's shutdownHook thread also runs and 
 shuts down all connections and IPC threads.
 My increment thread keeps waiting to increment and then times out after 240s. 
 Two problems with this - the incrementColValiue() didn't go through which 
 will increase the chances of inconsistency in my HBase data. And it too 240s 
 to exit. I am pasting some of the messages that the client thread outputs 
 while it tries contact the HBase server.
 Signalled. Exiting ...
 2010-09-01 12:11:14,769 DEBUG [HCM.shutdownHook] 
 zookeeper.ZooKeeperWrapper(787): 
 localhost:/hbase,org.apache.hadoop.hbase.client.HConnectionManagerClosed 
 connection with ZooKeeper; /hbase/root-region-server
 flushing after 7899
 2010-09-01 12:11:19,669 DEBUG [Line Processing Thread 0] 
 client.HConnectionManager$TableServers(903): Cache hit for row  in 
 tableName .META.: location server hadoop2205.snc3.facebook.com:60020, 
 location region name .META.,,1.1028785192
 2010-09-01 12:11:19,671 INFO  [Line Processing Thread 0] 
 zookeeper.ZooKeeperWrapper(206): Reconnecting to zookeeper
 2010-09-01 12:11:19,671 DEBUG [Line Processing Thread 0] 
 zookeeper.ZooKeeperWrapper(212): 
 localhost:/hbase,org.apache.hadoop.hbase.client.HConnectionManagerConnected 
 to zookeeper again
 2010-09-01 12:11:24,679 DEBUG [Line Processing Thread 0] 
 client.HConnectionManager$TableServers(964): Removed .META.,,1.1028785192 for 
 tableName=.META. from cache because of content_action_url_metrics,\x080r 
 B\xF7\x81_T\x07\x08\x16uOrcom.gigya 429934274290948,99
 2010-09-01 12:11:24,680 DEBUG [Line Processing Thread 0] 
 client.HConnectionManager$TableServers(857): locateRegionInMeta attempt 0 of 
 4 failed; retrying after sleep of 5000 because: The client is stopped
 2010-09-01 12:11:24,680 DEBUG [Line Processing Thread 0] 
 zookeeper.ZooKeeperWrapper(470): 
 localhost:/hbase,org.apache.hadoop.hbase.client.HConnectionManagerTrying to 
 read /hbase/root-region-server
 2010-09-01 12:11:24,681 DEBUG [Line Processing Thread 0] 
 zookeeper.ZooKeeperWrapper(489): 
 localhost:/hbase,org.apache.hadoop.hbase.client.HConnectionManagerRead 
 ZNode /hbase/root-region-server got 10.26.119.190:60020
 2010-09-01 12:11:24,681 DEBUG [Line Processing Thread 0] 
 client.HConnectionManager$TableServers(1116): Root region location changed. 
 Sleeping.
 ===
 It might be a good idea to only run the HCM shutdown code when all the 
 HTables referring to it have been closed. That way the client can control 
 when the shutdown actually happens.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2952) HConnectionManager's shutdown hook interferes with client's operations

2010-09-01 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905396#action_12905396
 ] 

Prakash Khemani commented on HBASE-2952:


The shutdown hook prevents the zookeeper logs from getting flooded with 
unnecessary connection timed out or such messages. If that is the case then 
the shutdown hook still serves some good purpose. IMO the behavior ought to be 
the following - users who properly call HTable::close on all the open Htables 
should see this nice HCM shutdown hook behavior. Others who don’t call close() 
will have their zk logs flooded. This goes to my earlier suggestion that 
HTable::close should trigger HCM::close and there should be some kind of ref 
counting in HCM.





 HConnectionManager's shutdown hook interferes with client's operations
 --

 Key: HBASE-2952
 URL: https://issues.apache.org/jira/browse/HBASE-2952
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.20.0
Reporter: Prakash Khemani

 My HBase client calls incrementColValue() in pairs. If someone kills the 
 client (SIGINT or SIGTERM) I want my client's increment threads to gracefully 
 exit. If a thread has already done one of the incrementColValue() then I want 
 that thread to complete the other incrementColValue() and then exit.
 For this purpose I installed my own shutdownHook(). My shitdownHook() thread 
 'sugnals' all the threads in my process that it is time to exit and then 
 waits for them to complete.
 The problem is that HConnectionManager's shutdownHook thread also runs and 
 shuts down all connections and IPC threads.
 My increment thread keeps waiting to increment and then times out after 240s. 
 Two problems with this - the incrementColValiue() didn't go through which 
 will increase the chances of inconsistency in my HBase data. And it too 240s 
 to exit. I am pasting some of the messages that the client thread outputs 
 while it tries contact the HBase server.
 Signalled. Exiting ...
 2010-09-01 12:11:14,769 DEBUG [HCM.shutdownHook] 
 zookeeper.ZooKeeperWrapper(787): 
 localhost:/hbase,org.apache.hadoop.hbase.client.HConnectionManagerClosed 
 connection with ZooKeeper; /hbase/root-region-server
 flushing after 7899
 2010-09-01 12:11:19,669 DEBUG [Line Processing Thread 0] 
 client.HConnectionManager$TableServers(903): Cache hit for row  in 
 tableName .META.: location server hadoop2205.snc3.facebook.com:60020, 
 location region name .META.,,1.1028785192
 2010-09-01 12:11:19,671 INFO  [Line Processing Thread 0] 
 zookeeper.ZooKeeperWrapper(206): Reconnecting to zookeeper
 2010-09-01 12:11:19,671 DEBUG [Line Processing Thread 0] 
 zookeeper.ZooKeeperWrapper(212): 
 localhost:/hbase,org.apache.hadoop.hbase.client.HConnectionManagerConnected 
 to zookeeper again
 2010-09-01 12:11:24,679 DEBUG [Line Processing Thread 0] 
 client.HConnectionManager$TableServers(964): Removed .META.,,1.1028785192 for 
 tableName=.META. from cache because of content_action_url_metrics,\x080r 
 B\xF7\x81_T\x07\x08\x16uOrcom.gigya 429934274290948,99
 2010-09-01 12:11:24,680 DEBUG [Line Processing Thread 0] 
 client.HConnectionManager$TableServers(857): locateRegionInMeta attempt 0 of 
 4 failed; retrying after sleep of 5000 because: The client is stopped
 2010-09-01 12:11:24,680 DEBUG [Line Processing Thread 0] 
 zookeeper.ZooKeeperWrapper(470): 
 localhost:/hbase,org.apache.hadoop.hbase.client.HConnectionManagerTrying to 
 read /hbase/root-region-server
 2010-09-01 12:11:24,681 DEBUG [Line Processing Thread 0] 
 zookeeper.ZooKeeperWrapper(489): 
 localhost:/hbase,org.apache.hadoop.hbase.client.HConnectionManagerRead 
 ZNode /hbase/root-region-server got 10.26.119.190:60020
 2010-09-01 12:11:24,681 DEBUG [Line Processing Thread 0] 
 client.HConnectionManager$TableServers(1116): Root region location changed. 
 Sleeping.
 ===
 It might be a good idea to only run the HCM shutdown code when all the 
 HTables referring to it have been closed. That way the client can control 
 when the shutdown actually happens.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.