[jira] [Commented] (HBASE-6134) Improvement for split-worker to speed up distributed log splitting

2012-07-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406299#comment-13406299
 ] 

stack commented on HBASE-6134:
--

Ok.  Then it was a pigment of my emancipation that you had.  You fellas fix so 
much, seemed possible.

 Improvement for split-worker to speed up distributed log splitting
 --

 Key: HBASE-6134
 URL: https://issues.apache.org/jira/browse/HBASE-6134
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.96.0

 Attachments: 6134v4.patch, HBASE-6134.patch, HBASE-6134v2.patch, 
 HBASE-6134v3-92.patch, HBASE-6134v3.patch, HBASE-6134v4-94.patch, 
 HBASE-6134v4.patch


 First,we do the test between local-master-splitting and 
 distributed-log-splitting
 Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
 splitting work), 400 regions in one hlog file
 local-master-split:60s+
 distributed-log-splitting:165s+
 In fact, in our production environment, distributed-log-splitting also took 
 60s with 30 regionservers for 34 hlog files (regionserver may be in high load)
 We found split-worker split one log file took about 20s
 (30ms~50ms per writer.close(); 10ms per create writers )
 I think we could do the improvement for this:
 Parallelizing the create and close writers in threads
 In the patch, change the logic for  distributed-log-splitting same as the 
 local-master-splitting and parallelizing the close in threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6283) [region_mover.rb] Add option to exclude list of hosts on unload instead of just assuming the source node.

2012-07-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406301#comment-13406301
 ] 

stack commented on HBASE-6283:
--

bq. Thanks for the pointer to Aravind's work – this is the first I've seen the 
blog. Have we encouraged Aravind to contribute his work?

He has contrib'd the non-SU stuff: i.e. the bit where can register in zk what 
regionservers are being rolled.

 [region_mover.rb] Add option to exclude list of hosts on unload instead of 
 just assuming the source node.
 -

 Key: HBASE-6283
 URL: https://issues.apache.org/jira/browse/HBASE-6283
 Project: HBase
  Issue Type: Improvement
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
  Labels: jruby
 Attachments: hbase-6283.patch


 Currently, the region_mover.rb script excludes a single host, the host 
 offloading data, as a region move target.  This essentially limits the number 
 of machines that can be shutdown at a time to one.  For larger clusters, it 
 is manageable to have several nodes down at a time and desirable to get this 
 process done more quickly.
 The proposed patch adds an exclude file option, that allows multiple hosts to 
 be excluded as targets.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.

2012-07-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406305#comment-13406305
 ] 

stack commented on HBASE-6299:
--

It looks to me like we have same issue in trunk.  Your suggested fix looks 
right Maryann.  Put up a patch and I'll have a go at making a unit test for it.

 RS starts region open while fails ack to HMaster.sendRegionOpen() causes 
 inconsistency in HMaster's region state and a series of successive problems.
 -

 Key: HBASE-6299
 URL: https://issues.apache.org/jira/browse/HBASE-6299
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Attachments: HBASE-6299.patch


 1. HMaster tries to assign a region to an RS.
 2. HMaster creates a RegionState for this region and puts it into 
 regionsInTransition.
 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS 
 receives the open region request and starts to proceed, with success 
 eventually. However, due to network problems, HMaster fails to receive the 
 response for the openRegion() call, and the call times out.
 4. HMaster attemps to assign for a second time, choosing another RS. 
 5. But since the HMaster's OpenedRegionHandler has been triggered by the 
 region open of the previous RS, and the RegionState has already been removed 
 from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK 
 node RS_ZK_REGION_OPENING updated by the second attempt.
 6. The unassigned ZK node stays and a later unassign fails coz 
 RS_ZK_REGION_CLOSING cannot be created.
 {code}
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
 region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.;
  
 plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.,
  src=swbss-hadoop-004,60020,1340890123243, 
 dest=swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:28,882 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,291 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
 event for 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, 
 regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node
 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Deleting existing unassigned node for 
 b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Successfully deleted unassigned node for 
 region b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has 
 opened the region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  that was online on serverName=swbss-hadoop-006,60020,1340890678078, 
 load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301)
 2012-06-29 07:07:41,140 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=0, 
 regions=575, usedHeap=0, maxHeap=0), trying 

[jira] [Commented] (HBASE-6326) Nested retry loops in HConnectionManager

2012-07-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406308#comment-13406308
 ] 

stack commented on HBASE-6326:
--

+1 Simple but ugly.  Good enough for a 0.94.1.

 Nested retry loops in HConnectionManager
 

 Key: HBASE-6326
 URL: https://issues.apache.org/jira/browse/HBASE-6326
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Priority: Critical
 Fix For: 0.94.1

 Attachments: 6326.txt


 While testing client timeouts when the HBase is not available we found that 
 even with aggressive settings, it takes the client 10 minutes or more to 
 finally receive an exception.
 Part of this is due to nested nested retry loops in locateRegion.
 locateRegion will first try to locate the table in meta (which is retried), 
 then it will try to locate the meta table is root (which is also retried).
 So for each retry of the meta lookup we retry the root lookup as well.
 I have have that avoids locateRegion retrying if it is called from code that 
 already has a retry loop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6309) [MTTR] Do NN operations outside of the ZK EventThread in SplitLogManager

2012-07-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406310#comment-13406310
 ] 

stack commented on HBASE-6309:
--

@Chunhui What about case where we fail a log splitting... how would the cleanup 
go?  If into a tmp dir, its easy remove the tmp dir (Otherwise, sounds like a 
fine idea).

 [MTTR] Do NN operations outside of the ZK EventThread in SplitLogManager
 

 Key: HBASE-6309
 URL: https://issues.apache.org/jira/browse/HBASE-6309
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.96.0


 We found this issue during the leap second cataclysm which prompted a 
 distributed splitting of all our logs.
 I saw that none of the RS were splitting after some time while the master was 
 showing that it wasn't even 30% done. jstack'ing I saw this:
 {noformat}
 main-EventThread daemon prio=10 tid=0x7f6ce46d8800 nid=0x5376 in
 Object.wait() [0x7f6ce2ecb000]
java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 at java.lang.Object.wait(Object.java:485)
 at org.apache.hadoop.ipc.Client.call(Client.java:1093)
 - locked 0x0005fdd661a0 (a org.apache.hadoop.ipc.Client$Call)
 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
 at $Proxy9.rename(Unknown Source)
 at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
 at $Proxy9.rename(Unknown Source)
 at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:759)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:253)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.moveRecoveredEditsFromTemp(HLogSplitter.java:553)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.moveRecoveredEditsFromTemp(HLogSplitter.java:519)
 at 
 org.apache.hadoop.hbase.master.SplitLogManager$1.finish(SplitLogManager.java:138)
 at 
 org.apache.hadoop.hbase.master.SplitLogManager.getDataSetWatchSuccess(SplitLogManager.java:431)
 at 
 org.apache.hadoop.hbase.master.SplitLogManager.access$1200(SplitLogManager.java:95)
 at 
 org.apache.hadoop.hbase.master.SplitLogManager$GetDataAsyncCallback.processResult(SplitLogManager.java:1011)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:571)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
 {noformat}
 We are effectively bottlenecking on doing NN operations and whatever else is 
 happening in GetDataAsyncCallback. It was so bad that on our 100 offline 
 cluster it took a few hours for the master to process all the incoming ZK 
 events while the actual splitting took a fraction of that time.
 I'm marking this as critical and against 0.96 but depending on how involved 
 the fix is we might want to backport.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6309) [MTTR] Do NN operations outside of the ZK EventThread in SplitLogManager

2012-07-04 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406316#comment-13406316
 ] 

chunhui shen commented on HBASE-6309:
-

bq.how would the cleanup go?
In HLogSplitter#createWAP
{code}
if ((tmpname == null)  fs.exists(regionedits)) {
  LOG.warn(Found existing old edits file. It could be the 
  + result of a previous failed split attempt. Deleting 
  + regionedits + , length=
  + fs.getFileStatus(regionedits).getLen());
  if (!fs.delete(regionedits, false)) {
LOG.warn(Failed delete of old  + regionedits);
  }
}
{code}
We could also fail a log splitting if using master-local-splitting, the clean 
up happen in the next splitting as per the above code

 [MTTR] Do NN operations outside of the ZK EventThread in SplitLogManager
 

 Key: HBASE-6309
 URL: https://issues.apache.org/jira/browse/HBASE-6309
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.96.0


 We found this issue during the leap second cataclysm which prompted a 
 distributed splitting of all our logs.
 I saw that none of the RS were splitting after some time while the master was 
 showing that it wasn't even 30% done. jstack'ing I saw this:
 {noformat}
 main-EventThread daemon prio=10 tid=0x7f6ce46d8800 nid=0x5376 in
 Object.wait() [0x7f6ce2ecb000]
java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 at java.lang.Object.wait(Object.java:485)
 at org.apache.hadoop.ipc.Client.call(Client.java:1093)
 - locked 0x0005fdd661a0 (a org.apache.hadoop.ipc.Client$Call)
 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
 at $Proxy9.rename(Unknown Source)
 at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
 at $Proxy9.rename(Unknown Source)
 at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:759)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:253)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.moveRecoveredEditsFromTemp(HLogSplitter.java:553)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.moveRecoveredEditsFromTemp(HLogSplitter.java:519)
 at 
 org.apache.hadoop.hbase.master.SplitLogManager$1.finish(SplitLogManager.java:138)
 at 
 org.apache.hadoop.hbase.master.SplitLogManager.getDataSetWatchSuccess(SplitLogManager.java:431)
 at 
 org.apache.hadoop.hbase.master.SplitLogManager.access$1200(SplitLogManager.java:95)
 at 
 org.apache.hadoop.hbase.master.SplitLogManager$GetDataAsyncCallback.processResult(SplitLogManager.java:1011)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:571)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
 {noformat}
 We are effectively bottlenecking on doing NN operations and whatever else is 
 happening in GetDataAsyncCallback. It was so bad that on our 100 offline 
 cluster it took a few hours for the master to process all the incoming ZK 
 events while the actual splitting took a fraction of that time.
 I'm marking this as critical and against 0.96 but depending on how involved 
 the fix is we might want to backport.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5450) Support for wire-compatibility in inter-cluster replication (ZK, etc)

2012-07-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406315#comment-13406315
 ] 

stack commented on HBASE-5450:
--

@Chris See HBASE-5965.  Looks like I abandoned it on the last length.  Looks 
like it needs some polish to get it over the finish line.  If you are on for 
it, be my guest.  Thanks.

 Support for wire-compatibility in inter-cluster replication (ZK, etc)
 -

 Key: HBASE-5450
 URL: https://issues.apache.org/jira/browse/HBASE-5450
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Todd Lipcon
Assignee: Chris Trezzo



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.

2012-07-04 Thread Maryann Xue (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maryann Xue updated HBASE-6299:
---

Attachment: HBASE-6299-v2.patch

Make handling of RegionAlreadyInTransitionException work.

 RS starts region open while fails ack to HMaster.sendRegionOpen() causes 
 inconsistency in HMaster's region state and a series of successive problems.
 -

 Key: HBASE-6299
 URL: https://issues.apache.org/jira/browse/HBASE-6299
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Attachments: HBASE-6299-v2.patch, HBASE-6299.patch


 1. HMaster tries to assign a region to an RS.
 2. HMaster creates a RegionState for this region and puts it into 
 regionsInTransition.
 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS 
 receives the open region request and starts to proceed, with success 
 eventually. However, due to network problems, HMaster fails to receive the 
 response for the openRegion() call, and the call times out.
 4. HMaster attemps to assign for a second time, choosing another RS. 
 5. But since the HMaster's OpenedRegionHandler has been triggered by the 
 region open of the previous RS, and the RegionState has already been removed 
 from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK 
 node RS_ZK_REGION_OPENING updated by the second attempt.
 6. The unassigned ZK node stays and a later unassign fails coz 
 RS_ZK_REGION_CLOSING cannot be created.
 {code}
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
 region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.;
  
 plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.,
  src=swbss-hadoop-004,60020,1340890123243, 
 dest=swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:28,882 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,291 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
 event for 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, 
 regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node
 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Deleting existing unassigned node for 
 b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Successfully deleted unassigned node for 
 region b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has 
 opened the region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  that was online on serverName=swbss-hadoop-006,60020,1340890678078, 
 load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301)
 2012-06-29 07:07:41,140 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=0, 
 regions=575, usedHeap=0, maxHeap=0), trying to assign elsewhere instead; 
 retry=0
 java.net.SocketTimeoutException: Call to 

[jira] [Commented] (HBASE-5705) Introduce Protocol Buffer RPC engine

2012-07-04 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406330#comment-13406330
 ] 

Devaraj Das commented on HBASE-5705:


Thanks for looking at the patch, Ted. I'll update it soon.

 Introduce Protocol Buffer RPC engine
 

 Key: HBASE-5705
 URL: https://issues.apache.org/jira/browse/HBASE-5705
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Devaraj Das
Assignee: Devaraj Das
 Attachments: 5705-1.patch


 Introduce Protocol Buffer RPC engine in the RPC core. Protocols that are PB 
 aware can be made to go through this RPC engine. The approach, in my current 
 thinking, would be similar to HADOOP-7773.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.

2012-07-04 Thread Maryann Xue (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maryann Xue updated HBASE-6299:
---

Status: Patch Available  (was: Open)

 RS starts region open while fails ack to HMaster.sendRegionOpen() causes 
 inconsistency in HMaster's region state and a series of successive problems.
 -

 Key: HBASE-6299
 URL: https://issues.apache.org/jira/browse/HBASE-6299
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0, 0.90.6
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Attachments: HBASE-6299-v2.patch, HBASE-6299.patch


 1. HMaster tries to assign a region to an RS.
 2. HMaster creates a RegionState for this region and puts it into 
 regionsInTransition.
 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS 
 receives the open region request and starts to proceed, with success 
 eventually. However, due to network problems, HMaster fails to receive the 
 response for the openRegion() call, and the call times out.
 4. HMaster attemps to assign for a second time, choosing another RS. 
 5. But since the HMaster's OpenedRegionHandler has been triggered by the 
 region open of the previous RS, and the RegionState has already been removed 
 from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK 
 node RS_ZK_REGION_OPENING updated by the second attempt.
 6. The unassigned ZK node stays and a later unassign fails coz 
 RS_ZK_REGION_CLOSING cannot be created.
 {code}
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
 region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.;
  
 plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.,
  src=swbss-hadoop-004,60020,1340890123243, 
 dest=swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:28,882 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,291 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
 event for 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, 
 regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node
 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Deleting existing unassigned node for 
 b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Successfully deleted unassigned node for 
 region b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has 
 opened the region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  that was online on serverName=swbss-hadoop-006,60020,1340890678078, 
 load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301)
 2012-06-29 07:07:41,140 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=0, 
 regions=575, usedHeap=0, maxHeap=0), trying to assign elsewhere instead; 
 retry=0
 java.net.SocketTimeoutException: Call to /172.16.0.6:60020 failed on socket 
 timeout exception: 

[jira] [Updated] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.

2012-07-04 Thread Maryann Xue (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maryann Xue updated HBASE-6299:
---

Status: Open  (was: Patch Available)

 RS starts region open while fails ack to HMaster.sendRegionOpen() causes 
 inconsistency in HMaster's region state and a series of successive problems.
 -

 Key: HBASE-6299
 URL: https://issues.apache.org/jira/browse/HBASE-6299
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0, 0.90.6
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Attachments: HBASE-6299-v2.patch, HBASE-6299.patch


 1. HMaster tries to assign a region to an RS.
 2. HMaster creates a RegionState for this region and puts it into 
 regionsInTransition.
 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS 
 receives the open region request and starts to proceed, with success 
 eventually. However, due to network problems, HMaster fails to receive the 
 response for the openRegion() call, and the call times out.
 4. HMaster attemps to assign for a second time, choosing another RS. 
 5. But since the HMaster's OpenedRegionHandler has been triggered by the 
 region open of the previous RS, and the RegionState has already been removed 
 from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK 
 node RS_ZK_REGION_OPENING updated by the second attempt.
 6. The unassigned ZK node stays and a later unassign fails coz 
 RS_ZK_REGION_CLOSING cannot be created.
 {code}
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
 region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.;
  
 plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.,
  src=swbss-hadoop-004,60020,1340890123243, 
 dest=swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:28,882 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,291 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
 event for 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, 
 regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node
 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Deleting existing unassigned node for 
 b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Successfully deleted unassigned node for 
 region b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has 
 opened the region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  that was online on serverName=swbss-hadoop-006,60020,1340890678078, 
 load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301)
 2012-06-29 07:07:41,140 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=0, 
 regions=575, usedHeap=0, maxHeap=0), trying to assign elsewhere instead; 
 retry=0
 java.net.SocketTimeoutException: Call to /172.16.0.6:60020 failed on socket 
 timeout exception: 

[jira] [Commented] (HBASE-6309) [MTTR] Do NN operations outside of the ZK EventThread in SplitLogManager

2012-07-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406345#comment-13406345
 ] 

stack commented on HBASE-6309:
--

What about the logic in moveRecoveredEditsFromTemp?  It flags corrupted logs 
and does some other cleanup.  Also seems to find recovered.edits files with a 
.corrupt ending: see ZKSlitLog.isCorruptFlagFile That'd need refactoring 
and a rename from moveRecoveredEditsFromTemp to 'completeLogSplit' or 'finish'?

Otherwise, looking through HLogSplitting and trying to recall issues we've run 
into w/ recovered.edits, I think doing it in place can work.

Would suggest you look at the region open and replay of recovered.edits stuff 
too to see if you see any possible issues there (I only went through 
HLogSplitting).

(That renaming stuff is pretty heavy duty stuffbut I'd have done the same 
to cordon off a distributed operation)

Good stuff Chunhui.

 [MTTR] Do NN operations outside of the ZK EventThread in SplitLogManager
 

 Key: HBASE-6309
 URL: https://issues.apache.org/jira/browse/HBASE-6309
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.96.0


 We found this issue during the leap second cataclysm which prompted a 
 distributed splitting of all our logs.
 I saw that none of the RS were splitting after some time while the master was 
 showing that it wasn't even 30% done. jstack'ing I saw this:
 {noformat}
 main-EventThread daemon prio=10 tid=0x7f6ce46d8800 nid=0x5376 in
 Object.wait() [0x7f6ce2ecb000]
java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 at java.lang.Object.wait(Object.java:485)
 at org.apache.hadoop.ipc.Client.call(Client.java:1093)
 - locked 0x0005fdd661a0 (a org.apache.hadoop.ipc.Client$Call)
 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
 at $Proxy9.rename(Unknown Source)
 at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
 at $Proxy9.rename(Unknown Source)
 at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:759)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:253)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.moveRecoveredEditsFromTemp(HLogSplitter.java:553)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.moveRecoveredEditsFromTemp(HLogSplitter.java:519)
 at 
 org.apache.hadoop.hbase.master.SplitLogManager$1.finish(SplitLogManager.java:138)
 at 
 org.apache.hadoop.hbase.master.SplitLogManager.getDataSetWatchSuccess(SplitLogManager.java:431)
 at 
 org.apache.hadoop.hbase.master.SplitLogManager.access$1200(SplitLogManager.java:95)
 at 
 org.apache.hadoop.hbase.master.SplitLogManager$GetDataAsyncCallback.processResult(SplitLogManager.java:1011)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:571)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
 {noformat}
 We are effectively bottlenecking on doing NN operations and whatever else is 
 happening in GetDataAsyncCallback. It was so bad that on our 100 offline 
 cluster it took a few hours for the master to process all the incoming ZK 
 events while the actual splitting took a fraction of that time.
 I'm marking this as critical and against 0.96 but depending on how involved 
 the fix is we might want to backport.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6309) [MTTR] Do NN operations outside of the ZK EventThread in SplitLogManager

2012-07-04 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406353#comment-13406353
 ] 

ramkrishna.s.vasudevan commented on HBASE-6309:
---

Currently there are 3 renames in this path.
The one that renames the temp to recovered.edits path and the next is in 
archive logs. Here there are 2 one for corrupted ones and the other for 
archived path.

In between there are lot of deletes and exists call.  I think we can reduce no 
of NN operations.  How costly is delete and exists check? I will check on this 
more.


 [MTTR] Do NN operations outside of the ZK EventThread in SplitLogManager
 

 Key: HBASE-6309
 URL: https://issues.apache.org/jira/browse/HBASE-6309
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.96.0


 We found this issue during the leap second cataclysm which prompted a 
 distributed splitting of all our logs.
 I saw that none of the RS were splitting after some time while the master was 
 showing that it wasn't even 30% done. jstack'ing I saw this:
 {noformat}
 main-EventThread daemon prio=10 tid=0x7f6ce46d8800 nid=0x5376 in
 Object.wait() [0x7f6ce2ecb000]
java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 at java.lang.Object.wait(Object.java:485)
 at org.apache.hadoop.ipc.Client.call(Client.java:1093)
 - locked 0x0005fdd661a0 (a org.apache.hadoop.ipc.Client$Call)
 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
 at $Proxy9.rename(Unknown Source)
 at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
 at $Proxy9.rename(Unknown Source)
 at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:759)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:253)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.moveRecoveredEditsFromTemp(HLogSplitter.java:553)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.moveRecoveredEditsFromTemp(HLogSplitter.java:519)
 at 
 org.apache.hadoop.hbase.master.SplitLogManager$1.finish(SplitLogManager.java:138)
 at 
 org.apache.hadoop.hbase.master.SplitLogManager.getDataSetWatchSuccess(SplitLogManager.java:431)
 at 
 org.apache.hadoop.hbase.master.SplitLogManager.access$1200(SplitLogManager.java:95)
 at 
 org.apache.hadoop.hbase.master.SplitLogManager$GetDataAsyncCallback.processResult(SplitLogManager.java:1011)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:571)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
 {noformat}
 We are effectively bottlenecking on doing NN operations and whatever else is 
 happening in GetDataAsyncCallback. It was so bad that on our 100 offline 
 cluster it took a few hours for the master to process all the incoming ZK 
 events while the actual splitting took a fraction of that time.
 I'm marking this as critical and against 0.96 but depending on how involved 
 the fix is we might want to backport.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6306) TestFSUtils fails against hadoop 2.0

2012-07-04 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406384#comment-13406384
 ] 

ramkrishna.s.vasudevan commented on HBASE-6306:
---

In 0.94 this is not there right, Jon?  Because for us this testcase in 0.94 
passes on hadoop2.0.

 TestFSUtils fails against hadoop 2.0
 

 Key: HBASE-6306
 URL: https://issues.apache.org/jira/browse/HBASE-6306
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.96.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Fix For: 0.96.0

 Attachments: hbase-6306-trunk.patch


 trunk: mvn clean test -Dhadoop.profile=2.0 -Dtest=TestFSUtils
 {code}
 java.io.FileNotFoundException: File 
 /home/jon/proj/hbase-trunk/hbase-server/target/test-data/02beb8c8-06c1-47ea-829b-6e7ce0570cf8/hbase.version
  does not exist
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:315)
 at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1279)
 at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1319)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:557)
 at 
 org.apache.hadoop.fs.FilterFileSystem.listStatus(FilterFileSystem.java:213)
 at org.apache.hadoop.hbase.util.FSUtils.getVersion(FSUtils.java:270)
 at 
 org.apache.hadoop.hbase.util.TestFSUtils.testVersion(TestFSUtils.java:58)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 ... 
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6313) Client hangs because the client is not notified

2012-07-04 Thread binlijin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

binlijin updated HBASE-6313:


Attachment: HBASE-6313-0.94-2.patch

 Client hangs because the client is not notified 
 

 Key: HBASE-6313
 URL: https://issues.apache.org/jira/browse/HBASE-6313
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: binlijin
 Fix For: 0.94.1

 Attachments: HBASE-6313-0.92-2.patch, HBASE-6313-0.92.patch, 
 HBASE-6313-0.94-2.patch, HBASE-6313-0.94.patch, HBASE-6313-trunk.patch, 
 clienthangthread.out


 If the call first remove from the calls, when some exception happened in 
 reading from the DataInputStream, the call will not be notified, cause the 
 client hangs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6313) Client hangs because the client is not notified

2012-07-04 Thread binlijin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

binlijin updated HBASE-6313:


Attachment: HBASE-6313-0.92-3.patch

 Client hangs because the client is not notified 
 

 Key: HBASE-6313
 URL: https://issues.apache.org/jira/browse/HBASE-6313
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: binlijin
 Fix For: 0.94.1

 Attachments: HBASE-6313-0.92-2.patch, HBASE-6313-0.92-3.patch, 
 HBASE-6313-0.92.patch, HBASE-6313-0.94-2.patch, HBASE-6313-0.94.patch, 
 HBASE-6313-trunk.patch, clienthangthread.out


 If the call first remove from the calls, when some exception happened in 
 reading from the DataInputStream, the call will not be notified, cause the 
 client hangs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6313) Client hangs because the client is not notified

2012-07-04 Thread binlijin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

binlijin updated HBASE-6313:


Attachment: HBASE-6313-trunk-2.patch

 Client hangs because the client is not notified 
 

 Key: HBASE-6313
 URL: https://issues.apache.org/jira/browse/HBASE-6313
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: binlijin
 Fix For: 0.94.1

 Attachments: HBASE-6313-0.92-2.patch, HBASE-6313-0.92-3.patch, 
 HBASE-6313-0.92.patch, HBASE-6313-0.94-2.patch, HBASE-6313-0.94.patch, 
 HBASE-6313-trunk-2.patch, HBASE-6313-trunk.patch, clienthangthread.out


 If the call first remove from the calls, when some exception happened in 
 reading from the DataInputStream, the call will not be notified, cause the 
 client hangs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4955) Use the official versions of surefire junit

2012-07-04 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406429#comment-13406429
 ] 

nkeywal commented on HBASE-4955:


Update: Still waiting. There is some life on Surefire, for JUnit there won't be 
anything before Q4 I guess.

 Use the official versions of surefire  junit
 -

 Key: HBASE-4955
 URL: https://issues.apache.org/jira/browse/HBASE-4955
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor

 We currently use private versions for Surefire  JUnit since HBASE-4763.
 This JIRA traks what we need to move to official versions.
 Surefire 2.11 is just out, but, after some tests, it does not contain all 
 what we need.
 JUnit. Could be for JUnit 4.11. Issue to monitor:
 https://github.com/KentBeck/junit/issues/359: fixed in our version, no 
 feedback for an integration on trunk
 Surefire: Could be for Surefire 2.12. Issues to monitor are:
 329 (category support): fixed, we use the official implementation from the 
 trunk
 786 (@Category with forkMode=always): fixed, we use the official 
 implementation from the trunk
 791 (incorrect elapsed time on test failure): fixed, we use the official 
 implementation from the trunk
 793 (incorrect time in the XML report): Not fixed (reopen) on trunk, fixed on 
 our version.
 760 (does not take into account the test method): fixed in trunk, not fixed 
 in our version
 798 (print immediately the test class name): not fixed in trunk, not fixed in 
 our version
 799 (Allow test parallelization when forkMode=always): not fixed in trunk, 
 not fixed in our version
 800 (redirectTestOutputToFile not taken into account): not yet fix on trunk, 
 fixed on our version
 800  793 are the more important to monitor, it's the only ones that are 
 fixed in our version but not on trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6309) [MTTR] Do NN operations outside of the ZK EventThread in SplitLogManager

2012-07-04 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406434#comment-13406434
 ] 

nkeywal commented on HBASE-6309:


bq. How costly is delete and exists check?
A remote call to the NN, but no socket creation (it's persistent). There is no 
cache on the client side, so all exists calls will do the network loop. Exists 
is pretty fast (not much more cost than the network roundtrip), but it adds a 
little something to the NN and network workload that can be already high when 
there is a major failure...

 [MTTR] Do NN operations outside of the ZK EventThread in SplitLogManager
 

 Key: HBASE-6309
 URL: https://issues.apache.org/jira/browse/HBASE-6309
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.96.0


 We found this issue during the leap second cataclysm which prompted a 
 distributed splitting of all our logs.
 I saw that none of the RS were splitting after some time while the master was 
 showing that it wasn't even 30% done. jstack'ing I saw this:
 {noformat}
 main-EventThread daemon prio=10 tid=0x7f6ce46d8800 nid=0x5376 in
 Object.wait() [0x7f6ce2ecb000]
java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 at java.lang.Object.wait(Object.java:485)
 at org.apache.hadoop.ipc.Client.call(Client.java:1093)
 - locked 0x0005fdd661a0 (a org.apache.hadoop.ipc.Client$Call)
 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
 at $Proxy9.rename(Unknown Source)
 at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
 at $Proxy9.rename(Unknown Source)
 at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:759)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:253)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.moveRecoveredEditsFromTemp(HLogSplitter.java:553)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.moveRecoveredEditsFromTemp(HLogSplitter.java:519)
 at 
 org.apache.hadoop.hbase.master.SplitLogManager$1.finish(SplitLogManager.java:138)
 at 
 org.apache.hadoop.hbase.master.SplitLogManager.getDataSetWatchSuccess(SplitLogManager.java:431)
 at 
 org.apache.hadoop.hbase.master.SplitLogManager.access$1200(SplitLogManager.java:95)
 at 
 org.apache.hadoop.hbase.master.SplitLogManager$GetDataAsyncCallback.processResult(SplitLogManager.java:1011)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:571)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
 {noformat}
 We are effectively bottlenecking on doing NN operations and whatever else is 
 happening in GetDataAsyncCallback. It was so bad that on our 100 offline 
 cluster it took a few hours for the master to process all the incoming ZK 
 events while the actual splitting took a fraction of that time.
 I'm marking this as critical and against 0.96 but depending on how involved 
 the fix is we might want to backport.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6327) HLog can be null when create table

2012-07-04 Thread ShiXing (JIRA)
ShiXing created HBASE-6327:
--

 Summary: HLog can be null when create table
 Key: HBASE-6327
 URL: https://issues.apache.org/jira/browse/HBASE-6327
 Project: HBase
  Issue Type: Bug
Reporter: ShiXing
Assignee: ShiXing
 Attachments: createTableFailedMaster.log

As HBASE-4010 discussed, the HLog can be null.

We have meet createTable failed because the no use hlog.

When createHReagion, the HLog.LogSyncer is run sync(), in under layer it call 
the DFSClient.DFSOutputStream.sync(). 

Then the hlog.closeAndDelete() was called,firstly the HLog.close() will 
interrupt the LogSyncer, and interrupt DFSClient.DFSOutputStream.sync().The 
DFSClient.DFSOutputStream will store the exception and throw it when we called 
DFSClient.close(). 

The HLog.close() call the writer.close()/DFSClient.close() after interrupt the 
LogSyncer. And there is no catch exception for the close().

So the Master throw exception to the client. There is no need to throw this 
exception, further, the hlog is no use.

Our cluster is 0.90, the logs is attached, after closing hlog writer, there 
is no log for the createTable().

The trunk and 0.92, 0.94, we used just one hlog, and if the exception happends, 
the client will got createTable failed, but indeed ,all the regions for the 
table can also be assigned.

I will give the patch for this later.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6327) HLog can be null when create table

2012-07-04 Thread ShiXing (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ShiXing updated HBASE-6327:
---

Attachment: createTableFailedMaster.log

 HLog can be null when create table
 --

 Key: HBASE-6327
 URL: https://issues.apache.org/jira/browse/HBASE-6327
 Project: HBase
  Issue Type: Bug
Reporter: ShiXing
Assignee: ShiXing
 Attachments: createTableFailedMaster.log


 As HBASE-4010 discussed, the HLog can be null.
 We have meet createTable failed because the no use hlog.
 When createHReagion, the HLog.LogSyncer is run sync(), in under layer it call 
 the DFSClient.DFSOutputStream.sync(). 
 Then the hlog.closeAndDelete() was called,firstly the HLog.close() will 
 interrupt the LogSyncer, and interrupt DFSClient.DFSOutputStream.sync().The 
 DFSClient.DFSOutputStream will store the exception and throw it when we 
 called DFSClient.close(). 
 The HLog.close() call the writer.close()/DFSClient.close() after interrupt 
 the LogSyncer. And there is no catch exception for the close().
 So the Master throw exception to the client. There is no need to throw this 
 exception, further, the hlog is no use.
 Our cluster is 0.90, the logs is attached, after closing hlog writer, there 
 is no log for the createTable().
 The trunk and 0.92, 0.94, we used just one hlog, and if the exception 
 happends, the client will got createTable failed, but indeed ,all the regions 
 for the table can also be assigned.
 I will give the patch for this later.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6272) In-memory region state is inconsistent

2012-07-04 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406445#comment-13406445
 ] 

ramkrishna.s.vasudevan commented on HBASE-6272:
---

Some comments (should say questions) added in RB.  Thanks.

 In-memory region state is inconsistent
 --

 Key: HBASE-6272
 URL: https://issues.apache.org/jira/browse/HBASE-6272
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang

 AssignmentManger stores region state related information in several places: 
 regionsInTransition, regions (region info to server name map), and servers 
 (server name to region info set map).  However the access to these places is 
 not coordinated properly.  It leads to inconsistent in-memory region state 
 information.  Sometimes, some region could even be offline, and not in 
 transition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5876) TestImportExport has been failing against hadoop 0.23 profile

2012-07-04 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406455#comment-13406455
 ] 

ramkrishna.s.vasudevan commented on HBASE-5876:
---

I tried running the patch for 0.94 on hadoop 2.0. It passed(but not much aware 
of the changes). :)

 TestImportExport has been failing against hadoop 0.23 profile
 -

 Key: HBASE-5876
 URL: https://issues.apache.org/jira/browse/HBASE-5876
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.96.0
Reporter: Zhihong Ted Yu
Assignee: Jonathan Hsieh
 Fix For: 0.96.0, 0.94.1

 Attachments: hbase-5876-94-v3.patch, hbase-5876-94.patch, 
 hbase-5876-trunk-v3.patch, hbase-5876-v2.patch, hbase-5876.patch


 TestImportExport has been failing against hadoop 0.23 profile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.

2012-07-04 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406476#comment-13406476
 ] 

ramkrishna.s.vasudevan commented on HBASE-6299:
---

@Maryann
We just checked over here in 0.94.
{code}
if (t instanceof RegionAlreadyInTransitionException) {
String errorMsg = Failed assignment in:  + plan.getDestination()
+  due to  + t.getMessage();
LOG.error(errorMsg, t);
return;
  }
{code}
This piece of code is correct.  If we directly check instancof it doesn't 
match.  Thanks..

 RS starts region open while fails ack to HMaster.sendRegionOpen() causes 
 inconsistency in HMaster's region state and a series of successive problems.
 -

 Key: HBASE-6299
 URL: https://issues.apache.org/jira/browse/HBASE-6299
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Attachments: HBASE-6299-v2.patch, HBASE-6299.patch


 1. HMaster tries to assign a region to an RS.
 2. HMaster creates a RegionState for this region and puts it into 
 regionsInTransition.
 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS 
 receives the open region request and starts to proceed, with success 
 eventually. However, due to network problems, HMaster fails to receive the 
 response for the openRegion() call, and the call times out.
 4. HMaster attemps to assign for a second time, choosing another RS. 
 5. But since the HMaster's OpenedRegionHandler has been triggered by the 
 region open of the previous RS, and the RegionState has already been removed 
 from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK 
 node RS_ZK_REGION_OPENING updated by the second attempt.
 6. The unassigned ZK node stays and a later unassign fails coz 
 RS_ZK_REGION_CLOSING cannot be created.
 {code}
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
 region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.;
  
 plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.,
  src=swbss-hadoop-004,60020,1340890123243, 
 dest=swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:28,882 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,291 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
 event for 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, 
 regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node
 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Deleting existing unassigned node for 
 b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Successfully deleted unassigned node for 
 region b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has 
 opened the region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  that was online on serverName=swbss-hadoop-006,60020,1340890678078, 
 load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301)
 2012-06-29 07:07:41,140 

[jira] [Commented] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.

2012-07-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406533#comment-13406533
 ] 

stack commented on HBASE-6299:
--

bq. This piece of code is correct. If we directly check instancof it doesn't 
match. Thanks..

Is it correct or incorrect Ram?  I'm not sure going by the above.

 RS starts region open while fails ack to HMaster.sendRegionOpen() causes 
 inconsistency in HMaster's region state and a series of successive problems.
 -

 Key: HBASE-6299
 URL: https://issues.apache.org/jira/browse/HBASE-6299
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Attachments: HBASE-6299-v2.patch, HBASE-6299.patch


 1. HMaster tries to assign a region to an RS.
 2. HMaster creates a RegionState for this region and puts it into 
 regionsInTransition.
 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS 
 receives the open region request and starts to proceed, with success 
 eventually. However, due to network problems, HMaster fails to receive the 
 response for the openRegion() call, and the call times out.
 4. HMaster attemps to assign for a second time, choosing another RS. 
 5. But since the HMaster's OpenedRegionHandler has been triggered by the 
 region open of the previous RS, and the RegionState has already been removed 
 from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK 
 node RS_ZK_REGION_OPENING updated by the second attempt.
 6. The unassigned ZK node stays and a later unassign fails coz 
 RS_ZK_REGION_CLOSING cannot be created.
 {code}
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
 region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.;
  
 plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.,
  src=swbss-hadoop-004,60020,1340890123243, 
 dest=swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:28,882 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,291 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
 event for 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, 
 regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node
 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Deleting existing unassigned node for 
 b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Successfully deleted unassigned node for 
 region b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has 
 opened the region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  that was online on serverName=swbss-hadoop-006,60020,1340890678078, 
 load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301)
 2012-06-29 07:07:41,140 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=0, 
 regions=575, 

[jira] [Commented] (HBASE-6319) ReplicationSource can call terminate on itself and deadlock

2012-07-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406539#comment-13406539
 ] 

stack commented on HBASE-6319:
--

How does 'this' get shutdown then?

 ReplicationSource can call terminate on itself and deadlock
 ---

 Key: HBASE-6319
 URL: https://issues.apache.org/jira/browse/HBASE-6319
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6, 0.92.1, 0.94.0
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.90.7, 0.92.2, 0.94.2

 Attachments: HBASE-6319-0.92.patch


 In a few places in the ReplicationSource code calls terminate on itself which 
 is a problem since in terminate() we wait on that thread to die.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6312) Make BlockCache eviction thresholds configurable

2012-07-04 Thread Jason Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406553#comment-13406553
 ] 

Jason Dai commented on HBASE-6312:
--

If we do not expose the acceptable factor and minimum factor, the 
hfile.block.cache.size parameter can be very confusing (for the user to 
properly configure the cache size behavior). Just as J-D mentioned, if the user 
want 2GB cache, he needs to set the parameter to ~2.35GB, and he needs to 
understand the HBase implementation details to do that. This looks a lot like 
hacking, not a user friendly interface. Maybe We should evict only after cache 
size is large than hfile.block.cache.size, and allow ~15% burstiness before 
blocking.


 Make BlockCache eviction thresholds configurable
 

 Key: HBASE-6312
 URL: https://issues.apache.org/jira/browse/HBASE-6312
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jie Huang
Priority: Minor
 Attachments: hbase-6312.patch


 Some of our customers found that tuning the BlockCache eviction thresholds 
 made test results different in their test environment. However, those 
 thresholds are not configurable in the current implementation. The only way 
 to change those values is to re-compile the HBase source code. We wonder if 
 it is possible to make them configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5705) Introduce Protocol Buffer RPC engine

2012-07-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406557#comment-13406557
 ] 

stack commented on HBASE-5705:
--

I added some comments up in RB.  Seems like pb stuff goes via Writables still?  
Would be nice if I did not have to read hadoop-7773 patch to figure out what 
this change is doing.  Any chance of a sentence or two on intent?  Good stuff 
DD.

 Introduce Protocol Buffer RPC engine
 

 Key: HBASE-5705
 URL: https://issues.apache.org/jira/browse/HBASE-5705
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Devaraj Das
Assignee: Devaraj Das
 Attachments: 5705-1.patch


 Introduce Protocol Buffer RPC engine in the RPC core. Protocols that are PB 
 aware can be made to go through this RPC engine. The approach, in my current 
 thinking, would be similar to HADOOP-7773.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6325) [replication] Race in ReplicationSourceManager.init can initiate a failover even if the node is alive

2012-07-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406562#comment-13406562
 ] 

stack commented on HBASE-6325:
--

Below returns true if we added regionservers.  What if we are adding 
regionservers we already had in otherRegionServers (how do I know only a 
newRsList is returned?  Because called on construction and nodeCreated?)

{code}
   /**
+   * Reads the list of region servers from ZK and updates the
+   * local view of it
+   * @return true if the update was successful, else false
+   */
+  private boolean refreshOtherRegionServersList() {
+ListString newRsList = zkHelper.getRegisteredRegionServers();
+if (newRsList == null) {
+  return false;
+} else {
+  synchronized (otherRegionServers) {
+otherRegionServers.clear();
+otherRegionServers.addAll(newRsList);
+  }
+}
+return true;
+  }
{code}


This synchronize is not needed anymore since its done inside in 
refreshOtherRegionServersList?

{code}
 synchronized (otherRegionServers) {
+  refreshOtherRegionServersList();
   LOG.info(Current list of replicators:  + currentReplicators
   +  other RSs:  + otherRegionServers);
 }
{code}

 [replication] Race in ReplicationSourceManager.init can initiate a failover 
 even if the node is alive
 -

 Key: HBASE-6325
 URL: https://issues.apache.org/jira/browse/HBASE-6325
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6, 0.92.1, 0.94.0
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.2

 Attachments: HBASE-6325-0.92.patch


 Yet another bug found during the leap second madness, it's possible to miss 
 the registration of new region servers so that in 
 ReplicationSourceManager.init we start the failover of a live and replicating 
 region server. I don't think there's data loss but the RS that's being failed 
 over will die on:
 {noformat}
 2012-07-01 06:25:15,604 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 sv4r23s48,10304,1341112194623: Writing replication status
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for 
 /hbase/replication/rs/sv4r23s48,10304,1341112194623/4/sv4r23s48%2C10304%2C1341112194623.1341112195369
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:655)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:697)
 at 
 org.apache.hadoop.hbase.replication.ReplicationZookeeper.writeReplicationStatus(ReplicationZookeeper.java:470)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:154)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:607)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:368)
 {noformat}
 It seems to me that just refreshing {{otherRegionServers}} after getting the 
 list of {{currentReplicators}} would be enough to fix this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6322) Unnecessary creation of finalizers in HTablePool

2012-07-04 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6322:
-

Attachment: HBASE-6322-trunk.1.patch

What I applied to trunk.

 Unnecessary creation of finalizers in HTablePool
 

 Key: HBASE-6322
 URL: https://issues.apache.org/jira/browse/HBASE-6322
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.92.0, 0.92.1, 0.94.0
Reporter: Ryan Brush
 Attachments: HBASE-6322-0.92.1.patch, HBASE-6322-trunk.1.patch


 From a mailing list question:
 While generating some load against a library that makes extensive use of 
 HTablePool in 0.92, I noticed that the largest heap consumer was 
 java.lang.ref.Finalizer.  Digging in, I discovered that HTablePool's internal 
 PooledHTable extends HTable, which instantiates a ThreadPoolExecutor and 
 supporting objects every time a pooled HTable is retrieved.  Since 
 ThreadPoolExecutor has a finalizer, it and its dependencies can't get garbage 
 collected until the finalizer runs.  The result is by using HTablePool, we're 
 creating a ton of objects to be finalized that are stuck on the heap longer 
 than they should be, creating our largest source of pressure on the garbage 
 collector.  It looks like this will also be a problem in 0.94 and trunk.
 The easy fix is just to have PooledHTable implement HTableInterface (rather 
 than subclass HTable), but this does break a unit test that explicitly checks 
 that PooledHTable implements HTable -- I can only assume this test is there 
 for some historical passivity reason.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6328) FSHDFSUtils#recoverFileLease tries to rethrow InterruptedException but actually shallows it

2012-07-04 Thread nkeywal (JIRA)
nkeywal created HBASE-6328:
--

 Summary: FSHDFSUtils#recoverFileLease tries to rethrow 
InterruptedException but actually shallows it
 Key: HBASE-6328
 URL: https://issues.apache.org/jira/browse/HBASE-6328
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Priority: Minor


Coding error is:

{noformat}
  try {
Thread.sleep(1000);
  } catch (InterruptedException ex) {
new InterruptedIOException().initCause(ex);
  }
{noformat}

The exception is created but not thrown...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-6322) Unnecessary creation of finalizers in HTablePool

2012-07-04 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-6322.
--

   Resolution: Fixed
Fix Version/s: 0.92.2
 Hadoop Flags: Reviewed

Applied to 0.92.  Thanks for the patch Ryan.  Do we need something like this on 
0.94 and trunk too?

 Unnecessary creation of finalizers in HTablePool
 

 Key: HBASE-6322
 URL: https://issues.apache.org/jira/browse/HBASE-6322
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.92.0, 0.92.1, 0.94.0
Reporter: Ryan Brush
 Fix For: 0.92.2

 Attachments: HBASE-6322-0.92.1.patch, HBASE-6322-trunk.1.patch


 From a mailing list question:
 While generating some load against a library that makes extensive use of 
 HTablePool in 0.92, I noticed that the largest heap consumer was 
 java.lang.ref.Finalizer.  Digging in, I discovered that HTablePool's internal 
 PooledHTable extends HTable, which instantiates a ThreadPoolExecutor and 
 supporting objects every time a pooled HTable is retrieved.  Since 
 ThreadPoolExecutor has a finalizer, it and its dependencies can't get garbage 
 collected until the finalizer runs.  The result is by using HTablePool, we're 
 creating a ton of objects to be finalized that are stuck on the heap longer 
 than they should be, creating our largest source of pressure on the garbage 
 collector.  It looks like this will also be a problem in 0.94 and trunk.
 The easy fix is just to have PooledHTable implement HTableInterface (rather 
 than subclass HTable), but this does break a unit test that explicitly checks 
 that PooledHTable implements HTable -- I can only assume this test is there 
 for some historical passivity reason.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6281) Assignment need not be called for disabling table regions during clean cluster start up.

2012-07-04 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6281:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to 0.92.  Thanks for patch Rajesh.

 Assignment need not be called for disabling table regions during clean 
 cluster start up.
 

 Key: HBASE-6281
 URL: https://issues.apache.org/jira/browse/HBASE-6281
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: rajeshbabu
Assignee: rajeshbabu
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: 6281-trunk-v2.txt, HBASE-6281_92.patch, 
 HBASE-6281_94.patch, HBASE-6281_94_2.patch, HBASE-6281_trunk.patch


 Currently during clean cluster start up if there are tables in DISABLING 
 state, we do bulk assignment through assignAllUserRegions() and after region 
 is OPENED in RS, master checks if the table is in DISBALING/DISABLED state 
 (in Am.regionOnline) and again calls unassign.  This roundtrip can be avoided 
 even before calling assignment.
 This JIRA is to address the above scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.

2012-07-04 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406594#comment-13406594
 ] 

ramkrishna.s.vasudevan commented on HBASE-6299:
---

Am very sorry for not making it clear.
{code}
if (t instanceof RegionAlreadyInTransitionException) {
String errorMsg = Failed assignment in:  + plan.getDestination()
+  due to  + t.getMessage();
LOG.error(errorMsg, t);
return;
  }
{code}
The above piece of code is correct.  The RegionAlreadyInTransition is of type 
RemoteException.  So we need to unwrap it.  
In the current patch 
{code}
if (t instanceof RegionAlreadyInTransitionException) {
+  String errorMsg = Failed assignment in:  + plan.getDestination()
+  +  due to  + t.getMessage();
+  LOG.error(errorMsg, t);
+  return;
+}
{code}
This is done.  It will not work.  We just did a small verification of this.  

 RS starts region open while fails ack to HMaster.sendRegionOpen() causes 
 inconsistency in HMaster's region state and a series of successive problems.
 -

 Key: HBASE-6299
 URL: https://issues.apache.org/jira/browse/HBASE-6299
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Attachments: HBASE-6299-v2.patch, HBASE-6299.patch


 1. HMaster tries to assign a region to an RS.
 2. HMaster creates a RegionState for this region and puts it into 
 regionsInTransition.
 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS 
 receives the open region request and starts to proceed, with success 
 eventually. However, due to network problems, HMaster fails to receive the 
 response for the openRegion() call, and the call times out.
 4. HMaster attemps to assign for a second time, choosing another RS. 
 5. But since the HMaster's OpenedRegionHandler has been triggered by the 
 region open of the previous RS, and the RegionState has already been removed 
 from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK 
 node RS_ZK_REGION_OPENING updated by the second attempt.
 6. The unassigned ZK node stays and a later unassign fails coz 
 RS_ZK_REGION_CLOSING cannot be created.
 {code}
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
 region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.;
  
 plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.,
  src=swbss-hadoop-004,60020,1340890123243, 
 dest=swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:28,882 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,291 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
 event for 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, 
 regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node
 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Deleting existing unassigned node for 
 b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Successfully deleted unassigned node for 
 region b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED
 2012-06-29 

[jira] [Commented] (HBASE-6306) TestFSUtils fails against hadoop 2.0

2012-07-04 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406593#comment-13406593
 ] 

Jonathan Hsieh commented on HBASE-6306:
---

Ram, correct.  The test is new for 0.96, and the check function is different in 
0.94.  (previously used fs.exists, now uses fs.filestatus).

 TestFSUtils fails against hadoop 2.0
 

 Key: HBASE-6306
 URL: https://issues.apache.org/jira/browse/HBASE-6306
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.96.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Fix For: 0.96.0

 Attachments: hbase-6306-trunk.patch


 trunk: mvn clean test -Dhadoop.profile=2.0 -Dtest=TestFSUtils
 {code}
 java.io.FileNotFoundException: File 
 /home/jon/proj/hbase-trunk/hbase-server/target/test-data/02beb8c8-06c1-47ea-829b-6e7ce0570cf8/hbase.version
  does not exist
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:315)
 at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1279)
 at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1319)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:557)
 at 
 org.apache.hadoop.fs.FilterFileSystem.listStatus(FilterFileSystem.java:213)
 at org.apache.hadoop.hbase.util.FSUtils.getVersion(FSUtils.java:270)
 at 
 org.apache.hadoop.hbase.util.TestFSUtils.testVersion(TestFSUtils.java:58)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 ... 
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6327) HLog can be null when create table

2012-07-04 Thread ShiXing (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ShiXing updated HBASE-6327:
---

Description: 
As HBASE-4010 discussed, the HLog can be null.

We have meet createTable failed because the no use hlog.

When createHReagion, the HLog.LogSyncer is run sync(), in under layer it call 
the DFSClient.DFSOutputStream.sync(). 

Then the hlog.closeAndDelete() was called,firstly the HLog.close() will 
interrupt the LogSyncer, and interrupt DFSClient.DFSOutputStream.sync().The 
DFSClient.DFSOutputStream will store the exception and throw it when we called 
DFSClient.close(). 

The HLog.close() call the writer.close()/DFSClient.close() after interrupt the 
LogSyncer. And there is no catch exception for the close().

So the Master throw exception to the client. There is no need to throw this 
exception, further, the hlog is no use.

Our cluster is 0.90, the logs is attached, after closing hlog writer, there 
is no log for the createTable().

The trunk and 0.92, 0.94, we used just one hlog, and if the exception happends, 
the client will got createTable failed, but indeed ,we expect all the regions 
for the table can also be assigned.

I will give the patch for this later.

  was:
As HBASE-4010 discussed, the HLog can be null.

We have meet createTable failed because the no use hlog.

When createHReagion, the HLog.LogSyncer is run sync(), in under layer it call 
the DFSClient.DFSOutputStream.sync(). 

Then the hlog.closeAndDelete() was called,firstly the HLog.close() will 
interrupt the LogSyncer, and interrupt DFSClient.DFSOutputStream.sync().The 
DFSClient.DFSOutputStream will store the exception and throw it when we called 
DFSClient.close(). 

The HLog.close() call the writer.close()/DFSClient.close() after interrupt the 
LogSyncer. And there is no catch exception for the close().

So the Master throw exception to the client. There is no need to throw this 
exception, further, the hlog is no use.

Our cluster is 0.90, the logs is attached, after closing hlog writer, there 
is no log for the createTable().

The trunk and 0.92, 0.94, we used just one hlog, and if the exception happends, 
the client will got createTable failed, but indeed ,all the regions for the 
table can also be assigned.

I will give the patch for this later.


 HLog can be null when create table
 --

 Key: HBASE-6327
 URL: https://issues.apache.org/jira/browse/HBASE-6327
 Project: HBase
  Issue Type: Bug
Reporter: ShiXing
Assignee: ShiXing
 Attachments: createTableFailedMaster.log


 As HBASE-4010 discussed, the HLog can be null.
 We have meet createTable failed because the no use hlog.
 When createHReagion, the HLog.LogSyncer is run sync(), in under layer it call 
 the DFSClient.DFSOutputStream.sync(). 
 Then the hlog.closeAndDelete() was called,firstly the HLog.close() will 
 interrupt the LogSyncer, and interrupt DFSClient.DFSOutputStream.sync().The 
 DFSClient.DFSOutputStream will store the exception and throw it when we 
 called DFSClient.close(). 
 The HLog.close() call the writer.close()/DFSClient.close() after interrupt 
 the LogSyncer. And there is no catch exception for the close().
 So the Master throw exception to the client. There is no need to throw this 
 exception, further, the hlog is no use.
 Our cluster is 0.90, the logs is attached, after closing hlog writer, there 
 is no log for the createTable().
 The trunk and 0.92, 0.94, we used just one hlog, and if the exception 
 happends, the client will got createTable failed, but indeed ,we expect all 
 the regions for the table can also be assigned.
 I will give the patch for this later.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6283) [region_mover.rb] Add option to exclude list of hosts on unload instead of just assuming the source node.

2012-07-04 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406614#comment-13406614
 ] 

Jonathan Hsieh commented on HBASE-6283:
---

bq. He has contrib'd the non-SU stuff: i.e. the bit where can register in zk 
what regionservers are being rolled.

I diffed his region_mover.rb script from trunk's and they are still some 
significant differences between the two related to the zk bits in the ruby 
script side.  In my case, I'm trying to help a customer in a particular 
situation who is still on 0.90 (didn't get included as part of HBASE-4298) so 
the draining zk bit isn't going to be helpful.

For this patch, I think I'll tweak to address you comments, commit to trunk 
(should I do the other versions too?), and then we should encourage aravind to 
contribute/port the jruby bits as well.

Sound good?

 [region_mover.rb] Add option to exclude list of hosts on unload instead of 
 just assuming the source node.
 -

 Key: HBASE-6283
 URL: https://issues.apache.org/jira/browse/HBASE-6283
 Project: HBase
  Issue Type: Improvement
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
  Labels: jruby
 Attachments: hbase-6283.patch


 Currently, the region_mover.rb script excludes a single host, the host 
 offloading data, as a region move target.  This essentially limits the number 
 of machines that can be shutdown at a time to one.  For larger clusters, it 
 is manageable to have several nodes down at a time and desirable to get this 
 process done more quickly.
 The proposed patch adds an exclude file option, that allows multiple hosts to 
 be excluded as targets.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6327) HLog can be null when create table

2012-07-04 Thread ShiXing (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ShiXing updated HBASE-6327:
---

Attachment: HBASE-6327-trunk-V1.patch

The trunk code of the HLog.sync() use group sync, the  interruption as 
described will not affect the createTable().

But I think we can save little time and simplify the createTable() logic.

There is no ut.

 HLog can be null when create table
 --

 Key: HBASE-6327
 URL: https://issues.apache.org/jira/browse/HBASE-6327
 Project: HBase
  Issue Type: Bug
Reporter: ShiXing
Assignee: ShiXing
 Attachments: HBASE-6327-trunk-V1.patch, createTableFailedMaster.log


 As HBASE-4010 discussed, the HLog can be null.
 We have meet createTable failed because the no use hlog.
 When createHReagion, the HLog.LogSyncer is run sync(), in under layer it call 
 the DFSClient.DFSOutputStream.sync(). 
 Then the hlog.closeAndDelete() was called,firstly the HLog.close() will 
 interrupt the LogSyncer, and interrupt DFSClient.DFSOutputStream.sync().The 
 DFSClient.DFSOutputStream will store the exception and throw it when we 
 called DFSClient.close(). 
 The HLog.close() call the writer.close()/DFSClient.close() after interrupt 
 the LogSyncer. And there is no catch exception for the close().
 So the Master throw exception to the client. There is no need to throw this 
 exception, further, the hlog is no use.
 Our cluster is 0.90, the logs is attached, after closing hlog writer, there 
 is no log for the createTable().
 The trunk and 0.92, 0.94, we used just one hlog, and if the exception 
 happends, the client will got createTable failed, but indeed ,we expect all 
 the regions for the table can also be assigned.
 I will give the patch for this later.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6117) Revisit default condition added to Switch cases in Trunk

2012-07-04 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-6117:
--

Attachment: HBASE-6117_1.patch

This is what i committed.  Thanks for the review Stack.

 Revisit default condition added to Switch cases in Trunk
 

 Key: HBASE-6117
 URL: https://issues.apache.org/jira/browse/HBASE-6117
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.96.0

 Attachments: HBASE-6117.patch, HBASE-6117_1.patch


 We found that in some cases the default case in switch block was just 
 throwing illegalArg Exception. There are cases where we may get some other 
 state for which we should not throw IllegalArgException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6117) Revisit default condition added to Switch cases in Trunk

2012-07-04 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-6117:
--

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

 Revisit default condition added to Switch cases in Trunk
 

 Key: HBASE-6117
 URL: https://issues.apache.org/jira/browse/HBASE-6117
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.96.0

 Attachments: HBASE-6117.patch, HBASE-6117_1.patch


 We found that in some cases the default case in switch block was just 
 throwing illegalArg Exception. There are cases where we may get some other 
 state for which we should not throw IllegalArgException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6281) Assignment need not be called for disabling table regions during clean cluster start up.

2012-07-04 Thread rajeshbabu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rajeshbabu updated HBASE-6281:
--

Attachment: 6281.addendum

Sorry for the mistake. Added addendum addressing Ted's comment.

 Assignment need not be called for disabling table regions during clean 
 cluster start up.
 

 Key: HBASE-6281
 URL: https://issues.apache.org/jira/browse/HBASE-6281
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: rajeshbabu
Assignee: rajeshbabu
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: 6281-trunk-v2.txt, 6281.addendum, HBASE-6281_92.patch, 
 HBASE-6281_94.patch, HBASE-6281_94_2.patch, HBASE-6281_trunk.patch


 Currently during clean cluster start up if there are tables in DISABLING 
 state, we do bulk assignment through assignAllUserRegions() and after region 
 is OPENED in RS, master checks if the table is in DISBALING/DISABLED state 
 (in Am.regionOnline) and again calls unassign.  This roundtrip can be avoided 
 even before calling assignment.
 This JIRA is to address the above scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6328) FSHDFSUtils#recoverFileLease tries to rethrow InterruptedException but actually shallows it

2012-07-04 Thread nkeywal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-6328:
---

Attachment: 6328.v1.patch

 FSHDFSUtils#recoverFileLease tries to rethrow InterruptedException but 
 actually shallows it
 ---

 Key: HBASE-6328
 URL: https://issues.apache.org/jira/browse/HBASE-6328
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Priority: Minor
 Attachments: 6328.v1.patch


 Coding error is:
 {noformat}
   try {
 Thread.sleep(1000);
   } catch (InterruptedException ex) {
 new InterruptedIOException().initCause(ex);
   }
 {noformat}
 The exception is created but not thrown...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6328) FSHDFSUtils#recoverFileLease tries to rethrow InterruptedException but actually shallows it

2012-07-04 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406660#comment-13406660
 ] 

nkeywal commented on HBASE-6328:


Trivial patch. Unit tests ok. Will commit in ~3 days if I don't get no go.

 FSHDFSUtils#recoverFileLease tries to rethrow InterruptedException but 
 actually shallows it
 ---

 Key: HBASE-6328
 URL: https://issues.apache.org/jira/browse/HBASE-6328
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Priority: Minor
 Attachments: 6328.v1.patch


 Coding error is:
 {noformat}
   try {
 Thread.sleep(1000);
   } catch (InterruptedException ex) {
 new InterruptedIOException().initCause(ex);
   }
 {noformat}
 The exception is created but not thrown...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-6328) FSHDFSUtils#recoverFileLease tries to rethrow InterruptedException but actually shallows it

2012-07-04 Thread nkeywal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal reassigned HBASE-6328:
--

Assignee: nkeywal

 FSHDFSUtils#recoverFileLease tries to rethrow InterruptedException but 
 actually shallows it
 ---

 Key: HBASE-6328
 URL: https://issues.apache.org/jira/browse/HBASE-6328
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 6328.v1.patch


 Coding error is:
 {noformat}
   try {
 Thread.sleep(1000);
   } catch (InterruptedException ex) {
 new InterruptedIOException().initCause(ex);
   }
 {noformat}
 The exception is created but not thrown...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4791) Allow Secure Zookeeper JAAS configuration to be programmatically set (rather than only by reading JAAS configuration file)

2012-07-04 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-4791:
---

Attachment: (was: HBASE-4791-v1.patch)

 Allow Secure Zookeeper JAAS configuration to be programmatically set (rather 
 than only by reading JAAS configuration file)
 --

 Key: HBASE-4791
 URL: https://issues.apache.org/jira/browse/HBASE-4791
 Project: HBase
  Issue Type: Improvement
  Components: security, zookeeper
Reporter: Eugene Koontz
Assignee: Eugene Koontz
  Labels: security, zookeeper
 Attachments: DemoConfig.java


 In the currently proposed fix for HBASE-2418, there must be a JAAS file 
 specified in System.setProperty(java.security.auth.login.config). 
 However, it might be preferable to construct a JAAS configuration 
 programmatically, as is done with secure Hadoop (see 
 https://github.com/apache/hadoop-common/blob/a48eceb62c9b5c1a5d71ee2945d9eea2ed62527b/src/java/org/apache/hadoop/security/UserGroupInformation.java#L175).
 This would have the benefit of avoiding a usage of a system property setting, 
 and allow instead an HBase-local configuration setting. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4791) Allow Secure Zookeeper JAAS configuration to be programmatically set (rather than only by reading JAAS configuration file)

2012-07-04 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-4791:
---

Attachment: (was: HBASE-4791-v0.patch)

 Allow Secure Zookeeper JAAS configuration to be programmatically set (rather 
 than only by reading JAAS configuration file)
 --

 Key: HBASE-4791
 URL: https://issues.apache.org/jira/browse/HBASE-4791
 Project: HBase
  Issue Type: Improvement
  Components: security, zookeeper
Reporter: Eugene Koontz
Assignee: Eugene Koontz
  Labels: security, zookeeper
 Attachments: DemoConfig.java


 In the currently proposed fix for HBASE-2418, there must be a JAAS file 
 specified in System.setProperty(java.security.auth.login.config). 
 However, it might be preferable to construct a JAAS configuration 
 programmatically, as is done with secure Hadoop (see 
 https://github.com/apache/hadoop-common/blob/a48eceb62c9b5c1a5d71ee2945d9eea2ed62527b/src/java/org/apache/hadoop/security/UserGroupInformation.java#L175).
 This would have the benefit of avoiding a usage of a system property setting, 
 and allow instead an HBase-local configuration setting. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6322) Unnecessary creation of finalizers in HTablePool

2012-07-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406665#comment-13406665
 ] 

Hudson commented on HBASE-6322:
---

Integrated in HBase-0.92 #466 (See 
[https://builds.apache.org/job/HBase-0.92/466/])
HBASE-6322 Unnecessary creation of finalizers in HTablePool (Revision 
1357291)
HBASE-6322 Unnecessary creation of finalizers in HTablePool (Revision 1357285)

 Result = FAILURE
stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt

stack : 
Files : 
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/HTablePool.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/client/TestHTablePool.java


 Unnecessary creation of finalizers in HTablePool
 

 Key: HBASE-6322
 URL: https://issues.apache.org/jira/browse/HBASE-6322
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.92.0, 0.92.1, 0.94.0
Reporter: Ryan Brush
 Fix For: 0.92.2

 Attachments: HBASE-6322-0.92.1.patch, HBASE-6322-trunk.1.patch


 From a mailing list question:
 While generating some load against a library that makes extensive use of 
 HTablePool in 0.92, I noticed that the largest heap consumer was 
 java.lang.ref.Finalizer.  Digging in, I discovered that HTablePool's internal 
 PooledHTable extends HTable, which instantiates a ThreadPoolExecutor and 
 supporting objects every time a pooled HTable is retrieved.  Since 
 ThreadPoolExecutor has a finalizer, it and its dependencies can't get garbage 
 collected until the finalizer runs.  The result is by using HTablePool, we're 
 creating a ton of objects to be finalized that are stuck on the heap longer 
 than they should be, creating our largest source of pressure on the garbage 
 collector.  It looks like this will also be a problem in 0.94 and trunk.
 The easy fix is just to have PooledHTable implement HTableInterface (rather 
 than subclass HTable), but this does break a unit test that explicitly checks 
 that PooledHTable implements HTable -- I can only assume this test is there 
 for some historical passivity reason.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6281) Assignment need not be called for disabling table regions during clean cluster start up.

2012-07-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406664#comment-13406664
 ] 

Hudson commented on HBASE-6281:
---

Integrated in HBase-0.92 #466 (See 
[https://builds.apache.org/job/HBase-0.92/466/])
HBASE-6281 Assignment need not be called for disabling table regions during 
clean cluster start up (Revision 1357302)

 Result = FAILURE
stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java


 Assignment need not be called for disabling table regions during clean 
 cluster start up.
 

 Key: HBASE-6281
 URL: https://issues.apache.org/jira/browse/HBASE-6281
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: rajeshbabu
Assignee: rajeshbabu
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: 6281-trunk-v2.txt, 6281.addendum, HBASE-6281_92.patch, 
 HBASE-6281_94.patch, HBASE-6281_94_2.patch, HBASE-6281_trunk.patch


 Currently during clean cluster start up if there are tables in DISABLING 
 state, we do bulk assignment through assignAllUserRegions() and after region 
 is OPENED in RS, master checks if the table is in DISBALING/DISABLED state 
 (in Am.regionOnline) and again calls unassign.  This roundtrip can be avoided 
 even before calling assignment.
 This JIRA is to address the above scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4791) Allow Secure Zookeeper JAAS configuration to be programmatically set (rather than only by reading JAAS configuration file)

2012-07-04 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-4791:
---

Attachment: HBASE-4791-v1.patch

Attached a patch that depends on ZOOKEEPER-1497, just to be able to start 
secure zookeeper from hbase (non distributed mode).

using instead hbase-site.xml configuration
 * hbase.zookeeper.client.keytab.file
 * hbase.zookeeper.client.kerberos.principal
Client properties are used by HBase Master and Region Servers.
 * hbase.zookeeper.server.keytab.file
 * hbase.zookeeper.server.kerberos.principal
Server properties are used by Quorum Peer when zookeepe is not external.


 Allow Secure Zookeeper JAAS configuration to be programmatically set (rather 
 than only by reading JAAS configuration file)
 --

 Key: HBASE-4791
 URL: https://issues.apache.org/jira/browse/HBASE-4791
 Project: HBase
  Issue Type: Improvement
  Components: security, zookeeper
Reporter: Eugene Koontz
Assignee: Eugene Koontz
  Labels: security, zookeeper
 Attachments: DemoConfig.java, HBASE-4791-v1.patch


 In the currently proposed fix for HBASE-2418, there must be a JAAS file 
 specified in System.setProperty(java.security.auth.login.config). 
 However, it might be preferable to construct a JAAS configuration 
 programmatically, as is done with secure Hadoop (see 
 https://github.com/apache/hadoop-common/blob/a48eceb62c9b5c1a5d71ee2945d9eea2ed62527b/src/java/org/apache/hadoop/security/UserGroupInformation.java#L175).
 This would have the benefit of avoiding a usage of a system property setting, 
 and allow instead an HBase-local configuration setting. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5955) Guava 11 drops MapEvictionListener and Hadoop 2.0.0-alpha requires it

2012-07-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406674#comment-13406674
 ] 

Hudson commented on HBASE-5955:
---

Integrated in HBase-0.94 #290 (See 
[https://builds.apache.org/job/HBase-0.94/290/])
HBASE-5955 Guava 11 drops MapEvictionListener and Hadoop 2.0.0-alpha 
requires it (Revision 1356379)

 Result = SUCCESS
larsh : 
Files : 
* /hbase/branches/0.94/pom.xml
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SingleSizeCache.java


 Guava 11 drops MapEvictionListener and Hadoop 2.0.0-alpha requires it
 -

 Key: HBASE-5955
 URL: https://issues.apache.org/jira/browse/HBASE-5955
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Andrew Purtell
Assignee: Lars Hofhansl
 Fix For: 0.94.1

 Attachments: 5955.txt


 Hadoop 2.0.0-alpha depends on Guava 11.0.2. Updating HBase dependencies to 
 match produces the following compilation errors:
 {code}
 [ERROR] SingleSizeCache.java:[41,32] cannot find symbol
 [ERROR] symbol  : class MapEvictionListener
 [ERROR] location: package com.google.common.collect
 [ERROR] 
 [ERROR] SingleSizeCache.java:[94,4] cannot find symbol
 [ERROR] symbol  : class MapEvictionListener
 [ERROR] location: class org.apache.hadoop.hbase.io.hfile.slab.SingleSizeCache
 [ERROR] 
 [ERROR] SingleSizeCache.java:[94,69] cannot find symbol
 [ERROR] symbol  : class MapEvictionListener
 [ERROR] location: class org.apache.hadoop.hbase.io.hfile.slab.SingleSizeCache
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6281) Assignment need not be called for disabling table regions during clean cluster start up.

2012-07-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406675#comment-13406675
 ] 

Hudson commented on HBASE-6281:
---

Integrated in HBase-0.94 #290 (See 
[https://builds.apache.org/job/HBase-0.94/290/])
HBASE-6281 Assignment need not be called for disabling table regions during 
clean cluster start up. (Rajesh)

Submitted by:Rajesh 
Reviewed by:Ram, Stack, Ted (Revision 1356396)

 Result = SUCCESS
ramkrishna : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java


 Assignment need not be called for disabling table regions during clean 
 cluster start up.
 

 Key: HBASE-6281
 URL: https://issues.apache.org/jira/browse/HBASE-6281
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: rajeshbabu
Assignee: rajeshbabu
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: 6281-trunk-v2.txt, 6281.addendum, HBASE-6281_92.patch, 
 HBASE-6281_94.patch, HBASE-6281_94_2.patch, HBASE-6281_trunk.patch


 Currently during clean cluster start up if there are tables in DISABLING 
 state, we do bulk assignment through assignAllUserRegions() and after region 
 is OPENED in RS, master checks if the table is in DISBALING/DISABLED state 
 (in Am.regionOnline) and again calls unassign.  This roundtrip can be avoided 
 even before calling assignment.
 This JIRA is to address the above scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6303) HCD.setCompressionType should use Enum support for storing compression types as strings

2012-07-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406676#comment-13406676
 ] 

Hudson commented on HBASE-6303:
---

Integrated in HBase-0.94 #290 (See 
[https://builds.apache.org/job/HBase-0.94/290/])
Amend HBASE-6303. Likewise for HCD.setCompactionCompressionType (Revision 
1356569)
HBASE-6303. HCD.setCompressionType should use Enum support for storing 
compression types as strings (Revision 1356518)

 Result = SUCCESS
apurtell : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java

apurtell : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java


 HCD.setCompressionType should use Enum support for storing compression types 
 as strings
 ---

 Key: HBASE-6303
 URL: https://issues.apache.org/jira/browse/HBASE-6303
 Project: HBase
  Issue Type: Bug
  Components: io
Affects Versions: 0.94.0, 0.96.0
Reporter: Gopinathan A
Assignee: Andrew Purtell
Priority: Minor
 Fix For: 0.96.0, 0.94.1

 Attachments: HBASE-6303-0.94.patch, HBASE-6303-addendum-0.94.patch, 
 HBASE-6303-addendum-trunk.patch, HBASE-6303-trunk.patch


 Let's not require an update to HCD every time the HFile compression enum is 
 changed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5876) TestImportExport has been failing against hadoop 0.23 profile

2012-07-04 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406709#comment-13406709
 ] 

Jonathan Hsieh commented on HBASE-5876:
---

Its been a few days, I'm going to commit later today unless I hear anything 
suggesting not to.

 TestImportExport has been failing against hadoop 0.23 profile
 -

 Key: HBASE-5876
 URL: https://issues.apache.org/jira/browse/HBASE-5876
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.96.0
Reporter: Zhihong Ted Yu
Assignee: Jonathan Hsieh
 Fix For: 0.96.0, 0.94.1

 Attachments: hbase-5876-94-v3.patch, hbase-5876-94.patch, 
 hbase-5876-trunk-v3.patch, hbase-5876-v2.patch, hbase-5876.patch


 TestImportExport has been failing against hadoop 0.23 profile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6281) Assignment need not be called for disabling table regions during clean cluster start up.

2012-07-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406711#comment-13406711
 ] 

stack commented on HBASE-6281:
--

Added the addendum.  Thanks Rajesh.

 Assignment need not be called for disabling table regions during clean 
 cluster start up.
 

 Key: HBASE-6281
 URL: https://issues.apache.org/jira/browse/HBASE-6281
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: rajeshbabu
Assignee: rajeshbabu
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: 6281-trunk-v2.txt, 6281.addendum, HBASE-6281_92.patch, 
 HBASE-6281_94.patch, HBASE-6281_94_2.patch, HBASE-6281_trunk.patch


 Currently during clean cluster start up if there are tables in DISABLING 
 state, we do bulk assignment through assignAllUserRegions() and after region 
 is OPENED in RS, master checks if the table is in DISBALING/DISABLED state 
 (in Am.regionOnline) and again calls unassign.  This roundtrip can be avoided 
 even before calling assignment.
 This JIRA is to address the above scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6283) [region_mover.rb] Add option to exclude list of hosts on unload instead of just assuming the source node.

2012-07-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406712#comment-13406712
 ] 

stack commented on HBASE-6283:
--

Other versions, yes.

Aravind doesn't work on this stuff any more.  If you open new issue, one of the 
two of us can take in the diff.  Good on you J.

 [region_mover.rb] Add option to exclude list of hosts on unload instead of 
 just assuming the source node.
 -

 Key: HBASE-6283
 URL: https://issues.apache.org/jira/browse/HBASE-6283
 Project: HBase
  Issue Type: Improvement
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
  Labels: jruby
 Attachments: hbase-6283.patch


 Currently, the region_mover.rb script excludes a single host, the host 
 offloading data, as a region move target.  This essentially limits the number 
 of machines that can be shutdown at a time to one.  For larger clusters, it 
 is manageable to have several nodes down at a time and desirable to get this 
 process done more quickly.
 The proposed patch adds an exclude file option, that allows multiple hosts to 
 be excluded as targets.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6228) Fixup daughters twice cause daughter region assigned twice

2012-07-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406715#comment-13406715
 ] 

stack commented on HBASE-6228:
--

I've been looking at hbase-6060 as a background task (sorry, its taking me a 
while Ram and Rajesh to get back to you lot).  When I put together multiple 
threads (SSH, HMaster joining cluster, single vs bulk assigning/unassign, 
timeout monitor, zk callbacks etc.) and then try to trace state changes not 
only across multiple state keepers (RegionState, RegionInTransition, 
AM#this.regions and AM#this.servers) in the master process but then also too 
x-process master - regionserver - via zk, I want to throw out what we have 
and start over (smile).  That ain't going to happen though.

Meantime I think we need to identify patterns or practices and broadcast them 
so all can sign on.  For example, I appreciate stuff like Jimmy's small win 
simplifying AM breaking out RegionStates into a standalone class apart from AM. 
 This at least collects a bunch of in-memory state in the one place.

We also need to have more tests I'd say so we can have some confidence stuff 
still works when we shift things around.

 Fixup daughters twice  cause daughter region assigned twice
 ---

 Key: HBASE-6228
 URL: https://issues.apache.org/jira/browse/HBASE-6228
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0

 Attachments: HBASE-6228.patch, HBASE-6228v2.patch, 
 HBASE-6228v2.patch, HBASE-6228v3.patch, HBASE-6228v4.patch


 First, how fixup daughters twice happen?
 1.we will fixupDaughters at the last of HMaster#finishInitialization
 2.ServerShutdownHandler will fixupDaughters when reassigning region through 
 ServerShutdownHandler#processDeadRegion
 When fixupDaughters, we will added daughters to .META., but it coudn't 
 prevent the above case, because FindDaughterVisitor.
 The detail is as the following:
 Suppose region A is a splitted parent region, and its daughter region B is 
 missing
 1.First, ServerShutdownHander thread fixup daughter, so add daughter region B 
 to .META. with serverName=null, and assign the daughter.
 2.Then, Master's initialization thread will also find the daughter region B 
 is missing and assign it. It is because FindDaughterVisitor consider daughter 
 is missing if its serverName=null

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Reopened] (HBASE-6322) Unnecessary creation of finalizers in HTablePool

2012-07-04 Thread Zhihong Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu reopened HBASE-6322:
---


From 
https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/466/testReport/org.apache.hadoop.hbase.rest/TestTableResource/testTableInfoText/:
{code}
java.lang.AssertionError: expected:500 but was:200
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at org.junit.Assert.assertEquals(Assert.java:456)
at 
org.apache.hadoop.hbase.rest.TestTableResource.testTableInfoText(TestTableResource.java:215)
{code}
The failure is reproducible locally.
In the same test output you can see:
{code}
2012-07-04 18:53:29,338 ERROR [2535725@qtp-29012646-0] log.Slf4jLog(87): 
/TestTableResource/regions
java.lang.ClassCastException: 
org.apache.hadoop.hbase.client.HTablePool$PooledHTable cannot be cast to 
org.apache.hadoop.hbase.client.HTable
{code}

 Unnecessary creation of finalizers in HTablePool
 

 Key: HBASE-6322
 URL: https://issues.apache.org/jira/browse/HBASE-6322
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.92.0, 0.92.1, 0.94.0
Reporter: Ryan Brush
 Fix For: 0.92.2

 Attachments: HBASE-6322-0.92.1.patch, HBASE-6322-trunk.1.patch


 From a mailing list question:
 While generating some load against a library that makes extensive use of 
 HTablePool in 0.92, I noticed that the largest heap consumer was 
 java.lang.ref.Finalizer.  Digging in, I discovered that HTablePool's internal 
 PooledHTable extends HTable, which instantiates a ThreadPoolExecutor and 
 supporting objects every time a pooled HTable is retrieved.  Since 
 ThreadPoolExecutor has a finalizer, it and its dependencies can't get garbage 
 collected until the finalizer runs.  The result is by using HTablePool, we're 
 creating a ton of objects to be finalized that are stuck on the heap longer 
 than they should be, creating our largest source of pressure on the garbage 
 collector.  It looks like this will also be a problem in 0.94 and trunk.
 The easy fix is just to have PooledHTable implement HTableInterface (rather 
 than subclass HTable), but this does break a unit test that explicitly checks 
 that PooledHTable implements HTable -- I can only assume this test is there 
 for some historical passivity reason.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6328) FSHDFSUtils#recoverFileLease tries to rethrow InterruptedException but actually shallows it

2012-07-04 Thread Zhihong Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6328:
--

Hadoop Flags: Reviewed
  Status: Patch Available  (was: Open)

Looks good to me.

 FSHDFSUtils#recoverFileLease tries to rethrow InterruptedException but 
 actually shallows it
 ---

 Key: HBASE-6328
 URL: https://issues.apache.org/jira/browse/HBASE-6328
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 6328.v1.patch


 Coding error is:
 {noformat}
   try {
 Thread.sleep(1000);
   } catch (InterruptedException ex) {
 new InterruptedIOException().initCause(ex);
   }
 {noformat}
 The exception is created but not thrown...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.

2012-07-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406735#comment-13406735
 ] 

Hadoop QA commented on HBASE-6299:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12535045/HBASE-6299-v2.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 5 javac compiler warnings (more than 
the trunk's current 4 warnings).

-1 findbugs.  The patch appears to introduce 7 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2318//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2318//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2318//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2318//console

This message is automatically generated.

 RS starts region open while fails ack to HMaster.sendRegionOpen() causes 
 inconsistency in HMaster's region state and a series of successive problems.
 -

 Key: HBASE-6299
 URL: https://issues.apache.org/jira/browse/HBASE-6299
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Attachments: HBASE-6299-v2.patch, HBASE-6299.patch


 1. HMaster tries to assign a region to an RS.
 2. HMaster creates a RegionState for this region and puts it into 
 regionsInTransition.
 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS 
 receives the open region request and starts to proceed, with success 
 eventually. However, due to network problems, HMaster fails to receive the 
 response for the openRegion() call, and the call times out.
 4. HMaster attemps to assign for a second time, choosing another RS. 
 5. But since the HMaster's OpenedRegionHandler has been triggered by the 
 region open of the previous RS, and the RegionState has already been removed 
 from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK 
 node RS_ZK_REGION_OPENING updated by the second attempt.
 6. The unassigned ZK node stays and a later unassign fails coz 
 RS_ZK_REGION_CLOSING cannot be created.
 {code}
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
 region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.;
  
 plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.,
  src=swbss-hadoop-004,60020,1340890123243, 
 dest=swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:28,882 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,291 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 

[jira] [Updated] (HBASE-6328) FSHDFSUtils#recoverFileLease tries to rethrow InterruptedException but actually shallows it

2012-07-04 Thread Zhihong Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6328:
--

Fix Version/s: 0.94.1
   0.96.0
   0.92.2

I found the same code in 0.92 and 0.94

 FSHDFSUtils#recoverFileLease tries to rethrow InterruptedException but 
 actually shallows it
 ---

 Key: HBASE-6328
 URL: https://issues.apache.org/jira/browse/HBASE-6328
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: 6328.v1.patch


 Coding error is:
 {noformat}
   try {
 Thread.sleep(1000);
   } catch (InterruptedException ex) {
 new InterruptedIOException().initCause(ex);
   }
 {noformat}
 The exception is created but not thrown...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6296) Refactor EventType to track its own ExecutorService type

2012-07-04 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6296:
-

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to trunk.  I ran tests locally w/ this patch applied and no failures 
or errors.  Thanks for the clean up Jesse.

 Refactor EventType to track its own ExecutorService type
 

 Key: HBASE-6296
 URL: https://issues.apache.org/jira/browse/HBASE-6296
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: Jesse Yates
Assignee: Jesse Yates
Priority: Minor
 Fix For: 0.96.0

 Attachments: 6296v1.txt, 6296v1.txt, java_hbase-6296-v0.patch, 
 java_hbase-6296-v0.patch


 Currently there is a massive switch statement in 
 org.apache.hadoop.hbase.executor.ExecutorService for the ExecutorType for 
 each org.apache.hadoop.hbase.executor.EventHandler.EventType. This means is 
 you add an new event type, you will also have to change the executorservice 
 file, if for nothing but to add the executor type. Instead the EventType 
 should just be able to keep track of which executor it should use.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6228) Fixup daughters twice cause daughter region assigned twice

2012-07-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406753#comment-13406753
 ] 

stack commented on HBASE-6228:
--

For example, on study:

+ RegionState is unreliable figuring state of region in master's memory; you 
cannot rely on it to answer the bigger question of who a region belongs to: 
master or regionserver.
+ In assign, we are careful with retries.  We actually need to be more careful 
especially around things like socket timeout (see Maryann's recent issue).  
Bulk assign does none of these checks.  Bulk assign was introduced originally 
to do assigns on cluster start; if anything failed, contract was we'd just 
crash out and restart cluster over.  That was how it was originally.  Now bulk 
assign is used all over -- e.g. in SSH -- in spite of its being loosey-goosey 
around failures.

 Fixup daughters twice  cause daughter region assigned twice
 ---

 Key: HBASE-6228
 URL: https://issues.apache.org/jira/browse/HBASE-6228
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0

 Attachments: HBASE-6228.patch, HBASE-6228v2.patch, 
 HBASE-6228v2.patch, HBASE-6228v3.patch, HBASE-6228v4.patch


 First, how fixup daughters twice happen?
 1.we will fixupDaughters at the last of HMaster#finishInitialization
 2.ServerShutdownHandler will fixupDaughters when reassigning region through 
 ServerShutdownHandler#processDeadRegion
 When fixupDaughters, we will added daughters to .META., but it coudn't 
 prevent the above case, because FindDaughterVisitor.
 The detail is as the following:
 Suppose region A is a splitted parent region, and its daughter region B is 
 missing
 1.First, ServerShutdownHander thread fixup daughter, so add daughter region B 
 to .META. with serverName=null, and assign the daughter.
 2.Then, Master's initialization thread will also find the daughter region B 
 is missing and assign it. It is because FindDaughterVisitor consider daughter 
 is missing if its serverName=null

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6328) FSHDFSUtils#recoverFileLease tries to rethrow InterruptedException but actually shallows it

2012-07-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406755#comment-13406755
 ] 

stack commented on HBASE-6328:
--

+1 on patch.  Apply it to all branches I'd say Nicolas.

 FSHDFSUtils#recoverFileLease tries to rethrow InterruptedException but 
 actually shallows it
 ---

 Key: HBASE-6328
 URL: https://issues.apache.org/jira/browse/HBASE-6328
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: 6328.v1.patch


 Coding error is:
 {noformat}
   try {
 Thread.sleep(1000);
   } catch (InterruptedException ex) {
 new InterruptedIOException().initCause(ex);
   }
 {noformat}
 The exception is created but not thrown...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6322) Unnecessary creation of finalizers in HTablePool

2012-07-04 Thread Ryan Brush (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406757#comment-13406757
 ] 

Ryan Brush commented on HBASE-6322:
---

My apologies...I had only run the tests around HTablePool since I had run into 
some apparently unrelated test failures in a full build.  (And I didn't expect 
it to be included in the build so quickly. ;)  It looks like we'll need to do 
some refactoring in REST server's RegionResource for this to apply cleanly, 
specifically the call to HTable.getRegionsInfo which requires the downcast (and 
is deprecated, anyway).

I'm not deeply familiar with this part of the code base and probably won't be 
able to dig into it today, but can get back to it in the next couple days, as 
well as making sure there aren't any further regressions caused by this change.

 Unnecessary creation of finalizers in HTablePool
 

 Key: HBASE-6322
 URL: https://issues.apache.org/jira/browse/HBASE-6322
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.92.0, 0.92.1, 0.94.0
Reporter: Ryan Brush
 Fix For: 0.92.2

 Attachments: HBASE-6322-0.92.1.patch, HBASE-6322-trunk.1.patch


 From a mailing list question:
 While generating some load against a library that makes extensive use of 
 HTablePool in 0.92, I noticed that the largest heap consumer was 
 java.lang.ref.Finalizer.  Digging in, I discovered that HTablePool's internal 
 PooledHTable extends HTable, which instantiates a ThreadPoolExecutor and 
 supporting objects every time a pooled HTable is retrieved.  Since 
 ThreadPoolExecutor has a finalizer, it and its dependencies can't get garbage 
 collected until the finalizer runs.  The result is by using HTablePool, we're 
 creating a ton of objects to be finalized that are stuck on the heap longer 
 than they should be, creating our largest source of pressure on the garbage 
 collector.  It looks like this will also be a problem in 0.94 and trunk.
 The easy fix is just to have PooledHTable implement HTableInterface (rather 
 than subclass HTable), but this does break a unit test that explicitly checks 
 that PooledHTable implements HTable -- I can only assume this test is there 
 for some historical passivity reason.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6322) Unnecessary creation of finalizers in HTablePool

2012-07-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406760#comment-13406760
 ] 

stack commented on HBASE-6322:
--

Thanks Ted.  I reverted the patch for now.

Thank your time Ryan.  Patch looks worth it if you can figure the test fail.  
Thanks.

 Unnecessary creation of finalizers in HTablePool
 

 Key: HBASE-6322
 URL: https://issues.apache.org/jira/browse/HBASE-6322
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.92.0, 0.92.1, 0.94.0
Reporter: Ryan Brush
 Fix For: 0.92.2

 Attachments: HBASE-6322-0.92.1.patch, HBASE-6322-trunk.1.patch


 From a mailing list question:
 While generating some load against a library that makes extensive use of 
 HTablePool in 0.92, I noticed that the largest heap consumer was 
 java.lang.ref.Finalizer.  Digging in, I discovered that HTablePool's internal 
 PooledHTable extends HTable, which instantiates a ThreadPoolExecutor and 
 supporting objects every time a pooled HTable is retrieved.  Since 
 ThreadPoolExecutor has a finalizer, it and its dependencies can't get garbage 
 collected until the finalizer runs.  The result is by using HTablePool, we're 
 creating a ton of objects to be finalized that are stuck on the heap longer 
 than they should be, creating our largest source of pressure on the garbage 
 collector.  It looks like this will also be a problem in 0.94 and trunk.
 The easy fix is just to have PooledHTable implement HTableInterface (rather 
 than subclass HTable), but this does break a unit test that explicitly checks 
 that PooledHTable implements HTable -- I can only assume this test is there 
 for some historical passivity reason.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5549) Master can fail if ZooKeeper session expires

2012-07-04 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406763#comment-13406763
 ] 

Himanshu Vashishtha commented on HBASE-5549:


This seems to fix the HBaseTestingUtility#expiresession method as it introduced 
a new logic of creating a monitor and then expiring the session. But it seems 
this fix needs some more work? For example, TestReplicationPeer occasionally 
fails even with this change on the trunk, citing improper session termination.
{code}
testResetZooKeeperSession(org.apache.hadoop.hbase.replication.TestReplicationPeer):
 ReplicationPeer ZooKeeper session was not properly expired.
{code}

On another note, I wonder whether this patch can be backported to 0.92/0.94?

 Master can fail if ZooKeeper session expires
 

 Key: HBASE-5549
 URL: https://issues.apache.org/jira/browse/HBASE-5549
 Project: HBase
  Issue Type: Bug
  Components: master, zookeeper
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5549.v10.patch, 5549.v11.patch, 5549.v6.patch, 
 5549.v7.patch, 5549.v8.patch, 5549.v9.patch, nochange.patch


 There is a retry mechanism in RecoverableZooKeeper, but when the session 
 expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism 
 does not work in this case. This is why a sleep is needed in 
 TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher 
 to be recreated before using the connection.
 This can happen in real life, it can happen when:
 - master  zookeeper starts
 - zookeeper connection is cut
 - master enters the retry loop
 - in the meantime the session expires
 - the network comes back, the session is recreated
 - the retries continues, but on the wrong object, hence fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6293) HMaster does not go down while splitting logs even if explicit shutdown is called.

2012-07-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406764#comment-13406764
 ] 

Hadoop QA commented on HBASE-6293:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12534877/6293.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 5 javac compiler warnings (more than 
the trunk's current 4 warnings).

-1 findbugs.  The patch appears to introduce 7 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.security.access.TestZKPermissionsWatcher

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2320//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2320//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2320//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2320//console

This message is automatically generated.

 HMaster does not go down while splitting logs even if explicit shutdown is 
 called.
 --

 Key: HBASE-6293
 URL: https://issues.apache.org/jira/browse/HBASE-6293
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: rajeshbabu
Assignee: rajeshbabu
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: 6293.txt


 When master starts up and tries to do splitlog, in case of any error we try 
 to do that infinitely in a loop until it succeeds.
 But now if we get a shutdown call, inside SplitLogManager
 {code}
   if (stopper.isStopped()) {
 LOG.warn(Stopped while waiting for log splits to be completed);
 return;
   }
 {code}
 Here we know that the master has stopped.  As the task may not be completed 
 now
 {code}
  if (batch.done != batch.installed) {
   batch.isDead = true;
   tot_mgr_log_split_batch_err.incrementAndGet();
   LOG.warn(error while splitting logs in  + logDirs +
installed =  + batch.installed +  but only  + batch.done +  
 done);
   throw new IOException(error or interrupt while splitting logs in 
   + logDirs +  Task =  + batch);
 }
 {code} 
 we throw an exception.  In MasterFileSystem.splitLogAfterStartup() we don't 
 check if the master is stopped and we try continously. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6288) In hbase-daemons.sh, description of the default backup-master file path is wrong

2012-07-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406767#comment-13406767
 ] 

stack commented on HBASE-6288:
--

Sounds right Benjamin.  Can you make a patch w/ your fix?

 In hbase-daemons.sh, description of the default backup-master file path is 
 wrong
 

 Key: HBASE-6288
 URL: https://issues.apache.org/jira/browse/HBASE-6288
 Project: HBase
  Issue Type: Task
  Components: master, scripts, shell
Affects Versions: 0.92.0, 0.92.1, 0.94.0
Reporter: Benjamin Kim

 In hbase-daemons.sh, description of the default backup-master file path is 
 wrong
 {code}
 #   HBASE_BACKUP_MASTERS File naming remote hosts.
 # Default is ${HADOOP_CONF_DIR}/backup-masters
 {code}
 it says the default backup-masters file path is at a hadoop-conf-dir, but 
 shouldn't this be HBASE_CONF_DIR?
 also adding following lines to conf/hbase-env.sh would be helpful
 {code}
 # File naming hosts on which backup HMaster will run.  
 $HBASE_HOME/conf/backup-masters by default.
 export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6305) TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds.

2012-07-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406771#comment-13406771
 ] 

Hadoop QA commented on HBASE-6305:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12534998/hbase-6305-94.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2321//console

This message is automatically generated.

 TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds.
 

 Key: HBASE-6305
 URL: https://issues.apache.org/jira/browse/HBASE-6305
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.92.2, 0.94.1
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Fix For: 0.92.2, 0.94.1

 Attachments: hbase-6305-94.patch


 trunk: mvn clean test -Dhadoop.profile=2.0 -Dtest=TestLocalHBaseCluster
 0.94: mvn clean test -Dhadoop.profile=23 -Dtest=TestLocalHBaseCluster
 {code}
 testLocalHBaseCluster(org.apache.hadoop.hbase.TestLocalHBaseCluster)  Time 
 elapsed: 0.022 sec   ERROR!
 java.lang.RuntimeException: Master not initialized after 200 seconds
 at 
 org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:208)
 at 
 org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:424)
 at 
 org.apache.hadoop.hbase.TestLocalHBaseCluster.testLocalHBaseCluster(TestLocalHBaseCluster.java:66)
 ...
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6027) Update the reference guide to reflect the changes in the security profile

2012-07-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406772#comment-13406772
 ] 

Hadoop QA commented on HBASE-6027:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12534996/6027-1.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+0 tests included.  The patch appears to be a documentation patch that 
doesn't require tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2322//console

This message is automatically generated.

 Update the reference guide to reflect the changes in the security profile
 -

 Key: HBASE-6027
 URL: https://issues.apache.org/jira/browse/HBASE-6027
 Project: HBase
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.96.0
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.96.0

 Attachments: 6027-1.patch


 The refguide needs to be updated to reflect the fact that there is no 
 security profile anymore, etc. [Follow up to HBASE-5732]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6328) FSHDFSUtils#recoverFileLease tries to rethrow InterruptedException but actually shallows it

2012-07-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406791#comment-13406791
 ] 

Hadoop QA commented on HBASE-6328:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12535129/6328.v1.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 5 javac compiler warnings (more than 
the trunk's current 4 warnings).

-1 findbugs.  The patch appears to introduce 7 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.catalog.TestMetaReaderEditor

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2323//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2323//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2323//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2323//console

This message is automatically generated.

 FSHDFSUtils#recoverFileLease tries to rethrow InterruptedException but 
 actually shallows it
 ---

 Key: HBASE-6328
 URL: https://issues.apache.org/jira/browse/HBASE-6328
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: 6328.v1.patch


 Coding error is:
 {noformat}
   try {
 Thread.sleep(1000);
   } catch (InterruptedException ex) {
 new InterruptedIOException().initCause(ex);
   }
 {noformat}
 The exception is created but not thrown...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6322) Unnecessary creation of finalizers in HTablePool

2012-07-04 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406792#comment-13406792
 ] 

Zhihong Ted Yu commented on HBASE-6322:
---

Looks like we can add the following to HTableInterface in trunk:
{code}
  public NavigableMapHRegionInfo, ServerName getRegionLocations() throws 
IOException {
{code}
so that RegionsResource can use it instead of getRegionsInfo().
And we don't need a cast in getTableRegions().

 Unnecessary creation of finalizers in HTablePool
 

 Key: HBASE-6322
 URL: https://issues.apache.org/jira/browse/HBASE-6322
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.92.0, 0.92.1, 0.94.0
Reporter: Ryan Brush
 Fix For: 0.92.2

 Attachments: HBASE-6322-0.92.1.patch, HBASE-6322-trunk.1.patch


 From a mailing list question:
 While generating some load against a library that makes extensive use of 
 HTablePool in 0.92, I noticed that the largest heap consumer was 
 java.lang.ref.Finalizer.  Digging in, I discovered that HTablePool's internal 
 PooledHTable extends HTable, which instantiates a ThreadPoolExecutor and 
 supporting objects every time a pooled HTable is retrieved.  Since 
 ThreadPoolExecutor has a finalizer, it and its dependencies can't get garbage 
 collected until the finalizer runs.  The result is by using HTablePool, we're 
 creating a ton of objects to be finalized that are stuck on the heap longer 
 than they should be, creating our largest source of pressure on the garbage 
 collector.  It looks like this will also be a problem in 0.94 and trunk.
 The easy fix is just to have PooledHTable implement HTableInterface (rather 
 than subclass HTable), but this does break a unit test that explicitly checks 
 that PooledHTable implements HTable -- I can only assume this test is there 
 for some historical passivity reason.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6327) HLog can be null when create table

2012-07-04 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406802#comment-13406802
 ] 

Zhihong Ted Yu commented on HBASE-6327:
---

{code}
 }
-hlog.closeAndDelete();
{code}
If hlog isn't null, we still need to call closeAndDelete().

 HLog can be null when create table
 --

 Key: HBASE-6327
 URL: https://issues.apache.org/jira/browse/HBASE-6327
 Project: HBase
  Issue Type: Bug
Reporter: ShiXing
Assignee: ShiXing
 Attachments: HBASE-6327-trunk-V1.patch, createTableFailedMaster.log


 As HBASE-4010 discussed, the HLog can be null.
 We have meet createTable failed because the no use hlog.
 When createHReagion, the HLog.LogSyncer is run sync(), in under layer it call 
 the DFSClient.DFSOutputStream.sync(). 
 Then the hlog.closeAndDelete() was called,firstly the HLog.close() will 
 interrupt the LogSyncer, and interrupt DFSClient.DFSOutputStream.sync().The 
 DFSClient.DFSOutputStream will store the exception and throw it when we 
 called DFSClient.close(). 
 The HLog.close() call the writer.close()/DFSClient.close() after interrupt 
 the LogSyncer. And there is no catch exception for the close().
 So the Master throw exception to the client. There is no need to throw this 
 exception, further, the hlog is no use.
 Our cluster is 0.90, the logs is attached, after closing hlog writer, there 
 is no log for the createTable().
 The trunk and 0.92, 0.94, we used just one hlog, and if the exception 
 happends, the client will got createTable failed, but indeed ,we expect all 
 the regions for the table can also be assigned.
 I will give the patch for this later.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Comment Edited] (HBASE-6327) HLog can be null when create table

2012-07-04 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406802#comment-13406802
 ] 

Zhihong Ted Yu edited comment on HBASE-6327 at 7/5/12 1:08 AM:
---

@Xing:
Have you tested the change in real cluster ?

  was (Author: zhi...@ebaysf.com):
{code}
 }
-hlog.closeAndDelete();
{code}
If hlog isn't null, we still need to call closeAndDelete().
  
 HLog can be null when create table
 --

 Key: HBASE-6327
 URL: https://issues.apache.org/jira/browse/HBASE-6327
 Project: HBase
  Issue Type: Bug
Reporter: ShiXing
Assignee: ShiXing
 Attachments: HBASE-6327-trunk-V1.patch, createTableFailedMaster.log


 As HBASE-4010 discussed, the HLog can be null.
 We have meet createTable failed because the no use hlog.
 When createHReagion, the HLog.LogSyncer is run sync(), in under layer it call 
 the DFSClient.DFSOutputStream.sync(). 
 Then the hlog.closeAndDelete() was called,firstly the HLog.close() will 
 interrupt the LogSyncer, and interrupt DFSClient.DFSOutputStream.sync().The 
 DFSClient.DFSOutputStream will store the exception and throw it when we 
 called DFSClient.close(). 
 The HLog.close() call the writer.close()/DFSClient.close() after interrupt 
 the LogSyncer. And there is no catch exception for the close().
 So the Master throw exception to the client. There is no need to throw this 
 exception, further, the hlog is no use.
 Our cluster is 0.90, the logs is attached, after closing hlog writer, there 
 is no log for the createTable().
 The trunk and 0.92, 0.94, we used just one hlog, and if the exception 
 happends, the client will got createTable failed, but indeed ,we expect all 
 the regions for the table can also be assigned.
 I will give the patch for this later.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6329) Stop META regionserver could cause daughter region assign twice

2012-07-04 Thread chunhui shen (JIRA)
chunhui shen created HBASE-6329:
---

 Summary: Stop META regionserver could cause daughter region assign 
twice
 Key: HBASE-6329
 URL: https://issues.apache.org/jira/browse/HBASE-6329
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0
Reporter: chunhui shen
Assignee: chunhui shen


We found this issue in 0.94, first let me describe the case:
Stop META rs when split is in progress

1.Stopping META rs(Server A).
2.The main thread of rs close ZK and delete ephemeral node of the rs.
3.SplitTransaction is retring MetaEditor.addDaughter
4.Master's ServerShutdownHandler process the above dead META server
5.Master fixup daughter and assign the daughter
6.The daughter is opened on another server(Server B)
7.Server A's splitTransaction successfully add the daughter to .META. with 
serverName=Server A
8.Now, in the .META., daughter's region location is Server A but it is onlined 
on Server B
9.Restart Master, and master will assign the daughter again.


Attaching the logs, daughter region 80f999ea84cb259e20e9a228546f6c8a

Master log:
2012-07-04 13:45:56,493 INFO 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
for dw93.kgb.sqa.cm4,60020,1341378224464
2012-07-04 13:45:58,983 INFO 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Fixup; missing 
daughter 
writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
 
2012-07-04 13:45:58,985 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Added 
daughter 
writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
 serverName=null 
2012-07-04 13:45:58,988 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Assigning region 
writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
 to dw88.kgb.sqa.cm4,60020,1341379188777 
2012-07-04 13:46:00,201 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
The master has opened the region 
writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
 that was online on dw88.kgb.sqa.cm4,60020,1341379188777 

Master log after restart:
2012-07-04 14:27:05,824 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:6-0x136187d60e34644 Creating (or updating) unassigned node for 
80f999ea84cb259e20e9a228546f6c8a with OFFLINE state 
2012-07-04 14:27:05,851 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Processing region 
writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
 in state M_ZK_REGION_OFFLINE 
2012-07-04 14:27:05,854 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Assigning region 
writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
 to dw93.kgb.sqa.cm4,60020,1341380812020 
2012-07-04 14:27:06,051 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENED, 
server=dw93.kgb.sqa.cm4,60020,1341380812020, 
region=80f999ea84cb259e20e9a228546f6c8a 



Regionserver(META rs) log:
2012-07-04 13:45:56,491 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server 
dw93.kgb.sqa.cm4,60020,1341378224464; zookeeper connection c
losed.
2012-07-04 13:46:11,951 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Added 
daughter 
writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
 serverName=dw93.kgb.sqa.cm4,60020,1341378224464 
2012-07-04 13:46:11,952 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Done with post open deploy 
task for 
region=writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
 daughter=true 




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6329) Stop META regionserver could cause daughter region assign twice

2012-07-04 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406834#comment-13406834
 ] 

chunhui shen commented on HBASE-6329:
-

IMO, regionserver should do closing zk and deleting ephemeral node in main 
thread after doing join()

 Stop META regionserver could cause daughter region assign twice
 ---

 Key: HBASE-6329
 URL: https://issues.apache.org/jira/browse/HBASE-6329
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0
Reporter: chunhui shen
Assignee: chunhui shen

 We found this issue in 0.94, first let me describe the case:
 Stop META rs when split is in progress
 1.Stopping META rs(Server A).
 2.The main thread of rs close ZK and delete ephemeral node of the rs.
 3.SplitTransaction is retring MetaEditor.addDaughter
 4.Master's ServerShutdownHandler process the above dead META server
 5.Master fixup daughter and assign the daughter
 6.The daughter is opened on another server(Server B)
 7.Server A's splitTransaction successfully add the daughter to .META. with 
 serverName=Server A
 8.Now, in the .META., daughter's region location is Server A but it is 
 onlined on Server B
 9.Restart Master, and master will assign the daughter again.
 Attaching the logs, daughter region 80f999ea84cb259e20e9a228546f6c8a
 Master log:
 2012-07-04 13:45:56,493 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
 for dw93.kgb.sqa.cm4,60020,1341378224464
 2012-07-04 13:45:58,983 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Fixup; missing 
 daughter 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
  
 2012-07-04 13:45:58,985 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Added daughter 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
  serverName=null 
 2012-07-04 13:45:58,988 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
  to dw88.kgb.sqa.cm4,60020,1341379188777 
 2012-07-04 13:46:00,201 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: The master has opened the 
 region 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
  that was online on dw88.kgb.sqa.cm4,60020,1341379188777 
 Master log after restart:
 2012-07-04 14:27:05,824 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x136187d60e34644 Creating (or updating) unassigned node for 
 80f999ea84cb259e20e9a228546f6c8a with OFFLINE state 
 2012-07-04 14:27:05,851 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Processing region 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
  in state M_ZK_REGION_OFFLINE 
 2012-07-04 14:27:05,854 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
  to dw93.kgb.sqa.cm4,60020,1341380812020 
 2012-07-04 14:27:06,051 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENED, server=dw93.kgb.sqa.cm4,60020,1341380812020, 
 region=80f999ea84cb259e20e9a228546f6c8a 
 Regionserver(META rs) log:
 2012-07-04 13:45:56,491 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server 
 dw93.kgb.sqa.cm4,60020,1341378224464; zookeeper connection c
 losed.
 2012-07-04 13:46:11,951 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Added daughter 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
  serverName=dw93.kgb.sqa.cm4,60020,1341378224464 
 2012-07-04 13:46:11,952 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Done with post open 
 deploy task for 
 region=writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
  daughter=true 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6329) Stop META regionserver when splitting region could cause daughter region assign twice

2012-07-04 Thread chunhui shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-6329:


Summary: Stop META regionserver when splitting region could cause daughter 
region assign twice  (was: Stop META regionserver could cause daughter region 
assign twice)

 Stop META regionserver when splitting region could cause daughter region 
 assign twice
 -

 Key: HBASE-6329
 URL: https://issues.apache.org/jira/browse/HBASE-6329
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0
Reporter: chunhui shen
Assignee: chunhui shen

 We found this issue in 0.94, first let me describe the case:
 Stop META rs when split is in progress
 1.Stopping META rs(Server A).
 2.The main thread of rs close ZK and delete ephemeral node of the rs.
 3.SplitTransaction is retring MetaEditor.addDaughter
 4.Master's ServerShutdownHandler process the above dead META server
 5.Master fixup daughter and assign the daughter
 6.The daughter is opened on another server(Server B)
 7.Server A's splitTransaction successfully add the daughter to .META. with 
 serverName=Server A
 8.Now, in the .META., daughter's region location is Server A but it is 
 onlined on Server B
 9.Restart Master, and master will assign the daughter again.
 Attaching the logs, daughter region 80f999ea84cb259e20e9a228546f6c8a
 Master log:
 2012-07-04 13:45:56,493 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
 for dw93.kgb.sqa.cm4,60020,1341378224464
 2012-07-04 13:45:58,983 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Fixup; missing 
 daughter 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
  
 2012-07-04 13:45:58,985 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Added daughter 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
  serverName=null 
 2012-07-04 13:45:58,988 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
  to dw88.kgb.sqa.cm4,60020,1341379188777 
 2012-07-04 13:46:00,201 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: The master has opened the 
 region 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
  that was online on dw88.kgb.sqa.cm4,60020,1341379188777 
 Master log after restart:
 2012-07-04 14:27:05,824 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x136187d60e34644 Creating (or updating) unassigned node for 
 80f999ea84cb259e20e9a228546f6c8a with OFFLINE state 
 2012-07-04 14:27:05,851 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Processing region 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
  in state M_ZK_REGION_OFFLINE 
 2012-07-04 14:27:05,854 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
  to dw93.kgb.sqa.cm4,60020,1341380812020 
 2012-07-04 14:27:06,051 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENED, server=dw93.kgb.sqa.cm4,60020,1341380812020, 
 region=80f999ea84cb259e20e9a228546f6c8a 
 Regionserver(META rs) log:
 2012-07-04 13:45:56,491 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server 
 dw93.kgb.sqa.cm4,60020,1341378224464; zookeeper connection c
 losed.
 2012-07-04 13:46:11,951 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Added daughter 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
  serverName=dw93.kgb.sqa.cm4,60020,1341378224464 
 2012-07-04 13:46:11,952 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Done with post open 
 deploy task for 
 region=writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
  daughter=true 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.

2012-07-04 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406838#comment-13406838
 ] 

ramkrishna.s.vasudevan commented on HBASE-6299:
---

@Maryann/Stack
I can tell one scenario where this patch will lead to inconsistency.  
In the patch
{code}
else {
+// The destination region server is probably processing the region 
open, so it
+// might be safer to try this region server again to avoid having 
two region
+// servers open the same region.
+LOG.warn(Call openRegion() to  + plan.getDestination() +
+ has timed out when trying to assign  + 
region.getRegionNameAsString() +
+. Trying to assign to this region server again; retry= + i, 
t);
+state.update(RegionState.State.OFFLINE);
+continue;
+  }
{code}
Now because the RS is already opening i tend to assign it to same RS and i 
update the inmemory state to OFFLINE.  At that time the RS has moved the znode 
from OFFLINE to OPENING or OPENINIG to OPENED.  Now there is a check in 
handleRegion
{code}
  if (regionState == null ||
  (!regionState.isPendingOpen()  !regionState.isOpening())) {
LOG.warn(Received OPENING for region  +
prettyPrintedRegionName +
 from server  + data.getOrigin() +  but region was in  +
 the state  + regionState +  and not  +
in expected PENDING_OPEN or OPENING states);
return;
  }
{code}

So the master skips the transition.  Now any way as we are trying out the 
assignment to same RS, we will either get RegionAlreadyInTransistion or 
sometimes even ALREADY_OPENED.
If i get ALREADY_OPENED we are handling it correctly by adding to this.regions. 
 
But if i get RegionAlreadyInTransistion  we just skip the assign next time.  
Now in the RS side the region could have been made online by this time but the 
master is not aware of this.

One more thing is 
{code}
+else if (t instanceof java.net.SocketTimeoutException) {
+  if (this.regionsInTransition.get(region.getEncodedName()) == null
+   
plan.getDestination().equals(getRegionServerOfRegion(region))) {
{code}
Here the plan could be cleared on regionOnline if RIT is cleared?

Ideally over in HBASE-6060 we were trying to see how good is the retry option 
in assign. Sometimes the retry option and SSH were causing double assignments 
which we were trying to solve.
Here, Can we have an option to shutdown the RS incase of sockettimeout by the 
master so that atleast we are sure that the SSH will take care of ssignment?




 RS starts region open while fails ack to HMaster.sendRegionOpen() causes 
 inconsistency in HMaster's region state and a series of successive problems.
 -

 Key: HBASE-6299
 URL: https://issues.apache.org/jira/browse/HBASE-6299
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Attachments: HBASE-6299-v2.patch, HBASE-6299.patch


 1. HMaster tries to assign a region to an RS.
 2. HMaster creates a RegionState for this region and puts it into 
 regionsInTransition.
 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS 
 receives the open region request and starts to proceed, with success 
 eventually. However, due to network problems, HMaster fails to receive the 
 response for the openRegion() call, and the call times out.
 4. HMaster attemps to assign for a second time, choosing another RS. 
 5. But since the HMaster's OpenedRegionHandler has been triggered by the 
 region open of the previous RS, and the RegionState has already been removed 
 from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK 
 node RS_ZK_REGION_OPENING updated by the second attempt.
 6. The unassigned ZK node stays and a later unassign fails coz 
 RS_ZK_REGION_CLOSING cannot be created.
 {code}
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
 region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.;
  
 plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.,
  src=swbss-hadoop-004,60020,1340890123243, 
 dest=swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to 

[jira] [Commented] (HBASE-6311) Data error after majorCompaction caused by keeping MVCC for opened scanners

2012-07-04 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406839#comment-13406839
 ] 

ramkrishna.s.vasudevan commented on HBASE-6311:
---

@All,
Could someone take a look at this? Seems important wrt MVCC.  


 Data error after majorCompaction caused by keeping MVCC for opened scanners
 ---

 Key: HBASE-6311
 URL: https://issues.apache.org/jira/browse/HBASE-6311
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Blocker
 Attachments: HBASE-6311-test.patch, HBASE-6311v1.patch


 It is a big problem we found in 0.94, and you could reproduce the problem in 
 Trunk using the test case I uploaded.
 When we do compaction, we will use region.getSmallestReadPoint() to keep MVCC 
 for opened scanners;
 However,It will make data mistake after majorCompaction because we will skip 
 delete type KV but keep the put type kv in the compacted storefile.
 The following is the reason from code:
 In StoreFileScanner, enforceMVCC is false when compaction, so we could read 
 the delete type KV,
 However, we will skip this delete type KV in ScanQueryMatcher because 
 following code
 {code}
 if (kv.isDelete())
 {
 ...
  if (includeDeleteMarker
  kv.getMemstoreTS() = maxReadPointToTrackVersions) {
   System.out.println(add deletes,maxReadPointToTrackVersions=
   + maxReadPointToTrackVersions);
   this.deletes.add(bytes, offset, qualLength, timestamp, type);
 }
 ...
 }
 {code}
 Here maxReadPointToTrackVersions = region.getSmallestReadPoint();
 and kv.getMemstoreTS()  maxReadPointToTrackVersions 
 So we won't add this to DeleteTracker.
 Why test case passed if remove the line 
 MultiVersionConsistencyControl.setThreadReadPoint(smallestReadPoint);
 Because in the StoreFileScanner#skipKVsNewerThanReadpoint
 {code}
 if (cur.getMemstoreTS() = readPoint) {
   cur.setMemstoreTS(0);
 }
 {code}
 So if we remove the line 
 MultiVersionConsistencyControl.setThreadReadPoint(smallestReadPoint);
 Here readPoint is LONG.MAX_VALUE, we will set memStore ts as 0, so we will 
 add it to DeleteTracker in ScanQueryMatcher 
 Solution:
 We use smallestReadPoint of region when compaction to keep MVCC for OPENED 
 scanner, So we should retain delete type kv in output in the case(Already 
 deleted KV is retained in output to make old opened scanner could read this 
 KV) even if it is a majorcompaction.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6329) Stop META regionserver when splitting region could cause daughter region assign twice

2012-07-04 Thread chunhui shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-6329:


Attachment: HBASE-6329v1.patch

 Stop META regionserver when splitting region could cause daughter region 
 assign twice
 -

 Key: HBASE-6329
 URL: https://issues.apache.org/jira/browse/HBASE-6329
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: HBASE-6329v1.patch


 We found this issue in 0.94, first let me describe the case:
 Stop META rs when split is in progress
 1.Stopping META rs(Server A).
 2.The main thread of rs close ZK and delete ephemeral node of the rs.
 3.SplitTransaction is retring MetaEditor.addDaughter
 4.Master's ServerShutdownHandler process the above dead META server
 5.Master fixup daughter and assign the daughter
 6.The daughter is opened on another server(Server B)
 7.Server A's splitTransaction successfully add the daughter to .META. with 
 serverName=Server A
 8.Now, in the .META., daughter's region location is Server A but it is 
 onlined on Server B
 9.Restart Master, and master will assign the daughter again.
 Attaching the logs, daughter region 80f999ea84cb259e20e9a228546f6c8a
 Master log:
 2012-07-04 13:45:56,493 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
 for dw93.kgb.sqa.cm4,60020,1341378224464
 2012-07-04 13:45:58,983 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Fixup; missing 
 daughter 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
  
 2012-07-04 13:45:58,985 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Added daughter 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
  serverName=null 
 2012-07-04 13:45:58,988 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
  to dw88.kgb.sqa.cm4,60020,1341379188777 
 2012-07-04 13:46:00,201 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: The master has opened the 
 region 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
  that was online on dw88.kgb.sqa.cm4,60020,1341379188777 
 Master log after restart:
 2012-07-04 14:27:05,824 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x136187d60e34644 Creating (or updating) unassigned node for 
 80f999ea84cb259e20e9a228546f6c8a with OFFLINE state 
 2012-07-04 14:27:05,851 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Processing region 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
  in state M_ZK_REGION_OFFLINE 
 2012-07-04 14:27:05,854 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
  to dw93.kgb.sqa.cm4,60020,1341380812020 
 2012-07-04 14:27:06,051 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENED, server=dw93.kgb.sqa.cm4,60020,1341380812020, 
 region=80f999ea84cb259e20e9a228546f6c8a 
 Regionserver(META rs) log:
 2012-07-04 13:45:56,491 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server 
 dw93.kgb.sqa.cm4,60020,1341378224464; zookeeper connection c
 losed.
 2012-07-04 13:46:11,951 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Added daughter 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
  serverName=dw93.kgb.sqa.cm4,60020,1341378224464 
 2012-07-04 13:46:11,952 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Done with post open 
 deploy task for 
 region=writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
  daughter=true 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6329) Stop META regionserver when splitting region could cause daughter region assign twice

2012-07-04 Thread Zhihong Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6329:
--

Status: Patch Available  (was: Open)

 Stop META regionserver when splitting region could cause daughter region 
 assign twice
 -

 Key: HBASE-6329
 URL: https://issues.apache.org/jira/browse/HBASE-6329
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: HBASE-6329v1.patch


 We found this issue in 0.94, first let me describe the case:
 Stop META rs when split is in progress
 1.Stopping META rs(Server A).
 2.The main thread of rs close ZK and delete ephemeral node of the rs.
 3.SplitTransaction is retring MetaEditor.addDaughter
 4.Master's ServerShutdownHandler process the above dead META server
 5.Master fixup daughter and assign the daughter
 6.The daughter is opened on another server(Server B)
 7.Server A's splitTransaction successfully add the daughter to .META. with 
 serverName=Server A
 8.Now, in the .META., daughter's region location is Server A but it is 
 onlined on Server B
 9.Restart Master, and master will assign the daughter again.
 Attaching the logs, daughter region 80f999ea84cb259e20e9a228546f6c8a
 Master log:
 2012-07-04 13:45:56,493 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
 for dw93.kgb.sqa.cm4,60020,1341378224464
 2012-07-04 13:45:58,983 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Fixup; missing 
 daughter 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
  
 2012-07-04 13:45:58,985 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Added daughter 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
  serverName=null 
 2012-07-04 13:45:58,988 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
  to dw88.kgb.sqa.cm4,60020,1341379188777 
 2012-07-04 13:46:00,201 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: The master has opened the 
 region 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
  that was online on dw88.kgb.sqa.cm4,60020,1341379188777 
 Master log after restart:
 2012-07-04 14:27:05,824 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x136187d60e34644 Creating (or updating) unassigned node for 
 80f999ea84cb259e20e9a228546f6c8a with OFFLINE state 
 2012-07-04 14:27:05,851 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Processing region 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
  in state M_ZK_REGION_OFFLINE 
 2012-07-04 14:27:05,854 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
  to dw93.kgb.sqa.cm4,60020,1341380812020 
 2012-07-04 14:27:06,051 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENED, server=dw93.kgb.sqa.cm4,60020,1341380812020, 
 region=80f999ea84cb259e20e9a228546f6c8a 
 Regionserver(META rs) log:
 2012-07-04 13:45:56,491 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server 
 dw93.kgb.sqa.cm4,60020,1341378224464; zookeeper connection c
 losed.
 2012-07-04 13:46:11,951 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Added daughter 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
  serverName=dw93.kgb.sqa.cm4,60020,1341378224464 
 2012-07-04 13:46:11,952 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Done with post open 
 deploy task for 
 region=writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
  daughter=true 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6311) Data error after majorCompaction caused by keeping MVCC for opened scanners

2012-07-04 Thread chunhui shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-6311:


Attachment: HBASE-6311v2.patch

@ram
What doubt do you have about my patch v2?
I update the test case to verify MVCC for scanners after majorCompaction.

 Data error after majorCompaction caused by keeping MVCC for opened scanners
 ---

 Key: HBASE-6311
 URL: https://issues.apache.org/jira/browse/HBASE-6311
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Blocker
 Attachments: HBASE-6311-test.patch, HBASE-6311v1.patch, 
 HBASE-6311v2.patch


 It is a big problem we found in 0.94, and you could reproduce the problem in 
 Trunk using the test case I uploaded.
 When we do compaction, we will use region.getSmallestReadPoint() to keep MVCC 
 for opened scanners;
 However,It will make data mistake after majorCompaction because we will skip 
 delete type KV but keep the put type kv in the compacted storefile.
 The following is the reason from code:
 In StoreFileScanner, enforceMVCC is false when compaction, so we could read 
 the delete type KV,
 However, we will skip this delete type KV in ScanQueryMatcher because 
 following code
 {code}
 if (kv.isDelete())
 {
 ...
  if (includeDeleteMarker
  kv.getMemstoreTS() = maxReadPointToTrackVersions) {
   System.out.println(add deletes,maxReadPointToTrackVersions=
   + maxReadPointToTrackVersions);
   this.deletes.add(bytes, offset, qualLength, timestamp, type);
 }
 ...
 }
 {code}
 Here maxReadPointToTrackVersions = region.getSmallestReadPoint();
 and kv.getMemstoreTS()  maxReadPointToTrackVersions 
 So we won't add this to DeleteTracker.
 Why test case passed if remove the line 
 MultiVersionConsistencyControl.setThreadReadPoint(smallestReadPoint);
 Because in the StoreFileScanner#skipKVsNewerThanReadpoint
 {code}
 if (cur.getMemstoreTS() = readPoint) {
   cur.setMemstoreTS(0);
 }
 {code}
 So if we remove the line 
 MultiVersionConsistencyControl.setThreadReadPoint(smallestReadPoint);
 Here readPoint is LONG.MAX_VALUE, we will set memStore ts as 0, so we will 
 add it to DeleteTracker in ScanQueryMatcher 
 Solution:
 We use smallestReadPoint of region when compaction to keep MVCC for OPENED 
 scanner, So we should retain delete type kv in output in the case(Already 
 deleted KV is retained in output to make old opened scanner could read this 
 KV) even if it is a majorcompaction.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6311) Data error after majorCompaction caused by keeping MVCC for opened scanners

2012-07-04 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406854#comment-13406854
 ] 

ramkrishna.s.vasudevan commented on HBASE-6311:
---

@Chunhui
I am clear with your patch.  Your patch tries to keep MVCC concepts intact and 
that is what is needed.  No problem. Even Anoop also has reviewed it.
Just wanted others to review this because now even on major compaction we 
create a file with delete marker if the condition is kv.getMemstoreTS()  
maxReadPointToTrackVersions.
But in a normal case we will not write delete marker on major compaction.  Is 
this ok? 

 Data error after majorCompaction caused by keeping MVCC for opened scanners
 ---

 Key: HBASE-6311
 URL: https://issues.apache.org/jira/browse/HBASE-6311
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Blocker
 Attachments: HBASE-6311-test.patch, HBASE-6311v1.patch, 
 HBASE-6311v2.patch


 It is a big problem we found in 0.94, and you could reproduce the problem in 
 Trunk using the test case I uploaded.
 When we do compaction, we will use region.getSmallestReadPoint() to keep MVCC 
 for opened scanners;
 However,It will make data mistake after majorCompaction because we will skip 
 delete type KV but keep the put type kv in the compacted storefile.
 The following is the reason from code:
 In StoreFileScanner, enforceMVCC is false when compaction, so we could read 
 the delete type KV,
 However, we will skip this delete type KV in ScanQueryMatcher because 
 following code
 {code}
 if (kv.isDelete())
 {
 ...
  if (includeDeleteMarker
  kv.getMemstoreTS() = maxReadPointToTrackVersions) {
   System.out.println(add deletes,maxReadPointToTrackVersions=
   + maxReadPointToTrackVersions);
   this.deletes.add(bytes, offset, qualLength, timestamp, type);
 }
 ...
 }
 {code}
 Here maxReadPointToTrackVersions = region.getSmallestReadPoint();
 and kv.getMemstoreTS()  maxReadPointToTrackVersions 
 So we won't add this to DeleteTracker.
 Why test case passed if remove the line 
 MultiVersionConsistencyControl.setThreadReadPoint(smallestReadPoint);
 Because in the StoreFileScanner#skipKVsNewerThanReadpoint
 {code}
 if (cur.getMemstoreTS() = readPoint) {
   cur.setMemstoreTS(0);
 }
 {code}
 So if we remove the line 
 MultiVersionConsistencyControl.setThreadReadPoint(smallestReadPoint);
 Here readPoint is LONG.MAX_VALUE, we will set memStore ts as 0, so we will 
 add it to DeleteTracker in ScanQueryMatcher 
 Solution:
 We use smallestReadPoint of region when compaction to keep MVCC for OPENED 
 scanner, So we should retain delete type kv in output in the case(Already 
 deleted KV is retained in output to make old opened scanner could read this 
 KV) even if it is a majorcompaction.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6329) Stop META regionserver when splitting region could cause daughter region assign twice

2012-07-04 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406858#comment-13406858
 ] 

ramkrishna.s.vasudevan commented on HBASE-6329:
---

Nice one.
One question here 
{code}
// Interrupt catalog tracker here in case any regions being opened out in
// handlers are stuck waiting on meta or root.
if (this.catalogTracker != null) this.catalogTracker.stop();
{code}
This does not impact the thread that is trying to write into META thro 
SplitTransaction?

May be we can add one check like if RS already aborting do not call abort/stop. 
 This is because some times in the above case if META writing fails we will get 
a PONR and thro PONR we will call server.abort.  Now already there is an abort 
going on and one more abort will be called. Not sure of the implications if 
both go on at the same time.

 Stop META regionserver when splitting region could cause daughter region 
 assign twice
 -

 Key: HBASE-6329
 URL: https://issues.apache.org/jira/browse/HBASE-6329
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: HBASE-6329v1.patch


 We found this issue in 0.94, first let me describe the case:
 Stop META rs when split is in progress
 1.Stopping META rs(Server A).
 2.The main thread of rs close ZK and delete ephemeral node of the rs.
 3.SplitTransaction is retring MetaEditor.addDaughter
 4.Master's ServerShutdownHandler process the above dead META server
 5.Master fixup daughter and assign the daughter
 6.The daughter is opened on another server(Server B)
 7.Server A's splitTransaction successfully add the daughter to .META. with 
 serverName=Server A
 8.Now, in the .META., daughter's region location is Server A but it is 
 onlined on Server B
 9.Restart Master, and master will assign the daughter again.
 Attaching the logs, daughter region 80f999ea84cb259e20e9a228546f6c8a
 Master log:
 2012-07-04 13:45:56,493 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
 for dw93.kgb.sqa.cm4,60020,1341378224464
 2012-07-04 13:45:58,983 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Fixup; missing 
 daughter 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
  
 2012-07-04 13:45:58,985 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Added daughter 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
  serverName=null 
 2012-07-04 13:45:58,988 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
  to dw88.kgb.sqa.cm4,60020,1341379188777 
 2012-07-04 13:46:00,201 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: The master has opened the 
 region 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
  that was online on dw88.kgb.sqa.cm4,60020,1341379188777 
 Master log after restart:
 2012-07-04 14:27:05,824 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x136187d60e34644 Creating (or updating) unassigned node for 
 80f999ea84cb259e20e9a228546f6c8a with OFFLINE state 
 2012-07-04 14:27:05,851 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Processing region 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
  in state M_ZK_REGION_OFFLINE 
 2012-07-04 14:27:05,854 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
  to dw93.kgb.sqa.cm4,60020,1341380812020 
 2012-07-04 14:27:06,051 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENED, server=dw93.kgb.sqa.cm4,60020,1341380812020, 
 region=80f999ea84cb259e20e9a228546f6c8a 
 Regionserver(META rs) log:
 2012-07-04 13:45:56,491 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server 
 dw93.kgb.sqa.cm4,60020,1341378224464; zookeeper connection c
 losed.
 2012-07-04 13:46:11,951 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Added daughter 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
  serverName=dw93.kgb.sqa.cm4,60020,1341378224464 
 2012-07-04 13:46:11,952 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Done with post open 
 deploy task for 
 

[jira] [Commented] (HBASE-6311) Data error after majorCompaction caused by keeping MVCC for opened scanners

2012-07-04 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406861#comment-13406861
 ] 

chunhui shen commented on HBASE-6311:
-

bq.But in a normal case we will not write delete marker on major compaction
Yes, it's so

 Data error after majorCompaction caused by keeping MVCC for opened scanners
 ---

 Key: HBASE-6311
 URL: https://issues.apache.org/jira/browse/HBASE-6311
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Blocker
 Attachments: HBASE-6311-test.patch, HBASE-6311v1.patch, 
 HBASE-6311v2.patch


 It is a big problem we found in 0.94, and you could reproduce the problem in 
 Trunk using the test case I uploaded.
 When we do compaction, we will use region.getSmallestReadPoint() to keep MVCC 
 for opened scanners;
 However,It will make data mistake after majorCompaction because we will skip 
 delete type KV but keep the put type kv in the compacted storefile.
 The following is the reason from code:
 In StoreFileScanner, enforceMVCC is false when compaction, so we could read 
 the delete type KV,
 However, we will skip this delete type KV in ScanQueryMatcher because 
 following code
 {code}
 if (kv.isDelete())
 {
 ...
  if (includeDeleteMarker
  kv.getMemstoreTS() = maxReadPointToTrackVersions) {
   System.out.println(add deletes,maxReadPointToTrackVersions=
   + maxReadPointToTrackVersions);
   this.deletes.add(bytes, offset, qualLength, timestamp, type);
 }
 ...
 }
 {code}
 Here maxReadPointToTrackVersions = region.getSmallestReadPoint();
 and kv.getMemstoreTS()  maxReadPointToTrackVersions 
 So we won't add this to DeleteTracker.
 Why test case passed if remove the line 
 MultiVersionConsistencyControl.setThreadReadPoint(smallestReadPoint);
 Because in the StoreFileScanner#skipKVsNewerThanReadpoint
 {code}
 if (cur.getMemstoreTS() = readPoint) {
   cur.setMemstoreTS(0);
 }
 {code}
 So if we remove the line 
 MultiVersionConsistencyControl.setThreadReadPoint(smallestReadPoint);
 Here readPoint is LONG.MAX_VALUE, we will set memStore ts as 0, so we will 
 add it to DeleteTracker in ScanQueryMatcher 
 Solution:
 We use smallestReadPoint of region when compaction to keep MVCC for OPENED 
 scanner, So we should retain delete type kv in output in the case(Already 
 deleted KV is retained in output to make old opened scanner could read this 
 KV) even if it is a majorcompaction.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6329) Stop META regionserver when splitting region could cause daughter region assign twice

2012-07-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406860#comment-13406860
 ] 

Hadoop QA commented on HBASE-6329:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12535146/HBASE-6329v1.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 5 javac compiler warnings (more than 
the trunk's current 4 warnings).

-1 findbugs.  The patch appears to introduce 7 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
  org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol
  org.apache.hadoop.hbase.regionserver.wal.TestHLog

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2324//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2324//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2324//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2324//console

This message is automatically generated.

 Stop META regionserver when splitting region could cause daughter region 
 assign twice
 -

 Key: HBASE-6329
 URL: https://issues.apache.org/jira/browse/HBASE-6329
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: HBASE-6329v1.patch


 We found this issue in 0.94, first let me describe the case:
 Stop META rs when split is in progress
 1.Stopping META rs(Server A).
 2.The main thread of rs close ZK and delete ephemeral node of the rs.
 3.SplitTransaction is retring MetaEditor.addDaughter
 4.Master's ServerShutdownHandler process the above dead META server
 5.Master fixup daughter and assign the daughter
 6.The daughter is opened on another server(Server B)
 7.Server A's splitTransaction successfully add the daughter to .META. with 
 serverName=Server A
 8.Now, in the .META., daughter's region location is Server A but it is 
 onlined on Server B
 9.Restart Master, and master will assign the daughter again.
 Attaching the logs, daughter region 80f999ea84cb259e20e9a228546f6c8a
 Master log:
 2012-07-04 13:45:56,493 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
 for dw93.kgb.sqa.cm4,60020,1341378224464
 2012-07-04 13:45:58,983 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Fixup; missing 
 daughter 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
  
 2012-07-04 13:45:58,985 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Added daughter 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
  serverName=null 
 2012-07-04 13:45:58,988 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
  to dw88.kgb.sqa.cm4,60020,1341379188777 
 2012-07-04 13:46:00,201 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: The master has opened the 
 region 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
  that was online on dw88.kgb.sqa.cm4,60020,1341379188777 
 Master log after restart:
 2012-07-04 14:27:05,824 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x136187d60e34644 Creating (or updating) unassigned node for 
 80f999ea84cb259e20e9a228546f6c8a with OFFLINE state 
 2012-07-04 14:27:05,851 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Processing region 
 writetest,JC\xCA\xC8\xCFQ\xC49OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
  in state M_ZK_REGION_OFFLINE 
 2012-07-04 14:27:05,854 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: 

[jira] [Commented] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.

2012-07-04 Thread Maryann Xue (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406862#comment-13406862
 ] 

Maryann Xue commented on HBASE-6299:


Agree, ramkrishna! You've made a good point here. My original idea was to 
directly return in the else branch, and leave it to the TimeoutMonitor to 
assign this region if the RS did not open the region. I changed to the current 
version, thinking to bring the assign retrial earlier. But regarding the region 
in transition problem you pointed out, the original return solution looks 
better.
{code}
else {
+// The destination region server is probably processing the region 
open, so it
+// might be safer to try this region server again to avoid having 
two region
+// servers open the same region.
+LOG.error(Call openRegion() to  + plan.getDestination() +
+ has timed out when trying to assign  + 
region.getRegionNameAsString() +
+., t);
+return;
+  }
{code}
And if we are considering removing the assign retry in HBASE-6060, problems 
like this one and the one in HBASE-5816 can be avoided.
Think triggering SSH in case of SocketTimeout should be a different problem. 
There are several places in HMaster where we should consider whether to start 
SSH, but currently only RegionServerTracker will start SSH. Shall we open 
another JIRA entry to discuss this issue?

 RS starts region open while fails ack to HMaster.sendRegionOpen() causes 
 inconsistency in HMaster's region state and a series of successive problems.
 -

 Key: HBASE-6299
 URL: https://issues.apache.org/jira/browse/HBASE-6299
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Attachments: HBASE-6299-v2.patch, HBASE-6299.patch


 1. HMaster tries to assign a region to an RS.
 2. HMaster creates a RegionState for this region and puts it into 
 regionsInTransition.
 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS 
 receives the open region request and starts to proceed, with success 
 eventually. However, due to network problems, HMaster fails to receive the 
 response for the openRegion() call, and the call times out.
 4. HMaster attemps to assign for a second time, choosing another RS. 
 5. But since the HMaster's OpenedRegionHandler has been triggered by the 
 region open of the previous RS, and the RegionState has already been removed 
 from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK 
 node RS_ZK_REGION_OPENING updated by the second attempt.
 6. The unassigned ZK node stays and a later unassign fails coz 
 RS_ZK_REGION_CLOSING cannot be created.
 {code}
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
 region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.;
  
 plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.,
  src=swbss-hadoop-004,60020,1340890123243, 
 dest=swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:28,882 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,291 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
 event for 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, 
 regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned 

[jira] [Commented] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.

2012-07-04 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406863#comment-13406863
 ] 

ramkrishna.s.vasudevan commented on HBASE-6299:
---

@Maryann
bq.And if we are considering removing the assign retry in HBASE-6060
Assign retry was a point discussed over there, but still not concluded on 
removing it.
bq.Shall we open another JIRA entry to discuss this issue?
Yes...sure...Stack, Jon and others have started to work on issues related to 
Assignments recently.  

 RS starts region open while fails ack to HMaster.sendRegionOpen() causes 
 inconsistency in HMaster's region state and a series of successive problems.
 -

 Key: HBASE-6299
 URL: https://issues.apache.org/jira/browse/HBASE-6299
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Attachments: HBASE-6299-v2.patch, HBASE-6299.patch


 1. HMaster tries to assign a region to an RS.
 2. HMaster creates a RegionState for this region and puts it into 
 regionsInTransition.
 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS 
 receives the open region request and starts to proceed, with success 
 eventually. However, due to network problems, HMaster fails to receive the 
 response for the openRegion() call, and the call times out.
 4. HMaster attemps to assign for a second time, choosing another RS. 
 5. But since the HMaster's OpenedRegionHandler has been triggered by the 
 region open of the previous RS, and the RegionState has already been removed 
 from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK 
 node RS_ZK_REGION_OPENING updated by the second attempt.
 6. The unassigned ZK node stays and a later unassign fails coz 
 RS_ZK_REGION_CLOSING cannot be created.
 {code}
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
 region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.;
  
 plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.,
  src=swbss-hadoop-004,60020,1340890123243, 
 dest=swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  to swbss-hadoop-006,60020,1340890678078
 2012-06-29 07:03:38,870 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:28,882 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,291 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, 
 region=b713fd655fa02395496c5a6e39ddf568
 2012-06-29 07:06:32,299 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
 event for 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, 
 regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node
 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Deleting existing unassigned node for 
 b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2377fee2ae80007 Successfully deleted unassigned node for 
 region b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED
 2012-06-29 07:06:32,301 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has 
 opened the region 
 CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
  that was online on serverName=swbss-hadoop-006,60020,1340890678078, 
 load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301)
 2012-06-29 07:07:41,140 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Failed 

[jira] [Updated] (HBASE-6311) Data error after majorCompaction caused by keeping MVCC for opened scanners

2012-07-04 Thread Zhihong Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6311:
--

Fix Version/s: 0.94.1
   0.96.0
 Hadoop Flags: Reviewed
   Status: Patch Available  (was: Open)

 Data error after majorCompaction caused by keeping MVCC for opened scanners
 ---

 Key: HBASE-6311
 URL: https://issues.apache.org/jira/browse/HBASE-6311
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Blocker
 Fix For: 0.96.0, 0.94.1

 Attachments: HBASE-6311-test.patch, HBASE-6311v1.patch, 
 HBASE-6311v2.patch


 It is a big problem we found in 0.94, and you could reproduce the problem in 
 Trunk using the test case I uploaded.
 When we do compaction, we will use region.getSmallestReadPoint() to keep MVCC 
 for opened scanners;
 However,It will make data mistake after majorCompaction because we will skip 
 delete type KV but keep the put type kv in the compacted storefile.
 The following is the reason from code:
 In StoreFileScanner, enforceMVCC is false when compaction, so we could read 
 the delete type KV,
 However, we will skip this delete type KV in ScanQueryMatcher because 
 following code
 {code}
 if (kv.isDelete())
 {
 ...
  if (includeDeleteMarker
  kv.getMemstoreTS() = maxReadPointToTrackVersions) {
   System.out.println(add deletes,maxReadPointToTrackVersions=
   + maxReadPointToTrackVersions);
   this.deletes.add(bytes, offset, qualLength, timestamp, type);
 }
 ...
 }
 {code}
 Here maxReadPointToTrackVersions = region.getSmallestReadPoint();
 and kv.getMemstoreTS()  maxReadPointToTrackVersions 
 So we won't add this to DeleteTracker.
 Why test case passed if remove the line 
 MultiVersionConsistencyControl.setThreadReadPoint(smallestReadPoint);
 Because in the StoreFileScanner#skipKVsNewerThanReadpoint
 {code}
 if (cur.getMemstoreTS() = readPoint) {
   cur.setMemstoreTS(0);
 }
 {code}
 So if we remove the line 
 MultiVersionConsistencyControl.setThreadReadPoint(smallestReadPoint);
 Here readPoint is LONG.MAX_VALUE, we will set memStore ts as 0, so we will 
 add it to DeleteTracker in ScanQueryMatcher 
 Solution:
 We use smallestReadPoint of region when compaction to keep MVCC for OPENED 
 scanner, So we should retain delete type kv in output in the case(Already 
 deleted KV is retained in output to make old opened scanner could read this 
 KV) even if it is a majorcompaction.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira