[jira] [Commented] (HBASE-6309) [MTTR] Do NN operations outside of the ZK EventThread in SplitLogManager

2012-07-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406310#comment-13406310
 ] 

stack commented on HBASE-6309:
--

@Chunhui What about case where we fail a log splitting... how would the cleanup 
go?  If into a tmp dir, its easy remove the tmp dir (Otherwise, sounds like a 
fine idea).

> [MTTR] Do NN operations outside of the ZK EventThread in SplitLogManager
> 
>
> Key: HBASE-6309
> URL: https://issues.apache.org/jira/browse/HBASE-6309
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.92.1, 0.94.0, 0.96.0
>Reporter: Jean-Daniel Cryans
>Priority: Critical
> Fix For: 0.96.0
>
>
> We found this issue during the leap second cataclysm which prompted a 
> distributed splitting of all our logs.
> I saw that none of the RS were splitting after some time while the master was 
> showing that it wasn't even 30% done. jstack'ing I saw this:
> {noformat}
> "main-EventThread" daemon prio=10 tid=0x7f6ce46d8800 nid=0x5376 in
> Object.wait() [0x7f6ce2ecb000]
>java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:485)
> at org.apache.hadoop.ipc.Client.call(Client.java:1093)
> - locked <0x0005fdd661a0> (a org.apache.hadoop.ipc.Client$Call)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
> at $Proxy9.rename(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> at $Proxy9.rename(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:759)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:253)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.moveRecoveredEditsFromTemp(HLogSplitter.java:553)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.moveRecoveredEditsFromTemp(HLogSplitter.java:519)
> at 
> org.apache.hadoop.hbase.master.SplitLogManager$1.finish(SplitLogManager.java:138)
> at 
> org.apache.hadoop.hbase.master.SplitLogManager.getDataSetWatchSuccess(SplitLogManager.java:431)
> at 
> org.apache.hadoop.hbase.master.SplitLogManager.access$1200(SplitLogManager.java:95)
> at 
> org.apache.hadoop.hbase.master.SplitLogManager$GetDataAsyncCallback.processResult(SplitLogManager.java:1011)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:571)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
> {noformat}
> We are effectively bottlenecking on doing NN operations and whatever else is 
> happening in GetDataAsyncCallback. It was so bad that on our 100 offline 
> cluster it took a few hours for the master to process all the incoming ZK 
> events while the actual splitting took a fraction of that time.
> I'm marking this as critical and against 0.96 but depending on how involved 
> the fix is we might want to backport.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6326) Nested retry loops in HConnectionManager

2012-07-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406308#comment-13406308
 ] 

stack commented on HBASE-6326:
--

+1 Simple but ugly.  Good enough for a 0.94.1.

> Nested retry loops in HConnectionManager
> 
>
> Key: HBASE-6326
> URL: https://issues.apache.org/jira/browse/HBASE-6326
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Priority: Critical
> Fix For: 0.94.1
>
> Attachments: 6326.txt
>
>
> While testing client timeouts when the HBase is not available we found that 
> even with aggressive settings, it takes the client 10 minutes or more to 
> finally receive an exception.
> Part of this is due to nested nested retry loops in locateRegion.
> locateRegion will first try to locate the table in meta (which is retried), 
> then it will try to locate the meta table is root (which is also retried).
> So for each retry of the meta lookup we retry the root lookup as well.
> I have have that avoids locateRegion retrying if it is called from code that 
> already has a retry loop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6299) RS starts region open while fails ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems.

2012-07-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406305#comment-13406305
 ] 

stack commented on HBASE-6299:
--

It looks to me like we have same issue in trunk.  Your suggested fix looks 
right Maryann.  Put up a patch and I'll have a go at making a unit test for it.

> RS starts region open while fails ack to HMaster.sendRegionOpen() causes 
> inconsistency in HMaster's region state and a series of successive problems.
> -
>
> Key: HBASE-6299
> URL: https://issues.apache.org/jira/browse/HBASE-6299
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.6, 0.94.0
>Reporter: Maryann Xue
>Assignee: Maryann Xue
>Priority: Critical
> Attachments: HBASE-6299.patch
>
>
> 1. HMaster tries to assign a region to an RS.
> 2. HMaster creates a RegionState for this region and puts it into 
> regionsInTransition.
> 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS 
> receives the open region request and starts to proceed, with success 
> eventually. However, due to network problems, HMaster fails to receive the 
> response for the openRegion() call, and the call times out.
> 4. HMaster attemps to assign for a second time, choosing another RS. 
> 5. But since the HMaster's OpenedRegionHandler has been triggered by the 
> region open of the previous RS, and the RegionState has already been removed 
> from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK 
> node "RS_ZK_REGION_OPENING" updated by the second attempt.
> 6. The unassigned ZK node stays and a later unassign fails coz 
> RS_ZK_REGION_CLOSING cannot be created.
> {code}
> 2012-06-29 07:03:38,870 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
> region 
> CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.;
>  
> plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.,
>  src=swbss-hadoop-004,60020,1340890123243, 
> dest=swbss-hadoop-006,60020,1340890678078
> 2012-06-29 07:03:38,870 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
> CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
>  to swbss-hadoop-006,60020,1340890678078
> 2012-06-29 07:03:38,870 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, 
> region=b713fd655fa02395496c5a6e39ddf568
> 2012-06-29 07:06:28,882 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
> region=b713fd655fa02395496c5a6e39ddf568
> 2012-06-29 07:06:32,291 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, 
> region=b713fd655fa02395496c5a6e39ddf568
> 2012-06-29 07:06:32,299 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, 
> region=b713fd655fa02395496c5a6e39ddf568
> 2012-06-29 07:06:32,299 DEBUG 
> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
> event for 
> CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
>  from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, 
> regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node
> 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:6-0x2377fee2ae80007 Deleting existing unassigned node for 
> b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED
> 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:6-0x2377fee2ae80007 Successfully deleted unassigned node for 
> region b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED
> 2012-06-29 07:06:32,301 DEBUG 
> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has 
> opened the region 
> CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
>  that was online on serverName=swbss-hadoop-006,60020,1340890678078, 
> load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301)
> 2012-06-29 07:07:41,140 WARN 
> org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of 
> CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
>  to serverName=swbss-hadoop-006,6

[jira] [Commented] (HBASE-6283) [region_mover.rb] Add option to exclude list of hosts on unload instead of just assuming the source node.

2012-07-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406301#comment-13406301
 ] 

stack commented on HBASE-6283:
--

bq. Thanks for the pointer to Aravind's work – this is the first I've seen the 
blog. Have we encouraged Aravind to contribute his work?

He has contrib'd the non-SU stuff: i.e. the bit where can register in zk what 
regionservers are being rolled.

> [region_mover.rb] Add option to exclude list of hosts on unload instead of 
> just assuming the source node.
> -
>
> Key: HBASE-6283
> URL: https://issues.apache.org/jira/browse/HBASE-6283
> Project: HBase
>  Issue Type: Improvement
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
>  Labels: jruby
> Attachments: hbase-6283.patch
>
>
> Currently, the region_mover.rb script excludes a single host, the host 
> offloading data, as a region move target.  This essentially limits the number 
> of machines that can be shutdown at a time to one.  For larger clusters, it 
> is manageable to have several nodes down at a time and desirable to get this 
> process done more quickly.
> The proposed patch adds an exclude file option, that allows multiple hosts to 
> be excluded as targets.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6134) Improvement for split-worker to speed up distributed log splitting

2012-07-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406299#comment-13406299
 ] 

stack commented on HBASE-6134:
--

Ok.  Then it was a pigment of my emancipation that you had.  You fellas fix so 
much, seemed possible.

> Improvement for split-worker to speed up distributed log splitting
> --
>
> Key: HBASE-6134
> URL: https://issues.apache.org/jira/browse/HBASE-6134
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: chunhui shen
>Assignee: chunhui shen
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: 6134v4.patch, HBASE-6134.patch, HBASE-6134v2.patch, 
> HBASE-6134v3-92.patch, HBASE-6134v3.patch, HBASE-6134v4-94.patch, 
> HBASE-6134v4.patch
>
>
> First,we do the test between local-master-splitting and 
> distributed-log-splitting
> Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
> splitting work), 400 regions in one hlog file
> local-master-split:60s+
> distributed-log-splitting:165s+
> In fact, in our production environment, distributed-log-splitting also took 
> 60s with 30 regionservers for 34 hlog files (regionserver may be in high load)
> We found split-worker split one log file took about 20s
> (30ms~50ms per writer.close(); 10ms per create writers )
> I think we could do the improvement for this:
> Parallelizing the create and close writers in threads
> In the patch, change the logic for  distributed-log-splitting same as the 
> local-master-splitting and parallelizing the close in threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6309) [MTTR] Do NN operations outside of the ZK EventThread in SplitLogManager

2012-07-03 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406285#comment-13406285
 ] 

chunhui shen commented on HBASE-6309:
-

In current distributed-log-splitting, we will split the logs to a tmp dir.
How about we directly splitting logs to region dir, so no necessary to do NN 
operations in finishing task for SplitLogManager.

> [MTTR] Do NN operations outside of the ZK EventThread in SplitLogManager
> 
>
> Key: HBASE-6309
> URL: https://issues.apache.org/jira/browse/HBASE-6309
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.92.1, 0.94.0, 0.96.0
>Reporter: Jean-Daniel Cryans
>Priority: Critical
> Fix For: 0.96.0
>
>
> We found this issue during the leap second cataclysm which prompted a 
> distributed splitting of all our logs.
> I saw that none of the RS were splitting after some time while the master was 
> showing that it wasn't even 30% done. jstack'ing I saw this:
> {noformat}
> "main-EventThread" daemon prio=10 tid=0x7f6ce46d8800 nid=0x5376 in
> Object.wait() [0x7f6ce2ecb000]
>java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:485)
> at org.apache.hadoop.ipc.Client.call(Client.java:1093)
> - locked <0x0005fdd661a0> (a org.apache.hadoop.ipc.Client$Call)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
> at $Proxy9.rename(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> at $Proxy9.rename(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:759)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:253)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.moveRecoveredEditsFromTemp(HLogSplitter.java:553)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.moveRecoveredEditsFromTemp(HLogSplitter.java:519)
> at 
> org.apache.hadoop.hbase.master.SplitLogManager$1.finish(SplitLogManager.java:138)
> at 
> org.apache.hadoop.hbase.master.SplitLogManager.getDataSetWatchSuccess(SplitLogManager.java:431)
> at 
> org.apache.hadoop.hbase.master.SplitLogManager.access$1200(SplitLogManager.java:95)
> at 
> org.apache.hadoop.hbase.master.SplitLogManager$GetDataAsyncCallback.processResult(SplitLogManager.java:1011)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:571)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
> {noformat}
> We are effectively bottlenecking on doing NN operations and whatever else is 
> happening in GetDataAsyncCallback. It was so bad that on our 100 offline 
> cluster it took a few hours for the master to process all the incoming ZK 
> events while the actual splitting took a fraction of that time.
> I'm marking this as critical and against 0.96 but depending on how involved 
> the fix is we might want to backport.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6311) Data error after majorCompaction caused by keeping MVCC for opened scanners

2012-07-03 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406282#comment-13406282
 ] 

chunhui shen commented on HBASE-6311:
-

bq.Am just saying what happens if the put/delete gets removed and we end up in 
an empty file.
We shouldn't end up in an empty file, because the put type KV should be able to 
read by earlier opened scanner as per MVCC

> Data error after majorCompaction caused by keeping MVCC for opened scanners
> ---
>
> Key: HBASE-6311
> URL: https://issues.apache.org/jira/browse/HBASE-6311
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.0
>Reporter: chunhui shen
>Assignee: chunhui shen
>Priority: Blocker
> Attachments: HBASE-6311-test.patch, HBASE-6311v1.patch
>
>
> It is a big problem we found in 0.94, and you could reproduce the problem in 
> Trunk using the test case I uploaded.
> When we do compaction, we will use region.getSmallestReadPoint() to keep MVCC 
> for opened scanners;
> However,It will make data mistake after majorCompaction because we will skip 
> delete type KV but keep the put type kv in the compacted storefile.
> The following is the reason from code:
> In StoreFileScanner, enforceMVCC is false when compaction, so we could read 
> the delete type KV,
> However, we will skip this delete type KV in ScanQueryMatcher because 
> following code
> {code}
> if (kv.isDelete())
> {
> ...
>  if (includeDeleteMarker
> && kv.getMemstoreTS() <= maxReadPointToTrackVersions) {
>   System.out.println("add deletes,maxReadPointToTrackVersions="
>   + maxReadPointToTrackVersions);
>   this.deletes.add(bytes, offset, qualLength, timestamp, type);
> }
> ...
> }
> {code}
> Here maxReadPointToTrackVersions = region.getSmallestReadPoint();
> and kv.getMemstoreTS() > maxReadPointToTrackVersions 
> So we won't add this to DeleteTracker.
> Why test case passed if remove the line 
> MultiVersionConsistencyControl.setThreadReadPoint(smallestReadPoint);
> Because in the StoreFileScanner#skipKVsNewerThanReadpoint
> {code}
> if (cur.getMemstoreTS() <= readPoint) {
>   cur.setMemstoreTS(0);
> }
> {code}
> So if we remove the line 
> MultiVersionConsistencyControl.setThreadReadPoint(smallestReadPoint);
> Here readPoint is LONG.MAX_VALUE, we will set memStore ts as 0, so we will 
> add it to DeleteTracker in ScanQueryMatcher 
> Solution:
> We use smallestReadPoint of region when compaction to keep MVCC for OPENED 
> scanner, So we should retain delete type kv in output in the case(Already 
> deleted KV is retained in output to make old opened scanner could read this 
> KV) even if it is a majorcompaction.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6311) Data error after majorCompaction caused by keeping MVCC for opened scanners

2012-07-03 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406281#comment-13406281
 ] 

chunhui shen commented on HBASE-6311:
-

@ram
In order to keep MVCC for the firstscanner, we retained already deleted KV 
output to compacted file in current logic.

What my patch do is retaining delete type KV output to compacted file also if 
the above happen:
{code}
if(kv.getMemstoreTS() > maxReadPointToTrackVersions)
return MatchCode.INCLUDE;{code}
Include the delete type KV as the above code

> Data error after majorCompaction caused by keeping MVCC for opened scanners
> ---
>
> Key: HBASE-6311
> URL: https://issues.apache.org/jira/browse/HBASE-6311
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.0
>Reporter: chunhui shen
>Assignee: chunhui shen
>Priority: Blocker
> Attachments: HBASE-6311-test.patch, HBASE-6311v1.patch
>
>
> It is a big problem we found in 0.94, and you could reproduce the problem in 
> Trunk using the test case I uploaded.
> When we do compaction, we will use region.getSmallestReadPoint() to keep MVCC 
> for opened scanners;
> However,It will make data mistake after majorCompaction because we will skip 
> delete type KV but keep the put type kv in the compacted storefile.
> The following is the reason from code:
> In StoreFileScanner, enforceMVCC is false when compaction, so we could read 
> the delete type KV,
> However, we will skip this delete type KV in ScanQueryMatcher because 
> following code
> {code}
> if (kv.isDelete())
> {
> ...
>  if (includeDeleteMarker
> && kv.getMemstoreTS() <= maxReadPointToTrackVersions) {
>   System.out.println("add deletes,maxReadPointToTrackVersions="
>   + maxReadPointToTrackVersions);
>   this.deletes.add(bytes, offset, qualLength, timestamp, type);
> }
> ...
> }
> {code}
> Here maxReadPointToTrackVersions = region.getSmallestReadPoint();
> and kv.getMemstoreTS() > maxReadPointToTrackVersions 
> So we won't add this to DeleteTracker.
> Why test case passed if remove the line 
> MultiVersionConsistencyControl.setThreadReadPoint(smallestReadPoint);
> Because in the StoreFileScanner#skipKVsNewerThanReadpoint
> {code}
> if (cur.getMemstoreTS() <= readPoint) {
>   cur.setMemstoreTS(0);
> }
> {code}
> So if we remove the line 
> MultiVersionConsistencyControl.setThreadReadPoint(smallestReadPoint);
> Here readPoint is LONG.MAX_VALUE, we will set memStore ts as 0, so we will 
> add it to DeleteTracker in ScanQueryMatcher 
> Solution:
> We use smallestReadPoint of region when compaction to keep MVCC for OPENED 
> scanner, So we should retain delete type kv in output in the case(Already 
> deleted KV is retained in output to make old opened scanner could read this 
> KV) even if it is a majorcompaction.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6228) Fixup daughters twice cause daughter region assigned twice

2012-07-03 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406273#comment-13406273
 ] 

ramkrishna.s.vasudevan commented on HBASE-6228:
---

@Jon
Thanks for sharing the list.  
bq.(possibly a ZK "regionlock")
Yes Jon. I agree.  Still we keep getting issues in AM, SSH, Master restart etc. 

> Fixup daughters twice  cause daughter region assigned twice
> ---
>
> Key: HBASE-6228
> URL: https://issues.apache.org/jira/browse/HBASE-6228
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: chunhui shen
>Assignee: chunhui shen
> Fix For: 0.96.0
>
> Attachments: HBASE-6228.patch, HBASE-6228v2.patch, 
> HBASE-6228v2.patch, HBASE-6228v3.patch, HBASE-6228v4.patch
>
>
> First, how fixup daughters twice happen?
> 1.we will fixupDaughters at the last of HMaster#finishInitialization
> 2.ServerShutdownHandler will fixupDaughters when reassigning region through 
> ServerShutdownHandler#processDeadRegion
> When fixupDaughters, we will added daughters to .META., but it coudn't 
> prevent the above case, because FindDaughterVisitor.
> The detail is as the following:
> Suppose region A is a splitted parent region, and its daughter region B is 
> missing
> 1.First, ServerShutdownHander thread fixup daughter, so add daughter region B 
> to .META. with serverName=null, and assign the daughter.
> 2.Then, Master's initialization thread will also find the daughter region B 
> is missing and assign it. It is because FindDaughterVisitor consider daughter 
> is missing if its serverName=null

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6313) Client hangs because the client is not notified

2012-07-03 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406268#comment-13406268
 ] 

Zhihong Ted Yu commented on HBASE-6313:
---

Hadoop QA has been dormant.

Please post test suite result.

Thanks

> Client hangs because the client is not notified 
> 
>
> Key: HBASE-6313
> URL: https://issues.apache.org/jira/browse/HBASE-6313
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.1, 0.94.0
>Reporter: binlijin
> Fix For: 0.94.1
>
> Attachments: HBASE-6313-0.92-2.patch, HBASE-6313-0.92.patch, 
> HBASE-6313-0.94.patch, HBASE-6313-trunk.patch, clienthangthread.out
>
>
> If the call first remove from the calls, when some exception happened in 
> reading from the DataInputStream, the call will not be notified, cause the 
> client hangs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6311) Data error after majorCompaction caused by keeping MVCC for opened scanners

2012-07-03 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406267#comment-13406267
 ] 

ramkrishna.s.vasudevan commented on HBASE-6311:
---

@Chunhui
I did not mean your patch does it.  Am just saying what happens if the 
put/delete gets removed and we end up in an empty file.  I was just trying to 
do some changes to your patch and the testcase..  

> Data error after majorCompaction caused by keeping MVCC for opened scanners
> ---
>
> Key: HBASE-6311
> URL: https://issues.apache.org/jira/browse/HBASE-6311
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.0
>Reporter: chunhui shen
>Assignee: chunhui shen
>Priority: Blocker
> Attachments: HBASE-6311-test.patch, HBASE-6311v1.patch
>
>
> It is a big problem we found in 0.94, and you could reproduce the problem in 
> Trunk using the test case I uploaded.
> When we do compaction, we will use region.getSmallestReadPoint() to keep MVCC 
> for opened scanners;
> However,It will make data mistake after majorCompaction because we will skip 
> delete type KV but keep the put type kv in the compacted storefile.
> The following is the reason from code:
> In StoreFileScanner, enforceMVCC is false when compaction, so we could read 
> the delete type KV,
> However, we will skip this delete type KV in ScanQueryMatcher because 
> following code
> {code}
> if (kv.isDelete())
> {
> ...
>  if (includeDeleteMarker
> && kv.getMemstoreTS() <= maxReadPointToTrackVersions) {
>   System.out.println("add deletes,maxReadPointToTrackVersions="
>   + maxReadPointToTrackVersions);
>   this.deletes.add(bytes, offset, qualLength, timestamp, type);
> }
> ...
> }
> {code}
> Here maxReadPointToTrackVersions = region.getSmallestReadPoint();
> and kv.getMemstoreTS() > maxReadPointToTrackVersions 
> So we won't add this to DeleteTracker.
> Why test case passed if remove the line 
> MultiVersionConsistencyControl.setThreadReadPoint(smallestReadPoint);
> Because in the StoreFileScanner#skipKVsNewerThanReadpoint
> {code}
> if (cur.getMemstoreTS() <= readPoint) {
>   cur.setMemstoreTS(0);
> }
> {code}
> So if we remove the line 
> MultiVersionConsistencyControl.setThreadReadPoint(smallestReadPoint);
> Here readPoint is LONG.MAX_VALUE, we will set memStore ts as 0, so we will 
> add it to DeleteTracker in ScanQueryMatcher 
> Solution:
> We use smallestReadPoint of region when compaction to keep MVCC for OPENED 
> scanner, So we should retain delete type kv in output in the case(Already 
> deleted KV is retained in output to make old opened scanner could read this 
> KV) even if it is a majorcompaction.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6134) Improvement for split-worker to speed up distributed log splitting

2012-07-03 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406264#comment-13406264
 ] 

ramkrishna.s.vasudevan commented on HBASE-6134:
---

@Stack
We have not fixed HBASE-6140.  Recently JD raised an issue HBASE-6309.
@Chunhui
You have any patch for HBASE-6140?

> Improvement for split-worker to speed up distributed log splitting
> --
>
> Key: HBASE-6134
> URL: https://issues.apache.org/jira/browse/HBASE-6134
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: chunhui shen
>Assignee: chunhui shen
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: 6134v4.patch, HBASE-6134.patch, HBASE-6134v2.patch, 
> HBASE-6134v3-92.patch, HBASE-6134v3.patch, HBASE-6134v4-94.patch, 
> HBASE-6134v4.patch
>
>
> First,we do the test between local-master-splitting and 
> distributed-log-splitting
> Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
> splitting work), 400 regions in one hlog file
> local-master-split:60s+
> distributed-log-splitting:165s+
> In fact, in our production environment, distributed-log-splitting also took 
> 60s with 30 regionservers for 34 hlog files (regionserver may be in high load)
> We found split-worker split one log file took about 20s
> (30ms~50ms per writer.close(); 10ms per create writers )
> I think we could do the improvement for this:
> Parallelizing the create and close writers in threads
> In the patch, change the logic for  distributed-log-splitting same as the 
> local-master-splitting and parallelizing the close in threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6027) Update the reference guide to reflect the changes in the security profile

2012-07-03 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406263#comment-13406263
 ] 

Zhihong Ted Yu commented on HBASE-6027:
---

Integrated to trunk.

Thanks for the patch, Devaraj.

> Update the reference guide to reflect the changes in the security profile
> -
>
> Key: HBASE-6027
> URL: https://issues.apache.org/jira/browse/HBASE-6027
> Project: HBase
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.96.0
>Reporter: Devaraj Das
>Assignee: Devaraj Das
> Fix For: 0.96.0
>
> Attachments: 6027-1.patch
>
>
> The refguide needs to be updated to reflect the fact that there is no 
> security profile anymore, etc. [Follow up to HBASE-5732]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6284) Introduce HRegion#doMiniBatchMutation()

2012-07-03 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406256#comment-13406256
 ] 

Zhihong Ted Yu commented on HBASE-6284:
---

Addendum integrated to trunk.

Thanks Anoop.

> Introduce HRegion#doMiniBatchMutation()
> ---
>
> Key: HBASE-6284
> URL: https://issues.apache.org/jira/browse/HBASE-6284
> Project: HBase
>  Issue Type: Bug
>  Components: performance, regionserver
>Reporter: Zhihong Ted Yu
>Assignee: Anoop Sam John
> Fix For: 0.96.0, 0.94.2
>
> Attachments: 6284_Trunk-Addendum.patch, 6284_Trunk-V3.patch, 
> HBASE-6284_94.patch, HBASE-6284_Trunk-V2.patch, HBASE-6284_Trunk-V3.patch, 
> HBASE-6284_Trunk.patch
>
>
> From Anoop under thread 'Can there be a doMiniBatchDelete in HRegion':
> The HTable#delete(List) groups the Deletes for the same RS and make 
> one n/w call only. But within the RS, there will be N number of delete calls 
> on the region one by one. This will include N number of HLog write and sync. 
> If this also can be grouped can we get better performance for the multi row 
> delete.
> I have made the new miniBatchDelete () and made the 
> HTable#delete(List) to call this new batch delete.
> Just tested initially with the one node cluster.  In that itself I am getting 
> a performance boost which is very much promising.
> Only one CF and qualifier.
> 10K total rows delete with a batch of 100 deletes. Only deletes happening on 
> the table from one thread.
> With the new way the net time taken is reduced by more than 1/10
> Will test in a 4 node cluster also. I think it will worth doing this change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5705) Introduce Protocol Buffer RPC engine

2012-07-03 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406255#comment-13406255
 ] 

Zhihong Ted Yu commented on HBASE-5705:
---

HBASE-6039 has removed HMasterInterface.
Please adjust patch accordingly.

> Introduce Protocol Buffer RPC engine
> 
>
> Key: HBASE-5705
> URL: https://issues.apache.org/jira/browse/HBASE-5705
> Project: HBase
>  Issue Type: Sub-task
>  Components: ipc, master, migration, regionserver
>Reporter: Devaraj Das
>Assignee: Devaraj Das
> Attachments: 5705-1.patch
>
>
> Introduce Protocol Buffer RPC engine in the RPC core. Protocols that are PB 
> aware can be made to go through this RPC engine. The approach, in my current 
> thinking, would be similar to HADOOP-7773.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6284) Introduce HRegion#doMiniBatchMutation()

2012-07-03 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406252#comment-13406252
 ] 

Anoop Sam John commented on HBASE-6284:
---

Attached patch fo 94 version. Pls review.

@Ted - There is a minor correction needed in the Javadoc for trunk patch. 
Attached addendum for that. Can you take a look and integrate the same pls.

> Introduce HRegion#doMiniBatchMutation()
> ---
>
> Key: HBASE-6284
> URL: https://issues.apache.org/jira/browse/HBASE-6284
> Project: HBase
>  Issue Type: Bug
>  Components: performance, regionserver
>Reporter: Zhihong Ted Yu
>Assignee: Anoop Sam John
> Fix For: 0.96.0, 0.94.2
>
> Attachments: 6284_Trunk-Addendum.patch, 6284_Trunk-V3.patch, 
> HBASE-6284_94.patch, HBASE-6284_Trunk-V2.patch, HBASE-6284_Trunk-V3.patch, 
> HBASE-6284_Trunk.patch
>
>
> From Anoop under thread 'Can there be a doMiniBatchDelete in HRegion':
> The HTable#delete(List) groups the Deletes for the same RS and make 
> one n/w call only. But within the RS, there will be N number of delete calls 
> on the region one by one. This will include N number of HLog write and sync. 
> If this also can be grouped can we get better performance for the multi row 
> delete.
> I have made the new miniBatchDelete () and made the 
> HTable#delete(List) to call this new batch delete.
> Just tested initially with the one node cluster.  In that itself I am getting 
> a performance boost which is very much promising.
> Only one CF and qualifier.
> 10K total rows delete with a batch of 100 deletes. Only deletes happening on 
> the table from one thread.
> With the new way the net time taken is reduced by more than 1/10
> Will test in a 4 node cluster also. I think it will worth doing this change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6284) Introduce HRegion#doMiniBatchMutation()

2012-07-03 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-6284:
--

Attachment: 6284_Trunk-Addendum.patch

> Introduce HRegion#doMiniBatchMutation()
> ---
>
> Key: HBASE-6284
> URL: https://issues.apache.org/jira/browse/HBASE-6284
> Project: HBase
>  Issue Type: Bug
>  Components: performance, regionserver
>Reporter: Zhihong Ted Yu
>Assignee: Anoop Sam John
> Fix For: 0.96.0, 0.94.2
>
> Attachments: 6284_Trunk-Addendum.patch, 6284_Trunk-V3.patch, 
> HBASE-6284_94.patch, HBASE-6284_Trunk-V2.patch, HBASE-6284_Trunk-V3.patch, 
> HBASE-6284_Trunk.patch
>
>
> From Anoop under thread 'Can there be a doMiniBatchDelete in HRegion':
> The HTable#delete(List) groups the Deletes for the same RS and make 
> one n/w call only. But within the RS, there will be N number of delete calls 
> on the region one by one. This will include N number of HLog write and sync. 
> If this also can be grouped can we get better performance for the multi row 
> delete.
> I have made the new miniBatchDelete () and made the 
> HTable#delete(List) to call this new batch delete.
> Just tested initially with the one node cluster.  In that itself I am getting 
> a performance boost which is very much promising.
> Only one CF and qualifier.
> 10K total rows delete with a batch of 100 deletes. Only deletes happening on 
> the table from one thread.
> With the new way the net time taken is reduced by more than 1/10
> Will test in a 4 node cluster also. I think it will worth doing this change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6284) Introduce HRegion#doMiniBatchMutation()

2012-07-03 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-6284:
--

Attachment: HBASE-6284_94.patch

> Introduce HRegion#doMiniBatchMutation()
> ---
>
> Key: HBASE-6284
> URL: https://issues.apache.org/jira/browse/HBASE-6284
> Project: HBase
>  Issue Type: Bug
>  Components: performance, regionserver
>Reporter: Zhihong Ted Yu
>Assignee: Anoop Sam John
> Fix For: 0.96.0, 0.94.2
>
> Attachments: 6284_Trunk-V3.patch, HBASE-6284_94.patch, 
> HBASE-6284_Trunk-V2.patch, HBASE-6284_Trunk-V3.patch, HBASE-6284_Trunk.patch
>
>
> From Anoop under thread 'Can there be a doMiniBatchDelete in HRegion':
> The HTable#delete(List) groups the Deletes for the same RS and make 
> one n/w call only. But within the RS, there will be N number of delete calls 
> on the region one by one. This will include N number of HLog write and sync. 
> If this also can be grouped can we get better performance for the multi row 
> delete.
> I have made the new miniBatchDelete () and made the 
> HTable#delete(List) to call this new batch delete.
> Just tested initially with the one node cluster.  In that itself I am getting 
> a performance boost which is very much promising.
> Only one CF and qualifier.
> 10K total rows delete with a batch of 100 deletes. Only deletes happening on 
> the table from one thread.
> With the new way the net time taken is reduced by more than 1/10
> Will test in a 4 node cluster also. I think it will worth doing this change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5450) Support for wire-compatibility in inter-cluster replication (ZK, etc)

2012-07-03 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406245#comment-13406245
 ] 

Chris Trezzo commented on HBASE-5450:
-

Sounds good to me. I will work on a patch.

> Support for wire-compatibility in inter-cluster replication (ZK, etc)
> -
>
> Key: HBASE-5450
> URL: https://issues.apache.org/jira/browse/HBASE-5450
> Project: HBase
>  Issue Type: Sub-task
>  Components: ipc, master, migration, regionserver
>Reporter: Todd Lipcon
>Assignee: Chris Trezzo
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6313) Client hangs because the client is not notified

2012-07-03 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406229#comment-13406229
 ] 

binlijin commented on HBASE-6313:
-

if status == Status.FATAL Connection.markClosed(IOException e) will be called, 
shouldCloseConnection will set to be true, the connection will be close.
Connection.close() -> Connection.cleanupCalls()-> Call.setException(IOException 
error), so all call in calls will be notified. 

> Client hangs because the client is not notified 
> 
>
> Key: HBASE-6313
> URL: https://issues.apache.org/jira/browse/HBASE-6313
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.1, 0.94.0
>Reporter: binlijin
> Fix For: 0.94.1
>
> Attachments: HBASE-6313-0.92-2.patch, HBASE-6313-0.92.patch, 
> HBASE-6313-0.94.patch, HBASE-6313-trunk.patch, clienthangthread.out
>
>
> If the call first remove from the calls, when some exception happened in 
> reading from the DataInputStream, the call will not be notified, cause the 
> client hangs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6305) TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds.

2012-07-03 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406228#comment-13406228
 ] 

Jonathan Hsieh commented on HBASE-6305:
---

Trunk part of this problem was fixed via HBASE-6506

> TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds.
> 
>
> Key: HBASE-6305
> URL: https://issues.apache.org/jira/browse/HBASE-6305
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.92.2, 0.94.1
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Fix For: 0.92.2, 0.94.1
>
> Attachments: hbase-6305-94.patch
>
>
> trunk: mvn clean test -Dhadoop.profile=2.0 -Dtest=TestLocalHBaseCluster
> 0.94: mvn clean test -Dhadoop.profile=23 -Dtest=TestLocalHBaseCluster
> {code}
> testLocalHBaseCluster(org.apache.hadoop.hbase.TestLocalHBaseCluster)  Time 
> elapsed: 0.022 sec  <<< ERROR!
> java.lang.RuntimeException: Master not initialized after 200 seconds
> at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:208)
> at 
> org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:424)
> at 
> org.apache.hadoop.hbase.TestLocalHBaseCluster.testLocalHBaseCluster(TestLocalHBaseCluster.java:66)
> ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6305) TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds.

2012-07-03 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-6305:
--

Fix Version/s: 0.94.1
   0.92.2

> TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds.
> 
>
> Key: HBASE-6305
> URL: https://issues.apache.org/jira/browse/HBASE-6305
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.92.2, 0.94.1
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Fix For: 0.92.2, 0.94.1
>
> Attachments: hbase-6305-94.patch
>
>
> trunk: mvn clean test -Dhadoop.profile=2.0 -Dtest=TestLocalHBaseCluster
> 0.94: mvn clean test -Dhadoop.profile=23 -Dtest=TestLocalHBaseCluster
> {code}
> testLocalHBaseCluster(org.apache.hadoop.hbase.TestLocalHBaseCluster)  Time 
> elapsed: 0.022 sec  <<< ERROR!
> java.lang.RuntimeException: Master not initialized after 200 seconds
> at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:208)
> at 
> org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:424)
> at 
> org.apache.hadoop.hbase.TestLocalHBaseCluster.testLocalHBaseCluster(TestLocalHBaseCluster.java:66)
> ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6311) Data error after majorCompaction caused by keeping MVCC for opened scanners

2012-07-03 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406226#comment-13406226
 ] 

chunhui shen commented on HBASE-6311:
-

bq.But when the compaction is done and the in this case the put and delete is 
removed
In this case, after the compaction the put is retained and delete is dropped 
without patch, so make data mistake. With the patch, the put and delete are 
both retained after compaction,so the firstscanner could also read this row as 
per MVCC.

> Data error after majorCompaction caused by keeping MVCC for opened scanners
> ---
>
> Key: HBASE-6311
> URL: https://issues.apache.org/jira/browse/HBASE-6311
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.0
>Reporter: chunhui shen
>Assignee: chunhui shen
>Priority: Blocker
> Attachments: HBASE-6311-test.patch, HBASE-6311v1.patch
>
>
> It is a big problem we found in 0.94, and you could reproduce the problem in 
> Trunk using the test case I uploaded.
> When we do compaction, we will use region.getSmallestReadPoint() to keep MVCC 
> for opened scanners;
> However,It will make data mistake after majorCompaction because we will skip 
> delete type KV but keep the put type kv in the compacted storefile.
> The following is the reason from code:
> In StoreFileScanner, enforceMVCC is false when compaction, so we could read 
> the delete type KV,
> However, we will skip this delete type KV in ScanQueryMatcher because 
> following code
> {code}
> if (kv.isDelete())
> {
> ...
>  if (includeDeleteMarker
> && kv.getMemstoreTS() <= maxReadPointToTrackVersions) {
>   System.out.println("add deletes,maxReadPointToTrackVersions="
>   + maxReadPointToTrackVersions);
>   this.deletes.add(bytes, offset, qualLength, timestamp, type);
> }
> ...
> }
> {code}
> Here maxReadPointToTrackVersions = region.getSmallestReadPoint();
> and kv.getMemstoreTS() > maxReadPointToTrackVersions 
> So we won't add this to DeleteTracker.
> Why test case passed if remove the line 
> MultiVersionConsistencyControl.setThreadReadPoint(smallestReadPoint);
> Because in the StoreFileScanner#skipKVsNewerThanReadpoint
> {code}
> if (cur.getMemstoreTS() <= readPoint) {
>   cur.setMemstoreTS(0);
> }
> {code}
> So if we remove the line 
> MultiVersionConsistencyControl.setThreadReadPoint(smallestReadPoint);
> Here readPoint is LONG.MAX_VALUE, we will set memStore ts as 0, so we will 
> add it to DeleteTracker in ScanQueryMatcher 
> Solution:
> We use smallestReadPoint of region when compaction to keep MVCC for OPENED 
> scanner, So we should retain delete type kv in output in the case(Already 
> deleted KV is retained in output to make old opened scanner could read this 
> KV) even if it is a majorcompaction.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6319) ReplicationSource can call terminate on itself and deadlock

2012-07-03 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-6319:
--

Attachment: HBASE-6319-0.92.patch

Patch that adds a check if the current thread is the one we're trying to stop 
and, if so, don't wait.

> ReplicationSource can call terminate on itself and deadlock
> ---
>
> Key: HBASE-6319
> URL: https://issues.apache.org/jira/browse/HBASE-6319
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.6, 0.92.1, 0.94.0
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.90.7, 0.92.2, 0.94.2
>
> Attachments: HBASE-6319-0.92.patch
>
>
> In a few places in the ReplicationSource code calls terminate on itself which 
> is a problem since in terminate() we wait on that thread to die.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-6325) [replication] Race in ReplicationSourceManager.init can initiate a failover even if the node is alive

2012-07-03 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans reassigned HBASE-6325:
-

Assignee: Jean-Daniel Cryans

> [replication] Race in ReplicationSourceManager.init can initiate a failover 
> even if the node is alive
> -
>
> Key: HBASE-6325
> URL: https://issues.apache.org/jira/browse/HBASE-6325
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.6, 0.92.1, 0.94.0
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.2
>
> Attachments: HBASE-6325-0.92.patch
>
>
> Yet another bug found during the leap second madness, it's possible to miss 
> the registration of new region servers so that in 
> ReplicationSourceManager.init we start the failover of a live and replicating 
> region server. I don't think there's data loss but the RS that's being failed 
> over will die on:
> {noformat}
> 2012-07-01 06:25:15,604 FATAL 
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
> sv4r23s48,10304,1341112194623: Writing replication status
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
> NoNode for 
> /hbase/replication/rs/sv4r23s48,10304,1341112194623/4/sv4r23s48%2C10304%2C1341112194623.1341112195369
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
> at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
> at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:655)
> at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:697)
> at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.writeReplicationStatus(ReplicationZookeeper.java:470)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:154)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:607)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:368)
> {noformat}
> It seems to me that just refreshing {{otherRegionServers}} after getting the 
> list of {{currentReplicators}} would be enough to fix this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6325) [replication] Race in ReplicationSourceManager.init can initiate a failover even if the node is alive

2012-07-03 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-6325:
--

Attachment: HBASE-6325-0.92.patch

Moving the refreshing of the list and related refactorings.

> [replication] Race in ReplicationSourceManager.init can initiate a failover 
> even if the node is alive
> -
>
> Key: HBASE-6325
> URL: https://issues.apache.org/jira/browse/HBASE-6325
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.6, 0.92.1, 0.94.0
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.2
>
> Attachments: HBASE-6325-0.92.patch
>
>
> Yet another bug found during the leap second madness, it's possible to miss 
> the registration of new region servers so that in 
> ReplicationSourceManager.init we start the failover of a live and replicating 
> region server. I don't think there's data loss but the RS that's being failed 
> over will die on:
> {noformat}
> 2012-07-01 06:25:15,604 FATAL 
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
> sv4r23s48,10304,1341112194623: Writing replication status
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
> NoNode for 
> /hbase/replication/rs/sv4r23s48,10304,1341112194623/4/sv4r23s48%2C10304%2C1341112194623.1341112195369
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
> at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
> at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:655)
> at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:697)
> at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.writeReplicationStatus(ReplicationZookeeper.java:470)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:154)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:607)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:368)
> {noformat}
> It seems to me that just refreshing {{otherRegionServers}} after getting the 
> list of {{currentReplicators}} would be enough to fix this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6326) Nested retry loops in HConnectionManager

2012-07-03 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6326:
-

Attachment: 6326.txt

Simple patch avoiding retries in locateRegion is called from 
locateRegionInMeta, which already has a retry loop.

> Nested retry loops in HConnectionManager
> 
>
> Key: HBASE-6326
> URL: https://issues.apache.org/jira/browse/HBASE-6326
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Priority: Critical
> Fix For: 0.94.1
>
> Attachments: 6326.txt
>
>
> While testing client timeouts when the HBase is not available we found that 
> even with aggressive settings, it takes the client 10 minutes or more to 
> finally receive an exception.
> Part of this is due to nested nested retry loops in locateRegion.
> locateRegion will first try to locate the table in meta (which is retried), 
> then it will try to locate the meta table is root (which is also retried).
> So for each retry of the meta lookup we retry the root lookup as well.
> I have have that avoids locateRegion retrying if it is called from code that 
> already has a retry loop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6326) Nested retry loops in HConnectionManager

2012-07-03 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406200#comment-13406200
 ] 

Lars Hofhansl commented on HBASE-6326:
--

Patch is against 0.94. 0.96 is quite different.

> Nested retry loops in HConnectionManager
> 
>
> Key: HBASE-6326
> URL: https://issues.apache.org/jira/browse/HBASE-6326
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Priority: Critical
> Fix For: 0.94.1
>
> Attachments: 6326.txt
>
>
> While testing client timeouts when the HBase is not available we found that 
> even with aggressive settings, it takes the client 10 minutes or more to 
> finally receive an exception.
> Part of this is due to nested nested retry loops in locateRegion.
> locateRegion will first try to locate the table in meta (which is retried), 
> then it will try to locate the meta table is root (which is also retried).
> So for each retry of the meta lookup we retry the root lookup as well.
> I have have that avoids locateRegion retrying if it is called from code that 
> already has a retry loop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6325) [replication] Race in ReplicationSourceManager.init can initiate a failover even if the node is alive

2012-07-03 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-6325:
--

Description: 
Yet another bug found during the leap second madness, it's possible to miss the 
registration of new region servers so that in ReplicationSourceManager.init we 
start the failover of a live and replicating region server. I don't think 
there's data loss but the RS that's being failed over will die on:

{noformat}
2012-07-01 06:25:15,604 FATAL 
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
sv4r23s48,10304,1341112194623: Writing replication status
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for 
/hbase/replication/rs/sv4r23s48,10304,1341112194623/4/sv4r23s48%2C10304%2C1341112194623.1341112195369
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:655)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:697)
at 
org.apache.hadoop.hbase.replication.ReplicationZookeeper.writeReplicationStatus(ReplicationZookeeper.java:470)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:154)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:607)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:368)
{noformat}

It seems to me that just refreshing {{otherRegionServers}} after getting the 
list of {{currentReplicators}} would be enough to fix this.

  was:
Yet another bug found during the leap second madness, it's possible to miss the 
registration of new region servers so that in ReplicationSource.init we start 
the failover of a live and replicating region server. I don't think there's 
data loss but the RS that's being failed over will die on:

{noformat}
2012-07-01 06:25:15,604 FATAL 
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
sv4r23s48,10304,1341112194623: Writing replication status
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for 
/hbase/replication/rs/sv4r23s48,10304,1341112194623/4/sv4r23s48%2C10304%2C1341112194623.1341112195369
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:655)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:697)
at 
org.apache.hadoop.hbase.replication.ReplicationZookeeper.writeReplicationStatus(ReplicationZookeeper.java:470)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:154)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:607)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:368)
{noformat}

It seems to me that just refreshing {{otherRegionServers}} after getting the 
list of {{currentReplicators}} would be enough to fix this.

Summary: [replication] Race in ReplicationSourceManager.init can 
initiate a failover even if the node is alive  (was: [replication] Race in 
ReplicationSource.init can initiate a failover even if the node is alive)

> [replication] Race in ReplicationSourceManager.init can initiate a failover 
> even if the node is alive
> -
>
> Key: HBASE-6325
> URL: https://issues.apache.org/jira/browse/HBASE-6325
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.6, 0.92.1, 0.94.0
>Reporter: Jean-Daniel Cryans
> Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.2
>
>
> Yet another bug found during the leap second madness, it's possible to miss 
> the registration of new region servers so that in 
> ReplicationSourceManager.init we start the failover of a live and replicating 
> region server. I don't think there's data loss but the RS that's being failed 
> over will die on:
> {noformat}
> 2012-07-01 06:25:15,604 FATAL 
> org.apache.hadoop.hbase.regionserver.HRegion

[jira] [Commented] (HBASE-5450) Support for wire-compatibility in inter-cluster replication (ZK, etc)

2012-07-03 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406193#comment-13406193
 ] 

Todd Lipcon commented on HBASE-5450:


I think it's still worth doing. Imagine if, in the future, we want to add a 
flag so that replication isn't just "TRUE or FALSE", but also "LAGGED", with a 
parameter for how long the lag should be? For the HLogs, maybe we need to 
extend it to allow multiple HLogs once we have the multi-WAL support.

> Support for wire-compatibility in inter-cluster replication (ZK, etc)
> -
>
> Key: HBASE-5450
> URL: https://issues.apache.org/jira/browse/HBASE-5450
> Project: HBase
>  Issue Type: Sub-task
>  Components: ipc, master, migration, regionserver
>Reporter: Todd Lipcon
>Assignee: Chris Trezzo
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6283) [region_mover.rb] Add option to exclude list of hosts on unload instead of just assuming the source node.

2012-07-03 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406189#comment-13406189
 ] 

Jonathan Hsieh commented on HBASE-6283:
---

@Stack I'll look into making it throw an exn or at least print a nice warning 
message.

Thanks for the pointer to Aravind's work -- this is the first I've seen the 
blog.  Have we encouraged Aravind to contribute his work?

> [region_mover.rb] Add option to exclude list of hosts on unload instead of 
> just assuming the source node.
> -
>
> Key: HBASE-6283
> URL: https://issues.apache.org/jira/browse/HBASE-6283
> Project: HBase
>  Issue Type: Improvement
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
>  Labels: jruby
> Attachments: hbase-6283.patch
>
>
> Currently, the region_mover.rb script excludes a single host, the host 
> offloading data, as a region move target.  This essentially limits the number 
> of machines that can be shutdown at a time to one.  For larger clusters, it 
> is manageable to have several nodes down at a time and desirable to get this 
> process done more quickly.
> The proposed patch adds an exclude file option, that allows multiple hosts to 
> be excluded as targets.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6326) Nested retry loops in HConnectionManager

2012-07-03 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6326:
-

 Priority: Critical  (was: Major)
Fix Version/s: 0.94.1

For us this is pretty critical.

> Nested retry loops in HConnectionManager
> 
>
> Key: HBASE-6326
> URL: https://issues.apache.org/jira/browse/HBASE-6326
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Priority: Critical
> Fix For: 0.94.1
>
>
> While testing client timeouts when the HBase is not available we found that 
> even with aggressive settings, it takes the client 10 minutes or more to 
> finally receive an exception.
> Part of this is due to nested nested retry loops in locateRegion.
> locateRegion will first try to locate the table in meta (which is retried), 
> then it will try to locate the meta table is root (which is also retried).
> So for each retry of the meta lookup we retry the root lookup as well.
> I have have that avoids locateRegion retrying if it is called from code that 
> already has a retry loop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6326) Nested retry loops in HConnectionManager

2012-07-03 Thread Lars Hofhansl (JIRA)
Lars Hofhansl created HBASE-6326:


 Summary: Nested retry loops in HConnectionManager
 Key: HBASE-6326
 URL: https://issues.apache.org/jira/browse/HBASE-6326
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl


While testing client timeouts when the HBase is not available we found that 
even with aggressive settings, it takes the client 10 minutes or more to 
finally receive an exception.
Part of this is due to nested nested retry loops in locateRegion.

locateRegion will first try to locate the table in meta (which is retried), 
then it will try to locate the meta table is root (which is also retried).
So for each retry of the meta lookup we retry the root lookup as well.

I have have that avoids locateRegion retrying if it is called from code that 
already has a retry loop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6228) Fixup daughters twice cause daughter region assigned twice

2012-07-03 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406182#comment-13406182
 ] 

Jonathan Hsieh commented on HBASE-6228:
---

@Ram, @Chunhui

I think stack may have started looking at some of these things but let me give 
a list that I've been looking at which I think are related: (there are probably 
more).

HBASE-6012, HBASE-6060, HBASE-5914, HBASE-5882, HBASE-6147, HBASe-5916, 
HBASE-5546, HBASE-5816, HBASE_6160, HBASE-5918, HBASE-6016, HBASE-5927.

I've been gathering the list with the intent of seeing if there is a common 
pattern.  At the highest level, I think that we need somethign to protect meta 
from changes coming from multiple uncoordinated sources like RS's and HM.   My 
gut is that the long term solution is something like a region lock (possibly a 
ZK "regionlock") that is used to isolate and protect hbase metadata 
modifications.   Something like that would reduce the space of possible 
problems and hopefully make testing easier along the way.

> Fixup daughters twice  cause daughter region assigned twice
> ---
>
> Key: HBASE-6228
> URL: https://issues.apache.org/jira/browse/HBASE-6228
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: chunhui shen
>Assignee: chunhui shen
> Fix For: 0.96.0
>
> Attachments: HBASE-6228.patch, HBASE-6228v2.patch, 
> HBASE-6228v2.patch, HBASE-6228v3.patch, HBASE-6228v4.patch
>
>
> First, how fixup daughters twice happen?
> 1.we will fixupDaughters at the last of HMaster#finishInitialization
> 2.ServerShutdownHandler will fixupDaughters when reassigning region through 
> ServerShutdownHandler#processDeadRegion
> When fixupDaughters, we will added daughters to .META., but it coudn't 
> prevent the above case, because FindDaughterVisitor.
> The detail is as the following:
> Suppose region A is a splitted parent region, and its daughter region B is 
> missing
> 1.First, ServerShutdownHander thread fixup daughter, so add daughter region B 
> to .META. with serverName=null, and assign the daughter.
> 2.Then, Master's initialization thread will also find the daughter region B 
> is missing and assign it. It is because FindDaughterVisitor consider daughter 
> is missing if its serverName=null

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Comment Edited] (HBASE-6228) Fixup daughters twice cause daughter region assigned twice

2012-07-03 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406182#comment-13406182
 ] 

Jonathan Hsieh edited comment on HBASE-6228 at 7/3/12 11:51 PM:


@Ram, @Chunhui

I think stack may have started looking at some of these things but let me give 
a list that I've been looking at which I think are related: (there are probably 
more).

HBASE-6012, HBASE-6060, HBASE-5914, HBASE-5882, HBASE-6147, HBASE-5916, 
HBASE-5546, HBASE-5816, HBASE-6160, HBASE-5918, HBASE-6016, HBASE-5927.

I've been gathering the list with the intent of seeing if there is a common 
pattern.  At the highest level, I think that we need somethign to protect meta 
from changes coming from multiple uncoordinated sources like RS's and HM.   My 
gut is that the long term solution is something like a region lock (possibly a 
ZK "regionlock") that is used to isolate and protect hbase metadata 
modifications.   Something like that would reduce the space of possible 
problems and hopefully make testing easier along the way.

  was (Author: jmhsieh):
@Ram, @Chunhui

I think stack may have started looking at some of these things but let me give 
a list that I've been looking at which I think are related: (there are probably 
more).

HBASE-6012, HBASE-6060, HBASE-5914, HBASE-5882, HBASE-6147, HBASe-5916, 
HBASE-5546, HBASE-5816, HBASE_6160, HBASE-5918, HBASE-6016, HBASE-5927.

I've been gathering the list with the intent of seeing if there is a common 
pattern.  At the highest level, I think that we need somethign to protect meta 
from changes coming from multiple uncoordinated sources like RS's and HM.   My 
gut is that the long term solution is something like a region lock (possibly a 
ZK "regionlock") that is used to isolate and protect hbase metadata 
modifications.   Something like that would reduce the space of possible 
problems and hopefully make testing easier along the way.
  
> Fixup daughters twice  cause daughter region assigned twice
> ---
>
> Key: HBASE-6228
> URL: https://issues.apache.org/jira/browse/HBASE-6228
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: chunhui shen
>Assignee: chunhui shen
> Fix For: 0.96.0
>
> Attachments: HBASE-6228.patch, HBASE-6228v2.patch, 
> HBASE-6228v2.patch, HBASE-6228v3.patch, HBASE-6228v4.patch
>
>
> First, how fixup daughters twice happen?
> 1.we will fixupDaughters at the last of HMaster#finishInitialization
> 2.ServerShutdownHandler will fixupDaughters when reassigning region through 
> ServerShutdownHandler#processDeadRegion
> When fixupDaughters, we will added daughters to .META., but it coudn't 
> prevent the above case, because FindDaughterVisitor.
> The detail is as the following:
> Suppose region A is a splitted parent region, and its daughter region B is 
> missing
> 1.First, ServerShutdownHander thread fixup daughter, so add daughter region B 
> to .META. with serverName=null, and assign the daughter.
> 2.Then, Master's initialization thread will also find the daughter region B 
> is missing and assign it. It is because FindDaughterVisitor consider daughter 
> is missing if its serverName=null

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6306) TestFSUtils fails against hadoop 2.0

2012-07-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406180#comment-13406180
 ] 

Hudson commented on HBASE-6306:
---

Integrated in HBase-TRUNK #3094 (See 
[https://builds.apache.org/job/HBase-TRUNK/3094/])
HBASE-6306 TestFSUtils fails against hadoop 2.0 (Revision 1356954)

 Result = SUCCESS
jmhsieh : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/FSUtils.java


> TestFSUtils fails against hadoop 2.0
> 
>
> Key: HBASE-6306
> URL: https://issues.apache.org/jira/browse/HBASE-6306
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.96.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Fix For: 0.96.0
>
> Attachments: hbase-6306-trunk.patch
>
>
> trunk: mvn clean test -Dhadoop.profile=2.0 -Dtest=TestFSUtils
> {code}
> java.io.FileNotFoundException: File 
> /home/jon/proj/hbase-trunk/hbase-server/target/test-data/02beb8c8-06c1-47ea-829b-6e7ce0570cf8/hbase.version
>  does not exist
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:315)
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1279)
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1319)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:557)
> at 
> org.apache.hadoop.fs.FilterFileSystem.listStatus(FilterFileSystem.java:213)
> at org.apache.hadoop.hbase.util.FSUtils.getVersion(FSUtils.java:270)
> at 
> org.apache.hadoop.hbase.util.TestFSUtils.testVersion(TestFSUtils.java:58)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> ... 
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6039) Remove HMasterInterface and replace with something similar to RegionServerStatusProtocol

2012-07-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406179#comment-13406179
 ] 

Hudson commented on HBASE-6039:
---

Integrated in HBase-TRUNK #3094 (See 
[https://builds.apache.org/job/HBase-TRUNK/3094/])
HBASE-6039 Remove HMasterInterface and replace with something similar to 
RegionServerStatusProtocol (Revision 1356920)

 Result = SUCCESS
stack : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/MasterAdminProtocol.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/MasterMonitorProtocol.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/MasterProtocol.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/HConnection.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/MasterAdminKeepAliveConnection.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/MasterKeepAliveConnection.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/MasterMonitorKeepAliveConnection.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRpcMetrics.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/Invocation.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/generated/MasterAdminProtos.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/generated/MasterMonitorProtos.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/generated/MasterProtos.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/security/HBasePolicyProvider.java
* /hbase/trunk/hbase-server/src/main/protobuf/Master.proto
* /hbase/trunk/hbase-server/src/main/protobuf/MasterAdmin.proto
* /hbase/trunk/hbase-server/src/main/protobuf/MasterMonitor.proto
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestHMasterRPCException.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestServerCustomProtocol.java


> Remove HMasterInterface and replace with something similar to 
> RegionServerStatusProtocol
> 
>
> Key: HBASE-6039
> URL: https://issues.apache.org/jira/browse/HBASE-6039
> Project: HBase
>  Issue Type: Task
>  Components: ipc, master
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.96.0
>
> Attachments: HBASE-6039-v2.patch, HBASE-6039.patch
>
>
> Me: Once everything in HMasterInterface is converted to use PB, we can either 
> declare a new class for the representation (similar to 
> RegionServerStatusProtocol) or just re-purpose HMasterInterface for that. 
> What is your preference?
> Stack: Lets do what Jimmy did, make a new class and kill the old.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5450) Support for wire-compatibility in inter-cluster replication (ZK, etc)

2012-07-03 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406170#comment-13406170
 ] 

Chris Trezzo commented on HBASE-5450:
-

Currently, the replication RPCs for log shipping have been converted to 
Protobufs. In HBASE-5944, Stack mentioned that the znodes for replication still 
needed to be converted, but I don't see a patch for it.

Some examples of the znode values:
- The value of the replication status znode (TRUE or FALSE)
- The cluster key for a peer cluster, containing the zk quorum, zk port and 
hbase base znode (zk1.host.com,zk2.host.com,zk3.host.com:2181:/hbase)
- The peer state znode (ENABLED or DISABLED)
- The HLog position (a long containing the position in the hlog)


All of the replication values are relatively simple. Do we think it is worth 
converting them to protobufs?

> Support for wire-compatibility in inter-cluster replication (ZK, etc)
> -
>
> Key: HBASE-5450
> URL: https://issues.apache.org/jira/browse/HBASE-5450
> Project: HBase
>  Issue Type: Sub-task
>  Components: ipc, master, migration, regionserver
>Reporter: Todd Lipcon
>Assignee: Chris Trezzo
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6318) SplitLogWorker exited due to ConcurrentModificationException

2012-07-03 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406167#comment-13406167
 ] 

Jimmy Xiang commented on HBASE-6318:


Wrong patch name.

The map is a Collections.synchronzedMap.  So it protects race against 
getOutputCounts(), and all other methods like put() and get() on this map.

The attached patch uses synchronizd on a strong consistent iterator.  There are 
not many threads here.
So performance wise, it should be fine.

However, there is still possible leakage.  For example, some thread created a 
writer, right before putting it to the map, it is blocked
because closeLogWriters() has the lock.  This writer will not be closed.

Probably it is better to fix the t.join() interruption, or the leakage is rare, 
and not a big deal?


> SplitLogWorker exited due to ConcurrentModificationException
> 
>
> Key: HBASE-6318
> URL: https://issues.apache.org/jira/browse/HBASE-6318
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.96.0
>Reporter: Jimmy Xiang
> Attachments: 6138.patch, 6318-suggest.txt, 6318.log
>
>
> In playing with 0.96 code on a live cluster, found this issue:
> 2012-07-03 12:13:32,572 ERROR 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: unexpected error
> java.util.ConcurrentModificationException
> at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100)
> at java.util.TreeMap$ValueIterator.next(TreeMap.java:1145)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$OutputSink.closeLogWriters(HLogSplitter.java:1330)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$OutputSink.finishWritingAndClose(HLogSplitter.java:1221)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:441)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:369)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:113)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:276)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:164)
> at java.lang.Thread.run(Thread.java:662)
> 2012-07-03 12:13:32,575 INFO 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: SplitLogWorker 
> .cloudera.com,57020,1341335300238 exiting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6325) [replication] Race in ReplicationSource.init can initiate a failover even if the node is alive

2012-07-03 Thread Jean-Daniel Cryans (JIRA)
Jean-Daniel Cryans created HBASE-6325:
-

 Summary: [replication] Race in ReplicationSource.init can initiate 
a failover even if the node is alive
 Key: HBASE-6325
 URL: https://issues.apache.org/jira/browse/HBASE-6325
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.92.1, 0.90.6
Reporter: Jean-Daniel Cryans
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.2


Yet another bug found during the leap second madness, it's possible to miss the 
registration of new region servers so that in ReplicationSource.init we start 
the failover of a live and replicating region server. I don't think there's 
data loss but the RS that's being failed over will die on:

{noformat}
2012-07-01 06:25:15,604 FATAL 
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
sv4r23s48,10304,1341112194623: Writing replication status
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for 
/hbase/replication/rs/sv4r23s48,10304,1341112194623/4/sv4r23s48%2C10304%2C1341112194623.1341112195369
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:655)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:697)
at 
org.apache.hadoop.hbase.replication.ReplicationZookeeper.writeReplicationStatus(ReplicationZookeeper.java:470)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:154)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:607)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:368)
{noformat}

It seems to me that just refreshing {{otherRegionServers}} after getting the 
list of {{currentReplicators}} would be enough to fix this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6318) SplitLogWorker exited due to ConcurrentModificationException

2012-07-03 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406163#comment-13406163
 ] 

Zhihong Ted Yu commented on HBASE-6318:
---

The JIRA number is not 6138 w.r.t. naming the patch.

Are you trying to protect race against getOutputCounts() ? That's where I saw 
the other 'synchronized (logWriters)' block.

> SplitLogWorker exited due to ConcurrentModificationException
> 
>
> Key: HBASE-6318
> URL: https://issues.apache.org/jira/browse/HBASE-6318
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.96.0
>Reporter: Jimmy Xiang
> Attachments: 6138.patch, 6318-suggest.txt, 6318.log
>
>
> In playing with 0.96 code on a live cluster, found this issue:
> 2012-07-03 12:13:32,572 ERROR 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: unexpected error
> java.util.ConcurrentModificationException
> at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100)
> at java.util.TreeMap$ValueIterator.next(TreeMap.java:1145)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$OutputSink.closeLogWriters(HLogSplitter.java:1330)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$OutputSink.finishWritingAndClose(HLogSplitter.java:1221)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:441)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:369)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:113)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:276)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:164)
> at java.lang.Thread.run(Thread.java:662)
> 2012-07-03 12:13:32,575 INFO 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: SplitLogWorker 
> .cloudera.com,57020,1341335300238 exiting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6318) SplitLogWorker exited due to ConcurrentModificationException

2012-07-03 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6318:
---

Attachment: 6138.patch

> SplitLogWorker exited due to ConcurrentModificationException
> 
>
> Key: HBASE-6318
> URL: https://issues.apache.org/jira/browse/HBASE-6318
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.96.0
>Reporter: Jimmy Xiang
> Attachments: 6138.patch, 6318-suggest.txt, 6318.log
>
>
> In playing with 0.96 code on a live cluster, found this issue:
> 2012-07-03 12:13:32,572 ERROR 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: unexpected error
> java.util.ConcurrentModificationException
> at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100)
> at java.util.TreeMap$ValueIterator.next(TreeMap.java:1145)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$OutputSink.closeLogWriters(HLogSplitter.java:1330)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$OutputSink.finishWritingAndClose(HLogSplitter.java:1221)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:441)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:369)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:113)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:276)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:164)
> at java.lang.Thread.run(Thread.java:662)
> 2012-07-03 12:13:32,575 INFO 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: SplitLogWorker 
> .cloudera.com,57020,1341335300238 exiting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6318) SplitLogWorker exited due to ConcurrentModificationException

2012-07-03 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406154#comment-13406154
 ] 

Jimmy Xiang commented on HBASE-6318:


I attached the log. ConcurrentSkipListMap is similar to ConcurrentHashMap. The 
.values() is a "weakly consistent" iterator.

As a last step to close all writers, it should be better to use a strong 
consistent iterator to avoid possible leakage.



> SplitLogWorker exited due to ConcurrentModificationException
> 
>
> Key: HBASE-6318
> URL: https://issues.apache.org/jira/browse/HBASE-6318
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.96.0
>Reporter: Jimmy Xiang
> Attachments: 6318-suggest.txt, 6318.log
>
>
> In playing with 0.96 code on a live cluster, found this issue:
> 2012-07-03 12:13:32,572 ERROR 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: unexpected error
> java.util.ConcurrentModificationException
> at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100)
> at java.util.TreeMap$ValueIterator.next(TreeMap.java:1145)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$OutputSink.closeLogWriters(HLogSplitter.java:1330)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$OutputSink.finishWritingAndClose(HLogSplitter.java:1221)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:441)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:369)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:113)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:276)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:164)
> at java.lang.Thread.run(Thread.java:662)
> 2012-07-03 12:13:32,575 INFO 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: SplitLogWorker 
> .cloudera.com,57020,1341335300238 exiting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6318) SplitLogWorker exited due to ConcurrentModificationException

2012-07-03 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6318:
---

Attachment: 6318.log

> SplitLogWorker exited due to ConcurrentModificationException
> 
>
> Key: HBASE-6318
> URL: https://issues.apache.org/jira/browse/HBASE-6318
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.96.0
>Reporter: Jimmy Xiang
> Attachments: 6318-suggest.txt, 6318.log
>
>
> In playing with 0.96 code on a live cluster, found this issue:
> 2012-07-03 12:13:32,572 ERROR 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: unexpected error
> java.util.ConcurrentModificationException
> at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100)
> at java.util.TreeMap$ValueIterator.next(TreeMap.java:1145)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$OutputSink.closeLogWriters(HLogSplitter.java:1330)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$OutputSink.finishWritingAndClose(HLogSplitter.java:1221)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:441)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:369)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:113)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:276)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:164)
> at java.lang.Thread.run(Thread.java:662)
> 2012-07-03 12:13:32,575 INFO 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: SplitLogWorker 
> .cloudera.com,57020,1341335300238 exiting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6305) TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds.

2012-07-03 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406148#comment-13406148
 ] 

Jonathan Hsieh commented on HBASE-6305:
---

Just checked 0.92, problem exists there too, and the same fix seems to fix it.

> TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds.
> 
>
> Key: HBASE-6305
> URL: https://issues.apache.org/jira/browse/HBASE-6305
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.92.2, 0.94.1
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: hbase-6305-94.patch
>
>
> trunk: mvn clean test -Dhadoop.profile=2.0 -Dtest=TestLocalHBaseCluster
> 0.94: mvn clean test -Dhadoop.profile=23 -Dtest=TestLocalHBaseCluster
> {code}
> testLocalHBaseCluster(org.apache.hadoop.hbase.TestLocalHBaseCluster)  Time 
> elapsed: 0.022 sec  <<< ERROR!
> java.lang.RuntimeException: Master not initialized after 200 seconds
> at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:208)
> at 
> org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:424)
> at 
> org.apache.hadoop.hbase.TestLocalHBaseCluster.testLocalHBaseCluster(TestLocalHBaseCluster.java:66)
> ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6027) Update the reference guide to reflect the changes in the security profile

2012-07-03 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406145#comment-13406145
 ] 

Zhihong Ted Yu commented on HBASE-6027:
---

Patch looks good.

> Update the reference guide to reflect the changes in the security profile
> -
>
> Key: HBASE-6027
> URL: https://issues.apache.org/jira/browse/HBASE-6027
> Project: HBase
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.96.0
>Reporter: Devaraj Das
>Assignee: Devaraj Das
> Fix For: 0.96.0
>
> Attachments: 6027-1.patch
>
>
> The refguide needs to be updated to reflect the fact that there is no 
> security profile anymore, etc. [Follow up to HBASE-5732]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6318) SplitLogWorker exited due to ConcurrentModificationException

2012-07-03 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406141#comment-13406141
 ] 

Zhihong Ted Yu commented on HBASE-6318:
---

bq. some writer threads are still there
If the above is allowed, using ConcurrentSkipListMap is fine, right ?

It would be nice if you can attach log snippet showing thread interruption.

> SplitLogWorker exited due to ConcurrentModificationException
> 
>
> Key: HBASE-6318
> URL: https://issues.apache.org/jira/browse/HBASE-6318
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.96.0
>Reporter: Jimmy Xiang
> Attachments: 6318-suggest.txt
>
>
> In playing with 0.96 code on a live cluster, found this issue:
> 2012-07-03 12:13:32,572 ERROR 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: unexpected error
> java.util.ConcurrentModificationException
> at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100)
> at java.util.TreeMap$ValueIterator.next(TreeMap.java:1145)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$OutputSink.closeLogWriters(HLogSplitter.java:1330)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$OutputSink.finishWritingAndClose(HLogSplitter.java:1221)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:441)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:369)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:113)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:276)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:164)
> at java.lang.Thread.run(Thread.java:662)
> 2012-07-03 12:13:32,575 INFO 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: SplitLogWorker 
> .cloudera.com,57020,1341335300238 exiting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6305) TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds.

2012-07-03 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-6305:
--

Affects Version/s: 0.92.2

> TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds.
> 
>
> Key: HBASE-6305
> URL: https://issues.apache.org/jira/browse/HBASE-6305
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.92.2, 0.94.1
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: hbase-6305-94.patch
>
>
> trunk: mvn clean test -Dhadoop.profile=2.0 -Dtest=TestLocalHBaseCluster
> 0.94: mvn clean test -Dhadoop.profile=23 -Dtest=TestLocalHBaseCluster
> {code}
> testLocalHBaseCluster(org.apache.hadoop.hbase.TestLocalHBaseCluster)  Time 
> elapsed: 0.022 sec  <<< ERROR!
> java.lang.RuntimeException: Master not initialized after 200 seconds
> at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:208)
> at 
> org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:424)
> at 
> org.apache.hadoop.hbase.TestLocalHBaseCluster.testLocalHBaseCluster(TestLocalHBaseCluster.java:66)
> ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6318) SplitLogWorker exited due to ConcurrentModificationException

2012-07-03 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406138#comment-13406138
 ] 

Jimmy Xiang commented on HBASE-6318:


@Ted, I prefer to synchronized HLogSplitter.java:1330. Because we don't want 
some other thread to put more logWriter to the map
while we are closing it down.  With ConcurrentHashMap, we won't get that 
exception any more.  But there could be some leakage
since other threads are still adding more logWriter.

I looked into the problem and figured out what's going on.  The worker got a 
task and started to work on it.  However, the task
was taken by someone else.  So the task was preempted, and 
SplitLogWorker.stopTask() was called.  So HLogSplitter was interrupted.
In clean up the writers, some writer threads are still there, so the CME.

Another thought is how should we handle if t.join is interrupted?

{code}

  try {
t.join();
  } catch (InterruptedException ie) {
IOException iie = new InterruptedIOException();
iie.initCause(ie);
throw iie;
  }

{code}

I think this part should be fine if we synchronize the loop of 
HLogSplitter.java:1330.



> SplitLogWorker exited due to ConcurrentModificationException
> 
>
> Key: HBASE-6318
> URL: https://issues.apache.org/jira/browse/HBASE-6318
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.96.0
>Reporter: Jimmy Xiang
> Attachments: 6318-suggest.txt
>
>
> In playing with 0.96 code on a live cluster, found this issue:
> 2012-07-03 12:13:32,572 ERROR 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: unexpected error
> java.util.ConcurrentModificationException
> at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100)
> at java.util.TreeMap$ValueIterator.next(TreeMap.java:1145)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$OutputSink.closeLogWriters(HLogSplitter.java:1330)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$OutputSink.finishWritingAndClose(HLogSplitter.java:1221)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:441)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:369)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:113)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:276)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:164)
> at java.lang.Thread.run(Thread.java:662)
> 2012-07-03 12:13:32,575 INFO 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: SplitLogWorker 
> .cloudera.com,57020,1341335300238 exiting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6305) TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds.

2012-07-03 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-6305:
--

Affects Version/s: (was: 0.96.0)
   Status: Patch Available  (was: Open)

> TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds.
> 
>
> Key: HBASE-6305
> URL: https://issues.apache.org/jira/browse/HBASE-6305
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.94.1
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: hbase-6305-94.patch
>
>
> trunk: mvn clean test -Dhadoop.profile=2.0 -Dtest=TestLocalHBaseCluster
> 0.94: mvn clean test -Dhadoop.profile=23 -Dtest=TestLocalHBaseCluster
> {code}
> testLocalHBaseCluster(org.apache.hadoop.hbase.TestLocalHBaseCluster)  Time 
> elapsed: 0.022 sec  <<< ERROR!
> java.lang.RuntimeException: Master not initialized after 200 seconds
> at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:208)
> at 
> org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:424)
> at 
> org.apache.hadoop.hbase.TestLocalHBaseCluster.testLocalHBaseCluster(TestLocalHBaseCluster.java:66)
> ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism

2012-07-03 Thread Mikhail Bautin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406128#comment-13406128
 ] 

Mikhail Bautin commented on HBASE-5104:
---

Committed.

> Provide a reliable intra-row pagination mechanism
> -
>
> Key: HBASE-5104
> URL: https://issues.apache.org/jira/browse/HBASE-5104
> Project: HBase
>  Issue Type: Bug
>Reporter: Kannan Muthukkaruppan
>Assignee: Madhuwanti Vaidya
> Attachments: D2799.1.patch, D2799.2.patch, D2799.3.patch, 
> D2799.4.patch, D2799.5.patch, D2799.6.patch, 
> jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-04-16_12_39_42.patch,
>  
> jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-06-19_20_12_21.patch,
>  
> jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-07-02_12_43_28.patch,
>  
> jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-07-02_15_15_30.patch,
>  testFilterList.rb
>
>
> Addendum:
> Doing pagination (retrieving at most "limit" number of KVs at a particular 
> "offset") is currently supported via the ColumnPaginationFilter. However, it 
> is not a very clean way of supporting pagination.  Some of the problems with 
> it are:
> * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have 
> same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This 
> is not the case for ColumnPaginationFilter as its internal state gets updated 
> depending on whether or not Filter(A) returns TRUE/FALSE for a particular 
> cell.
> * When this Filter is used in combination with other filters (e.g., doing AND 
> with another filter using FilterList), the behavior of the query depends on 
> the order of filters in the FilterList. This is not ideal.
> * ColumnPaginationFilter is a stateful filter which ends up counting multiple 
> versions of the cell as separate values even if another filter upstream or 
> the ScanQueryMatcher is going to reject the value for other reasons.
> Seems like we need a reliable way to do pagination. The particular use case 
> that prompted this JIRA is pagination within the same rowKey. For example, 
> for a given row key R, get columns with prefix P, starting at offset X (among 
> columns which have prefix P) and limit Y. Some possible fixes might be:
> 1) enhance ColumnPrefixFilter to support another constructor which supports 
> limit/offset.
> 2) Support pagination (limit/offset) at the Scan/Get API level (rather than 
> as a filter) [Like SQL].
> Original Post:
> Thanks Jiakai Liu for reporting this issue and doing the initial 
> investigation. Email from Jiakai below:
> Assuming that we have an index column family with the following entries:
> "tag0:001:thread1"
> ...
> "tag1:001:thread1"
> "tag1:002:thread2"
> ...
> "tag1:010:thread10"
> ...
> "tag2:001:thread1"
> "tag2:005:thread5"
> ...
> To get threads with "tag1" in range [5, 10), I tried the following code:
> ColumnPrefixFilter filter1 = new 
> ColumnPrefixFilter(Bytes.toBytes("tag1"));
> ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit 
> */, 5 /* offset */);
> FilterList filters = new FilterList(Operator.MUST_PASS_ALL);
> filters.addFilter(filter1);
> filters.addFilter(filter2);
> Get get = new Get(USER);
> get.addFamily(COLUMN_FAMILY);
> get.setMaxVersions(1);
> get.setFilter(filters);
> Somehow it didn't work as expected. It returned the entries as if the filter1 
> were not set.
> Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. 
> The FilterList filter does not handle this return code properly (treat it as 
> INCLUDE).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5104) Provide a reliable intra-row pagination mechanism

2012-07-03 Thread Mikhail Bautin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5104:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Provide a reliable intra-row pagination mechanism
> -
>
> Key: HBASE-5104
> URL: https://issues.apache.org/jira/browse/HBASE-5104
> Project: HBase
>  Issue Type: Bug
>Reporter: Kannan Muthukkaruppan
>Assignee: Madhuwanti Vaidya
> Attachments: D2799.1.patch, D2799.2.patch, D2799.3.patch, 
> D2799.4.patch, D2799.5.patch, D2799.6.patch, 
> jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-04-16_12_39_42.patch,
>  
> jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-06-19_20_12_21.patch,
>  
> jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-07-02_12_43_28.patch,
>  
> jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-07-02_15_15_30.patch,
>  testFilterList.rb
>
>
> Addendum:
> Doing pagination (retrieving at most "limit" number of KVs at a particular 
> "offset") is currently supported via the ColumnPaginationFilter. However, it 
> is not a very clean way of supporting pagination.  Some of the problems with 
> it are:
> * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have 
> same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This 
> is not the case for ColumnPaginationFilter as its internal state gets updated 
> depending on whether or not Filter(A) returns TRUE/FALSE for a particular 
> cell.
> * When this Filter is used in combination with other filters (e.g., doing AND 
> with another filter using FilterList), the behavior of the query depends on 
> the order of filters in the FilterList. This is not ideal.
> * ColumnPaginationFilter is a stateful filter which ends up counting multiple 
> versions of the cell as separate values even if another filter upstream or 
> the ScanQueryMatcher is going to reject the value for other reasons.
> Seems like we need a reliable way to do pagination. The particular use case 
> that prompted this JIRA is pagination within the same rowKey. For example, 
> for a given row key R, get columns with prefix P, starting at offset X (among 
> columns which have prefix P) and limit Y. Some possible fixes might be:
> 1) enhance ColumnPrefixFilter to support another constructor which supports 
> limit/offset.
> 2) Support pagination (limit/offset) at the Scan/Get API level (rather than 
> as a filter) [Like SQL].
> Original Post:
> Thanks Jiakai Liu for reporting this issue and doing the initial 
> investigation. Email from Jiakai below:
> Assuming that we have an index column family with the following entries:
> "tag0:001:thread1"
> ...
> "tag1:001:thread1"
> "tag1:002:thread2"
> ...
> "tag1:010:thread10"
> ...
> "tag2:001:thread1"
> "tag2:005:thread5"
> ...
> To get threads with "tag1" in range [5, 10), I tried the following code:
> ColumnPrefixFilter filter1 = new 
> ColumnPrefixFilter(Bytes.toBytes("tag1"));
> ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit 
> */, 5 /* offset */);
> FilterList filters = new FilterList(Operator.MUST_PASS_ALL);
> filters.addFilter(filter1);
> filters.addFilter(filter2);
> Get get = new Get(USER);
> get.addFamily(COLUMN_FAMILY);
> get.setMaxVersions(1);
> get.setFilter(filters);
> Somehow it didn't work as expected. It returned the entries as if the filter1 
> were not set.
> Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. 
> The FilterList filter does not handle this return code properly (treat it as 
> INCLUDE).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6305) TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds.

2012-07-03 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-6305:
--

Attachment: hbase-6305-94.patch

Attached as patch for 0.94.

> TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds.
> 
>
> Key: HBASE-6305
> URL: https://issues.apache.org/jira/browse/HBASE-6305
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.94.1
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: hbase-6305-94.patch
>
>
> trunk: mvn clean test -Dhadoop.profile=2.0 -Dtest=TestLocalHBaseCluster
> 0.94: mvn clean test -Dhadoop.profile=23 -Dtest=TestLocalHBaseCluster
> {code}
> testLocalHBaseCluster(org.apache.hadoop.hbase.TestLocalHBaseCluster)  Time 
> elapsed: 0.022 sec  <<< ERROR!
> java.lang.RuntimeException: Master not initialized after 200 seconds
> at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:208)
> at 
> org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:424)
> at 
> org.apache.hadoop.hbase.TestLocalHBaseCluster.testLocalHBaseCluster(TestLocalHBaseCluster.java:66)
> ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-6320) assembly:single doesn't work after modularization

2012-07-03 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang resolved HBASE-6320.


Resolution: Won't Fix

Good to know the background.  Thanks a lot.  Closed it as Won't Fix.

> assembly:single doesn't work after modularization
> -
>
> Key: HBASE-6320
> URL: https://issues.apache.org/jira/browse/HBASE-6320
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.96.0
>Reporter: Jimmy Xiang
>
> After modularization, the command to build the tarball on wiki:
> http://wiki.apache.org/hadoop/Hbase/HowToRelease
> mvn clean site install assembly:single 
> Doesn't work any more.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-assembly-plugin:2.3:single (default-cli) on 
> project hbase: Failed to create assembly: Artifact: 
> org.apache.hbase:hbase-common:jar:0.95-SNAPSHOT (included by module) does not 
> have an artifact with a file. Please ensure the package phase is run before 
> the assembly is generated. -> [Help 1]
> Matteo told me we have to use 
> mvn -DskipTests package assembly:assembly
> I think we should make assembly:single work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6027) Update the reference guide to reflect the changes in the security profile

2012-07-03 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HBASE-6027:
---

  Component/s: documentation
Affects Version/s: 0.96.0
Fix Version/s: 0.96.0
 Assignee: Devaraj Das

> Update the reference guide to reflect the changes in the security profile
> -
>
> Key: HBASE-6027
> URL: https://issues.apache.org/jira/browse/HBASE-6027
> Project: HBase
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.96.0
>Reporter: Devaraj Das
>Assignee: Devaraj Das
> Fix For: 0.96.0
>
> Attachments: 6027-1.patch
>
>
> The refguide needs to be updated to reflect the fact that there is no 
> security profile anymore, etc. [Follow up to HBASE-5732]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6027) Update the reference guide to reflect the changes in the security profile

2012-07-03 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HBASE-6027:
---

Status: Patch Available  (was: Open)

> Update the reference guide to reflect the changes in the security profile
> -
>
> Key: HBASE-6027
> URL: https://issues.apache.org/jira/browse/HBASE-6027
> Project: HBase
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.96.0
>Reporter: Devaraj Das
>Assignee: Devaraj Das
> Fix For: 0.96.0
>
> Attachments: 6027-1.patch
>
>
> The refguide needs to be updated to reflect the fact that there is no 
> security profile anymore, etc. [Follow up to HBASE-5732]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6027) Update the reference guide to reflect the changes in the security profile

2012-07-03 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HBASE-6027:
---

Attachment: 6027-1.patch

Straightforward patch.

> Update the reference guide to reflect the changes in the security profile
> -
>
> Key: HBASE-6027
> URL: https://issues.apache.org/jira/browse/HBASE-6027
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
> Attachments: 6027-1.patch
>
>
> The refguide needs to be updated to reflect the fact that there is no 
> security profile anymore, etc. [Follow up to HBASE-5732]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6324) Direct API calls from embedded Thrift server to regionserver

2012-07-03 Thread Mikhail Bautin (JIRA)
Mikhail Bautin created HBASE-6324:
-

 Summary: Direct API calls from embedded Thrift server to 
regionserver
 Key: HBASE-6324
 URL: https://issues.apache.org/jira/browse/HBASE-6324
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin


When handling Thrift calls in the regionserver we should not go through RPC to 
talk to the local regionserver.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6272) In-memory region state is inconsistent

2012-07-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406110#comment-13406110
 ] 

stack commented on HBASE-6272:
--

I added up some comments on RB Jimmy.

> In-memory region state is inconsistent
> --
>
> Key: HBASE-6272
> URL: https://issues.apache.org/jira/browse/HBASE-6272
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>
> AssignmentManger stores region state related information in several places: 
> regionsInTransition, regions (region info to server name map), and servers 
> (server name to region info set map).  However the access to these places is 
> not coordinated properly.  It leads to inconsistent in-memory region state 
> information.  Sometimes, some region could even be offline, and not in 
> transition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6323) [replication] most of the source metrics are wrong when there's multiple slaves

2012-07-03 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406107#comment-13406107
 ] 

Elliott Clark commented on HBASE-6323:
--

bq.I'm not sure what's the right way to fix this since we can't have dynamic 
metrics

We can.  They are just annoying to make.  But I think something like what we 
have with regions is probably the right thing. A dynamic set for every source 
and a rollup.

> [replication] most of the source metrics are wrong when there's multiple 
> slaves
> ---
>
> Key: HBASE-6323
> URL: https://issues.apache.org/jira/browse/HBASE-6323
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.1, 0.94.0
>Reporter: Jean-Daniel Cryans
>Assignee: Elliott Clark
> Fix For: 0.96.0, 0.94.2
>
>
> Most of the metrics in replication were written with 1 slave in mind but with 
> multiple slaves the issue really shows. Most of the metrics are set directly:
> {code}
> public void enqueueLog(Path log) {
>   this.queue.put(log);
>   this.metrics.sizeOfLogQueue.set(queue.size());
> }
> {code}
> So {{sizeOfLogQueue}} is always showing the size of the queue that updated 
> the metric last.
> I'm not sure what's the right way to fix this since we can't have dynamic 
> metrics. Merging them would work here but it wouldn't work so well with 
> {{ageOfLastShippedOp}} since the age can be different and it definitely 
> cannot be summed.
> Assigning to Elliott since he seems to dig metrics these days. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6323) [replication] most of the source metrics are wrong when there's multiple slaves

2012-07-03 Thread Jean-Daniel Cryans (JIRA)
Jean-Daniel Cryans created HBASE-6323:
-

 Summary: [replication] most of the source metrics are wrong when 
there's multiple slaves
 Key: HBASE-6323
 URL: https://issues.apache.org/jira/browse/HBASE-6323
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.92.1
Reporter: Jean-Daniel Cryans
Assignee: Elliott Clark
 Fix For: 0.96.0, 0.94.2


Most of the metrics in replication were written with 1 slave in mind but with 
multiple slaves the issue really shows. Most of the metrics are set directly:

{code}
public void enqueueLog(Path log) {
  this.queue.put(log);
  this.metrics.sizeOfLogQueue.set(queue.size());
}
{code}

So {{sizeOfLogQueue}} is always showing the size of the queue that updated the 
metric last.

I'm not sure what's the right way to fix this since we can't have dynamic 
metrics. Merging them would work here but it wouldn't work so well with 
{{ageOfLastShippedOp}} since the age can be different and it definitely cannot 
be summed.

Assigning to Elliott since he seems to dig metrics these days. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6320) assembly:single doesn't work after modularization

2012-07-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406104#comment-13406104
 ] 

stack commented on HBASE-6320:
--

I updated the wiki page to point at refguide.  If you think thats ok, want to 
close this Jimmy?

> assembly:single doesn't work after modularization
> -
>
> Key: HBASE-6320
> URL: https://issues.apache.org/jira/browse/HBASE-6320
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.96.0
>Reporter: Jimmy Xiang
>
> After modularization, the command to build the tarball on wiki:
> http://wiki.apache.org/hadoop/Hbase/HowToRelease
> mvn clean site install assembly:single 
> Doesn't work any more.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-assembly-plugin:2.3:single (default-cli) on 
> project hbase: Failed to create assembly: Artifact: 
> org.apache.hbase:hbase-common:jar:0.95-SNAPSHOT (included by module) does not 
> have an artifact with a file. Please ensure the package phase is run before 
> the assembly is generated. -> [Help 1]
> Matteo told me we have to use 
> mvn -DskipTests package assembly:assembly
> I think we should make assembly:single work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6320) assembly:single doesn't work after modularization

2012-07-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406099#comment-13406099
 ] 

stack commented on HBASE-6320:
--

bq. I think we should make assembly:single work.

You can't have that.  mvn won't let you.

Here is the back and forth between myself and Jesse that ended up w/ our going 
assembly:assembly:

https://issues.apache.org/jira/browse/HBASE-6145?focusedCommentId=13287772&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13287772

If you keep reading, you'll see Jesse pushing for assembly:single and my 
pushing back because on digging, you'd have to go a broke path to get it 
working with maven (would need an hbase-assembly module ... 

Here is where I do a bit of detail: 
https://issues.apache.org/jira/browse/HBASE-6145?focusedCommentId=13288037&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13288037

Here we doc that you have to do assembly:assembly: 
http://hbase.apache.org/book.html#build.tgz

Given the above, I think we should close out this issue Jimmy.

> assembly:single doesn't work after modularization
> -
>
> Key: HBASE-6320
> URL: https://issues.apache.org/jira/browse/HBASE-6320
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.96.0
>Reporter: Jimmy Xiang
>
> After modularization, the command to build the tarball on wiki:
> http://wiki.apache.org/hadoop/Hbase/HowToRelease
> mvn clean site install assembly:single 
> Doesn't work any more.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-assembly-plugin:2.3:single (default-cli) on 
> project hbase: Failed to create assembly: Artifact: 
> org.apache.hbase:hbase-common:jar:0.95-SNAPSHOT (included by module) does not 
> have an artifact with a file. Please ensure the package phase is run before 
> the assembly is generated. -> [Help 1]
> Matteo told me we have to use 
> mvn -DskipTests package assembly:assembly
> I think we should make assembly:single work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6320) assembly:single doesn't work after modularization

2012-07-03 Thread Jesse Yates (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406100#comment-13406100
 ] 

Jesse Yates commented on HBASE-6320:


+1 on stack's comment - was going to same about the same myself.

We should update the wiki if its out of date - really it should just point to 
the ref-guide.

> assembly:single doesn't work after modularization
> -
>
> Key: HBASE-6320
> URL: https://issues.apache.org/jira/browse/HBASE-6320
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.96.0
>Reporter: Jimmy Xiang
>
> After modularization, the command to build the tarball on wiki:
> http://wiki.apache.org/hadoop/Hbase/HowToRelease
> mvn clean site install assembly:single 
> Doesn't work any more.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-assembly-plugin:2.3:single (default-cli) on 
> project hbase: Failed to create assembly: Artifact: 
> org.apache.hbase:hbase-common:jar:0.95-SNAPSHOT (included by module) does not 
> have an artifact with a file. Please ensure the package phase is run before 
> the assembly is generated. -> [Help 1]
> Matteo told me we have to use 
> mvn -DskipTests package assembly:assembly
> I think we should make assembly:single work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6318) SplitLogWorker exited due to ConcurrentModificationException

2012-07-03 Thread Zhihong Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6318:
--

Attachment: 6318-suggest.txt

How about this change ?

> SplitLogWorker exited due to ConcurrentModificationException
> 
>
> Key: HBASE-6318
> URL: https://issues.apache.org/jira/browse/HBASE-6318
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.96.0
>Reporter: Jimmy Xiang
> Attachments: 6318-suggest.txt
>
>
> In playing with 0.96 code on a live cluster, found this issue:
> 2012-07-03 12:13:32,572 ERROR 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: unexpected error
> java.util.ConcurrentModificationException
> at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100)
> at java.util.TreeMap$ValueIterator.next(TreeMap.java:1145)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$OutputSink.closeLogWriters(HLogSplitter.java:1330)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$OutputSink.finishWritingAndClose(HLogSplitter.java:1221)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:441)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:369)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:113)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:276)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:164)
> at java.lang.Thread.run(Thread.java:662)
> 2012-07-03 12:13:32,575 INFO 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: SplitLogWorker 
> .cloudera.com,57020,1341335300238 exiting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6145) Fix site target post modularization

2012-07-03 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6145:
-

   Resolution: Fixed
Fix Version/s: 0.96.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

This was committed a while back.

> Fix site target post modularization
> ---
>
> Key: HBASE-6145
> URL: https://issues.apache.org/jira/browse/HBASE-6145
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: stack
> Fix For: 0.96.0
>
> Attachments: 6145v4.txt, 6145v4.txt, site.txt, site2.txt, sitev3.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6322) Unnecessary creation of finalizers in HTablePool

2012-07-03 Thread Ryan Brush (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Brush updated HBASE-6322:
--

Attachment: HBASE-6322-0.92.1.patch

Simple patch that for the 0.92 branch to implement HTableInterface and remove 
the test that checks for HTable.

> Unnecessary creation of finalizers in HTablePool
> 
>
> Key: HBASE-6322
> URL: https://issues.apache.org/jira/browse/HBASE-6322
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.92.0, 0.92.1, 0.94.0
>Reporter: Ryan Brush
> Attachments: HBASE-6322-0.92.1.patch
>
>
> From a mailing list question:
> While generating some load against a library that makes extensive use of 
> HTablePool in 0.92, I noticed that the largest heap consumer was 
> java.lang.ref.Finalizer.  Digging in, I discovered that HTablePool's internal 
> PooledHTable extends HTable, which instantiates a ThreadPoolExecutor and 
> supporting objects every time a pooled HTable is retrieved.  Since 
> ThreadPoolExecutor has a finalizer, it and its dependencies can't get garbage 
> collected until the finalizer runs.  The result is by using HTablePool, we're 
> creating a ton of objects to be finalized that are stuck on the heap longer 
> than they should be, creating our largest source of pressure on the garbage 
> collector.  It looks like this will also be a problem in 0.94 and trunk.
> The easy fix is just to have PooledHTable implement HTableInterface (rather 
> than subclass HTable), but this does break a unit test that explicitly checks 
> that PooledHTable implements HTable -- I can only assume this test is there 
> for some historical passivity reason.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6318) SplitLogWorker exited due to ConcurrentModificationException

2012-07-03 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406076#comment-13406076
 ] 

Zhihong Ted Yu commented on HBASE-6318:
---

Looks like the CME shows conflict between getWriterAndPath() and 
closeLogWriters().
See the following in getWriterAndPath():
{code}
  logWriters.put(region, ret);
{code}

> SplitLogWorker exited due to ConcurrentModificationException
> 
>
> Key: HBASE-6318
> URL: https://issues.apache.org/jira/browse/HBASE-6318
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.96.0
>Reporter: Jimmy Xiang
>
> In playing with 0.96 code on a live cluster, found this issue:
> 2012-07-03 12:13:32,572 ERROR 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: unexpected error
> java.util.ConcurrentModificationException
> at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100)
> at java.util.TreeMap$ValueIterator.next(TreeMap.java:1145)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$OutputSink.closeLogWriters(HLogSplitter.java:1330)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$OutputSink.finishWritingAndClose(HLogSplitter.java:1221)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:441)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:369)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:113)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:276)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:164)
> at java.lang.Thread.run(Thread.java:662)
> 2012-07-03 12:13:32,575 INFO 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: SplitLogWorker 
> .cloudera.com,57020,1341335300238 exiting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6322) Unnecessary creation of finalizers in HTablePool

2012-07-03 Thread Ryan Brush (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Brush updated HBASE-6322:
--

Description: 
>From a mailing list question:

While generating some load against a library that makes extensive use of 
HTablePool in 0.92, I noticed that the largest heap consumer was 
java.lang.ref.Finalizer.  Digging in, I discovered that HTablePool's internal 
PooledHTable extends HTable, which instantiates a ThreadPoolExecutor and 
supporting objects every time a pooled HTable is retrieved.  Since 
ThreadPoolExecutor has a finalizer, it and its dependencies can't get garbage 
collected until the finalizer runs.  The result is by using HTablePool, we're 
creating a ton of objects to be finalized that are stuck on the heap longer 
than they should be, creating our largest source of pressure on the garbage 
collector.  It looks like this will also be a problem in 0.94 and trunk.

The easy fix is just to have PooledHTable implement HTableInterface (rather 
than subclass HTable), but this does break a unit test that explicitly checks 
that PooledHTable implements HTable -- I can only assume this test is there for 
some historical passivity reason.

  was:
>From a mailing list question:

While generating some load against a library that makes extensive use of
HTablePool in 0.92, I noticed that the largest heap consumer was
java.lang.ref.Finalizer.  Digging in, I discovered that HTablePool's
internal PooledHTable extends HTable, which instantiates a
ThreadPoolExecutor and supporting objects every time a pooled HTable is
retrieved.  Since ThreadPoolExecutor has a finalizer, it and its
dependencies can't get garbage collected until the finalizer runs.  The
result is by using HTablePool, we're creating a ton of objects to be
finalized that are stuck on the heap longer than they should be, creating
our largest source of pressure on the garbage collector.  It looks like
this will also be a problem in 0.94 and trunk.

The easy fix is just to have PooledHTable implement HTableInterface (rather 
than subclass HTable), but this does break a unit test that explicitly checks 
that PooledHTable implements HTable -- I can only assume this test is there for 
some historical passivity reason.


> Unnecessary creation of finalizers in HTablePool
> 
>
> Key: HBASE-6322
> URL: https://issues.apache.org/jira/browse/HBASE-6322
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.92.0, 0.92.1, 0.94.0
>Reporter: Ryan Brush
>
> From a mailing list question:
> While generating some load against a library that makes extensive use of 
> HTablePool in 0.92, I noticed that the largest heap consumer was 
> java.lang.ref.Finalizer.  Digging in, I discovered that HTablePool's internal 
> PooledHTable extends HTable, which instantiates a ThreadPoolExecutor and 
> supporting objects every time a pooled HTable is retrieved.  Since 
> ThreadPoolExecutor has a finalizer, it and its dependencies can't get garbage 
> collected until the finalizer runs.  The result is by using HTablePool, we're 
> creating a ton of objects to be finalized that are stuck on the heap longer 
> than they should be, creating our largest source of pressure on the garbage 
> collector.  It looks like this will also be a problem in 0.94 and trunk.
> The easy fix is just to have PooledHTable implement HTableInterface (rather 
> than subclass HTable), but this does break a unit test that explicitly checks 
> that PooledHTable implements HTable -- I can only assume this test is there 
> for some historical passivity reason.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6322) Unnecessary creation of finalizers in HTablePool

2012-07-03 Thread Ryan Brush (JIRA)
Ryan Brush created HBASE-6322:
-

 Summary: Unnecessary creation of finalizers in HTablePool
 Key: HBASE-6322
 URL: https://issues.apache.org/jira/browse/HBASE-6322
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.94.0, 0.92.1, 0.92.0
Reporter: Ryan Brush


>From a mailing list question:

While generating some load against a library that makes extensive use of
HTablePool in 0.92, I noticed that the largest heap consumer was
java.lang.ref.Finalizer.  Digging in, I discovered that HTablePool's
internal PooledHTable extends HTable, which instantiates a
ThreadPoolExecutor and supporting objects every time a pooled HTable is
retrieved.  Since ThreadPoolExecutor has a finalizer, it and its
dependencies can't get garbage collected until the finalizer runs.  The
result is by using HTablePool, we're creating a ton of objects to be
finalized that are stuck on the heap longer than they should be, creating
our largest source of pressure on the garbage collector.  It looks like
this will also be a problem in 0.94 and trunk.

The easy fix is just to have PooledHTable implement HTableInterface (rather 
than subclass HTable), but this does break a unit test that explicitly checks 
that PooledHTable implements HTable -- I can only assume this test is there for 
some historical passivity reason.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-6306) TestFSUtils fails against hadoop 2.0

2012-07-03 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh resolved HBASE-6306.
---

   Resolution: Fixed
Fix Version/s: 0.96.0
 Hadoop Flags: Reviewed

Thanks for the review Andrew!

> TestFSUtils fails against hadoop 2.0
> 
>
> Key: HBASE-6306
> URL: https://issues.apache.org/jira/browse/HBASE-6306
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.96.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Fix For: 0.96.0
>
> Attachments: hbase-6306-trunk.patch
>
>
> trunk: mvn clean test -Dhadoop.profile=2.0 -Dtest=TestFSUtils
> {code}
> java.io.FileNotFoundException: File 
> /home/jon/proj/hbase-trunk/hbase-server/target/test-data/02beb8c8-06c1-47ea-829b-6e7ce0570cf8/hbase.version
>  does not exist
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:315)
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1279)
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1319)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:557)
> at 
> org.apache.hadoop.fs.FilterFileSystem.listStatus(FilterFileSystem.java:213)
> at org.apache.hadoop.hbase.util.FSUtils.getVersion(FSUtils.java:270)
> at 
> org.apache.hadoop.hbase.util.TestFSUtils.testVersion(TestFSUtils.java:58)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> ... 
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6306) TestFSUtils fails against hadoop 2.0

2012-07-03 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406062#comment-13406062
 ] 

Jonathan Hsieh commented on HBASE-6306:
---

This seems to fix the trunk part of HBASE-6305.

> TestFSUtils fails against hadoop 2.0
> 
>
> Key: HBASE-6306
> URL: https://issues.apache.org/jira/browse/HBASE-6306
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.96.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Fix For: 0.96.0
>
> Attachments: hbase-6306-trunk.patch
>
>
> trunk: mvn clean test -Dhadoop.profile=2.0 -Dtest=TestFSUtils
> {code}
> java.io.FileNotFoundException: File 
> /home/jon/proj/hbase-trunk/hbase-server/target/test-data/02beb8c8-06c1-47ea-829b-6e7ce0570cf8/hbase.version
>  does not exist
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:315)
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1279)
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1319)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:557)
> at 
> org.apache.hadoop.fs.FilterFileSystem.listStatus(FilterFileSystem.java:213)
> at org.apache.hadoop.hbase.util.FSUtils.getVersion(FSUtils.java:270)
> at 
> org.apache.hadoop.hbase.util.TestFSUtils.testVersion(TestFSUtils.java:58)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> ... 
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6306) TestFSUtils fails against hadoop 2.0

2012-07-03 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-6306:
--

Attachment: hbase-6306-trunk.patch

> TestFSUtils fails against hadoop 2.0
> 
>
> Key: HBASE-6306
> URL: https://issues.apache.org/jira/browse/HBASE-6306
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.96.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Fix For: 0.96.0
>
> Attachments: hbase-6306-trunk.patch
>
>
> trunk: mvn clean test -Dhadoop.profile=2.0 -Dtest=TestFSUtils
> {code}
> java.io.FileNotFoundException: File 
> /home/jon/proj/hbase-trunk/hbase-server/target/test-data/02beb8c8-06c1-47ea-829b-6e7ce0570cf8/hbase.version
>  does not exist
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:315)
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1279)
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1319)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:557)
> at 
> org.apache.hadoop.fs.FilterFileSystem.listStatus(FilterFileSystem.java:213)
> at org.apache.hadoop.hbase.util.FSUtils.getVersion(FSUtils.java:270)
> at 
> org.apache.hadoop.hbase.util.TestFSUtils.testVersion(TestFSUtils.java:58)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> ... 
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5876) TestImportExport has been failing against hadoop 0.23 profile

2012-07-03 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-5876:
--

Status: Patch Available  (was: Open)

> TestImportExport has been failing against hadoop 0.23 profile
> -
>
> Key: HBASE-5876
> URL: https://issues.apache.org/jira/browse/HBASE-5876
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Zhihong Ted Yu
>Assignee: Jonathan Hsieh
> Fix For: 0.96.0, 0.94.1
>
> Attachments: hbase-5876-94-v3.patch, hbase-5876-94.patch, 
> hbase-5876-trunk-v3.patch, hbase-5876-v2.patch, hbase-5876.patch
>
>
> TestImportExport has been failing against hadoop 0.23 profile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5876) TestImportExport has been failing against hadoop 0.23 profile

2012-07-03 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-5876:
--

Status: Open  (was: Patch Available)

> TestImportExport has been failing against hadoop 0.23 profile
> -
>
> Key: HBASE-5876
> URL: https://issues.apache.org/jira/browse/HBASE-5876
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Zhihong Ted Yu
>Assignee: Jonathan Hsieh
> Fix For: 0.96.0, 0.94.1
>
> Attachments: hbase-5876-94-v3.patch, hbase-5876-94.patch, 
> hbase-5876-trunk-v3.patch, hbase-5876-v2.patch, hbase-5876.patch
>
>
> TestImportExport has been failing against hadoop 0.23 profile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-6306) TestFSUtils fails against hadoop 2.0

2012-07-03 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh reassigned HBASE-6306:
-

Assignee: Jonathan Hsieh

> TestFSUtils fails against hadoop 2.0
> 
>
> Key: HBASE-6306
> URL: https://issues.apache.org/jira/browse/HBASE-6306
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.96.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
>
> trunk: mvn clean test -Dhadoop.profile=2.0 -Dtest=TestFSUtils
> {code}
> java.io.FileNotFoundException: File 
> /home/jon/proj/hbase-trunk/hbase-server/target/test-data/02beb8c8-06c1-47ea-829b-6e7ce0570cf8/hbase.version
>  does not exist
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:315)
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1279)
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1319)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:557)
> at 
> org.apache.hadoop.fs.FilterFileSystem.listStatus(FilterFileSystem.java:213)
> at org.apache.hadoop.hbase.util.FSUtils.getVersion(FSUtils.java:270)
> at 
> org.apache.hadoop.hbase.util.TestFSUtils.testVersion(TestFSUtils.java:58)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> ... 
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6306) TestFSUtils fails against hadoop 2.0

2012-07-03 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406053#comment-13406053
 ] 

Jonathan Hsieh commented on HBASE-6306:
---

Review was here: https://reviews.apache.org/r/5723/ and +1'ed.

> TestFSUtils fails against hadoop 2.0
> 
>
> Key: HBASE-6306
> URL: https://issues.apache.org/jira/browse/HBASE-6306
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.96.0
>Reporter: Jonathan Hsieh
>
> trunk: mvn clean test -Dhadoop.profile=2.0 -Dtest=TestFSUtils
> {code}
> java.io.FileNotFoundException: File 
> /home/jon/proj/hbase-trunk/hbase-server/target/test-data/02beb8c8-06c1-47ea-829b-6e7ce0570cf8/hbase.version
>  does not exist
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:315)
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1279)
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1319)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:557)
> at 
> org.apache.hadoop.fs.FilterFileSystem.listStatus(FilterFileSystem.java:213)
> at org.apache.hadoop.hbase.util.FSUtils.getVersion(FSUtils.java:270)
> at 
> org.apache.hadoop.hbase.util.TestFSUtils.testVersion(TestFSUtils.java:58)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> ... 
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6321) ReplicationSource dies read the peer's id

2012-07-03 Thread Jean-Daniel Cryans (JIRA)
Jean-Daniel Cryans created HBASE-6321:
-

 Summary: ReplicationSource dies read the peer's id
 Key: HBASE-6321
 URL: https://issues.apache.org/jira/browse/HBASE-6321
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.92.1
Reporter: Jean-Daniel Cryans
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.2


This is what I saw:

{noformat}
2012-07-01 05:04:01,638 ERROR 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Closing 
source 8 because an error occurred: Could not read peer's cluster id
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired for /va1-backup/hbaseid
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1021)
at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:154)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:259)
at 
org.apache.hadoop.hbase.zookeeper.ClusterId.readClusterIdZNode(ClusterId.java:61)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:253)
{noformat}

The session should just be reopened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6321) ReplicationSource dies reading the peer's id

2012-07-03 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-6321:
--

Summary: ReplicationSource dies reading the peer's id  (was: 
ReplicationSource dies read the peer's id)

> ReplicationSource dies reading the peer's id
> 
>
> Key: HBASE-6321
> URL: https://issues.apache.org/jira/browse/HBASE-6321
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.1, 0.94.0
>Reporter: Jean-Daniel Cryans
> Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.2
>
>
> This is what I saw:
> {noformat}
> 2012-07-01 05:04:01,638 ERROR 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Closing 
> source 8 because an error occurred: Could not read peer's cluster id
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
> = Session expired for /va1-backup/hbaseid
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1021)
> at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:154)
> at 
> org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:259)
> at 
> org.apache.hadoop.hbase.zookeeper.ClusterId.readClusterIdZNode(ClusterId.java:61)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:253)
> {noformat}
> The session should just be reopened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6320) assembly:single doesn't work after modularization

2012-07-03 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HBASE-6320:
--

 Summary: assembly:single doesn't work after modularization
 Key: HBASE-6320
 URL: https://issues.apache.org/jira/browse/HBASE-6320
 Project: HBase
  Issue Type: Bug
  Components: build
Affects Versions: 0.96.0
Reporter: Jimmy Xiang


After modularization, the command to build the tarball on wiki:
http://wiki.apache.org/hadoop/Hbase/HowToRelease

mvn clean site install assembly:single 

Doesn't work any more.

[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-assembly-plugin:2.3:single (default-cli) on 
project hbase: Failed to create assembly: Artifact: 
org.apache.hbase:hbase-common:jar:0.95-SNAPSHOT (included by module) does not 
have an artifact with a file. Please ensure the package phase is run before the 
assembly is generated. -> [Help 1]


Matteo told me we have to use 

mvn -DskipTests package assembly:assembly


I think we should make assembly:single work.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6319) ReplicationSource can call terminate on itself and deadlock

2012-07-03 Thread Jean-Daniel Cryans (JIRA)
Jean-Daniel Cryans created HBASE-6319:
-

 Summary: ReplicationSource can call terminate on itself and 
deadlock
 Key: HBASE-6319
 URL: https://issues.apache.org/jira/browse/HBASE-6319
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.92.1, 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.90.7, 0.92.2, 0.94.2


In a few places in the ReplicationSource code calls terminate on itself which 
is a problem since in terminate() we wait on that thread to die.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6134) Improvement for split-worker to speed up distributed log splitting

2012-07-03 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406032#comment-13406032
 ] 

Jimmy Xiang commented on HBASE-6134:


Distributed log splitting is not working any more. See HBASE-6318 for trace.  
Is it related to this patch?

> Improvement for split-worker to speed up distributed log splitting
> --
>
> Key: HBASE-6134
> URL: https://issues.apache.org/jira/browse/HBASE-6134
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: chunhui shen
>Assignee: chunhui shen
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: 6134v4.patch, HBASE-6134.patch, HBASE-6134v2.patch, 
> HBASE-6134v3-92.patch, HBASE-6134v3.patch, HBASE-6134v4-94.patch, 
> HBASE-6134v4.patch
>
>
> First,we do the test between local-master-splitting and 
> distributed-log-splitting
> Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
> splitting work), 400 regions in one hlog file
> local-master-split:60s+
> distributed-log-splitting:165s+
> In fact, in our production environment, distributed-log-splitting also took 
> 60s with 30 regionservers for 34 hlog files (regionserver may be in high load)
> We found split-worker split one log file took about 20s
> (30ms~50ms per writer.close(); 10ms per create writers )
> I think we could do the improvement for this:
> Parallelizing the create and close writers in threads
> In the patch, change the logic for  distributed-log-splitting same as the 
> local-master-splitting and parallelizing the close in threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6117) Revisit default condition added to Switch cases in Trunk

2012-07-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406028#comment-13406028
 ] 

stack commented on HBASE-6117:
--

+1 on patch minus the redundant logging.

> Revisit default condition added to Switch cases in Trunk
> 
>
> Key: HBASE-6117
> URL: https://issues.apache.org/jira/browse/HBASE-6117
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.96.0
>
> Attachments: HBASE-6117.patch
>
>
> We found that in some cases the default case in switch block was just 
> throwing illegalArg Exception. There are cases where we may get some other 
> state for which we should not throw IllegalArgException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6318) SplitLogWorker exited due to ConcurrentModificationException

2012-07-03 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HBASE-6318:
--

 Summary: SplitLogWorker exited due to 
ConcurrentModificationException
 Key: HBASE-6318
 URL: https://issues.apache.org/jira/browse/HBASE-6318
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.96.0
Reporter: Jimmy Xiang


In playing with 0.96 code on a live cluster, found this issue:

2012-07-03 12:13:32,572 ERROR 
org.apache.hadoop.hbase.regionserver.SplitLogWorker: unexpected error
java.util.ConcurrentModificationException
at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100)
at java.util.TreeMap$ValueIterator.next(TreeMap.java:1145)
at 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$OutputSink.closeLogWriters(HLogSplitter.java:1330)
at 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$OutputSink.finishWritingAndClose(HLogSplitter.java:1221)
at 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:441)
at 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:369)
at 
org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:113)
at 
org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:276)
at 
org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
at 
org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:164)
at java.lang.Thread.run(Thread.java:662)
2012-07-03 12:13:32,575 INFO 
org.apache.hadoop.hbase.regionserver.SplitLogWorker: SplitLogWorker 
.cloudera.com,57020,1341335300238 exiting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6134) Improvement for split-worker to speed up distributed log splitting

2012-07-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406017#comment-13406017
 ] 

stack commented on HBASE-6134:
--

@Ram or @Chunhui You fellas remember an issue that moves a rename out of zk 
handling path?  I seem to remember such a beast but can't remember the issue 
number.  I'm wondering if you fellas fixed hbase-6140 already and afraid in 
case we let it drop.  Thanks.

> Improvement for split-worker to speed up distributed log splitting
> --
>
> Key: HBASE-6134
> URL: https://issues.apache.org/jira/browse/HBASE-6134
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: chunhui shen
>Assignee: chunhui shen
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: 6134v4.patch, HBASE-6134.patch, HBASE-6134v2.patch, 
> HBASE-6134v3-92.patch, HBASE-6134v3.patch, HBASE-6134v4-94.patch, 
> HBASE-6134v4.patch
>
>
> First,we do the test between local-master-splitting and 
> distributed-log-splitting
> Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
> splitting work), 400 regions in one hlog file
> local-master-split:60s+
> distributed-log-splitting:165s+
> In fact, in our production environment, distributed-log-splitting also took 
> 60s with 30 regionservers for 34 hlog files (regionserver may be in high load)
> We found split-worker split one log file took about 20s
> (30ms~50ms per writer.close(); 10ms per create writers )
> I think we could do the improvement for this:
> Parallelizing the create and close writers in threads
> In the patch, change the logic for  distributed-log-splitting same as the 
> local-master-splitting and parallelizing the close in threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6134) Improvement for split-worker to speed up distributed log splitting

2012-07-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406014#comment-13406014
 ] 

stack commented on HBASE-6134:
--

The patch did not make it into 0.92 and 0.94?  Is it supposed to?

> Improvement for split-worker to speed up distributed log splitting
> --
>
> Key: HBASE-6134
> URL: https://issues.apache.org/jira/browse/HBASE-6134
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: chunhui shen
>Assignee: chunhui shen
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: 6134v4.patch, HBASE-6134.patch, HBASE-6134v2.patch, 
> HBASE-6134v3-92.patch, HBASE-6134v3.patch, HBASE-6134v4-94.patch, 
> HBASE-6134v4.patch
>
>
> First,we do the test between local-master-splitting and 
> distributed-log-splitting
> Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
> splitting work), 400 regions in one hlog file
> local-master-split:60s+
> distributed-log-splitting:165s+
> In fact, in our production environment, distributed-log-splitting also took 
> 60s with 30 regionservers for 34 hlog files (regionserver may be in high load)
> We found split-worker split one log file took about 20s
> (30ms~50ms per writer.close(); 10ms per create writers )
> I think we could do the improvement for this:
> Parallelizing the create and close writers in threads
> In the patch, change the logic for  distributed-log-splitting same as the 
> local-master-splitting and parallelizing the close in threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6039) Remove HMasterInterface and replace with something similar to RegionServerStatusProtocol

2012-07-03 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6039:
-

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to trunk.  Thanks for the fat patch Gregory.

> Remove HMasterInterface and replace with something similar to 
> RegionServerStatusProtocol
> 
>
> Key: HBASE-6039
> URL: https://issues.apache.org/jira/browse/HBASE-6039
> Project: HBase
>  Issue Type: Task
>  Components: ipc, master
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.96.0
>
> Attachments: HBASE-6039-v2.patch, HBASE-6039.patch
>
>
> Me: Once everything in HMasterInterface is converted to use PB, we can either 
> declare a new class for the representation (similar to 
> RegionServerStatusProtocol) or just re-purpose HMasterInterface for that. 
> What is your preference?
> Stack: Lets do what Jimmy did, make a new class and kill the old.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6039) Remove HMasterInterface and replace with something similar to RegionServerStatusProtocol

2012-07-03 Thread Gregory Chanan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gregory Chanan updated HBASE-6039:
--

Attachment: HBASE-6039-v2.patch

* Attached HBASE-6039-v2.patch *

Matches reviewboard.

> Remove HMasterInterface and replace with something similar to 
> RegionServerStatusProtocol
> 
>
> Key: HBASE-6039
> URL: https://issues.apache.org/jira/browse/HBASE-6039
> Project: HBase
>  Issue Type: Task
>  Components: ipc, master
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.96.0
>
> Attachments: HBASE-6039-v2.patch, HBASE-6039.patch
>
>
> Me: Once everything in HMasterInterface is converted to use PB, we can either 
> declare a new class for the representation (similar to 
> RegionServerStatusProtocol) or just re-purpose HMasterInterface for that. 
> What is your preference?
> Stack: Lets do what Jimmy did, make a new class and kill the old.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6293) HMaster does not go down while splitting logs even if explicit shutdown is called.

2012-07-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405991#comment-13405991
 ] 

stack commented on HBASE-6293:
--

+1

> HMaster does not go down while splitting logs even if explicit shutdown is 
> called.
> --
>
> Key: HBASE-6293
> URL: https://issues.apache.org/jira/browse/HBASE-6293
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.1, 0.94.0
>Reporter: rajeshbabu
>Assignee: rajeshbabu
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: 6293.txt
>
>
> When master starts up and tries to do splitlog, in case of any error we try 
> to do that infinitely in a loop until it succeeds.
> But now if we get a shutdown call, inside SplitLogManager
> {code}
>   if (stopper.isStopped()) {
> LOG.warn("Stopped while waiting for log splits to be completed");
> return;
>   }
> {code}
> Here we know that the master has stopped.  As the task may not be completed 
> now
> {code}
>  if (batch.done != batch.installed) {
>   batch.isDead = true;
>   tot_mgr_log_split_batch_err.incrementAndGet();
>   LOG.warn("error while splitting logs in " + logDirs +
>   " installed = " + batch.installed + " but only " + batch.done + " 
> done");
>   throw new IOException("error or interrupt while splitting logs in "
>   + logDirs + " Task = " + batch);
> }
> {code} 
> we throw an exception.  In MasterFileSystem.splitLogAfterStartup() we don't 
> check if the master is stopped and we try continously. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6309) [MTTR] Do NN operations outside of the ZK EventThread in SplitLogManager

2012-07-03 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405976#comment-13405976
 ] 

nkeywal commented on HBASE-6309:


bq. IMO we should move everything that talks to ZK and NN out of that path.
+1...

> [MTTR] Do NN operations outside of the ZK EventThread in SplitLogManager
> 
>
> Key: HBASE-6309
> URL: https://issues.apache.org/jira/browse/HBASE-6309
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.92.1, 0.94.0, 0.96.0
>Reporter: Jean-Daniel Cryans
>Priority: Critical
> Fix For: 0.96.0
>
>
> We found this issue during the leap second cataclysm which prompted a 
> distributed splitting of all our logs.
> I saw that none of the RS were splitting after some time while the master was 
> showing that it wasn't even 30% done. jstack'ing I saw this:
> {noformat}
> "main-EventThread" daemon prio=10 tid=0x7f6ce46d8800 nid=0x5376 in
> Object.wait() [0x7f6ce2ecb000]
>java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:485)
> at org.apache.hadoop.ipc.Client.call(Client.java:1093)
> - locked <0x0005fdd661a0> (a org.apache.hadoop.ipc.Client$Call)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
> at $Proxy9.rename(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> at $Proxy9.rename(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:759)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:253)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.moveRecoveredEditsFromTemp(HLogSplitter.java:553)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.moveRecoveredEditsFromTemp(HLogSplitter.java:519)
> at 
> org.apache.hadoop.hbase.master.SplitLogManager$1.finish(SplitLogManager.java:138)
> at 
> org.apache.hadoop.hbase.master.SplitLogManager.getDataSetWatchSuccess(SplitLogManager.java:431)
> at 
> org.apache.hadoop.hbase.master.SplitLogManager.access$1200(SplitLogManager.java:95)
> at 
> org.apache.hadoop.hbase.master.SplitLogManager$GetDataAsyncCallback.processResult(SplitLogManager.java:1011)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:571)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
> {noformat}
> We are effectively bottlenecking on doing NN operations and whatever else is 
> happening in GetDataAsyncCallback. It was so bad that on our 100 offline 
> cluster it took a few hours for the master to process all the incoming ZK 
> events while the actual splitting took a fraction of that time.
> I'm marking this as critical and against 0.96 but depending on how involved 
> the fix is we might want to backport.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6312) Make BlockCache eviction thresholds configurable

2012-07-03 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405973#comment-13405973
 ] 

Zhihong Ted Yu commented on HBASE-6312:
---

Pasting from a message on dev@:

Ideally those shouldn't be configurable, we should just set to a level
that makes more sense. If we do make it configurable and let it like
that then we'll have questions like "what acceptable/min factor should
I use?" and we'll spend hours doing back and forth on the ML to get
minimal results.

Currently having the acceptable factor set to where it is just means
that we're using less memory than configured eg if you need to cache
2GB per machine, set hfile.block.cache.size to ~2.35GB and you'll have
it.

The real issue is the minimum factor. The idea is that we don't want
to overflow the configured maximum size while we're evicting. The
problems:

 - Evicting 10% of the cache (85-75) is pretty hardcore, it means that
if you evict often then you're never close to using 85% of your cache.
 - Evictions are purely CPU-bound and in my tests are almost never
likely to be so slow that you reach 100% utilization (whereas loading
the cache usually means you need to read from disk). It was too slow
for caches of up to 32MB with data generated in-memory.
 - Considering the previous two problems it seems we should set the
minimum factor close to the acceptable one, but on big caches this
would waste a lot of CPU cycles (I haven't quantified that yet, I'm
just stating this from experience).

So back to HBASE-6312, at the moment I think we should just set the
minimum factor 5% closer to the acceptable one. Jie Huang doesn't
mention if in their tests their customers compared the caching ratio
for caches of the same size but with different acceptable factor or if
they tried to compare apples to apples. What I'm trying to say, going
back to my earlier example, is that if they compared two caches with
hfile.block.cache.size=0.2 but with different acceptable factors then
well yes the one with the bigger acceptable factor will win... because
it's using a bigger cache.

J-D

> Make BlockCache eviction thresholds configurable
> 
>
> Key: HBASE-6312
> URL: https://issues.apache.org/jira/browse/HBASE-6312
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Affects Versions: 0.94.0
>Reporter: Jie Huang
>Priority: Minor
> Attachments: hbase-6312.patch
>
>
> Some of our customers found that tuning the BlockCache eviction thresholds 
> made test results different in their test environment. However, those 
> thresholds are not configurable in the current implementation. The only way 
> to change those values is to re-compile the HBase source code. We wonder if 
> it is possible to make them configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6309) [MTTR] Do NN operations outside of the ZK EventThread in SplitLogManager

2012-07-03 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405969#comment-13405969
 ] 

Jean-Daniel Cryans commented on HBASE-6309:
---

Hey Ram, yeah it's pretty much the same although I like my description of the 
problem better :)

IMO we should move everything that talks to ZK and NN out of that path.

> [MTTR] Do NN operations outside of the ZK EventThread in SplitLogManager
> 
>
> Key: HBASE-6309
> URL: https://issues.apache.org/jira/browse/HBASE-6309
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.92.1, 0.94.0, 0.96.0
>Reporter: Jean-Daniel Cryans
>Priority: Critical
> Fix For: 0.96.0
>
>
> We found this issue during the leap second cataclysm which prompted a 
> distributed splitting of all our logs.
> I saw that none of the RS were splitting after some time while the master was 
> showing that it wasn't even 30% done. jstack'ing I saw this:
> {noformat}
> "main-EventThread" daemon prio=10 tid=0x7f6ce46d8800 nid=0x5376 in
> Object.wait() [0x7f6ce2ecb000]
>java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:485)
> at org.apache.hadoop.ipc.Client.call(Client.java:1093)
> - locked <0x0005fdd661a0> (a org.apache.hadoop.ipc.Client$Call)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
> at $Proxy9.rename(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> at $Proxy9.rename(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:759)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:253)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.moveRecoveredEditsFromTemp(HLogSplitter.java:553)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.moveRecoveredEditsFromTemp(HLogSplitter.java:519)
> at 
> org.apache.hadoop.hbase.master.SplitLogManager$1.finish(SplitLogManager.java:138)
> at 
> org.apache.hadoop.hbase.master.SplitLogManager.getDataSetWatchSuccess(SplitLogManager.java:431)
> at 
> org.apache.hadoop.hbase.master.SplitLogManager.access$1200(SplitLogManager.java:95)
> at 
> org.apache.hadoop.hbase.master.SplitLogManager$GetDataAsyncCallback.processResult(SplitLogManager.java:1011)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:571)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
> {noformat}
> We are effectively bottlenecking on doing NN operations and whatever else is 
> happening in GetDataAsyncCallback. It was so bad that on our 100 offline 
> cluster it took a few hours for the master to process all the incoming ZK 
> events while the actual splitting took a fraction of that time.
> I'm marking this as critical and against 0.96 but depending on how involved 
> the fix is we might want to backport.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6293) HMaster does not go down while splitting logs even if explicit shutdown is called.

2012-07-03 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405954#comment-13405954
 ] 

ramkrishna.s.vasudevan commented on HBASE-6293:
---

+1.

> HMaster does not go down while splitting logs even if explicit shutdown is 
> called.
> --
>
> Key: HBASE-6293
> URL: https://issues.apache.org/jira/browse/HBASE-6293
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.1, 0.94.0
>Reporter: rajeshbabu
>Assignee: rajeshbabu
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: 6293.txt
>
>
> When master starts up and tries to do splitlog, in case of any error we try 
> to do that infinitely in a loop until it succeeds.
> But now if we get a shutdown call, inside SplitLogManager
> {code}
>   if (stopper.isStopped()) {
> LOG.warn("Stopped while waiting for log splits to be completed");
> return;
>   }
> {code}
> Here we know that the master has stopped.  As the task may not be completed 
> now
> {code}
>  if (batch.done != batch.installed) {
>   batch.isDead = true;
>   tot_mgr_log_split_batch_err.incrementAndGet();
>   LOG.warn("error while splitting logs in " + logDirs +
>   " installed = " + batch.installed + " but only " + batch.done + " 
> done");
>   throw new IOException("error or interrupt while splitting logs in "
>   + logDirs + " Task = " + batch);
> }
> {code} 
> we throw an exception.  In MasterFileSystem.splitLogAfterStartup() we don't 
> check if the master is stopped and we try continously. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6293) HMaster does not go down while splitting logs even if explicit shutdown is called.

2012-07-03 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6293:
-

Attachment: 6293.txt

Simple patch.

> HMaster does not go down while splitting logs even if explicit shutdown is 
> called.
> --
>
> Key: HBASE-6293
> URL: https://issues.apache.org/jira/browse/HBASE-6293
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.1, 0.94.0
>Reporter: rajeshbabu
>Assignee: rajeshbabu
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: 6293.txt
>
>
> When master starts up and tries to do splitlog, in case of any error we try 
> to do that infinitely in a loop until it succeeds.
> But now if we get a shutdown call, inside SplitLogManager
> {code}
>   if (stopper.isStopped()) {
> LOG.warn("Stopped while waiting for log splits to be completed");
> return;
>   }
> {code}
> Here we know that the master has stopped.  As the task may not be completed 
> now
> {code}
>  if (batch.done != batch.installed) {
>   batch.isDead = true;
>   tot_mgr_log_split_batch_err.incrementAndGet();
>   LOG.warn("error while splitting logs in " + logDirs +
>   " installed = " + batch.installed + " but only " + batch.done + " 
> done");
>   throw new IOException("error or interrupt while splitting logs in "
>   + logDirs + " Task = " + batch);
> }
> {code} 
> we throw an exception.  In MasterFileSystem.splitLogAfterStartup() we don't 
> check if the master is stopped and we try continously. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6293) HMaster does not go down while splitting logs even if explicit shutdown is called.

2012-07-03 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6293:
-

Status: Patch Available  (was: Open)

> HMaster does not go down while splitting logs even if explicit shutdown is 
> called.
> --
>
> Key: HBASE-6293
> URL: https://issues.apache.org/jira/browse/HBASE-6293
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.0, 0.92.1
>Reporter: rajeshbabu
>Assignee: rajeshbabu
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: 6293.txt
>
>
> When master starts up and tries to do splitlog, in case of any error we try 
> to do that infinitely in a loop until it succeeds.
> But now if we get a shutdown call, inside SplitLogManager
> {code}
>   if (stopper.isStopped()) {
> LOG.warn("Stopped while waiting for log splits to be completed");
> return;
>   }
> {code}
> Here we know that the master has stopped.  As the task may not be completed 
> now
> {code}
>  if (batch.done != batch.installed) {
>   batch.isDead = true;
>   tot_mgr_log_split_batch_err.incrementAndGet();
>   LOG.warn("error while splitting logs in " + logDirs +
>   " installed = " + batch.installed + " but only " + batch.done + " 
> done");
>   throw new IOException("error or interrupt while splitting logs in "
>   + logDirs + " Task = " + batch);
> }
> {code} 
> we throw an exception.  In MasterFileSystem.splitLogAfterStartup() we don't 
> check if the master is stopped and we try continously. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6317) Master clean start up and Partially enabled tables make region assignment inconsistent.

2012-07-03 Thread ramkrishna.s.vasudevan (JIRA)
ramkrishna.s.vasudevan created HBASE-6317:
-

 Summary: Master clean start up and Partially enabled tables make 
region assignment inconsistent.
 Key: HBASE-6317
 URL: https://issues.apache.org/jira/browse/HBASE-6317
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.96.0, 0.94.1


If we have a  table in partially enabled state (ENABLING) then on HMaster 
restart we treat it as a clean cluster start up and do a bulk assign.  
Currently in 0.94 bulk assign will not handle ALREADY_OPENED scenarios and it 
leads to region assignment problems.  Analysing more on this we found that we 
have better way to handle these scenarios.
{code}
if (false == checkIfRegionBelongsToDisabled(regionInfo)
&& false == checkIfRegionsBelongsToEnabling(regionInfo)) {
  synchronized (this.regions) {
regions.put(regionInfo, regionLocation);
addToServers(regionLocation, regionInfo);
  }
{code}
We dont add to regions map so that enable table handler can handle it.  But as 
nothing is added to regions map we think it as a clean cluster start up.
Will come up with a patch tomorrow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6294) Detect leftover data in ZK after a user delete all its HBase data

2012-07-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405945#comment-13405945
 ] 

stack commented on HBASE-6294:
--

Thats a bug Lars.  Aim is to just stop/start to move 0.94 to 0.96.  Most of the 
znodes are automigrated.  I must have missed some.  I made HBASE-6316 to make 
sure this not necessary going to 0.96.

> Detect leftover data in ZK after a user delete all its HBase data
> -
>
> Key: HBASE-6294
> URL: https://issues.apache.org/jira/browse/HBASE-6294
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.94.0
>Reporter: Jean-Daniel Cryans
>Priority: Critical
> Fix For: 0.96.0, 0.94.1
>
>
> It seems we have a new failure mode when a user deletes the hbase root.dir 
> but doesn't delete the ZK data. For example a user on IRC came with this log:
> {noformat}
> 2012-06-30 09:07:48,017 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open 
> region: kw,,1340981821308.2e8a318837602c9c9961e9d690b7fd02.
> 2012-06-30 09:07:48,017 WARN org.apache.hadoop.hbase.util.FSTableDescriptors: 
> The following folder is in HBase's root directory and doesn't contain a table 
> descriptor, do consider deleting it: kw
> 2012-06-30 09:07:48,018 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:34193-0x1383bfe01b70001 Attempting to transition node 
> 2e8a318837602c9c9961e9d690b7fd02 from M_ZK_REGION_OFFLINE to 
> RS_ZK_REGION_OPENING
> 2012-06-30 09:07:48,018 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=M_ZK_REGION_OFFLINE, server=localhost,50890,1341036299694, 
> region=2e8a318837602c9c9961e9d690b7fd02
> 2012-06-30 09:07:48,020 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, server=localhost,34193,1341036300138, 
> region=b254af24c9127b8bb22cb6d24e523dad
> 2012-06-30 09:07:48,020 DEBUG 
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
> event for b254af24c9127b8bb22cb6d24e523dad
> 2012-06-30 09:07:48,020 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
> was=kw_r,,1340981822374.b254af24c9127b8bb22cb6d24e523dad. state=CLOSED, 
> ts=1341036467998, server=localhost,34193,1341036300138
> 2012-06-30 09:07:48,020 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:50890-0x1383bfe01b7 Creating (or updating) unassigned node for 
> b254af24c9127b8bb22cb6d24e523dad with OFFLINE state
> 2012-06-30 09:07:48,028 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:34193-0x1383bfe01b70001 Successfully transitioned node 
> 2e8a318837602c9c9961e9d690b7fd02 from M_ZK_REGION_OFFLINE to 
> RS_ZK_REGION_OPENING
> 2012-06-30 09:07:48,028 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> Opening region: {NAME => 
> 'kw,,1340981821308.2e8a318837602c9c9961e9d690b7fd02.', STARTKEY => '', ENDKEY 
> => '', ENCODED => 2e8a318837602c9c9961e9d690b7fd02,}
> 2012-06-30 09:07:48,029 ERROR 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open 
> of region=kw,,1340981821308.2e8a318837602c9c9961e9d690b7fd02., starting to 
> roll back the global memstore size.
> java.lang.IllegalStateException: Could not instantiate a region instance.
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:3490)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3628)
>   at 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:332)
>   at 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:108)
>   at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:169)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>   at java.lang.Thread.run(Thread.java:679)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.GeneratedConstructorAccessor15.newInstance(Unknown 
> Source)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:532)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:3487)
>   ... 7 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.loadTableCoprocessors(RegionCoprocessorHost.java:133)
>   at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.(RegionCoprocessorHost.java:125)
>   at org.apache.hadoop.hbase.regionserver.HReg

[jira] [Created] (HBASE-6316) Confirm can upgrade from 0.96 from 0.94 by just stopping and restarting

2012-07-03 Thread stack (JIRA)
stack created HBASE-6316:


 Summary: Confirm can upgrade from 0.96 from 0.94 by just stopping 
and restarting
 Key: HBASE-6316
 URL: https://issues.apache.org/jira/browse/HBASE-6316
 Project: HBase
  Issue Type: Bug
Reporter: stack
Priority: Blocker


Over in HBASE-6294, LarsH says you have to currently clear zk to get a 0.96 to 
start over data written by a 0.94.  Need to fix it so don't have to do this -- 
that zk state left over gets auto-migrated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6281) Assignment need not be called for disabling table regions during clean cluster start up.

2012-07-03 Thread rajeshbabu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rajeshbabu updated HBASE-6281:
--

Attachment: HBASE-6281_92.patch

Patch for 92. Added TestAssignmentManager to test this scenario.

> Assignment need not be called for disabling table regions during clean 
> cluster start up.
> 
>
> Key: HBASE-6281
> URL: https://issues.apache.org/jira/browse/HBASE-6281
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.1, 0.94.0
>Reporter: rajeshbabu
>Assignee: rajeshbabu
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: 6281-trunk-v2.txt, HBASE-6281_92.patch, 
> HBASE-6281_94.patch, HBASE-6281_94_2.patch, HBASE-6281_trunk.patch
>
>
> Currently during clean cluster start up if there are tables in DISABLING 
> state, we do bulk assignment through assignAllUserRegions() and after region 
> is OPENED in RS, master checks if the table is in DISBALING/DISABLED state 
> (in Am.regionOnline) and again calls unassign.  This roundtrip can be avoided 
> even before calling assignment.
> This JIRA is to address the above scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6315) ipc.HBaseClient should support address change as does hdfs

2012-07-03 Thread nkeywal (JIRA)
nkeywal created HBASE-6315:
--

 Summary: ipc.HBaseClient should support address change as does hdfs
 Key: HBASE-6315
 URL: https://issues.apache.org/jira/browse/HBASE-6315
 Project: HBase
  Issue Type: Bug
  Components: ipc
Affects Versions: 0.96.0
Reporter: nkeywal
Priority: Minor


ipc.HBaseClient is a copy paste from ipc.Client. This implementation now 
support adress change. 

As a side node, HBase comment on 'the max number of retries is 45' is now wrong.


--- HBaseClient
} catch (SocketTimeoutException toe) {
  /* The max number of retries is 45,
   * which amounts to 20s*45 = 15 minutes retries.
   */
  handleConnectionFailure(timeoutFailures++, maxRetries, toe);
} catch (IOException ie) {
  handleConnectionFailure(ioFailures++, maxRetries, ie);
}

--- Hadoop Client
} catch (SocketTimeoutException toe) {
  /* Check for an address change and update the local reference.
   * Reset the failure counter if the address was changed
   */
  if (updateAddress()) {
timeoutFailures = ioFailures = 0;
  }
  /* The max number of retries is 45,
   * which amounts to 20s*45 = 15 minutes retries.
   */
  handleConnectionFailure(timeoutFailures++, 45, toe);
} catch (IOException ie) {
  if (updateAddress()) {
timeoutFailures = ioFailures = 0;
  }
  handleConnectionFailure(ioFailures++, maxRetries, ie);
}

private synchronized boolean updateAddress() throws IOException {
  // Do a fresh lookup with the old host name.
  InetSocketAddress currentAddr = NetUtils.makeSocketAddr(
   server.getHostName(), server.getPort());

  if (!server.equals(currentAddr)) {
LOG.warn("Address change detected. Old: " + server.toString() +
 " New: " + currentAddr.toString());
server = currentAddr;
return true;
  }
  return false;
}



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >