[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23

2012-02-17 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210741#comment-13210741
 ] 

Hadoop QA commented on HBASE-5317:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12515057/HBASE-5317-v3.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 7 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -136 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 158 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/985//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/985//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/985//console

This message is automatically generated.

 Fix TestHFileOutputFormat to work against hadoop 0.23
 -

 Key: HBASE-5317
 URL: https://issues.apache.org/jira/browse/HBASE-5317
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0, 0.92.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch, 
 HBASE-5317-v3.patch


 Running
 mvn -Dhadoop.profile=23 test -P localTests 
 -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
 yields this on 0.92:
 Failed tests:   
 testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  HFile for column family info-A not found
 Tests in error: 
   test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): 
 /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0
  (Is a directory)
   
 testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
   
 testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 It looks like on trunk, this also results in an error:
   
 testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but 
 haven't fixed the other 3 yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5347) GC free memory management in Level-1 Block Cache

2012-02-17 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210750#comment-13210750
 ] 

Todd Lipcon commented on HBASE-5347:


Hey folks. I haven't been through the patch yet, but just wanted to throw out 
one idea that I think can make reference-counted systems a little simpler: in 
Cocoa (the OSX development framework) there's a class called NSAutoreleasePool, 
an instance of which is carried around as part of the local thread context. You 
can then call autorelease on any object, which will not immediately decrement 
the ref count, but adds it to the pool. When you release the pool, all 
referenced objects are decremented at that point.

This idea might make it easier to manage references. For example, when 
something is read by a scanner, it could be read with ref count incremented but 
put on the request's autorelease pool. Then, when any IPC handler thread is 
returned to the thread pool, the auto release pool could be decremented. This 
ensures that any stuff we reference is kept around for the whole request 
lifecycle but still automatically dereffed at the end.

Do you think such a construct would be useful here?

https://developer.apple.com/library/mac/#documentation/Cocoa/Conceptual/MemoryMgmt/Articles/mmAutoreleasePools.html
 has some more info.

 GC free memory management in Level-1 Block Cache
 

 Key: HBASE-5347
 URL: https://issues.apache.org/jira/browse/HBASE-5347
 Project: HBase
  Issue Type: Improvement
Reporter: Prakash Khemani
Assignee: Prakash Khemani
 Attachments: D1635.5.patch


 On eviction of a block from the block-cache, instead of waiting for the 
 garbage collecter to reuse its memory, reuse the block right away.
 This will require us to keep reference counts on the HFile blocks. Once we 
 have the reference counts in place we can do our own simple 
 blocks-out-of-slab allocation for the block-cache.
 This will help us with
 * reducing gc pressure, especially in the old generation
 * making it possible to have non-java-heap memory backing the HFile blocks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5332) Deterministic Compaction Jitter

2012-02-17 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210751#comment-13210751
 ] 

Lars Hofhansl commented on HBASE-5332:
--

I kinda like the simpleness of the random jitter. Part of the problem seems 
to be that we only get a few random choices with delay + jitter*(1 - 
2*Math.random())

What if we just change this to delay + jitter*(2 - 4*Math.random()) or delay 
+ jitter*(3 - 6*Math.random()) and decrease jitter accordingly?


 Deterministic Compaction Jitter
 ---

 Key: HBASE-5332
 URL: https://issues.apache.org/jira/browse/HBASE-5332
 Project: HBase
  Issue Type: Improvement
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Minor
 Attachments: D1785.1.patch, D1785.2.patch


 Currently, we add jitter to a compaction using delay + jitter*(1 - 
 2*Math.random()).  Since this is non-deterministic, we can get major 
 compaction storms on server restart as half the Stores that were set to 
 delay + jitter will now be set to delay - jitter.  We need a more 
 deterministic way to jitter major compactions so this information can persist 
 across server restarts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5003) If the master is started with a wrong root dir, it gets stuck and can't be killed

2012-02-17 Thread Shaneal Manek (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaneal Manek updated HBASE-5003:
-

Attachment: hbase-5003.patch

 If the master is started with a wrong root dir, it gets stuck and can't be 
 killed
 -

 Key: HBASE-5003
 URL: https://issues.apache.org/jira/browse/HBASE-5003
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Priority: Critical
  Labels: noob
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: hbase-5003.patch


 Reported by a new user on IRC who tried to set hbase.rootdir to 
 file:///~/hbase, the master gets stuck and cannot be killed. I tried 
 something similar on my machine and it spins while logging:
 {quote}
 2011-12-09 16:11:17,002 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to 
 create version file at file:/bin/hbase, retrying: Mkdirs failed to create 
 file:/bin/hbase
 2011-12-09 16:11:27,002 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to 
 create version file at file:/bin/hbase, retrying: Mkdirs failed to create 
 file:/bin/hbase
 2011-12-09 16:11:37,003 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to 
 create version file at file:/bin/hbase, retrying: Mkdirs failed to create 
 file:/bin/hbase
 {quote}
 The reason it cannot be stopped is that the master's main thread is stuck in 
 there and will never be notified:
 {quote}
 Master:0;su-jdcryans-01.local,51116,1323475535684 prio=5 tid=7f92b7a3c000 
 nid=0x1137ba000 waiting on condition [1137b9000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
   at java.lang.Thread.sleep(Native Method)
   at org.apache.hadoop.hbase.util.FSUtils.setVersion(FSUtils.java:297)
   at org.apache.hadoop.hbase.util.FSUtils.setVersion(FSUtils.java:268)
   at 
 org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:339)
   at 
 org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:128)
   at 
 org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:113)
   at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:435)
   at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:314)
   at 
 org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:218)
   at java.lang.Thread.run(Thread.java:680)
 {quote}
 It seems we should do a better handling of the exceptions we get in there, 
 and die if we need to. It would make a better user experience.
 Maybe also do a check on hbase.rootdir before even starting the master.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5003) If the master is started with a wrong root dir, it gets stuck and can't be killed

2012-02-17 Thread Shaneal Manek (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaneal Manek updated HBASE-5003:
-

Attachment: hbase-5003-v2.patch

 If the master is started with a wrong root dir, it gets stuck and can't be 
 killed
 -

 Key: HBASE-5003
 URL: https://issues.apache.org/jira/browse/HBASE-5003
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Priority: Critical
  Labels: noob
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: hbase-5003-v2.patch, hbase-5003.patch


 Reported by a new user on IRC who tried to set hbase.rootdir to 
 file:///~/hbase, the master gets stuck and cannot be killed. I tried 
 something similar on my machine and it spins while logging:
 {quote}
 2011-12-09 16:11:17,002 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to 
 create version file at file:/bin/hbase, retrying: Mkdirs failed to create 
 file:/bin/hbase
 2011-12-09 16:11:27,002 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to 
 create version file at file:/bin/hbase, retrying: Mkdirs failed to create 
 file:/bin/hbase
 2011-12-09 16:11:37,003 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to 
 create version file at file:/bin/hbase, retrying: Mkdirs failed to create 
 file:/bin/hbase
 {quote}
 The reason it cannot be stopped is that the master's main thread is stuck in 
 there and will never be notified:
 {quote}
 Master:0;su-jdcryans-01.local,51116,1323475535684 prio=5 tid=7f92b7a3c000 
 nid=0x1137ba000 waiting on condition [1137b9000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
   at java.lang.Thread.sleep(Native Method)
   at org.apache.hadoop.hbase.util.FSUtils.setVersion(FSUtils.java:297)
   at org.apache.hadoop.hbase.util.FSUtils.setVersion(FSUtils.java:268)
   at 
 org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:339)
   at 
 org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:128)
   at 
 org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:113)
   at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:435)
   at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:314)
   at 
 org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:218)
   at java.lang.Thread.run(Thread.java:680)
 {quote}
 It seems we should do a better handling of the exceptions we get in there, 
 and die if we need to. It would make a better user experience.
 Maybe also do a check on hbase.rootdir before even starting the master.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5423) Regionserver may block forever on waitOnAllRegionsToClose when aborting

2012-02-17 Thread chunhui shen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210760#comment-13210760
 ] 

chunhui shen commented on HBASE-5423:
-

If we failed closing one region, we will remove it from RIT , so it maybe 
calling close multiple times on same region.
{code}
CloseRegionHandler#process(){
try{
...
region.close(abort)
...
}finally{
 this.rsServices.getRegionsInTransitionInRS().
  remove(this.regionInfo.getEncodedNameAsBytes());
}
}
{code}

Therefore, if we can't close some regions because of some exception, we should 
break even though online regions is not yet empty, otherwise, it may block 
forever on waitOnAllRegionsToClose

 Regionserver may block forever on waitOnAllRegionsToClose when aborting
 ---

 Key: HBASE-5423
 URL: https://issues.apache.org/jira/browse/HBASE-5423
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: hbase-5423.patch


 If closeRegion throws any exception (It would be caused by FS ) when RS is 
 aborting, 
 RS will block forever on waitOnAllRegionsToClose().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5003) If the master is started with a wrong root dir, it gets stuck and can't be killed

2012-02-17 Thread Shaneal Manek (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaneal Manek updated HBASE-5003:
-

Status: Patch Available  (was: Open)

Simply has the master retry writing the version file 3 times (by default - but 
configurable). If it fails, the master shuts down gracefully.

Please disregard the first patch - it accidentally includes the buggy 
hbase-site.xml I was using to reproduce this issue.

 If the master is started with a wrong root dir, it gets stuck and can't be 
 killed
 -

 Key: HBASE-5003
 URL: https://issues.apache.org/jira/browse/HBASE-5003
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Priority: Critical
  Labels: noob
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: hbase-5003-v2.patch, hbase-5003.patch


 Reported by a new user on IRC who tried to set hbase.rootdir to 
 file:///~/hbase, the master gets stuck and cannot be killed. I tried 
 something similar on my machine and it spins while logging:
 {quote}
 2011-12-09 16:11:17,002 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to 
 create version file at file:/bin/hbase, retrying: Mkdirs failed to create 
 file:/bin/hbase
 2011-12-09 16:11:27,002 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to 
 create version file at file:/bin/hbase, retrying: Mkdirs failed to create 
 file:/bin/hbase
 2011-12-09 16:11:37,003 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to 
 create version file at file:/bin/hbase, retrying: Mkdirs failed to create 
 file:/bin/hbase
 {quote}
 The reason it cannot be stopped is that the master's main thread is stuck in 
 there and will never be notified:
 {quote}
 Master:0;su-jdcryans-01.local,51116,1323475535684 prio=5 tid=7f92b7a3c000 
 nid=0x1137ba000 waiting on condition [1137b9000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
   at java.lang.Thread.sleep(Native Method)
   at org.apache.hadoop.hbase.util.FSUtils.setVersion(FSUtils.java:297)
   at org.apache.hadoop.hbase.util.FSUtils.setVersion(FSUtils.java:268)
   at 
 org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:339)
   at 
 org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:128)
   at 
 org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:113)
   at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:435)
   at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:314)
   at 
 org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:218)
   at java.lang.Thread.run(Thread.java:680)
 {quote}
 It seems we should do a better handling of the exceptions we get in there, 
 and die if we need to. It would make a better user experience.
 Maybe also do a check on hbase.rootdir before even starting the master.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5422) StartupBulkAssigner would cause a lot of timeout on RIT when assigning large numbers of regions (timeout = 3 mins)

2012-02-17 Thread chunhui shen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210766#comment-13210766
 ] 

chunhui shen commented on HBASE-5422:
-

Yes, we should add regionPlans, so that when an open comes in from the 
StartupBulkAssign, we could update timers of RIT, where they have the same 
assigning destination.

I agree with make an addPlan method that takes a Map of plans.


 StartupBulkAssigner would cause a lot of timeout on RIT when assigning large 
 numbers of regions (timeout = 3 mins)
 --

 Key: HBASE-5422
 URL: https://issues.apache.org/jira/browse/HBASE-5422
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: chunhui shen
 Attachments: 5422-90.patch, hbase-5422.patch


 In our produce environment
 We find a lot of timeout on RIT when cluster up, there are about 7w regions 
 in the cluster( 25 regionservers ).
 First, we could see the following log:(See the region 
 33cf229845b1009aa8a3f7b0f85c9bd0)
 master's log
 2012-02-13 18:07:41,409 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x348f4a94723da5 Async create of unassigned node for 
 33cf229845b1009aa8a3f7b0f85c9bd0 with OFFLINE state 
 2012-02-13 18:07:42,560 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback:
  rs=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 
 state=OFFLINE, ts=1329127661409, 
 server=r03f11025.yh.aliyun.com,60020,1329127549907 
 2012-02-13 18:07:42,996 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback:
  rs=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 
 state=OFFLINE, ts=1329127661409 
 2012-02-13 18:10:48,072 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 
 state=PENDING_OPEN, ts=1329127662996
 2012-02-13 18:10:48,072 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_OPEN for too long, reassigning 
 region=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 
 2012-02-13 18:11:16,744 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENED, 
 server=r03f11025.yh.aliyun.com,60020,1329127549907, 
 region=33cf229845b1009aa8a3f7b0f85c9bd0 
 2012-02-13 18:38:07,310 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
 event for 33cf229845b1009aa8a3f7b0f85c9bd0; deleting unassigned node 
 2012-02-13 18:38:07,310 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x348f4a94723da5 Deleting existing unassigned node for 
 33cf229845b1009aa8a3f7b0f85c9bd0 that is in expected state 
 RS_ZK_REGION_OPENED 
 2012-02-13 18:38:07,314 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x348f4a94723da5 Successfully deleted unassigned node for region 
 33cf229845b1009aa8a3f7b0f85c9bd0 in expected state RS_ZK_REGION_OPENED 
 2012-02-13 18:38:07,573 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region 
 item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. on 
 r03f11025.yh.aliyun.com,60020,1329127549907 
 2012-02-13 18:50:54,428 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan 
 was found (or we are ignoring an existing plan) for 
 item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. so 
 generated a random one; 
 hri=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0., 
 src=, dest=r01b05043.yh.aliyun.com,60020,1329127549041; 29 (online=29, 
 exclude=null) available servers 
 2012-02-13 18:50:54,428 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. to 
 r01b05043.yh.aliyun.com,60020,1329127549041 
 2012-02-13 19:31:50,514 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 
 state=PENDING_OPEN, ts=1329132528086 
 2012-02-13 19:31:50,514 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_OPEN for too long, reassigning 
 region=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 
 Regionserver's log
 2012-02-13 18:07:43,537 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open 
 region: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 
 2012-02-13 18:11:16,560 DEBUG 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing 
 open of item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 
 

[jira] [Commented] (HBASE-5229) Provide basic building blocks for multi-row local transactions.

2012-02-17 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210770#comment-13210770
 ] 

Lars Hofhansl commented on HBASE-5229:
--

Note that Cassandra adds something similar in 1.1: 
http://www.datastax.com/dev/blog/schema-in-cassandra-1-1 (check towards the end 
of that blog post).

 Provide basic building blocks for multi-row local transactions.
 -

 Key: HBASE-5229
 URL: https://issues.apache.org/jira/browse/HBASE-5229
 Project: HBase
  Issue Type: New Feature
  Components: client, regionserver
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 5229-endpoint.txt, 5229-final.txt, 5229-multiRow-v2.txt, 
 5229-multiRow.txt, 5229-seekto-v2.txt, 5229-seekto.txt, 5229.txt


 In the final iteration, this issue provides a generalized, public 
 mutateRowsWithLocks method on HRegion, that can be used by coprocessors to 
 implement atomic operations efficiently.
 Coprocessors are already region aware, which makes this is a good pairing of 
 APIs. This feature is by design not available to the client via the HTable 
 API.
 It took a long time to arrive at this and I apologize for the public exposure 
 of my (erratic in retrospect) thought processes.
 Was:
 HBase should provide basic building blocks for multi-row local transactions. 
 Local means that we do this by co-locating the data. Global (cross region) 
 transactions are not discussed here.
 After a bit of discussion two solutions have emerged:
 1. Keep the row-key for determining grouping and location and allow efficient 
 intra-row scanning. A client application would then model tables as 
 HBase-rows.
 2. Define a prefix-length in HTableDescriptor that defines a grouping of 
 rows. Regions will then never be split inside a grouping prefix.
 #1 is true to the current storage paradigm of HBase.
 #2 is true to the current client side API.
 I will explore these two with sample patches here.
 
 Was:
 As discussed (at length) on the dev mailing list with the HBASE-3584 and 
 HBASE-5203 committed, supporting atomic cross row transactions within a 
 region becomes simple.
 I am aware of the hesitation about the usefulness of this feature, but we 
 have to start somewhere.
 Let's use this jira for discussion, I'll attach a patch (with tests) 
 momentarily to make this concrete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5420) TestImportTsv does not shut down MR Cluster correctly (fails against 0.23 hadoop)

2012-02-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210773#comment-13210773
 ] 

Hudson commented on HBASE-5420:
---

Integrated in HBase-0.92 #289 (See 
[https://builds.apache.org/job/HBase-0.92/289/])
HBASE-5420 TestImportTsv does not shut down MR Cluster correctly (fails 
against 0.23 hadoop) (Revision 1245796)

 Result = SUCCESS
stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/mapreduce/TestImportTsv.java


 TestImportTsv does not shut down MR Cluster correctly (fails against 0.23 
 hadoop)
 -

 Key: HBASE-5420
 URL: https://issues.apache.org/jira/browse/HBASE-5420
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0, 0.92.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Fix For: 0.94.0, 0.92.1

 Attachments: HBASE-5420-v1.patch, HBASE-5420.patch


 Test calls startMiniMapReduceCluster() but never calls 
 shutdownMiniMapReduceCluster().
 This causes failures with -Dhadoop.profile=23 when both testMROnTable and 
 testMROnTableWithCustomMapper are run, because the cluster cannot start up 
 properly for the second test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-17 Thread chunhui shen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210772#comment-13210772
 ] 

chunhui shen commented on HBASE-5270:
-

{code}
+// We set serverLoad with one region, it could differentiate with
+// regionserver which is started just now
+HServerLoad serverLoad = new HServerLoad();
+serverLoad.setNumberOfRegions(1);
How you know it has a region?
{code}
We do this to mark the RS running ago, not the regionserver which is started 
just now.
(If it is a regionserver started just now, it has no regions, so when master 
assignRootAndMeta,we needn't expire it.(Only 90 version need do this, because 
rootLocation doesn't contain startcode, so we can't be sure it is a rootServer 
according to HServerAddress))

 Handle potential data loss due to concurrent processing of processFaileOver 
 and ServerShutdownHandler
 -

 Key: HBASE-5270
 URL: https://issues.apache.org/jira/browse/HBASE-5270
 Project: HBase
  Issue Type: Sub-task
  Components: master
Reporter: Zhihong Yu
 Fix For: 0.94.0, 0.92.1

 Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 
 5270-90.patch, 5270-90v2.patch, 5270-testcase.patch, 5270-testcasev2.patch, 
 hbase-5270.patch, hbase-5270v2.patch, sampletest.txt


 This JIRA continues the effort from HBASE-5179. Starting with Stack's 
 comments about patches for 0.92 and TRUNK:
 Reviewing 0.92v17
 isDeadServerInProgress is a new public method in ServerManager but it does 
 not seem to be used anywhere.
 Does isDeadRootServerInProgress need to be public? Ditto for meta version.
 This method param names are not right 'definitiveRootServer'; what is meant 
 by definitive? Do they need this qualifier?
 Is there anything in place to stop us expiring a server twice if its carrying 
 root and meta?
 What is difference between asking assignment manager isCarryingRoot and this 
 variable that is passed in? Should be doc'd at least. Ditto for meta.
 I think I've asked for this a few times - onlineServers needs to be 
 explained... either in javadoc or in comment. This is the param passed into 
 joinCluster. How does it arise? I think I know but am unsure. God love the 
 poor noob that comes awandering this code trying to make sense of it all.
 It looks like we get the list by trawling zk for regionserver znodes that 
 have not checked in. Don't we do this operation earlier in master setup? Are 
 we doing it again here?
 Though distributed split log is configured, we will do in master single 
 process splitting under some conditions with this patch. Its not explained in 
 code why we would do this. Why do we think master log splitting 'high 
 priority' when it could very well be slower. Should we only go this route if 
 distributed splitting is not going on. Do we know if concurrent distributed 
 log splitting and master splitting works?
 Why would we have dead servers in progress here in master startup? Because a 
 servershutdownhandler fired?
 This patch is different to the patch for 0.90. Should go into trunk first 
 with tests, then 0.92. Should it be in this issue? This issue is really hard 
 to follow now. Maybe this issue is for 0.90.x and new issue for more work on 
 this trunk patch?
 This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-17 Thread chunhui shen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210776#comment-13210776
 ] 

chunhui shen commented on HBASE-5270:
-

{code}Can you just do + super.nodeDeleted(path); instead of + 
GatedNodeDeleteRegionServerTracker.super.nodeDeleted(path);?
{code}
If we block the nodeDeleted(path) in GatedNodeDeleteRegionServerTracker, it 
will block all the ZK event. 
so I just want to delay the event of RS node deleted through a thread. However, 
in the thread#run(), we need call 
GatedNodeDeleteRegionServerTracker.super.nodeDeleted(path);

 Handle potential data loss due to concurrent processing of processFaileOver 
 and ServerShutdownHandler
 -

 Key: HBASE-5270
 URL: https://issues.apache.org/jira/browse/HBASE-5270
 Project: HBase
  Issue Type: Sub-task
  Components: master
Reporter: Zhihong Yu
 Fix For: 0.94.0, 0.92.1

 Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 
 5270-90.patch, 5270-90v2.patch, 5270-testcase.patch, 5270-testcasev2.patch, 
 hbase-5270.patch, hbase-5270v2.patch, sampletest.txt


 This JIRA continues the effort from HBASE-5179. Starting with Stack's 
 comments about patches for 0.92 and TRUNK:
 Reviewing 0.92v17
 isDeadServerInProgress is a new public method in ServerManager but it does 
 not seem to be used anywhere.
 Does isDeadRootServerInProgress need to be public? Ditto for meta version.
 This method param names are not right 'definitiveRootServer'; what is meant 
 by definitive? Do they need this qualifier?
 Is there anything in place to stop us expiring a server twice if its carrying 
 root and meta?
 What is difference between asking assignment manager isCarryingRoot and this 
 variable that is passed in? Should be doc'd at least. Ditto for meta.
 I think I've asked for this a few times - onlineServers needs to be 
 explained... either in javadoc or in comment. This is the param passed into 
 joinCluster. How does it arise? I think I know but am unsure. God love the 
 poor noob that comes awandering this code trying to make sense of it all.
 It looks like we get the list by trawling zk for regionserver znodes that 
 have not checked in. Don't we do this operation earlier in master setup? Are 
 we doing it again here?
 Though distributed split log is configured, we will do in master single 
 process splitting under some conditions with this patch. Its not explained in 
 code why we would do this. Why do we think master log splitting 'high 
 priority' when it could very well be slower. Should we only go this route if 
 distributed splitting is not going on. Do we know if concurrent distributed 
 log splitting and master splitting works?
 Why would we have dead servers in progress here in master startup? Because a 
 servershutdownhandler fired?
 This patch is different to the patch for 0.90. Should go into trunk first 
 with tests, then 0.92. Should it be in this issue? This issue is really hard 
 to follow now. Maybe this issue is for 0.90.x and new issue for more work on 
 this trunk patch?
 This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-17 Thread chunhui shen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210777#comment-13210777
 ] 

chunhui shen commented on HBASE-5270:
-

{code}
Why the need for this timeout:

+Thread.sleep(1 * 2);
+((GatedNodeDeleteRegionServerTracker) master.getRegionServerTracker()).gate
+.set(false);
{code}
Because we sleep 10s after splitLog,  we sleep 20s to make sure that master is 
assigning RootAndMeta or has assigned. After it we starting process the event 
of RS node deleted

 Handle potential data loss due to concurrent processing of processFaileOver 
 and ServerShutdownHandler
 -

 Key: HBASE-5270
 URL: https://issues.apache.org/jira/browse/HBASE-5270
 Project: HBase
  Issue Type: Sub-task
  Components: master
Reporter: Zhihong Yu
 Fix For: 0.94.0, 0.92.1

 Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 
 5270-90.patch, 5270-90v2.patch, 5270-testcase.patch, 5270-testcasev2.patch, 
 hbase-5270.patch, hbase-5270v2.patch, sampletest.txt


 This JIRA continues the effort from HBASE-5179. Starting with Stack's 
 comments about patches for 0.92 and TRUNK:
 Reviewing 0.92v17
 isDeadServerInProgress is a new public method in ServerManager but it does 
 not seem to be used anywhere.
 Does isDeadRootServerInProgress need to be public? Ditto for meta version.
 This method param names are not right 'definitiveRootServer'; what is meant 
 by definitive? Do they need this qualifier?
 Is there anything in place to stop us expiring a server twice if its carrying 
 root and meta?
 What is difference between asking assignment manager isCarryingRoot and this 
 variable that is passed in? Should be doc'd at least. Ditto for meta.
 I think I've asked for this a few times - onlineServers needs to be 
 explained... either in javadoc or in comment. This is the param passed into 
 joinCluster. How does it arise? I think I know but am unsure. God love the 
 poor noob that comes awandering this code trying to make sense of it all.
 It looks like we get the list by trawling zk for regionserver znodes that 
 have not checked in. Don't we do this operation earlier in master setup? Are 
 we doing it again here?
 Though distributed split log is configured, we will do in master single 
 process splitting under some conditions with this patch. Its not explained in 
 code why we would do this. Why do we think master log splitting 'high 
 priority' when it could very well be slower. Should we only go this route if 
 distributed splitting is not going on. Do we know if concurrent distributed 
 log splitting and master splitting works?
 Why would we have dead servers in progress here in master startup? Because a 
 servershutdownhandler fired?
 This patch is different to the patch for 0.90. Should go into trunk first 
 with tests, then 0.92. Should it be in this issue? This issue is really hard 
 to follow now. Maybe this issue is for 0.90.x and new issue for more work on 
 this trunk patch?
 This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-17 Thread chunhui shen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210778#comment-13210778
 ] 

chunhui shen commented on HBASE-5270:
-

Because this issue contains a bug that root will not be assigned and master 
will block on waiting for root when initializing
So we set timeout for the testcase.

 Handle potential data loss due to concurrent processing of processFaileOver 
 and ServerShutdownHandler
 -

 Key: HBASE-5270
 URL: https://issues.apache.org/jira/browse/HBASE-5270
 Project: HBase
  Issue Type: Sub-task
  Components: master
Reporter: Zhihong Yu
 Fix For: 0.94.0, 0.92.1

 Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 
 5270-90.patch, 5270-90v2.patch, 5270-testcase.patch, 5270-testcasev2.patch, 
 hbase-5270.patch, hbase-5270v2.patch, sampletest.txt


 This JIRA continues the effort from HBASE-5179. Starting with Stack's 
 comments about patches for 0.92 and TRUNK:
 Reviewing 0.92v17
 isDeadServerInProgress is a new public method in ServerManager but it does 
 not seem to be used anywhere.
 Does isDeadRootServerInProgress need to be public? Ditto for meta version.
 This method param names are not right 'definitiveRootServer'; what is meant 
 by definitive? Do they need this qualifier?
 Is there anything in place to stop us expiring a server twice if its carrying 
 root and meta?
 What is difference between asking assignment manager isCarryingRoot and this 
 variable that is passed in? Should be doc'd at least. Ditto for meta.
 I think I've asked for this a few times - onlineServers needs to be 
 explained... either in javadoc or in comment. This is the param passed into 
 joinCluster. How does it arise? I think I know but am unsure. God love the 
 poor noob that comes awandering this code trying to make sense of it all.
 It looks like we get the list by trawling zk for regionserver znodes that 
 have not checked in. Don't we do this operation earlier in master setup? Are 
 we doing it again here?
 Though distributed split log is configured, we will do in master single 
 process splitting under some conditions with this patch. Its not explained in 
 code why we would do this. Why do we think master log splitting 'high 
 priority' when it could very well be slower. Should we only go this route if 
 distributed splitting is not going on. Do we know if concurrent distributed 
 log splitting and master splitting works?
 Why would we have dead servers in progress here in master startup? Because a 
 servershutdownhandler fired?
 This patch is different to the patch for 0.90. Should go into trunk first 
 with tests, then 0.92. Should it be in this issue? This issue is really hard 
 to follow now. Maybe this issue is for 0.90.x and new issue for more work on 
 this trunk patch?
 This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-17 Thread chunhui shen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210783#comment-13210783
 ] 

chunhui shen commented on HBASE-5270:
-

{code}
+   * Dead servers under processing by the ServerShutdownHander. 
Whats that mean?  Its while the server is being processed by 
ServerShutdownHandler exclusively -- these are the inProgress servers?
{code}
Yes,these are the inProgress servers

 Handle potential data loss due to concurrent processing of processFaileOver 
 and ServerShutdownHandler
 -

 Key: HBASE-5270
 URL: https://issues.apache.org/jira/browse/HBASE-5270
 Project: HBase
  Issue Type: Sub-task
  Components: master
Reporter: Zhihong Yu
 Fix For: 0.94.0, 0.92.1

 Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 
 5270-90.patch, 5270-90v2.patch, 5270-testcase.patch, 5270-testcasev2.patch, 
 hbase-5270.patch, hbase-5270v2.patch, sampletest.txt


 This JIRA continues the effort from HBASE-5179. Starting with Stack's 
 comments about patches for 0.92 and TRUNK:
 Reviewing 0.92v17
 isDeadServerInProgress is a new public method in ServerManager but it does 
 not seem to be used anywhere.
 Does isDeadRootServerInProgress need to be public? Ditto for meta version.
 This method param names are not right 'definitiveRootServer'; what is meant 
 by definitive? Do they need this qualifier?
 Is there anything in place to stop us expiring a server twice if its carrying 
 root and meta?
 What is difference between asking assignment manager isCarryingRoot and this 
 variable that is passed in? Should be doc'd at least. Ditto for meta.
 I think I've asked for this a few times - onlineServers needs to be 
 explained... either in javadoc or in comment. This is the param passed into 
 joinCluster. How does it arise? I think I know but am unsure. God love the 
 poor noob that comes awandering this code trying to make sense of it all.
 It looks like we get the list by trawling zk for regionserver znodes that 
 have not checked in. Don't we do this operation earlier in master setup? Are 
 we doing it again here?
 Though distributed split log is configured, we will do in master single 
 process splitting under some conditions with this patch. Its not explained in 
 code why we would do this. Why do we think master log splitting 'high 
 priority' when it could very well be slower. Should we only go this route if 
 distributed splitting is not going on. Do we know if concurrent distributed 
 log splitting and master splitting works?
 Why would we have dead servers in progress here in master startup? Because a 
 servershutdownhandler fired?
 This patch is different to the patch for 0.90. Should go into trunk first 
 with tests, then 0.92. Should it be in this issue? This issue is really hard 
 to follow now. Maybe this issue is for 0.90.x and new issue for more work on 
 this trunk patch?
 This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-17 Thread chunhui shen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210785#comment-13210785
 ] 

chunhui shen commented on HBASE-5270:
-

So, what happens if a server had root and meta and its not expired when we do 
failover?  We'll expire it processing root.  Will we expire it a second time 
processing meta?  Perhaps the answer is no because the first expiration will 
clear the meta state in master?
{code}
 if (metaServerLoad != null  metaServerLoad.getNumberOfRegions()  0
+ !catalogTracker.getRootLocation().equals(metaServerAddress)) {
+  // If metaServer is online  not start just now, we expire it
+  this.serverManager.expireServer(metaServerInfo);
+}
{code}
If a server had root and meta , we will ensure not expire it a second time 
through catalogTracker.getRootLocation().equals(metaServerAddress)

 Handle potential data loss due to concurrent processing of processFaileOver 
 and ServerShutdownHandler
 -

 Key: HBASE-5270
 URL: https://issues.apache.org/jira/browse/HBASE-5270
 Project: HBase
  Issue Type: Sub-task
  Components: master
Reporter: Zhihong Yu
 Fix For: 0.94.0, 0.92.1

 Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 
 5270-90.patch, 5270-90v2.patch, 5270-testcase.patch, 5270-testcasev2.patch, 
 hbase-5270.patch, hbase-5270v2.patch, sampletest.txt


 This JIRA continues the effort from HBASE-5179. Starting with Stack's 
 comments about patches for 0.92 and TRUNK:
 Reviewing 0.92v17
 isDeadServerInProgress is a new public method in ServerManager but it does 
 not seem to be used anywhere.
 Does isDeadRootServerInProgress need to be public? Ditto for meta version.
 This method param names are not right 'definitiveRootServer'; what is meant 
 by definitive? Do they need this qualifier?
 Is there anything in place to stop us expiring a server twice if its carrying 
 root and meta?
 What is difference between asking assignment manager isCarryingRoot and this 
 variable that is passed in? Should be doc'd at least. Ditto for meta.
 I think I've asked for this a few times - onlineServers needs to be 
 explained... either in javadoc or in comment. This is the param passed into 
 joinCluster. How does it arise? I think I know but am unsure. God love the 
 poor noob that comes awandering this code trying to make sense of it all.
 It looks like we get the list by trawling zk for regionserver znodes that 
 have not checked in. Don't we do this operation earlier in master setup? Are 
 we doing it again here?
 Though distributed split log is configured, we will do in master single 
 process splitting under some conditions with this patch. Its not explained in 
 code why we would do this. Why do we think master log splitting 'high 
 priority' when it could very well be slower. Should we only go this route if 
 distributed splitting is not going on. Do we know if concurrent distributed 
 log splitting and master splitting works?
 Why would we have dead servers in progress here in master startup? Because a 
 servershutdownhandler fired?
 This patch is different to the patch for 0.90. Should go into trunk first 
 with tests, then 0.92. Should it be in this issue? This issue is really hard 
 to follow now. Maybe this issue is for 0.90.x and new issue for more work on 
 this trunk patch?
 This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-17 Thread chunhui shen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210788#comment-13210788
 ] 

chunhui shen commented on HBASE-5270:
-

For the other suggestion,I will do a modify later.
Thanks for Stack's review!

 Handle potential data loss due to concurrent processing of processFaileOver 
 and ServerShutdownHandler
 -

 Key: HBASE-5270
 URL: https://issues.apache.org/jira/browse/HBASE-5270
 Project: HBase
  Issue Type: Sub-task
  Components: master
Reporter: Zhihong Yu
 Fix For: 0.94.0, 0.92.1

 Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 
 5270-90.patch, 5270-90v2.patch, 5270-testcase.patch, 5270-testcasev2.patch, 
 hbase-5270.patch, hbase-5270v2.patch, sampletest.txt


 This JIRA continues the effort from HBASE-5179. Starting with Stack's 
 comments about patches for 0.92 and TRUNK:
 Reviewing 0.92v17
 isDeadServerInProgress is a new public method in ServerManager but it does 
 not seem to be used anywhere.
 Does isDeadRootServerInProgress need to be public? Ditto for meta version.
 This method param names are not right 'definitiveRootServer'; what is meant 
 by definitive? Do they need this qualifier?
 Is there anything in place to stop us expiring a server twice if its carrying 
 root and meta?
 What is difference between asking assignment manager isCarryingRoot and this 
 variable that is passed in? Should be doc'd at least. Ditto for meta.
 I think I've asked for this a few times - onlineServers needs to be 
 explained... either in javadoc or in comment. This is the param passed into 
 joinCluster. How does it arise? I think I know but am unsure. God love the 
 poor noob that comes awandering this code trying to make sense of it all.
 It looks like we get the list by trawling zk for regionserver znodes that 
 have not checked in. Don't we do this operation earlier in master setup? Are 
 we doing it again here?
 Though distributed split log is configured, we will do in master single 
 process splitting under some conditions with this patch. Its not explained in 
 code why we would do this. Why do we think master log splitting 'high 
 priority' when it could very well be slower. Should we only go this route if 
 distributed splitting is not going on. Do we know if concurrent distributed 
 log splitting and master splitting works?
 Why would we have dead servers in progress here in master startup? Because a 
 servershutdownhandler fired?
 This patch is different to the patch for 0.90. Should go into trunk first 
 with tests, then 0.92. Should it be in this issue? This issue is really hard 
 to follow now. Maybe this issue is for 0.90.x and new issue for more work on 
 this trunk patch?
 This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5003) If the master is started with a wrong root dir, it gets stuck and can't be killed

2012-02-17 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210786#comment-13210786
 ] 

Hadoop QA commented on HBASE-5003:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12515059/hbase-5003-v2.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -136 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 158 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestImportTsv
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/986//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/986//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/986//console

This message is automatically generated.

 If the master is started with a wrong root dir, it gets stuck and can't be 
 killed
 -

 Key: HBASE-5003
 URL: https://issues.apache.org/jira/browse/HBASE-5003
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Priority: Critical
  Labels: noob
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: hbase-5003-v2.patch, hbase-5003.patch


 Reported by a new user on IRC who tried to set hbase.rootdir to 
 file:///~/hbase, the master gets stuck and cannot be killed. I tried 
 something similar on my machine and it spins while logging:
 {quote}
 2011-12-09 16:11:17,002 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to 
 create version file at file:/bin/hbase, retrying: Mkdirs failed to create 
 file:/bin/hbase
 2011-12-09 16:11:27,002 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to 
 create version file at file:/bin/hbase, retrying: Mkdirs failed to create 
 file:/bin/hbase
 2011-12-09 16:11:37,003 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to 
 create version file at file:/bin/hbase, retrying: Mkdirs failed to create 
 file:/bin/hbase
 {quote}
 The reason it cannot be stopped is that the master's main thread is stuck in 
 there and will never be notified:
 {quote}
 Master:0;su-jdcryans-01.local,51116,1323475535684 prio=5 tid=7f92b7a3c000 
 nid=0x1137ba000 waiting on condition [1137b9000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
   at java.lang.Thread.sleep(Native Method)
   at org.apache.hadoop.hbase.util.FSUtils.setVersion(FSUtils.java:297)
   at org.apache.hadoop.hbase.util.FSUtils.setVersion(FSUtils.java:268)
   at 
 org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:339)
   at 
 org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:128)
   at 
 org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:113)
   at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:435)
   at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:314)
   at 
 org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:218)
   at java.lang.Thread.run(Thread.java:680)
 {quote}
 It seems we should do a better handling of the exceptions we get in there, 
 and die if we need to. It would make a better user experience.
 Maybe also do a check on hbase.rootdir before even starting the master.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-17 Thread chunhui shen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210787#comment-13210787
 ] 

chunhui shen commented on HBASE-5270:
-

{code}
+// Remove regions in RIT, they are may being processed by the SSH.
+synchronized (regionsInTransition) {
+  nodes.removeAll(regionsInTransition.keySet());
+}
{code}
Perhaps SSH has put up something in RIT because its done an assign and here we 
are blanket removing them all?

Yes, SSH and master'initializing Thread may assign the same regions, so we 
should do a prevent of mutli assign.

 Handle potential data loss due to concurrent processing of processFaileOver 
 and ServerShutdownHandler
 -

 Key: HBASE-5270
 URL: https://issues.apache.org/jira/browse/HBASE-5270
 Project: HBase
  Issue Type: Sub-task
  Components: master
Reporter: Zhihong Yu
 Fix For: 0.94.0, 0.92.1

 Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 
 5270-90.patch, 5270-90v2.patch, 5270-testcase.patch, 5270-testcasev2.patch, 
 hbase-5270.patch, hbase-5270v2.patch, sampletest.txt


 This JIRA continues the effort from HBASE-5179. Starting with Stack's 
 comments about patches for 0.92 and TRUNK:
 Reviewing 0.92v17
 isDeadServerInProgress is a new public method in ServerManager but it does 
 not seem to be used anywhere.
 Does isDeadRootServerInProgress need to be public? Ditto for meta version.
 This method param names are not right 'definitiveRootServer'; what is meant 
 by definitive? Do they need this qualifier?
 Is there anything in place to stop us expiring a server twice if its carrying 
 root and meta?
 What is difference between asking assignment manager isCarryingRoot and this 
 variable that is passed in? Should be doc'd at least. Ditto for meta.
 I think I've asked for this a few times - onlineServers needs to be 
 explained... either in javadoc or in comment. This is the param passed into 
 joinCluster. How does it arise? I think I know but am unsure. God love the 
 poor noob that comes awandering this code trying to make sense of it all.
 It looks like we get the list by trawling zk for regionserver znodes that 
 have not checked in. Don't we do this operation earlier in master setup? Are 
 we doing it again here?
 Though distributed split log is configured, we will do in master single 
 process splitting under some conditions with this patch. Its not explained in 
 code why we would do this. Why do we think master log splitting 'high 
 priority' when it could very well be slower. Should we only go this route if 
 distributed splitting is not going on. Do we know if concurrent distributed 
 log splitting and master splitting works?
 Why would we have dead servers in progress here in master startup? Because a 
 servershutdownhandler fired?
 This patch is different to the patch for 0.90. Should go into trunk first 
 with tests, then 0.92. Should it be in this issue? This issue is really hard 
 to follow now. Maybe this issue is for 0.90.x and new issue for more work on 
 this trunk patch?
 This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-17 Thread chunhui shen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210791#comment-13210791
 ] 

chunhui shen commented on HBASE-5270:
-

{code}So, what happens if a server had root and meta and its not expired when 
we do failover? We'll expire it processing root. Will we expire it a second 
time processing meta? Perhaps the answer is no because the first expiration 
will clear the meta state in master?
{code}
I'm sorry I'm wrong for the upper comment.

if a server had root and meta, it will be expired when processing root,
and we will not expire it a second time processing meta because the following 
code (metaServerInfo == null)
{code}+  HServerInfo metaServerInfo = this.serverManager
+  .getHServerInfo(metaServerAddress);
+  if (metaServerInfo != null) {
+HServerLoad metaServerLoad = metaServerInfo.getLoad();
+if (metaServerLoad != null  metaServerLoad.getNumberOfRegions()  0
+ !catalogTracker.getRootLocation().equals(metaServerAddress)) {
+  // If metaServer is online  not start just now, we expire it
+  this.serverManager.expireServer(metaServerInfo);
+}
+  }
{code}

 Handle potential data loss due to concurrent processing of processFaileOver 
 and ServerShutdownHandler
 -

 Key: HBASE-5270
 URL: https://issues.apache.org/jira/browse/HBASE-5270
 Project: HBase
  Issue Type: Sub-task
  Components: master
Reporter: Zhihong Yu
 Fix For: 0.94.0, 0.92.1

 Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 
 5270-90.patch, 5270-90v2.patch, 5270-testcase.patch, 5270-testcasev2.patch, 
 hbase-5270.patch, hbase-5270v2.patch, sampletest.txt


 This JIRA continues the effort from HBASE-5179. Starting with Stack's 
 comments about patches for 0.92 and TRUNK:
 Reviewing 0.92v17
 isDeadServerInProgress is a new public method in ServerManager but it does 
 not seem to be used anywhere.
 Does isDeadRootServerInProgress need to be public? Ditto for meta version.
 This method param names are not right 'definitiveRootServer'; what is meant 
 by definitive? Do they need this qualifier?
 Is there anything in place to stop us expiring a server twice if its carrying 
 root and meta?
 What is difference between asking assignment manager isCarryingRoot and this 
 variable that is passed in? Should be doc'd at least. Ditto for meta.
 I think I've asked for this a few times - onlineServers needs to be 
 explained... either in javadoc or in comment. This is the param passed into 
 joinCluster. How does it arise? I think I know but am unsure. God love the 
 poor noob that comes awandering this code trying to make sense of it all.
 It looks like we get the list by trawling zk for regionserver znodes that 
 have not checked in. Don't we do this operation earlier in master setup? Are 
 we doing it again here?
 Though distributed split log is configured, we will do in master single 
 process splitting under some conditions with this patch. Its not explained in 
 code why we would do this. Why do we think master log splitting 'high 
 priority' when it could very well be slower. Should we only go this route if 
 distributed splitting is not going on. Do we know if concurrent distributed 
 log splitting and master splitting works?
 Why would we have dead servers in progress here in master startup? Because a 
 servershutdownhandler fired?
 This patch is different to the patch for 0.90. Should go into trunk first 
 with tests, then 0.92. Should it be in this issue? This issue is really hard 
 to follow now. Maybe this issue is for 0.90.x and new issue for more work on 
 this trunk patch?
 This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-17 Thread chunhui shen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210790#comment-13210790
 ] 

chunhui shen commented on HBASE-5270:
-

{code}So, what happens if a server had root and meta and its not expired when 
we do failover? We'll expire it processing root. Will we expire it a second 
time processing meta? Perhaps the answer is no because the first expiration 
will clear the meta state in master?
{code}
I'm sorry I'm wrong for the upper comment.

if a server had root and meta, it will be expired when processing root,
and we will not expire it a second time processing meta because the following 
code (metaServerInfo == null)
{code}+  HServerInfo metaServerInfo = this.serverManager
+  .getHServerInfo(metaServerAddress);
+  if (metaServerInfo != null) {
+HServerLoad metaServerLoad = metaServerInfo.getLoad();
+if (metaServerLoad != null  metaServerLoad.getNumberOfRegions()  0
+ !catalogTracker.getRootLocation().equals(metaServerAddress)) {
+  // If metaServer is online  not start just now, we expire it
+  this.serverManager.expireServer(metaServerInfo);
+}
+  }
{code}

 Handle potential data loss due to concurrent processing of processFaileOver 
 and ServerShutdownHandler
 -

 Key: HBASE-5270
 URL: https://issues.apache.org/jira/browse/HBASE-5270
 Project: HBase
  Issue Type: Sub-task
  Components: master
Reporter: Zhihong Yu
 Fix For: 0.94.0, 0.92.1

 Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 
 5270-90.patch, 5270-90v2.patch, 5270-testcase.patch, 5270-testcasev2.patch, 
 hbase-5270.patch, hbase-5270v2.patch, sampletest.txt


 This JIRA continues the effort from HBASE-5179. Starting with Stack's 
 comments about patches for 0.92 and TRUNK:
 Reviewing 0.92v17
 isDeadServerInProgress is a new public method in ServerManager but it does 
 not seem to be used anywhere.
 Does isDeadRootServerInProgress need to be public? Ditto for meta version.
 This method param names are not right 'definitiveRootServer'; what is meant 
 by definitive? Do they need this qualifier?
 Is there anything in place to stop us expiring a server twice if its carrying 
 root and meta?
 What is difference between asking assignment manager isCarryingRoot and this 
 variable that is passed in? Should be doc'd at least. Ditto for meta.
 I think I've asked for this a few times - onlineServers needs to be 
 explained... either in javadoc or in comment. This is the param passed into 
 joinCluster. How does it arise? I think I know but am unsure. God love the 
 poor noob that comes awandering this code trying to make sense of it all.
 It looks like we get the list by trawling zk for regionserver znodes that 
 have not checked in. Don't we do this operation earlier in master setup? Are 
 we doing it again here?
 Though distributed split log is configured, we will do in master single 
 process splitting under some conditions with this patch. Its not explained in 
 code why we would do this. Why do we think master log splitting 'high 
 priority' when it could very well be slower. Should we only go this route if 
 distributed splitting is not going on. Do we know if concurrent distributed 
 log splitting and master splitting works?
 Why would we have dead servers in progress here in master startup? Because a 
 servershutdownhandler fired?
 This patch is different to the patch for 0.90. Should go into trunk first 
 with tests, then 0.92. Should it be in this issue? This issue is really hard 
 to follow now. Maybe this issue is for 0.90.x and new issue for more work on 
 this trunk patch?
 This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5075) regionserver crashed and failover

2012-02-17 Thread zhiyuan.dai (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210793#comment-13210793
 ] 

zhiyuan.dai commented on HBASE-5075:


@Lars Hofhansl 
I am sorry for that i have reformated the existing code,i will do another patch.

There are two versions of this work.We have implemented version 1 that can't 
check machine dying but can check process crashed.The next version will realize 
all about it. 

thanks for your reply.

 regionserver crashed and failover
 -

 Key: HBASE-5075
 URL: https://issues.apache.org/jira/browse/HBASE-5075
 Project: HBase
  Issue Type: Improvement
  Components: monitoring, regionserver, replication, zookeeper
Affects Versions: 0.92.1
Reporter: zhiyuan.dai
 Fix For: 0.90.5

 Attachments: 5075.patch


 regionserver crashed,it is too long time to notify hmaster.when hmaster know 
 regionserver's shutdown,it is long time to fetch the hlog's lease.
 hbase is a online db, availability is very important.
 i have a idea to improve availability, monitor node to check regionserver's 
 pid.if this pid not exsits,i think the rs down,i will delete the znode,and 
 force close the hlog file.
 so the period maybe 100ms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5075) regionserver crashed and failover

2012-02-17 Thread zhiyuan.dai (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210796#comment-13210796
 ] 

zhiyuan.dai commented on HBASE-5075:


@stack
I am sorry for that i have reformated the existing code,i will do another patch.

Thanks for the designing points you've mentioned. I am handling the design 
documents which include the answers to your questions and i will upload them as 
soon as possible.

 regionserver crashed and failover
 -

 Key: HBASE-5075
 URL: https://issues.apache.org/jira/browse/HBASE-5075
 Project: HBase
  Issue Type: Improvement
  Components: monitoring, regionserver, replication, zookeeper
Affects Versions: 0.92.1
Reporter: zhiyuan.dai
 Fix For: 0.90.5

 Attachments: 5075.patch


 regionserver crashed,it is too long time to notify hmaster.when hmaster know 
 regionserver's shutdown,it is long time to fetch the hlog's lease.
 hbase is a online db, availability is very important.
 i have a idea to improve availability, monitor node to check regionserver's 
 pid.if this pid not exsits,i think the rs down,i will delete the znode,and 
 force close the hlog file.
 so the period maybe 100ms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5424) HTable meet NPE when call getRegionInfo()

2012-02-17 Thread zhiyuan.dai (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210797#comment-13210797
 ] 

zhiyuan.dai commented on HBASE-5424:


@stack @Hadoop QA
i will improve, and do another patch which included test ut.

 HTable meet NPE when call getRegionInfo()
 -

 Key: HBASE-5424
 URL: https://issues.apache.org/jira/browse/HBASE-5424
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.1, 0.90.5
Reporter: junhua yang
 Attachments: HBASE-5424.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 We meet NPE when call getRegionInfo() in testing environment.
 Exception in thread main java.lang.NullPointerException
 at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75)
 at org.apache.hadoop.hbase.util.Writables.getHRegionInfo(Writables.java:119)
 at org.apache.hadoop.hbase.client.HTable$2.processRow(HTable.java:395)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:190)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:73)
 at org.apache.hadoop.hbase.client.HTable.getRegionsInfo(HTable.java:418)
 This NPE also make the table.jsp can't show the region information of this 
 table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5424) HTable meet NPE when call getRegionInfo()

2012-02-17 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210799#comment-13210799
 ] 

Lars Hofhansl commented on HBASE-5424:
--

@junhua: Under which circumstances do you see this NPE? Seems strange that we 
have not encountered that before. Are you sure this is not a case of different 
server and client versions?

 HTable meet NPE when call getRegionInfo()
 -

 Key: HBASE-5424
 URL: https://issues.apache.org/jira/browse/HBASE-5424
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.1, 0.90.5
Reporter: junhua yang
 Attachments: HBASE-5424.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 We meet NPE when call getRegionInfo() in testing environment.
 Exception in thread main java.lang.NullPointerException
 at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75)
 at org.apache.hadoop.hbase.util.Writables.getHRegionInfo(Writables.java:119)
 at org.apache.hadoop.hbase.client.HTable$2.processRow(HTable.java:395)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:190)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:73)
 at org.apache.hadoop.hbase.client.HTable.getRegionsInfo(HTable.java:418)
 This NPE also make the table.jsp can't show the region information of this 
 table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5294) Make sure javadoc is included in tarball bundle when we release

2012-02-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210803#comment-13210803
 ] 

Hudson commented on HBASE-5294:
---

Integrated in HBase-0.92 #290 (See 
[https://builds.apache.org/job/HBase-0.92/290/])
HBASE-5294 Make sure javadoc is included in tarball bundle when we release 
(Revision 1245826)

 Result = SUCCESS
stack : 
Files : 
* /hbase/branches/0.92/pom.xml


 Make sure javadoc is included in tarball bundle when we release
 ---

 Key: HBASE-5294
 URL: https://issues.apache.org/jira/browse/HBASE-5294
 Project: HBase
  Issue Type: Task
Affects Versions: 0.92.0
Reporter: stack
Assignee: Shaneal Manek
Priority: Critical
 Fix For: 0.94.0, 0.92.1

 Attachments: hbase-5294.patch


 0.92.0 doesn't have javadoc in the tarball.  Fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent

2012-02-17 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210804#comment-13210804
 ] 

ramkrishna.s.vasudevan commented on HBASE-5200:
---

@Stack
yes the closing node is created by master now.
As I had mentioned in my previous comments
in 0.90 the closing node if created by RS then on master  failover first we set 
watch on list children on unassigned node. So RS creates the node just after 
setting children watch we will start getting callback which will be missed. If 
we make only the master to create nodes then thisproblem Can be avoided.

 AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the 
 region assignment inconsistent
 -

 Key: HBASE-5200
 URL: https://issues.apache.org/jira/browse/HBASE-5200
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: 5200-test.txt, 5200-v2.txt, 5200-v3.txt, 5200-v4.txt, 
 5200-v4no-prefix.txt, HBASE-5200.patch, HBASE-5200_1.patch, 
 HBASE-5200_trunk_latest_with_test_2.patch, 
 TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, 
 hbase-5200_90_latest.patch, hbase-5200_90_latest_new.patch


 This is the scenario
 Consider a case where the balancer is going on thus trying to close regions 
 in a RS.
 Before we could close a master switch happens.  
 On Master switch the set of nodes that are in RIT is collected and we first 
 get Data and start watching the node
 After that the node data is added into RIT.
 Now by this time (before adding to RIT) if the RS to which close was called 
 does a transition in AM.handleRegion() we miss the handling saying RIT state 
 was null.
 {code}
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 a66d281d231dfcaea97c270698b26b6f from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 c12e53bfd48ddc5eec507d66821c4d23 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 59ae13de8c1eb325a0dd51f4902d2052 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 f45bc9614d7575f35244849af85aa078 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 cc3ecd7054fe6cd4a1159ed92fd62641 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 3af40478a17fee96b4a192b22c90d5a2 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 e6096a8466e730463e10d3d61f809b92 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 4806781a1a23066f7baed22b4d237e24 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 d69e104131accaefe21dcc01fddc7629 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 {code}
 In branch the CLOSING node is created by RS thus leading to more 
 inconsistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5162) Basic client pushback mechanism

2012-02-17 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210809#comment-13210809
 ] 

jirapos...@reviews.apache.org commented on HBASE-5162:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3930/#review5209
---



src/main/java/org/apache/hadoop/hbase/client/HTable.java
https://reviews.apache.org/r/3930/#comment11397

I find this a bit dubious.
This won't actually slow the client thread down, but just accumulate more 
data and reduce the number of RPCs. In the end it might lead to more load on 
the server, because we can deliver more puts as with fewer but larger batches.

I'd rather just rely on the server sleeping the thread for a bit (as you do 
later).



src/main/java/org/apache/hadoop/hbase/client/HTable.java
https://reviews.apache.org/r/3930/#comment11399

What if the flusher is not null? Should we re-calculate the wait time?



src/main/java/org/apache/hadoop/hbase/client/HTable.java
https://reviews.apache.org/r/3930/#comment11400

If sleepTime is 0 (for example from NoServerBackoffPolicy), we should 
probably not create the thread and flush right here.

(But as I said in the comment above, I'd probably not bother with this 
extra thread to begin with :) )



src/main/java/org/apache/hadoop/hbase/client/HTable.java
https://reviews.apache.org/r/3930/#comment11401

Was this a problem before? Or only now becaue of the background thread?



src/main/java/org/apache/hadoop/hbase/client/HTable.java
https://reviews.apache.org/r/3930/#comment11398

This will break backwards compatibility, right? (Not saying that's not ok, 
just calling it out)

I'd almost rather have the client not know about this, until we reach a bad 
spot (in which case we can throw back retryable exceptions).



src/main/java/org/apache/hadoop/hbase/regionserver/MemstorePressureMonitor.java
https://reviews.apache.org/r/3930/#comment11402

Ah ok, this is where we gracefully delay the server thread a bit.
Seems this would need to be tweaked carefully to make it effective while 
not slowing normal operations.

Should the serverPauseTime be somehow related to the amount of pressure.
I.e. wait a bit more if the pressure is higher?
Maybe the pausetime calculation should be part of the pluggable policy?

Also in the jira there was some discussion about throwing (presumably 
retryable) exceptions back to the client is the pressure gets too high. That 
would slow the client, without consuming server resources (beyond multiple 
requests).



src/main/java/org/apache/hadoop/hbase/regionserver/StoreUtils.java
https://reviews.apache.org/r/3930/#comment11403

General comment: Where are we on putting/documenting these things in 
hbase-defaults.xml?


- Lars


On 2012-02-16 20:45:50, Jesse Yates wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3930/
bq.  ---
bq.  
bq.  (Updated 2012-02-16 20:45:50)
bq.  
bq.  
bq.  Review request for hbase, Michael Stack, Jean-Daniel Cryans, and Lars 
Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Under heavy write load, HBase will create a saw-tooth pattern in accepting 
writes. This is due to the I/O in minor compactions not being able to keep up 
with the write load. Specifically, the memstore is attempting to flush while we 
are attempting to do a minor compaction, leading to blocking _all_ writes. 
Instead, we need to have the option of graceful degradation mechanism.
bq.  
bq.  This patch supports both a short-term,adjustable server-side write 
blocking as well as client-side back-off to help alleviate temporary memstore 
pressure.
bq.  
bq.  
bq.  This addresses bug HBASE-5162.
bq.  https://issues.apache.org/jira/browse/HBASE-5162
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 
bq.src/main/java/org/apache/hadoop/hbase/client/BackoffPolicy.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/client/HTable.java 57605e6 
bq.src/main/java/org/apache/hadoop/hbase/client/MonitoredResult.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/client/NoServerBackoffPolicy.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 25cb31d 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 
7d7be3c 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/MemstorePressureMonitor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/OperationStatus.java 
1b94ab5 
bq.

[jira] [Created] (HBASE-5431) Improve delete marker handling in Import M/R jobs

2012-02-17 Thread Lars Hofhansl (Created) (JIRA)
Improve delete marker handling in Import M/R jobs
-

 Key: HBASE-5431
 URL: https://issues.apache.org/jira/browse/HBASE-5431
 Project: HBase
  Issue Type: Sub-task
  Components: mapreduce
Affects Versions: 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.0


Import currently create a new Delete object for each delete KV found in a 
result object.
This can be improved with the new Delete API that allows adding a delete KV to 
a Delete object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5431) Improve delete marker handling in Import M/R jobs

2012-02-17 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5431:
-

Attachment: 5431.txt

Simple patch.
The removed Delete constructor was added in 0.94 just for this case, so it's 
safe to remove it now.

 Improve delete marker handling in Import M/R jobs
 -

 Key: HBASE-5431
 URL: https://issues.apache.org/jira/browse/HBASE-5431
 Project: HBase
  Issue Type: Sub-task
  Components: mapreduce
Affects Versions: 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.0

 Attachments: 5431.txt


 Import currently create a new Delete object for each delete KV found in a 
 result object.
 This can be improved with the new Delete API that allows adding a delete KV 
 to a Delete object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5431) Improve delete marker handling in Import M/R jobs

2012-02-17 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5431:
-

Status: Patch Available  (was: Open)

 Improve delete marker handling in Import M/R jobs
 -

 Key: HBASE-5431
 URL: https://issues.apache.org/jira/browse/HBASE-5431
 Project: HBase
  Issue Type: Sub-task
  Components: mapreduce
Affects Versions: 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.0

 Attachments: 5431.txt


 Import currently create a new Delete object for each delete KV found in a 
 result object.
 This can be improved with the new Delete API that allows adding a delete KV 
 to a Delete object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5255) Use singletons for OperationStatus to save memory

2012-02-17 Thread Benoit Sigoure (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210817#comment-13210817
 ] 

Benoit Sigoure commented on HBASE-5255:
---

A hunk was missed when this patch got merged in the 0.92 branch:

{code}

--- a/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
+++ b/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
@@ -1862,8 +1862,7 @@ public class HRegion implements HeapSize { // , Writable{
   continue;
 }
 addedSize += applyFamilyMapToMemstore(familyMaps[i]);
-batchOp.retCodeDetails[i] = new OperationStatus(
-OperationStatusCode.SUCCESS);
+batchOp.retCodeDetails[i] = OperationStatus.SUCCESS;
   }
 
   // 
{code}

Do you want to file another JIRA about it?

 Use singletons for OperationStatus to save memory
 -

 Key: HBASE-5255
 URL: https://issues.apache.org/jira/browse/HBASE-5255
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.90.5, 0.92.0
Reporter: Benoit Sigoure
Assignee: Benoit Sigoure
Priority: Minor
  Labels: performance
 Fix For: 0.94.0, 0.92.1

 Attachments: 5255-92.txt, 5255-v2.txt, 
 HBASE-5255-0.92-Use-singletons-to-remove-unnecessary-memory-allocati.patch, 
 HBASE-5255-trunk-Use-singletons-to-remove-unnecessary-memory-allocati.patch


 Every single {{Put}} causes the allocation of at least one 
 {{OperationStatus}}, yet {{OperationStatus}} is almost always stateless, so 
 these allocations are unnecessary and could be avoided.  Attached patch adds 
 a few singletons and uses them, with no public API change.  I didn't test the 
 patches, but you get the idea.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5209) HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup

2012-02-17 Thread David S. Wang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210818#comment-13210818
 ] 

David S. Wang commented on HBASE-5209:
--

Stack,

Thanks for the review.

I am a bit confused about your previous comment though: I do not update 
HMasterInterface in my latest patch, nor do I change anything related to 
isActiveMaster. The only version that I update is VERSION for ClusterStatus, 
and I already add the new fields to the end with my current patch. I think 
perhaps that is what you are referring to.

I did two tests with hbase hbck -details to test ClusterStatus:

1. I tested old client (top of trunk 0.92) with new server (0.92 with my patch 
but without bumping VERSION), and things worked fine.

2. I tested new client (0.92 with my patch but without bumping VERSION), with 
old server (top of trunk 0.92), and got the following error.  I'm thinking 
because the new client expects the new fields I added that the old server never 
sends. Is this OK behavior?

INFO zookeeper.ClientCnxn: Session establishment complete on server 
haus02.sf.cloudera.com/172.29.5.33:30181, sessionid = 0x1358ef9f91b000b, 
negotiated timeout = 5000

[... pauses for some time ...]

12/02/17 15:51:46 ERROR io.HbaseObjectWritable: Error in readFields 
java.net.SocketTimeoutException: 6 millis timeout while waiting for channel 
to be ready for read. ch : java.nio.channels.SocketChannel[connected 
local=/172.29.5.33:41223 remote=haus04.sf.cloudera.com/172.29.5.35:31000] 
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.FilterInputStream.read(FilterInputStream.java:116)
at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:311)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.DataInputStream.readByte(DataInputStream.java:248)
at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:299)
at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:320)
at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:146)
at org.apache.hadoop.hbase.ClusterStatus.readFields(ClusterStatus.java:334)
at 
org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:647)
at 
org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:311)
at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:593)
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:50

 HConnection/HMasterInterface should allow for way to get hostname of 
 currently active master in multi-master HBase setup
 

 Key: HBASE-5209
 URL: https://issues.apache.org/jira/browse/HBASE-5209
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.94.0, 0.90.5, 0.92.0
Reporter: Aditya Acharya
Assignee: David S. Wang
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: HBASE-5209-v0.diff, HBASE-5209-v1.diff


 I have a multi-master HBase set up, and I'm trying to programmatically 
 determine which of the masters is currently active. But the API does not 
 allow me to do this. There is a getMaster() method in the HConnection class, 
 but it returns an HMasterInterface, whose methods do not allow me to find out 
 which master won the last race. The API should have a 
 getActiveMasterHostname() or something to that effect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5428) Allow for custom filters to be registered within the Thrift interface

2012-02-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210820#comment-13210820
 ] 

Hudson commented on HBASE-5428:
---

Integrated in HBase-TRUNK-security #114 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/114/])
HBASE-5428 Allow for custom filters to be registered within the Thrift 
interface (Revision 1245774)

 Result = FAILURE
stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/filter/ParseFilter.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift/ThriftServerRunner.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/filter/TestParseFilter.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServer.java


 Allow for custom filters to be registered within the Thrift interface
 -

 Key: HBASE-5428
 URL: https://issues.apache.org/jira/browse/HBASE-5428
 Project: HBase
  Issue Type: Improvement
  Components: thrift
Affects Versions: 0.92.0
Reporter: Robert Roland
  Labels: patch
 Fix For: 0.94.0

 Attachments: ThriftCustomFilters.patch


 Custom filters work within the Java client API, but are not accessible within 
 the Thrift API.  Attempting to use one will generate a Filter Name x not 
 supported
 Attached patch allows a user to specify a list of custom filters that are 
 registered at Thrift server startup time within the HBase configuration files:
 property
   namehbase.thrift.filters/name
   valueMyFilter:com.foo.Filter,OtherFilter:com.foo.OtherFilter/value
 /property
 Patch created off SVN r1245727

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5427) Upgrade our zk to 3.4.3

2012-02-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210821#comment-13210821
 ] 

Hudson commented on HBASE-5427:
---

Integrated in HBase-TRUNK-security #114 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/114/])
HBASE-5427 Upgrade our zk to 3.4.3 (Revision 1245759)

 Result = FAILURE
stack : 
Files : 
* /hbase/trunk/pom.xml


 Upgrade our zk to 3.4.3
 ---

 Key: HBASE-5427
 URL: https://issues.apache.org/jira/browse/HBASE-5427
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Fix For: 0.94.0, 0.92.1

 Attachments: 5427.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5294) Make sure javadoc is included in tarball bundle when we release

2012-02-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210823#comment-13210823
 ] 

Hudson commented on HBASE-5294:
---

Integrated in HBase-TRUNK-security #114 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/114/])
HBASE-5294 Make sure javadoc is included in tarball bundle when we release 
(Revision 1245827)

 Result = FAILURE
stack : 
Files : 
* /hbase/trunk/pom.xml
* /hbase/trunk/src/docbkx/developer.xml


 Make sure javadoc is included in tarball bundle when we release
 ---

 Key: HBASE-5294
 URL: https://issues.apache.org/jira/browse/HBASE-5294
 Project: HBase
  Issue Type: Task
Affects Versions: 0.92.0
Reporter: stack
Assignee: Shaneal Manek
Priority: Critical
 Fix For: 0.94.0, 0.92.1

 Attachments: hbase-5294.patch


 0.92.0 doesn't have javadoc in the tarball.  Fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5120) Timeout monitor races with table disable handler

2012-02-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210822#comment-13210822
 ] 

Hudson commented on HBASE-5120:
---

Integrated in HBase-TRUNK-security #114 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/114/])
HBASE-5120 Timeout monitor races with table disable handler (Revision 
1245731)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java


 Timeout monitor races with table disable handler
 

 Key: HBASE-5120
 URL: https://issues.apache.org/jira/browse/HBASE-5120
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Zhihong Yu
Assignee: ramkrishna.s.vasudevan
Priority: Blocker
 Fix For: 0.94.0

 Attachments: HBASE-5120.patch, HBASE-5120_1.patch, 
 HBASE-5120_2.patch, HBASE-5120_3.patch, HBASE-5120_4.patch, 
 HBASE-5120_5.patch, HBASE-5120_5.patch


 Here is what J-D described here:
 https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
 I think I will retract from my statement that it used to be extremely racy 
 and caused more troubles than it fixed, on my first test I got a stuck 
 region in transition instead of being able to recover. The timeout was set to 
 2 minutes to be sure I hit it.
 First the region gets closed
 {quote}
 2012-01-04 00:16:25,811 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 sv4r5s38,62023,1325635980913 for region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 {quote}
 2 minutes later it times out:
 {quote}
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636185810, server=null
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_CLOSE for too long, running forced unassign again on 
 region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,027 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 (offlining)
 {quote}
 100ms later the master finally gets the event:
 {quote}
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
 region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for 1a4b111bcc228043e89f59c4c3f6a791
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
 deleting ZK node and removing from regions in transition, skipping assignment 
 of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Deleting existing unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
 region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
 {quote}
 At this point everything is fine, the region was processed as closed. But 
 wait, remember that line where it said it was going to force an unassign?
 {quote}
 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Creating unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
 2012-01-04 00:18:30,328 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
 java.lang.NullPointerException: Passed server is null for 
 1a4b111bcc228043e89f59c4c3f6a791
 {quote}
 Now the master is confused, it recreated the RIT znode but the region doesn't 
 even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
 this is what's going on.
 The late ZK notification that the znode was deleted (but it got recreated 
 after):
 {quote}
 2012-01-04 00:19:33,285 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
 deleted.
 {quote}
 Then it prints this, and much later tries to unassign it again:
 {quote}
 2012-01-04 00:19:46,607 DEBUG 
 org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
 to clear 

[jira] [Commented] (HBASE-5393) Consider splitting after flushing

2012-02-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210819#comment-13210819
 ] 

Hudson commented on HBASE-5393:
---

Integrated in HBase-TRUNK-security #114 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/114/])
HBASE-5393  Consider splitting after flushing (Revision 1245727)

 Result = FAILURE
jdcryans : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java


 Consider splitting after flushing
 -

 Key: HBASE-5393
 URL: https://issues.apache.org/jira/browse/HBASE-5393
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.5
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.94.0, 0.92.1

 Attachments: HBASE-2375-flush-split.patch


 Spawning this from HBASE-2375, I saw that it was much more efficient 
 compaction-wise to check if we can split right after flushing. Much like the 
 ideas that Jon spelled out in the description of that jira, the window is 
 smaller because you don't have to compact and then split right away to only 
 compact again when the daughters open.
 Another thing it improves is while we're normally waiting for the compaction 
 to happen, data that's still coming in will make us go way past the 
 MAX_FILESIZE to a point where for the first region I was seeing a store size 
 3-4x bigger before it was able to split.
 I targeted this for 0.94, but I'd like to get this into 0.92.1 or .2 too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4640) Catch ClosedChannelException and document it

2012-02-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210825#comment-13210825
 ] 

Hudson commented on HBASE-4640:
---

Integrated in HBase-TRUNK-security #114 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/114/])
HBASE-4640  Catch ClosedChannelException and document it (Revision 1245730)

 Result = FAILURE
jdcryans : 
Files : 
* /hbase/trunk/src/docbkx/troubleshooting.xml
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java


 Catch ClosedChannelException and document it
 

 Key: HBASE-4640
 URL: https://issues.apache.org/jira/browse/HBASE-4640
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Minor
 Fix For: 0.94.0

 Attachments: HBASE-4640.patch


 ClosedChannelException is a pretty obscure exception for the non-expert and 
 doesn't tell you why you get it. We should instead catch it, print a WARN, 
 don't print a stack trace, and add a line in the book about this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5420) TestImportTsv does not shut down MR Cluster correctly (fails against 0.23 hadoop)

2012-02-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210827#comment-13210827
 ] 

Hudson commented on HBASE-5420:
---

Integrated in HBase-TRUNK-security #114 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/114/])
HBASE-5420 TestImportTsv does not shut down MR Cluster correctly (fails 
against 0.23 hadoop) (Revision 1245795)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestImportTsv.java


 TestImportTsv does not shut down MR Cluster correctly (fails against 0.23 
 hadoop)
 -

 Key: HBASE-5420
 URL: https://issues.apache.org/jira/browse/HBASE-5420
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0, 0.92.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Fix For: 0.94.0, 0.92.1

 Attachments: HBASE-5420-v1.patch, HBASE-5420.patch


 Test calls startMiniMapReduceCluster() but never calls 
 shutdownMiniMapReduceCluster().
 This causes failures with -Dhadoop.profile=23 when both testMROnTable and 
 testMROnTableWithCustomMapper are run, because the cluster cannot start up 
 properly for the second test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5425) Punt on the timeout doesn't work in BulkEnabler#waitUntilDone (master's EnableTableHandler)

2012-02-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210830#comment-13210830
 ] 

Hudson commented on HBASE-5425:
---

Integrated in HBase-TRUNK-security #114 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/114/])
HBASE-5425 Punt on the timeout doesn't work in BulkEnabler#waitUntilDone 
(master's EnableTableHandler) (Revision 1245674)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java


  Punt on the timeout doesn't work in BulkEnabler#waitUntilDone (master's 
 EnableTableHandler)
 

 Key: HBASE-5425
 URL: https://issues.apache.org/jira/browse/HBASE-5425
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.5, 0.92.0
Reporter: terry zhang
 Fix For: 0.94.0

 Attachments: HBASE-5425.patch


 please take a look at the code below in EnableTableHandler(hbase master):
 {code:title=EnableTableHandler.java|borderStyle=solid}
 protected boolean waitUntilDone(long timeout)
 throws InterruptedException {
 
   .
   int lastNumberOfRegions = this.countOfRegionsInTable;
   while (!server.isStopped()  remaining  0) {
 Thread.sleep(waitingTimeForEvents);
 regions = assignmentManager.getRegionsOfTable(tableName);
 if (isDone(regions)) break;
 // Punt on the timeout as long we make progress
 if (regions.size()  lastNumberOfRegions) {
   lastNumberOfRegions = regions.size();
   timeout += waitingTimeForEvents;
 }
 remaining = timeout - (System.currentTimeMillis() - startTime);
 
 }
 private boolean isDone(final ListHRegionInfo regions) {
   return regions != null  regions.size() = this.countOfRegionsInTable;
 }
 {code} 
 We can easily find out if we let lastNumberOfRegions = 
 this.countOfRegionsInTable , the function of punt on timeout code will never 
 be executed. I think initlize lastNumberOfRegions = 0 can make it work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5421) use hadoop-client/hadoop-minicluster artifacts for Hadoop 0.23 build

2012-02-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210829#comment-13210829
 ] 

Hudson commented on HBASE-5421:
---

Integrated in HBase-TRUNK-security #114 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/114/])
HBASE-5421 use hadoop-client/hadoop-minicluster artifacts for Hadoop 0.23 
build (Revision 1245743)

 Result = FAILURE
stack : 
Files : 
* /hbase/trunk/pom.xml


 use hadoop-client/hadoop-minicluster artifacts for Hadoop 0.23 build
 

 Key: HBASE-5421
 URL: https://issues.apache.org/jira/browse/HBASE-5421
 Project: HBase
  Issue Type: Improvement
  Components: build
Affects Versions: 0.92.0
Reporter: Shaneal Manek
Assignee: Shaneal Manek
Priority: Minor
  Labels: build
 Fix For: 0.94.0, 0.92.1

 Attachments: hbase-5421.patch


 Hadoop recently added hadoop-client and hadoop-minicluster artifacts for 
 Hadoop 0.23+ that don't export all the internal dependencies (HADOOP-8009).
 Let's use them instead of manually specifying transitive dependency exclusion 
 lists (which is error prone and annoying).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5195) [Coprocessors] preGet hook does not allow overriding or wrapping filter on incoming Get

2012-02-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210828#comment-13210828
 ] 

Hudson commented on HBASE-5195:
---

Integrated in HBase-TRUNK-security #114 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/114/])
HBASE-5195 [Coprocessors] preGet hook does not allow overriding or wrapping 
filter on incoming Get -- SECOND HALF OF THIS COMMIT (Revision 1245773)

 Result = FAILURE
stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java


 [Coprocessors] preGet hook does not allow overriding or wrapping filter on 
 incoming Get
 ---

 Key: HBASE-5195
 URL: https://issues.apache.org/jira/browse/HBASE-5195
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.92.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.94.0, 0.92.1

 Attachments: HBASE-5195.patch


 Without the ability to wrap the internal Scan on the Get, we can't override 
 (or protect, in the case of access control) Gets as we can Scans. The result 
 is inconsistent behavior.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3584) Allow atomic put/delete in one call

2012-02-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210826#comment-13210826
 ] 

Hudson commented on HBASE-3584:
---

Integrated in HBase-TRUNK-security #114 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/114/])
HBASE-3584 Rename RowMutation to RowMutations (Revision 1245792)

 Result = FAILURE
stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/RowMutation.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/RowMutations.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java


 Allow atomic put/delete in one call
 ---

 Key: HBASE-3584
 URL: https://issues.apache.org/jira/browse/HBASE-3584
 Project: HBase
  Issue Type: New Feature
  Components: client, coprocessors, regionserver
Reporter: ryan rawson
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 3584-final.txt, 3584-v1.txt, 3584-v3.txt


 Right now we have the following calls:
 put(Put)
 delete(Delete)
 increment(Increments)
 But we cannot combine all of the above in a single call, complete with a 
 single row lock.  It would be nice to do that.
 It would also allow us to do a CAS where we could do a put/increment if the 
 check succeeded.
 -
 Amendment:
 Since Increment does not currently support MVCC it cannot be included in an 
 atomic operation.
 So this for Put and Delete only.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5279) NPE in Master after upgrading to 0.92.0

2012-02-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210824#comment-13210824
 ] 

Hudson commented on HBASE-5279:
---

Integrated in HBase-TRUNK-security #114 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/114/])
HBASE-5279 NPE in Master after upgrading to 0.92.0 -- REVERT OVERCOMMIT TO 
HREGION (Revision 1245768)
HBASE-5279 NPE in Master after upgrading to 0.92.0 (Revision 1245767)

 Result = FAILURE
stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java


 NPE in Master after upgrading to 0.92.0
 ---

 Key: HBASE-5279
 URL: https://issues.apache.org/jira/browse/HBASE-5279
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.0
Reporter: Tobias Herbert
Priority: Critical
 Fix For: 0.92.1

 Attachments: HBASE-5279-v2.patch, HBASE-5279.patch


 I have upgraded my environment from 0.90.4 to 0.92.0
 after the table migration I get the following error in the master (permanent)
 {noformat}
 2012-01-25 18:23:48,648 FATAL master-namenode,6,1327512209588 
 org.apache.hadoop.hbase.master.HMaster - Unhandled exception. Starting 
 shutdown.
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.rebuildUserRegions(AssignmentManager.java:2190)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:323)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:501)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:326)
 at java.lang.Thread.run(Thread.java:662)
 2012-01-25 18:23:48,650 INFO namenode,6,1327512209588 
 org.apache.hadoop.hbase.master.HMaster - Aborting
 {noformat}
 I think that's because I had a hard crash in the cluster a while ago - and 
 the following WARN since then
 {noformat}
 2012-01-25 21:20:47,121 WARN namenode,6,1327513078123-CatalogJanitor 
 org.apache.hadoop.hbase.master.CatalogJanitor - REGIONINFO_QUALIFIER is empty 
 in keyvalues={emails,,xxx./info:server/1314336400471/Put/vlen=38, 
 emails,,1314189353300.xxx./info:serverstartcode/1314336400471/Put/vlen=8}
 {noformat}
 my patch was simple to go around the NPE (as the other code around the lines)
 but I don't know if that's correct

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5431) Improve delete marker handling in Import M/R jobs

2012-02-17 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210832#comment-13210832
 ] 

Hadoop QA commented on HBASE-5431:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12515067/5431.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -136 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 158 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestImportTsv
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/987//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/987//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/987//console

This message is automatically generated.

 Improve delete marker handling in Import M/R jobs
 -

 Key: HBASE-5431
 URL: https://issues.apache.org/jira/browse/HBASE-5431
 Project: HBase
  Issue Type: Sub-task
  Components: mapreduce
Affects Versions: 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.0

 Attachments: 5431.txt


 Import currently create a new Delete object for each delete KV found in a 
 result object.
 This can be improved with the new Delete API that allows adding a delete KV 
 to a Delete object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5423) Regionserver may block forever on waitOnAllRegionsToClose when aborting

2012-02-17 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210835#comment-13210835
 ] 

ramkrishna.s.vasudevan commented on HBASE-5423:
---

@Chunhui
Patch makes sense.  Using a set is also fine with me.
+1 on patch except if there is any name change for the name suggested by Stack.

 Regionserver may block forever on waitOnAllRegionsToClose when aborting
 ---

 Key: HBASE-5423
 URL: https://issues.apache.org/jira/browse/HBASE-5423
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: hbase-5423.patch


 If closeRegion throws any exception (It would be caused by FS ) when RS is 
 aborting, 
 RS will block forever on waitOnAllRegionsToClose().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5424) HTable meet NPE when call getRegionInfo()

2012-02-17 Thread zhiyuan.dai (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210838#comment-13210838
 ] 

zhiyuan.dai commented on HBASE-5424:


@Lars Hofhansl 
sure,meta may have some problem, we found the bug in 0.90.x(0.90.1 0.90.5) 

 HTable meet NPE when call getRegionInfo()
 -

 Key: HBASE-5424
 URL: https://issues.apache.org/jira/browse/HBASE-5424
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.1, 0.90.5
Reporter: junhua yang
 Attachments: HBASE-5424.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 We meet NPE when call getRegionInfo() in testing environment.
 Exception in thread main java.lang.NullPointerException
 at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75)
 at org.apache.hadoop.hbase.util.Writables.getHRegionInfo(Writables.java:119)
 at org.apache.hadoop.hbase.client.HTable$2.processRow(HTable.java:395)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:190)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:73)
 at org.apache.hadoop.hbase.client.HTable.getRegionsInfo(HTable.java:418)
 This NPE also make the table.jsp can't show the region information of this 
 table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5424) HTable meet NPE when call getRegionInfo()

2012-02-17 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210845#comment-13210845
 ] 

Lars Hofhansl commented on HBASE-5424:
--

That zhiyuan. I'm wondering whether we shouldn't focus on fixing the problem 
that caused the problem rather than pasting over it.

 HTable meet NPE when call getRegionInfo()
 -

 Key: HBASE-5424
 URL: https://issues.apache.org/jira/browse/HBASE-5424
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.1, 0.90.5
Reporter: junhua yang
 Attachments: HBASE-5424.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 We meet NPE when call getRegionInfo() in testing environment.
 Exception in thread main java.lang.NullPointerException
 at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75)
 at org.apache.hadoop.hbase.util.Writables.getHRegionInfo(Writables.java:119)
 at org.apache.hadoop.hbase.client.HTable$2.processRow(HTable.java:395)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:190)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:73)
 at org.apache.hadoop.hbase.client.HTable.getRegionsInfo(HTable.java:418)
 This NPE also make the table.jsp can't show the region information of this 
 table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent

2012-02-17 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210851#comment-13210851
 ] 

ramkrishna.s.vasudevan commented on HBASE-5200:
---

@Stack and @Ted
I suggest we commit this to 0.92 and trunk.

The creating of closing node is created in HBASE-3789 and even there the 0.90 
patch was left uncommitted as it may affect rolling restarts.



 AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the 
 region assignment inconsistent
 -

 Key: HBASE-5200
 URL: https://issues.apache.org/jira/browse/HBASE-5200
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: 5200-test.txt, 5200-v2.txt, 5200-v3.txt, 5200-v4.txt, 
 5200-v4no-prefix.txt, HBASE-5200.patch, HBASE-5200_1.patch, 
 HBASE-5200_trunk_latest_with_test_2.patch, 
 TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, 
 hbase-5200_90_latest.patch, hbase-5200_90_latest_new.patch


 This is the scenario
 Consider a case where the balancer is going on thus trying to close regions 
 in a RS.
 Before we could close a master switch happens.  
 On Master switch the set of nodes that are in RIT is collected and we first 
 get Data and start watching the node
 After that the node data is added into RIT.
 Now by this time (before adding to RIT) if the RS to which close was called 
 does a transition in AM.handleRegion() we miss the handling saying RIT state 
 was null.
 {code}
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 a66d281d231dfcaea97c270698b26b6f from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 c12e53bfd48ddc5eec507d66821c4d23 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 59ae13de8c1eb325a0dd51f4902d2052 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 f45bc9614d7575f35244849af85aa078 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 cc3ecd7054fe6cd4a1159ed92fd62641 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 3af40478a17fee96b4a192b22c90d5a2 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 e6096a8466e730463e10d3d61f809b92 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 4806781a1a23066f7baed22b4d237e24 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 d69e104131accaefe21dcc01fddc7629 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 {code}
 In branch the CLOSING node is created by RS thus leading to more 
 inconsistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (HBASE-5424) HTable meet NPE when call getRegionInfo()

2012-02-17 Thread Lars Hofhansl (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210845#comment-13210845
 ] 

Lars Hofhansl edited comment on HBASE-5424 at 2/18/12 7:16 AM:
---

Thanks zhiyuan. I'm wondering whether we shouldn't focus on fixing the problem 
that caused the problem rather than pasting over it.

  was (Author: lhofhansl):
That zhiyuan. I'm wondering whether we shouldn't focus on fixing the 
problem that caused the problem rather than pasting over it.
  
 HTable meet NPE when call getRegionInfo()
 -

 Key: HBASE-5424
 URL: https://issues.apache.org/jira/browse/HBASE-5424
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.1, 0.90.5
Reporter: junhua yang
 Attachments: HBASE-5424.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 We meet NPE when call getRegionInfo() in testing environment.
 Exception in thread main java.lang.NullPointerException
 at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75)
 at org.apache.hadoop.hbase.util.Writables.getHRegionInfo(Writables.java:119)
 at org.apache.hadoop.hbase.client.HTable$2.processRow(HTable.java:395)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:190)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:73)
 at org.apache.hadoop.hbase.client.HTable.getRegionsInfo(HTable.java:418)
 This NPE also make the table.jsp can't show the region information of this 
 table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3584) Allow atomic put/delete in one call

2012-02-17 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210863#comment-13210863
 ] 

Lars Hofhansl commented on HBASE-3584:
--

Something went wrong with this. I still see RowMutation, but it is empty. I 
suspect that it was not marked as deleted.

 Allow atomic put/delete in one call
 ---

 Key: HBASE-3584
 URL: https://issues.apache.org/jira/browse/HBASE-3584
 Project: HBase
  Issue Type: New Feature
  Components: client, coprocessors, regionserver
Reporter: ryan rawson
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 3584-final.txt, 3584-v1.txt, 3584-v3.txt


 Right now we have the following calls:
 put(Put)
 delete(Delete)
 increment(Increments)
 But we cannot combine all of the above in a single call, complete with a 
 single row lock.  It would be nice to do that.
 It would also allow us to do a CAS where we could do a put/increment if the 
 check succeeded.
 -
 Amendment:
 Since Increment does not currently support MVCC it cannot be included in an 
 atomic operation.
 So this for Put and Delete only.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3584) Allow atomic put/delete in one call

2012-02-17 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210864#comment-13210864
 ] 

Lars Hofhansl commented on HBASE-3584:
--

Fixed with addendum.

 Allow atomic put/delete in one call
 ---

 Key: HBASE-3584
 URL: https://issues.apache.org/jira/browse/HBASE-3584
 Project: HBase
  Issue Type: New Feature
  Components: client, coprocessors, regionserver
Reporter: ryan rawson
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 3584-final.txt, 3584-v1.txt, 3584-v3.txt


 Right now we have the following calls:
 put(Put)
 delete(Delete)
 increment(Increments)
 But we cannot combine all of the above in a single call, complete with a 
 single row lock.  It would be nice to do that.
 It would also allow us to do a CAS where we could do a put/increment if the 
 check succeeded.
 -
 Amendment:
 Since Increment does not currently support MVCC it cannot be included in an 
 atomic operation.
 So this for Put and Delete only.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5075) regionserver crashed and failover

2012-02-17 Thread zhiyuan.dai (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhiyuan.dai updated HBASE-5075:
---

Attachment: 5075.patch

 regionserver crashed and failover
 -

 Key: HBASE-5075
 URL: https://issues.apache.org/jira/browse/HBASE-5075
 Project: HBase
  Issue Type: Improvement
  Components: monitoring, regionserver, replication, zookeeper
Affects Versions: 0.92.1
Reporter: zhiyuan.dai
 Fix For: 0.90.5

 Attachments: 5075.patch


 regionserver crashed,it is too long time to notify hmaster.when hmaster know 
 regionserver's shutdown,it is long time to fetch the hlog's lease.
 hbase is a online db, availability is very important.
 i have a idea to improve availability, monitor node to check regionserver's 
 pid.if this pid not exsits,i think the rs down,i will delete the znode,and 
 force close the hlog file.
 so the period maybe 100ms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5075) regionserver crashed and failover

2012-02-17 Thread zhiyuan.dai (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210170#comment-13210170
 ] 

zhiyuan.dai commented on HBASE-5075:


@stack you are right,I really is considering a supervisor-like process that 
will remove the regionserver ephemeral node if the pid goes missing and fail to 
ping(new Socket-Connection refused),now i am translate us design documents.

 regionserver crashed and failover
 -

 Key: HBASE-5075
 URL: https://issues.apache.org/jira/browse/HBASE-5075
 Project: HBase
  Issue Type: Improvement
  Components: monitoring, regionserver, replication, zookeeper
Affects Versions: 0.92.1
Reporter: zhiyuan.dai
 Fix For: 0.90.5

 Attachments: 5075.patch


 regionserver crashed,it is too long time to notify hmaster.when hmaster know 
 regionserver's shutdown,it is long time to fetch the hlog's lease.
 hbase is a online db, availability is very important.
 i have a idea to improve availability, monitor node to check regionserver's 
 pid.if this pid not exsits,i think the rs down,i will delete the znode,and 
 force close the hlog file.
 so the period maybe 100ms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5412) HBase book, section 2.6.4, has deficient list of client dependencies

2012-02-17 Thread Mike Spreitzer (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210323#comment-13210323
 ] 

Mike Spreitzer commented on HBASE-5412:
---

Yes, I tested HBase 0.92.0 with Hadoop 1.0.0.  I doubt this is the only 
combination for which the current text is deficient.

 HBase book, section 2.6.4, has deficient list of client dependencies
 

 Key: HBASE-5412
 URL: https://issues.apache.org/jira/browse/HBASE-5412
 Project: HBase
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.92.0
Reporter: Mike Spreitzer
Assignee: Doug Meil
Priority: Minor
  Labels: documentation
   Original Estimate: 1h
  Remaining Estimate: 1h

 The current text in section 2.6.4 of the HBase book says this about client 
 dependencies:
 Minimally, a client of HBase needs the hbase, hadoop, log4j, commons-logging, 
 commons-lang, and ZooKeeper jars in its CLASSPATH connecting to a cluster.
 I tried that, and got an exception due to a class not being found.  I fixed 
 that by searching for that class in the jars in lib/, and tried again.  Got 
 an exception, due to a different class not found.  I iterated until it 
 worked.  When I was done, I found myself using the following JARs:
 commons-configuration-1.6.jar  hadoop-core-1.0.0.jar  slf4j-api-1.5.8.jar
 commons-lang-2.5.jar   hbase-0.92.0.jar   slf4j-log4j12-1.5.8.jar
 commons-logging-1.1.1.jar  log4j-1.2.16.jar   zookeeper-3.4.2.jar

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5075) regionserver crashed and failover

2012-02-17 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210367#comment-13210367
 ] 

stack commented on HBASE-5075:
--

Thanks for doing this.  It looks very interesting.

Please do not reformat existing code.  It bloats your patch and makes reviews 
take longer; reviewer attention span is short (at least in this case) and its a 
shame to spend it going over code reformats.

On the patch, is this necessary: +  public String getRSPidAndRsZknode();

Can't you get the pid from a process listing?  Or you want us to publish it via 
jmx?   Or it looks like it is already published via jmx.  Can your tool pick it 
up there?  On the znode, can't you get the regionserver servername and then do 
lookup in zk directly?

Can't you have supervisor do this?  Is there not existing utilities that watch 
a pid and allow you do stuff when its gone?  Or is it that you'd kill the 
server if a long GC pause?

Do you have a bit of documentation on how this new utility works?

Thanks.

 regionserver crashed and failover
 -

 Key: HBASE-5075
 URL: https://issues.apache.org/jira/browse/HBASE-5075
 Project: HBase
  Issue Type: Improvement
  Components: monitoring, regionserver, replication, zookeeper
Affects Versions: 0.92.1
Reporter: zhiyuan.dai
 Fix For: 0.90.5

 Attachments: 5075.patch


 regionserver crashed,it is too long time to notify hmaster.when hmaster know 
 regionserver's shutdown,it is long time to fetch the hlog's lease.
 hbase is a online db, availability is very important.
 i have a idea to improve availability, monitor node to check regionserver's 
 pid.if this pid not exsits,i think the rs down,i will delete the znode,and 
 force close the hlog file.
 so the period maybe 100ms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5426) How to Set Up a Pseudo-Distributed Mode for HBase

2012-02-17 Thread Bing Li (Created) (JIRA)
How to Set Up a Pseudo-Distributed Mode for HBase
-

 Key: HBASE-5426
 URL: https://issues.apache.org/jira/browse/HBASE-5426
 Project: HBase
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.92.0
 Environment: RedHat 7, Ubuntu 10
Reporter: Bing Li
 Fix For: 0.92.0


Hi, all,

I just made a summary about the experiences to set up a pseudo-distributed mode 
HBase.

1) RedHat 9 is not suitable for running HBase and Hadoop. I don't know the 
reasons. Now Ubuntu is my choice.

2) After the pseudo-distributed mode of HDFS is configured, it is required to 
configure the hbase-env.sh and hbase-site.xml. The book, HBase the Definitive 
Guide, does not mentions hbase.env.xml.

3) It should set up JAVA_HOME, HBASE_CLASSPATH and HBASE_MANAGES_ZK. My 
hbase-env.sh is as follows.

 export JAVA_HOME=/opt/jdk1.6.1/
 export HBASE_CLASSPATH=/opt/hbase-0.92.0/conf
 export HBASE_OPTS=-XX:+UseConcMarkSweepGC
 export HBASE_MANAGES_ZK=true

4) When configuring hbase-site.xml, the property, hbase.cluster.distributed, 
must be set also. The book, HBase the Definitive Guide, does not do that 
either. My hbase-site.xml is as follows.

 ?xml version=1.0?
 ?xml-stylesheet type=text/xsl href=configuration.xsl?
 configuration
property
   namehbase.rootdir/name
   valuehdfs://localhost:9000/hbase/value
/property
property
   namedfs.replication/name
   value1/value
/property
property
   namehbase.cluster.distributed/name
   valuetrue/value
/property
 /configuration

I am a new user of HBase. Your suggestions are highly appreciated.

Best regards,
Bing

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5075) regionserver crashed and failover

2012-02-17 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210386#comment-13210386
 ] 

Lars Hofhansl commented on HBASE-5075:
--

I would 2nd Stack's request to create a patch without the format changes, also 
there're some author tags in the javadoc (which we don't do with Apache code).

Is this guarding against just the RegionServer process dying (but its machine 
still up), or also against the machine dying? (I know I could take a closer 
look at the patch, but it's easier if you just tell me) :)


 regionserver crashed and failover
 -

 Key: HBASE-5075
 URL: https://issues.apache.org/jira/browse/HBASE-5075
 Project: HBase
  Issue Type: Improvement
  Components: monitoring, regionserver, replication, zookeeper
Affects Versions: 0.92.1
Reporter: zhiyuan.dai
 Fix For: 0.90.5

 Attachments: 5075.patch


 regionserver crashed,it is too long time to notify hmaster.when hmaster know 
 regionserver's shutdown,it is long time to fetch the hlog's lease.
 hbase is a online db, availability is very important.
 i have a idea to improve availability, monitor node to check regionserver's 
 pid.if this pid not exsits,i think the rs down,i will delete the znode,and 
 force close the hlog file.
 so the period maybe 100ms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5425) Punt on the timeout doesn't work in BulkEnabler#waitUntilDone (master's EnableTableHandler)

2012-02-17 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5425:
-

   Resolution: Fixed
Fix Version/s: 0.94.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed to trunk.  Thanks for the patch Terry.

  Punt on the timeout doesn't work in BulkEnabler#waitUntilDone (master's 
 EnableTableHandler)
 

 Key: HBASE-5425
 URL: https://issues.apache.org/jira/browse/HBASE-5425
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.5, 0.92.0
Reporter: terry zhang
 Fix For: 0.94.0

 Attachments: HBASE-5425.patch


 please take a look at the code below in EnableTableHandler(hbase master):
 {code:title=EnableTableHandler.java|borderStyle=solid}
 protected boolean waitUntilDone(long timeout)
 throws InterruptedException {
 
   .
   int lastNumberOfRegions = this.countOfRegionsInTable;
   while (!server.isStopped()  remaining  0) {
 Thread.sleep(waitingTimeForEvents);
 regions = assignmentManager.getRegionsOfTable(tableName);
 if (isDone(regions)) break;
 // Punt on the timeout as long we make progress
 if (regions.size()  lastNumberOfRegions) {
   lastNumberOfRegions = regions.size();
   timeout += waitingTimeForEvents;
 }
 remaining = timeout - (System.currentTimeMillis() - startTime);
 
 }
 private boolean isDone(final ListHRegionInfo regions) {
   return regions != null  regions.size() = this.countOfRegionsInTable;
 }
 {code} 
 We can easily find out if we let lastNumberOfRegions = 
 this.countOfRegionsInTable , the function of punt on timeout code will never 
 be executed. I think initlize lastNumberOfRegions = 0 can make it work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5425) Punt on the timeout doesn't work in BulkEnabler#waitUntilDone (master's EnableTableHandler)

2012-02-17 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210393#comment-13210393
 ] 

stack commented on HBASE-5425:
--

Committed to 0.92 branch too.

  Punt on the timeout doesn't work in BulkEnabler#waitUntilDone (master's 
 EnableTableHandler)
 

 Key: HBASE-5425
 URL: https://issues.apache.org/jira/browse/HBASE-5425
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.5, 0.92.0
Reporter: terry zhang
 Fix For: 0.94.0

 Attachments: HBASE-5425.patch


 please take a look at the code below in EnableTableHandler(hbase master):
 {code:title=EnableTableHandler.java|borderStyle=solid}
 protected boolean waitUntilDone(long timeout)
 throws InterruptedException {
 
   .
   int lastNumberOfRegions = this.countOfRegionsInTable;
   while (!server.isStopped()  remaining  0) {
 Thread.sleep(waitingTimeForEvents);
 regions = assignmentManager.getRegionsOfTable(tableName);
 if (isDone(regions)) break;
 // Punt on the timeout as long we make progress
 if (regions.size()  lastNumberOfRegions) {
   lastNumberOfRegions = regions.size();
   timeout += waitingTimeForEvents;
 }
 remaining = timeout - (System.currentTimeMillis() - startTime);
 
 }
 private boolean isDone(final ListHRegionInfo regions) {
   return regions != null  regions.size() = this.countOfRegionsInTable;
 }
 {code} 
 We can easily find out if we let lastNumberOfRegions = 
 this.countOfRegionsInTable , the function of punt on timeout code will never 
 be executed. I think initlize lastNumberOfRegions = 0 can make it work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5424) HTable meet NPE when call getRegionInfo()

2012-02-17 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210405#comment-13210405
 ] 

Hadoop QA commented on HBASE-5424:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12514961/HBASE-5424.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/980//console

This message is automatically generated.

 HTable meet NPE when call getRegionInfo()
 -

 Key: HBASE-5424
 URL: https://issues.apache.org/jira/browse/HBASE-5424
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.1, 0.90.5
Reporter: junhua yang
 Attachments: HBASE-5424.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 We meet NPE when call getRegionInfo() in testing environment.
 Exception in thread main java.lang.NullPointerException
 at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75)
 at org.apache.hadoop.hbase.util.Writables.getHRegionInfo(Writables.java:119)
 at org.apache.hadoop.hbase.client.HTable$2.processRow(HTable.java:395)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:190)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:73)
 at org.apache.hadoop.hbase.client.HTable.getRegionsInfo(HTable.java:418)
 This NPE also make the table.jsp can't show the region information of this 
 table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5426) How to Set Up a Pseudo-Distributed Mode for HBase

2012-02-17 Thread Doug Meil (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210409#comment-13210409
 ] 

Doug Meil commented on HBASE-5426:
--

no problem.

 How to Set Up a Pseudo-Distributed Mode for HBase
 -

 Key: HBASE-5426
 URL: https://issues.apache.org/jira/browse/HBASE-5426
 Project: HBase
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.92.0
 Environment: RedHat 7, Ubuntu 10
Reporter: Bing Li
Assignee: Doug Meil
  Labels: documentation
 Fix For: 0.92.0

   Original Estimate: 48h
  Remaining Estimate: 48h

 Hi, all,
 I just made a summary about the experiences to set up a pseudo-distributed 
 mode HBase.
 1) RedHat 9 is not suitable for running HBase and Hadoop. I don't know the 
 reasons. Now Ubuntu is my choice.
 2) After the pseudo-distributed mode of HDFS is configured, it is required to 
 configure the hbase-env.sh and hbase-site.xml. The book, HBase the Definitive 
 Guide, does not mentions hbase.env.xml.
 3) It should set up JAVA_HOME, HBASE_CLASSPATH and HBASE_MANAGES_ZK. My 
 hbase-env.sh is as follows.
  export JAVA_HOME=/opt/jdk1.6.1/
  export HBASE_CLASSPATH=/opt/hbase-0.92.0/conf
  export HBASE_OPTS=-XX:+UseConcMarkSweepGC
  export HBASE_MANAGES_ZK=true
 4) When configuring hbase-site.xml, the property, hbase.cluster.distributed, 
 must be set also. The book, HBase the Definitive Guide, does not do that 
 either. My hbase-site.xml is as follows.
  ?xml version=1.0?
  ?xml-stylesheet type=text/xsl href=configuration.xsl?
  configuration
 property
namehbase.rootdir/name
valuehdfs://localhost:9000/hbase/value
 /property
 property
namedfs.replication/name
value1/value
 /property
 property
namehbase.cluster.distributed/name
valuetrue/value
 /property
  /configuration
 I am a new user of HBase. Your suggestions are highly appreciated.
 Best regards,
 Bing

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5407) Show the per-region level request/sec count in the web ui

2012-02-17 Thread Phabricator (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-5407:
---

Attachment: D1779.1.patch

Liyin requested code review of [jira][HBASE-5407][89-fb] Show the per-region 
level request/sec count in the web ui.
Reviewers: Kannan, Karthik

  It would be nice to show the per-region level request/sec count in the web 
ui, especially when debugging the hot region problem.

TEST PLAN
  Tested on the dev cluster

REVISION DETAIL
  https://reviews.facebook.net/D1779

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/HServerLoad.java
  src/main/java/org/apache/hadoop/hbase/metrics/RequestMetrics.java
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
  src/main/resources/hbase-webapps/regionserver/regionserver.jsp

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/3795/

Tip: use the X-Herald-Rules header to filter Herald messages in your client.


 Show the per-region level request/sec count in the web ui
 -

 Key: HBASE-5407
 URL: https://issues.apache.org/jira/browse/HBASE-5407
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D1779.1.patch, D1779.1.patch


 It would be nice to show the per-region level request/sec count in the web 
 ui, especially when debugging the hot region problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5407) Show the per-region level request/sec count in the web ui

2012-02-17 Thread Phabricator (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-5407:
---

Attachment: D1779.1.patch

Liyin requested code review of [jira][HBASE-5407][89-fb] Show the per-region 
level request/sec count in the web ui.
Reviewers: Kannan, Karthik

  It would be nice to show the per-region level request/sec count in the web 
ui, especially when debugging the hot region problem.

TEST PLAN
  Tested on the dev cluster

REVISION DETAIL
  https://reviews.facebook.net/D1779

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/HServerLoad.java
  src/main/java/org/apache/hadoop/hbase/metrics/RequestMetrics.java
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
  src/main/resources/hbase-webapps/regionserver/regionserver.jsp

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/3795/

Tip: use the X-Herald-Rules header to filter Herald messages in your client.


 Show the per-region level request/sec count in the web ui
 -

 Key: HBASE-5407
 URL: https://issues.apache.org/jira/browse/HBASE-5407
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D1779.1.patch, D1779.1.patch, D1779.1.patch


 It would be nice to show the per-region level request/sec count in the web 
 ui, especially when debugging the hot region problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5241) Deletes should not mask Puts that come after it.

2012-02-17 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210444#comment-13210444
 ] 

Phabricator commented on HBASE-5241:


aaiyer has commented on the revision HBASE-5241 [jira] Deletes should not mask 
Puts that come after it..

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:1748 This 
will only happen for Deletes (Column and Family). The idea is that the Delete 
shall apply to all the puts, with a lower memstoreTS, regardless of their 
timestamp -- even if it is in future.

  Subsequent Puts etc. will not get masked by the Delete, because they should 
have a memstoreTS that is larger.
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanDeleteTracker.java:155 
This is not yet in production. But, if we decide to go down this route, we will 
definitely test it out for performance.

  Haven't optimised much here. Since, I don't expect there to be too many 
delete Family.

  Will revisit if the assumption turns out to be false.
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanDeleteTracker.java:155 
I'm not sure if we want to put this under ENFORCE_STRICTER_SEMANTICS 

  my understanding was that it would be better to have Puts not be masked by 
previous Deletes, regardless 

  weather we are willing to pay the extra performance cost for it, was the 
trade-off enforced using ENFORCE_STRICTER_SEMANTICS.

  If there is a good reason for clients to expect that the Put will be masked 
by previous Deletes, we can definitely guard this with the flag.
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanDeleteTracker.java:173 
Perhaps, I might rename this class to something different, and we can add a 
flag in ScanQueryMatcher to instantiate the appropriate DeleteTracker.
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java:223 
Agree that this is going to be a performance issue here.

  But, this is just a V-1 to get the general idea out. I'm hopeful, we can 
optimise the codepath so that we incur the performance penalty only when there 
is really a later KV with a higher memstoreTS.

  We currently, do not have a way to tell that. But, it can be done, say dump a 
flag while writing the HFile, if there is a memstoreTS inversion. Or something 
along that lines 

  Will try to optimise this, if needed, along those lines.

REVISION DETAIL
  https://reviews.facebook.net/D1731


 Deletes should not mask Puts that come after it.
 

 Key: HBASE-5241
 URL: https://issues.apache.org/jira/browse/HBASE-5241
 Project: HBase
  Issue Type: Improvement
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
 Attachments: HBASE-5241.D1731.1.patch, HBASE-5241.D1731.2.patch, 
 HBASE-5241.D1731.3.patch


 Suppose that we have a delete row, and then followed by the put. The delete 
 row
 can mask the put, unless there was a major compaction in between.
 Now that we are flushing the memstoreTS to disk, along with the KVs, we 
 should be able
 to differentiate whether or not the Put happened after the Delete and offer 
 better 
 delete semantics.
 Couldn't find a pre-existing JIRA that already discusses this, so creating 
 one.
 Seems related to https://issues.apache.org/jira/browse/HBASE-2406, but is not 
 quite the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5407) Show the per-region level request/sec count in the web ui

2012-02-17 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210450#comment-13210450
 ] 

stack commented on HBASE-5407:
--

Liyin.  Is this a backport for 0.89fb?   If so, is there something you've added 
to your backport that we should have in trunk?  Thanks boss.

 Show the per-region level request/sec count in the web ui
 -

 Key: HBASE-5407
 URL: https://issues.apache.org/jira/browse/HBASE-5407
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D1779.1.patch, D1779.1.patch, D1779.1.patch


 It would be nice to show the per-region level request/sec count in the web 
 ui, especially when debugging the hot region problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5407) Show the per-region level request/sec count in the web ui

2012-02-17 Thread Liyin Tang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210455#comment-13210455
 ] 

Liyin Tang commented on HBASE-5407:
---

Hi Stack. 
This patch is to add total read/write request number and read/write request per 
second for each region in 89-fb branch.
For the apache trunk, I will also need to add the read/write request per second 
only.

 Show the per-region level request/sec count in the web ui
 -

 Key: HBASE-5407
 URL: https://issues.apache.org/jira/browse/HBASE-5407
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D1779.1.patch, D1779.1.patch, D1779.1.patch


 It would be nice to show the per-region level request/sec count in the web 
 ui, especially when debugging the hot region problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent

2012-02-17 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210460#comment-13210460
 ] 

stack commented on HBASE-5200:
--

Ram wants me to apply the patches here but the version for 0.90 is very 
different to the version for 0.92.  This is removed:

-RS_ZK_REGION_CLOSING  (1),   // RS is in process of closing a region


And this is added:

+M_ZK_REGION_CLOSING   (51),  // Master adds this region as closing in 
ZK

This looks like a port from 0.92?



 AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the 
 region assignment inconsistent
 -

 Key: HBASE-5200
 URL: https://issues.apache.org/jira/browse/HBASE-5200
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: 5200-test.txt, 5200-v2.txt, 5200-v3.txt, 5200-v4.txt, 
 HBASE-5200.patch, HBASE-5200_1.patch, 
 HBASE-5200_trunk_latest_with_test_2.patch, 
 TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, 
 hbase-5200_90_latest.patch, hbase-5200_90_latest_new.patch


 This is the scenario
 Consider a case where the balancer is going on thus trying to close regions 
 in a RS.
 Before we could close a master switch happens.  
 On Master switch the set of nodes that are in RIT is collected and we first 
 get Data and start watching the node
 After that the node data is added into RIT.
 Now by this time (before adding to RIT) if the RS to which close was called 
 does a transition in AM.handleRegion() we miss the handling saying RIT state 
 was null.
 {code}
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 a66d281d231dfcaea97c270698b26b6f from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 c12e53bfd48ddc5eec507d66821c4d23 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 59ae13de8c1eb325a0dd51f4902d2052 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 f45bc9614d7575f35244849af85aa078 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 cc3ecd7054fe6cd4a1159ed92fd62641 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 3af40478a17fee96b4a192b22c90d5a2 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 e6096a8466e730463e10d3d61f809b92 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 4806781a1a23066f7baed22b4d237e24 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 d69e104131accaefe21dcc01fddc7629 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 {code}
 In branch the CLOSING node is created by RS thus leading to more 
 inconsistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent

2012-02-17 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5200:
-

Attachment: 5200-v4no-prefix.txt

v4 for hadoopqa

 AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the 
 region assignment inconsistent
 -

 Key: HBASE-5200
 URL: https://issues.apache.org/jira/browse/HBASE-5200
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: 5200-test.txt, 5200-v2.txt, 5200-v3.txt, 5200-v4.txt, 
 5200-v4no-prefix.txt, HBASE-5200.patch, HBASE-5200_1.patch, 
 HBASE-5200_trunk_latest_with_test_2.patch, 
 TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, 
 hbase-5200_90_latest.patch, hbase-5200_90_latest_new.patch


 This is the scenario
 Consider a case where the balancer is going on thus trying to close regions 
 in a RS.
 Before we could close a master switch happens.  
 On Master switch the set of nodes that are in RIT is collected and we first 
 get Data and start watching the node
 After that the node data is added into RIT.
 Now by this time (before adding to RIT) if the RS to which close was called 
 does a transition in AM.handleRegion() we miss the handling saying RIT state 
 was null.
 {code}
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 a66d281d231dfcaea97c270698b26b6f from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 c12e53bfd48ddc5eec507d66821c4d23 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 59ae13de8c1eb325a0dd51f4902d2052 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 f45bc9614d7575f35244849af85aa078 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 cc3ecd7054fe6cd4a1159ed92fd62641 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 3af40478a17fee96b4a192b22c90d5a2 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 e6096a8466e730463e10d3d61f809b92 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 4806781a1a23066f7baed22b4d237e24 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 d69e104131accaefe21dcc01fddc7629 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 {code}
 In branch the CLOSING node is created by RS thus leading to more 
 inconsistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent

2012-02-17 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5200:
-

Status: Open  (was: Patch Available)

 AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the 
 region assignment inconsistent
 -

 Key: HBASE-5200
 URL: https://issues.apache.org/jira/browse/HBASE-5200
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: 5200-test.txt, 5200-v2.txt, 5200-v3.txt, 5200-v4.txt, 
 5200-v4no-prefix.txt, HBASE-5200.patch, HBASE-5200_1.patch, 
 HBASE-5200_trunk_latest_with_test_2.patch, 
 TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, 
 hbase-5200_90_latest.patch, hbase-5200_90_latest_new.patch


 This is the scenario
 Consider a case where the balancer is going on thus trying to close regions 
 in a RS.
 Before we could close a master switch happens.  
 On Master switch the set of nodes that are in RIT is collected and we first 
 get Data and start watching the node
 After that the node data is added into RIT.
 Now by this time (before adding to RIT) if the RS to which close was called 
 does a transition in AM.handleRegion() we miss the handling saying RIT state 
 was null.
 {code}
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 a66d281d231dfcaea97c270698b26b6f from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 c12e53bfd48ddc5eec507d66821c4d23 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 59ae13de8c1eb325a0dd51f4902d2052 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 f45bc9614d7575f35244849af85aa078 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 cc3ecd7054fe6cd4a1159ed92fd62641 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 3af40478a17fee96b4a192b22c90d5a2 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 e6096a8466e730463e10d3d61f809b92 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 4806781a1a23066f7baed22b4d237e24 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 d69e104131accaefe21dcc01fddc7629 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 {code}
 In branch the CLOSING node is created by RS thus leading to more 
 inconsistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent

2012-02-17 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5200:
-

Status: Patch Available  (was: Open)

 AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the 
 region assignment inconsistent
 -

 Key: HBASE-5200
 URL: https://issues.apache.org/jira/browse/HBASE-5200
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: 5200-test.txt, 5200-v2.txt, 5200-v3.txt, 5200-v4.txt, 
 5200-v4no-prefix.txt, HBASE-5200.patch, HBASE-5200_1.patch, 
 HBASE-5200_trunk_latest_with_test_2.patch, 
 TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, 
 hbase-5200_90_latest.patch, hbase-5200_90_latest_new.patch


 This is the scenario
 Consider a case where the balancer is going on thus trying to close regions 
 in a RS.
 Before we could close a master switch happens.  
 On Master switch the set of nodes that are in RIT is collected and we first 
 get Data and start watching the node
 After that the node data is added into RIT.
 Now by this time (before adding to RIT) if the RS to which close was called 
 does a transition in AM.handleRegion() we miss the handling saying RIT state 
 was null.
 {code}
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 a66d281d231dfcaea97c270698b26b6f from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 c12e53bfd48ddc5eec507d66821c4d23 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 59ae13de8c1eb325a0dd51f4902d2052 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 f45bc9614d7575f35244849af85aa078 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 cc3ecd7054fe6cd4a1159ed92fd62641 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 3af40478a17fee96b4a192b22c90d5a2 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 e6096a8466e730463e10d3d61f809b92 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 4806781a1a23066f7baed22b4d237e24 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 d69e104131accaefe21dcc01fddc7629 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 {code}
 In branch the CLOSING node is created by RS thus leading to more 
 inconsistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5423) Regionserver may block forever on waitOnAllRegionsToClose when aborting

2012-02-17 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210466#comment-13210466
 ] 

stack commented on HBASE-5423:
--

Patch looks good Chunhui.

Change name of Set from addedRegionsToCallClose to closed.

Why do we have this Set?  Are we calling close multiple times on same region?

So, we'd break even though online regions is not yet empty?

{code}
+  if (this.regionsInTransitionInRS.isEmpty()) {
+break;
+  }
{code}

Thanks.

 Regionserver may block forever on waitOnAllRegionsToClose when aborting
 ---

 Key: HBASE-5423
 URL: https://issues.apache.org/jira/browse/HBASE-5423
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: hbase-5423.patch


 If closeRegion throws any exception (It would be caused by FS ) when RS is 
 aborting, 
 RS will block forever on waitOnAllRegionsToClose().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5346) Fix testColumnFamilyCompression and test_TIMERANGE in TestHFileOutputFormat

2012-02-17 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5346:
-

Fix Version/s: 0.92.0

Committed to 0.92 too...

  Fix testColumnFamilyCompression and test_TIMERANGE in TestHFileOutputFormat
 

 Key: HBASE-5346
 URL: https://issues.apache.org/jira/browse/HBASE-5346
 Project: HBase
  Issue Type: Sub-task
  Components: mapreduce, test
Affects Versions: 0.94.0, 0.92.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Fix For: 0.94.0, 0.92.0

 Attachments: HBASE-5346-v0.patch


 Running
 mvn -Dhadoop.profile=23 test -P localTests 
 -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
 yields this on 0.92 (for testColumnFamilyCompression and test_TIMERANGE):
 Failed tests: 
 testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  HFile for column family info-A not found
 Tests in error: 
 test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): 
 /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0
  (Is a directory)
 The problem is that these tests make incorrect assumptions about the output 
 of mapreduce jobs.  Prior to 0.23, temporary data was in, for example:
 ./_temporary/_attempt___r_00_0/b/1979617994050536795
 Now that has changed.  The correct way to get that path is based on 
 getDefaultWorkFile.
 Also, the data is not moved into the outputPath until both the Task and Job 
 are committed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-5393) Consider splitting after flushing

2012-02-17 Thread Jean-Daniel Cryans (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans resolved HBASE-5393.
---

   Resolution: Fixed
Fix Version/s: 0.92.1
 Hadoop Flags: Reviewed

Committed to trunk and 0.92, thanks for the votes and reviews guys.

 Consider splitting after flushing
 -

 Key: HBASE-5393
 URL: https://issues.apache.org/jira/browse/HBASE-5393
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.5
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.94.0, 0.92.1

 Attachments: HBASE-2375-flush-split.patch


 Spawning this from HBASE-2375, I saw that it was much more efficient 
 compaction-wise to check if we can split right after flushing. Much like the 
 ideas that Jon spelled out in the description of that jira, the window is 
 smaller because you don't have to compact and then split right away to only 
 compact again when the daughters open.
 Another thing it improves is while we're normally waiting for the compaction 
 to happen, data that's still coming in will make us go way past the 
 MAX_FILESIZE to a point where for the first region I was seeing a store size 
 3-4x bigger before it was able to split.
 I targeted this for 0.94, but I'd like to get this into 0.92.1 or .2 too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4640) Catch ClosedChannelException and document it

2012-02-17 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210483#comment-13210483
 ] 

stack commented on HBASE-4640:
--

+1

On commit add the CCE.getMessage to the LOG.WARN just in case its got info of 
use (I'm fine on skipping stack trace)

 Catch ClosedChannelException and document it
 

 Key: HBASE-4640
 URL: https://issues.apache.org/jira/browse/HBASE-4640
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Minor
 Fix For: 0.94.0

 Attachments: HBASE-4640.patch


 ClosedChannelException is a pretty obscure exception for the non-expert and 
 doesn't tell you why you get it. We should instead catch it, print a WARN, 
 don't print a stack trace, and add a line in the book about this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-4640) Catch ClosedChannelException and document it

2012-02-17 Thread Jean-Daniel Cryans (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans resolved HBASE-4640.
---

  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to trunk with Stack's commit, thanks for the review.

 Catch ClosedChannelException and document it
 

 Key: HBASE-4640
 URL: https://issues.apache.org/jira/browse/HBASE-4640
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Minor
 Fix For: 0.94.0

 Attachments: HBASE-4640.patch


 ClosedChannelException is a pretty obscure exception for the non-expert and 
 doesn't tell you why you get it. We should instead catch it, print a WARN, 
 don't print a stack trace, and add a line in the book about this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5119) Set the TimeoutMonitor's timeout back down

2012-02-17 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5119:
-

Fix Version/s: (was: 0.92.1)
   0.92.2

Moving to 0.92.2

 Set the TimeoutMonitor's timeout back down
 --

 Key: HBASE-5119
 URL: https://issues.apache.org/jira/browse/HBASE-5119
 Project: HBase
  Issue Type: Task
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.94.0, 0.92.2


 The TimeoutMonitor used to be extremely racy and caused more troubles than it 
 fixed, but most of this has been fixed I believe in the context of 0.92 so I 
 think we should set it down back to a useful level. Currently it's 30 
 minutes, what should the new value be?
 I think 5 minutes should be good, will do some testing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5120) Timeout monitor races with table disable handler

2012-02-17 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210491#comment-13210491
 ] 

stack commented on HBASE-5120:
--

I committed to trunk.  Will not commit to 0.92.  Not important enough of a bug 
I'd say.

 Timeout monitor races with table disable handler
 

 Key: HBASE-5120
 URL: https://issues.apache.org/jira/browse/HBASE-5120
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Zhihong Yu
Assignee: ramkrishna.s.vasudevan
Priority: Blocker
 Fix For: 0.94.0, 0.92.1

 Attachments: HBASE-5120.patch, HBASE-5120_1.patch, 
 HBASE-5120_2.patch, HBASE-5120_3.patch, HBASE-5120_4.patch, 
 HBASE-5120_5.patch, HBASE-5120_5.patch


 Here is what J-D described here:
 https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
 I think I will retract from my statement that it used to be extremely racy 
 and caused more troubles than it fixed, on my first test I got a stuck 
 region in transition instead of being able to recover. The timeout was set to 
 2 minutes to be sure I hit it.
 First the region gets closed
 {quote}
 2012-01-04 00:16:25,811 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 sv4r5s38,62023,1325635980913 for region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 {quote}
 2 minutes later it times out:
 {quote}
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636185810, server=null
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_CLOSE for too long, running forced unassign again on 
 region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,027 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 (offlining)
 {quote}
 100ms later the master finally gets the event:
 {quote}
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
 region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for 1a4b111bcc228043e89f59c4c3f6a791
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
 deleting ZK node and removing from regions in transition, skipping assignment 
 of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Deleting existing unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
 region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
 {quote}
 At this point everything is fine, the region was processed as closed. But 
 wait, remember that line where it said it was going to force an unassign?
 {quote}
 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Creating unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
 2012-01-04 00:18:30,328 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
 java.lang.NullPointerException: Passed server is null for 
 1a4b111bcc228043e89f59c4c3f6a791
 {quote}
 Now the master is confused, it recreated the RIT znode but the region doesn't 
 even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
 this is what's going on.
 The late ZK notification that the znode was deleted (but it got recreated 
 after):
 {quote}
 2012-01-04 00:19:33,285 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
 deleted.
 {quote}
 Then it prints this, and much later tries to unassign it again:
 {quote}
 2012-01-04 00:19:46,607 DEBUG 
 org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
 to clear regions in transition; 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636310328, server=null
 ...
 2012-01-04 00:20:39,623 DEBUG 
 

[jira] [Commented] (HBASE-5120) Timeout monitor races with table disable handler

2012-02-17 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210492#comment-13210492
 ] 

stack commented on HBASE-5120:
--

I did not commit to 0.90 either.

 Timeout monitor races with table disable handler
 

 Key: HBASE-5120
 URL: https://issues.apache.org/jira/browse/HBASE-5120
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Zhihong Yu
Assignee: ramkrishna.s.vasudevan
Priority: Blocker
 Fix For: 0.94.0

 Attachments: HBASE-5120.patch, HBASE-5120_1.patch, 
 HBASE-5120_2.patch, HBASE-5120_3.patch, HBASE-5120_4.patch, 
 HBASE-5120_5.patch, HBASE-5120_5.patch


 Here is what J-D described here:
 https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
 I think I will retract from my statement that it used to be extremely racy 
 and caused more troubles than it fixed, on my first test I got a stuck 
 region in transition instead of being able to recover. The timeout was set to 
 2 minutes to be sure I hit it.
 First the region gets closed
 {quote}
 2012-01-04 00:16:25,811 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 sv4r5s38,62023,1325635980913 for region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 {quote}
 2 minutes later it times out:
 {quote}
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636185810, server=null
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_CLOSE for too long, running forced unassign again on 
 region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,027 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 (offlining)
 {quote}
 100ms later the master finally gets the event:
 {quote}
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
 region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for 1a4b111bcc228043e89f59c4c3f6a791
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
 deleting ZK node and removing from regions in transition, skipping assignment 
 of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Deleting existing unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
 region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
 {quote}
 At this point everything is fine, the region was processed as closed. But 
 wait, remember that line where it said it was going to force an unassign?
 {quote}
 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Creating unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
 2012-01-04 00:18:30,328 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
 java.lang.NullPointerException: Passed server is null for 
 1a4b111bcc228043e89f59c4c3f6a791
 {quote}
 Now the master is confused, it recreated the RIT znode but the region doesn't 
 even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
 this is what's going on.
 The late ZK notification that the znode was deleted (but it got recreated 
 after):
 {quote}
 2012-01-04 00:19:33,285 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
 deleted.
 {quote}
 Then it prints this, and much later tries to unassign it again:
 {quote}
 2012-01-04 00:19:46,607 DEBUG 
 org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
 to clear regions in transition; 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636310328, server=null
 ...
 2012-01-04 00:20:39,623 DEBUG 
 org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
 to clear regions in transition; 

[jira] [Updated] (HBASE-4298) Support to drain RS nodes through ZK

2012-02-17 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4298:
-

Fix Version/s: (was: 0.90.7)

Removed 0.90.7 as a fix version.

 Support to drain RS nodes through ZK
 

 Key: HBASE-4298
 URL: https://issues.apache.org/jira/browse/HBASE-4298
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.90.4
 Environment: all
Reporter: Aravind Gottipati
Priority: Critical
  Labels: patch
 Fix For: 0.92.0

 Attachments: 4298-trunk-v2.txt, 4298-trunk-v3.txt, 90_hbase.patch, 
 drainingservertest-v2.txt, drainingservertest.txt, trunk_hbase.patch, 
 trunk_with_test.txt


 HDFS currently has a way to exclude certain datanodes and prevent them from 
 getting new blocks.  HDFS goes one step further and even drains these nodes 
 for you.  This enhancement is a step in that direction.
 The idea is that we mark nodes in zookeeper as draining nodes.  This means 
 that they don't get any more new regions.  These draining nodes look exactly 
 the same as the corresponding nodes in /rs, except they live under /draining.
 Eventually, support for draining them can be added.  I am submitting two 
 patches for review - one for the 0.90 branch and one for trunk (in git).
 Here are the two patches
 0.90 - 
 https://github.com/aravind/hbase/commit/181041e72e7ffe6a4da6d82b431ef7f8c99e62d2
 trunk - 
 https://github.com/aravind/hbase/commit/e127b25ae3b4034103b185d8380f3b7267bc67d5
 I have tested both these patches and they work as advertised.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5120) Timeout monitor races with table disable handler

2012-02-17 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5120:
-

Fix Version/s: (was: 0.92.1)

 Timeout monitor races with table disable handler
 

 Key: HBASE-5120
 URL: https://issues.apache.org/jira/browse/HBASE-5120
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Zhihong Yu
Assignee: ramkrishna.s.vasudevan
Priority: Blocker
 Fix For: 0.94.0

 Attachments: HBASE-5120.patch, HBASE-5120_1.patch, 
 HBASE-5120_2.patch, HBASE-5120_3.patch, HBASE-5120_4.patch, 
 HBASE-5120_5.patch, HBASE-5120_5.patch


 Here is what J-D described here:
 https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
 I think I will retract from my statement that it used to be extremely racy 
 and caused more troubles than it fixed, on my first test I got a stuck 
 region in transition instead of being able to recover. The timeout was set to 
 2 minutes to be sure I hit it.
 First the region gets closed
 {quote}
 2012-01-04 00:16:25,811 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 sv4r5s38,62023,1325635980913 for region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 {quote}
 2 minutes later it times out:
 {quote}
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636185810, server=null
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_CLOSE for too long, running forced unassign again on 
 region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,027 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 (offlining)
 {quote}
 100ms later the master finally gets the event:
 {quote}
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
 region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for 1a4b111bcc228043e89f59c4c3f6a791
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
 deleting ZK node and removing from regions in transition, skipping assignment 
 of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Deleting existing unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
 region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
 {quote}
 At this point everything is fine, the region was processed as closed. But 
 wait, remember that line where it said it was going to force an unassign?
 {quote}
 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Creating unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
 2012-01-04 00:18:30,328 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
 java.lang.NullPointerException: Passed server is null for 
 1a4b111bcc228043e89f59c4c3f6a791
 {quote}
 Now the master is confused, it recreated the RIT znode but the region doesn't 
 even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
 this is what's going on.
 The late ZK notification that the znode was deleted (but it got recreated 
 after):
 {quote}
 2012-01-04 00:19:33,285 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
 deleted.
 {quote}
 Then it prints this, and much later tries to unassign it again:
 {quote}
 2012-01-04 00:19:46,607 DEBUG 
 org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
 to clear regions in transition; 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636310328, server=null
 ...
 2012-01-04 00:20:39,623 DEBUG 
 org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
 to clear regions in transition; 
 

[jira] [Updated] (HBASE-5279) NPE in Master after upgrading to 0.92.0

2012-02-17 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5279:
-

Attachment: HBASE-5279-v2.patch

Version of patch that will work w/ hadoopqa

 NPE in Master after upgrading to 0.92.0
 ---

 Key: HBASE-5279
 URL: https://issues.apache.org/jira/browse/HBASE-5279
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.0
Reporter: Tobias Herbert
Priority: Critical
 Fix For: 0.92.1

 Attachments: HBASE-5279-v2.patch, HBASE-5279.patch


 I have upgraded my environment from 0.90.4 to 0.92.0
 after the table migration I get the following error in the master (permanent)
 {noformat}
 2012-01-25 18:23:48,648 FATAL master-namenode,6,1327512209588 
 org.apache.hadoop.hbase.master.HMaster - Unhandled exception. Starting 
 shutdown.
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.rebuildUserRegions(AssignmentManager.java:2190)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:323)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:501)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:326)
 at java.lang.Thread.run(Thread.java:662)
 2012-01-25 18:23:48,650 INFO namenode,6,1327512209588 
 org.apache.hadoop.hbase.master.HMaster - Aborting
 {noformat}
 I think that's because I had a hard crash in the cluster a while ago - and 
 the following WARN since then
 {noformat}
 2012-01-25 21:20:47,121 WARN namenode,6,1327513078123-CatalogJanitor 
 org.apache.hadoop.hbase.master.CatalogJanitor - REGIONINFO_QUALIFIER is empty 
 in keyvalues={emails,,xxx./info:server/1314336400471/Put/vlen=38, 
 emails,,1314189353300.xxx./info:serverstartcode/1314336400471/Put/vlen=8}
 {noformat}
 my patch was simple to go around the NPE (as the other code around the lines)
 but I don't know if that's correct

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5279) NPE in Master after upgrading to 0.92.0

2012-02-17 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5279:
-

Status: Patch Available  (was: Open)

 NPE in Master after upgrading to 0.92.0
 ---

 Key: HBASE-5279
 URL: https://issues.apache.org/jira/browse/HBASE-5279
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.0
Reporter: Tobias Herbert
Priority: Critical
 Fix For: 0.92.1

 Attachments: HBASE-5279-v2.patch, HBASE-5279.patch


 I have upgraded my environment from 0.90.4 to 0.92.0
 after the table migration I get the following error in the master (permanent)
 {noformat}
 2012-01-25 18:23:48,648 FATAL master-namenode,6,1327512209588 
 org.apache.hadoop.hbase.master.HMaster - Unhandled exception. Starting 
 shutdown.
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.rebuildUserRegions(AssignmentManager.java:2190)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:323)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:501)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:326)
 at java.lang.Thread.run(Thread.java:662)
 2012-01-25 18:23:48,650 INFO namenode,6,1327512209588 
 org.apache.hadoop.hbase.master.HMaster - Aborting
 {noformat}
 I think that's because I had a hard crash in the cluster a while ago - and 
 the following WARN since then
 {noformat}
 2012-01-25 21:20:47,121 WARN namenode,6,1327513078123-CatalogJanitor 
 org.apache.hadoop.hbase.master.CatalogJanitor - REGIONINFO_QUALIFIER is empty 
 in keyvalues={emails,,xxx./info:server/1314336400471/Put/vlen=38, 
 emails,,1314189353300.xxx./info:serverstartcode/1314336400471/Put/vlen=8}
 {noformat}
 my patch was simple to go around the NPE (as the other code around the lines)
 but I don't know if that's correct

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5271) Result.getValue and Result.getColumnLatest return the wrong column.

2012-02-17 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5271:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Was committed a while back.  Thanks for the patch Ghais.

 Result.getValue and Result.getColumnLatest return the wrong column.
 ---

 Key: HBASE-5271
 URL: https://issues.apache.org/jira/browse/HBASE-5271
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.5
Reporter: Ghais Issa
Assignee: Ghais Issa
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: 5271-90.txt, 5271-v2.txt, 
 fixKeyValueMatchingColumn.diff, testGetValue.diff


 In the following example result.getValue returns the wrong column
 KeyValue kv = new KeyValue(Bytes.toBytes(r), Bytes.toBytes(24), 
 Bytes.toBytes(2), Bytes.toBytes(7L));
 Result result = new Result(new KeyValue[] { kv });
 System.out.println(Bytes.toLong(result.getValue(Bytes.toBytes(2), 
 Bytes.toBytes(2; //prints 7.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent

2012-02-17 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210505#comment-13210505
 ] 

Hadoop QA commented on HBASE-5200:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12515010/5200-v4no-prefix.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -136 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 158 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.coprocessor.TestMasterObserver
  org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks
  org.apache.hadoop.hbase.mapreduce.TestImportTsv
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/981//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/981//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/981//console

This message is automatically generated.

 AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the 
 region assignment inconsistent
 -

 Key: HBASE-5200
 URL: https://issues.apache.org/jira/browse/HBASE-5200
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: 5200-test.txt, 5200-v2.txt, 5200-v3.txt, 5200-v4.txt, 
 5200-v4no-prefix.txt, HBASE-5200.patch, HBASE-5200_1.patch, 
 HBASE-5200_trunk_latest_with_test_2.patch, 
 TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, 
 hbase-5200_90_latest.patch, hbase-5200_90_latest_new.patch


 This is the scenario
 Consider a case where the balancer is going on thus trying to close regions 
 in a RS.
 Before we could close a master switch happens.  
 On Master switch the set of nodes that are in RIT is collected and we first 
 get Data and start watching the node
 After that the node data is added into RIT.
 Now by this time (before adding to RIT) if the RS to which close was called 
 does a transition in AM.handleRegion() we miss the handling saying RIT state 
 was null.
 {code}
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 a66d281d231dfcaea97c270698b26b6f from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 c12e53bfd48ddc5eec507d66821c4d23 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 59ae13de8c1eb325a0dd51f4902d2052 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 f45bc9614d7575f35244849af85aa078 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 cc3ecd7054fe6cd4a1159ed92fd62641 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 3af40478a17fee96b4a192b22c90d5a2 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 e6096a8466e730463e10d3d61f809b92 from server 
 HOST-192-168-47-204,20020,1326342744518 but region 

[jira] [Updated] (HBASE-5421) use hadoop-client/hadoop-minicluster artifacts for Hadoop 0.23 build

2012-02-17 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5421:
-

   Resolution: Fixed
Fix Version/s: 0.94.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed trunk and branch.  Thanks for patch Shaneal.

 use hadoop-client/hadoop-minicluster artifacts for Hadoop 0.23 build
 

 Key: HBASE-5421
 URL: https://issues.apache.org/jira/browse/HBASE-5421
 Project: HBase
  Issue Type: Improvement
  Components: build
Affects Versions: 0.92.0
Reporter: Shaneal Manek
Assignee: Shaneal Manek
Priority: Minor
  Labels: build
 Fix For: 0.94.0, 0.92.1

 Attachments: hbase-5421.patch


 Hadoop recently added hadoop-client and hadoop-minicluster artifacts for 
 Hadoop 0.23+ that don't export all the internal dependencies (HADOOP-8009).
 Let's use them instead of manually specifying transitive dependency exclusion 
 lists (which is error prone and annoying).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5399) Cut the link between the client and the zookeeper ensemble

2012-02-17 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5399:
---

Attachment: 5399_inprogress.v3.patch

 Cut the link between the client and the zookeeper ensemble
 --

 Key: HBASE-5399
 URL: https://issues.apache.org/jira/browse/HBASE-5399
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5399_inprogress.patch, 5399_inprogress.v3.patch


 The link is often considered as an issue, for various reasons. One of them 
 being that there is a limit on the number of connection that ZK can manage. 
 Stack was suggesting as well to remove the link to master from HConnection.
 There are choices to be made considering the existing API (that we don't want 
 to break).
 The first patches I will submit on hadoop-qa should not be committed: they 
 are here to show the progress on the direction taken.
 ZooKeeper is used for:
 - public getter, to let the client do whatever he wants, and close ZooKeeper 
 when closing the connection = we have to deprecate this but keep it.
 - read get master address to create a master = now done with a temporary 
 zookeeper connection
 - read root location = now done with a temporary zookeeper connection, but 
 questionable. Used in public function locateRegion. To be reworked.
 - read cluster id = now done once with a temporary zookeeper connection.
 - check if base done is available = now done once with a zookeeper 
 connection given as a parameter
 - isTableDisabled/isTableAvailable = public functions, now done with a 
 temporary zookeeper connection.
  - Called internally from HBaseAdmin and HTable
 - getCurrentNrHRS(): public function to get the number of region servers and 
 create a pool of thread = now done with a temporary zookeeper connection
 -
 Master is used for:
 - getMaster public getter, as for ZooKeeper = we have to deprecate this but 
 keep it.
 - isMasterRunning(): public function, used internally by HMerge  HBaseAdmin
 - getHTableDescriptor*: public functions offering access to the master.  = 
 we could make them using a temporary master connection as well.
 Main points are:
 - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a 
 strongly coupled architecture ;-). This can be changed, but requires a lot of 
 modifications in these classes (likely adding a class in the middle of the 
 hierarchy, something like that). Anyway, non connected client will always be 
 really slower, because it's a tcp connection, and establishing a tcp 
 connection is slow.
 - having a link between ZK and all the client seems to make sense for some 
 Use Cases. However, it won't scale if a TCP connection is required for every 
 client
 - if we move the table descriptor part away from the client, we need to find 
 a new place for it.
 - we will have the same issue if HBaseAdmin (for both ZK  Master), may be we 
 can put a timeout on the connection. That would make the whole system less 
 deterministic however.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5279) NPE in Master after upgrading to 0.92.0

2012-02-17 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210552#comment-13210552
 ] 

Hadoop QA commented on HBASE-5279:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12515016/HBASE-5279-v2.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -136 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 158 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestImportTsv
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/982//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/982//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/982//console

This message is automatically generated.

 NPE in Master after upgrading to 0.92.0
 ---

 Key: HBASE-5279
 URL: https://issues.apache.org/jira/browse/HBASE-5279
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.0
Reporter: Tobias Herbert
Priority: Critical
 Fix For: 0.92.1

 Attachments: HBASE-5279-v2.patch, HBASE-5279.patch


 I have upgraded my environment from 0.90.4 to 0.92.0
 after the table migration I get the following error in the master (permanent)
 {noformat}
 2012-01-25 18:23:48,648 FATAL master-namenode,6,1327512209588 
 org.apache.hadoop.hbase.master.HMaster - Unhandled exception. Starting 
 shutdown.
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.rebuildUserRegions(AssignmentManager.java:2190)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:323)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:501)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:326)
 at java.lang.Thread.run(Thread.java:662)
 2012-01-25 18:23:48,650 INFO namenode,6,1327512209588 
 org.apache.hadoop.hbase.master.HMaster - Aborting
 {noformat}
 I think that's because I had a hard crash in the cluster a while ago - and 
 the following WARN since then
 {noformat}
 2012-01-25 21:20:47,121 WARN namenode,6,1327513078123-CatalogJanitor 
 org.apache.hadoop.hbase.master.CatalogJanitor - REGIONINFO_QUALIFIER is empty 
 in keyvalues={emails,,xxx./info:server/1314336400471/Put/vlen=38, 
 emails,,1314189353300.xxx./info:serverstartcode/1314336400471/Put/vlen=8}
 {noformat}
 my patch was simple to go around the NPE (as the other code around the lines)
 but I don't know if that's correct

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5427) Upgrade our zk to 3.4.3

2012-02-17 Thread stack (Created) (JIRA)
Upgrade our zk to 3.4.3
---

 Key: HBASE-5427
 URL: https://issues.apache.org/jira/browse/HBASE-5427
 Project: HBase
  Issue Type: Task
Reporter: stack
 Fix For: 0.94.0, 0.92.1
 Attachments: 5427.txt



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5427) Upgrade our zk to 3.4.3

2012-02-17 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5427:
-

Attachment: 5427.txt

 Upgrade our zk to 3.4.3
 ---

 Key: HBASE-5427
 URL: https://issues.apache.org/jira/browse/HBASE-5427
 Project: HBase
  Issue Type: Task
Reporter: stack
 Fix For: 0.94.0, 0.92.1

 Attachments: 5427.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5425) Punt on the timeout doesn't work in BulkEnabler#waitUntilDone (master's EnableTableHandler)

2012-02-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210562#comment-13210562
 ] 

Hudson commented on HBASE-5425:
---

Integrated in HBase-0.92 #286 (See 
[https://builds.apache.org/job/HBase-0.92/286/])
HBASE-5425 Punt on the timeout doesn't work in BulkEnabler#waitUntilDone 
(master's EnableTableHandler) (Revision 1245676)

 Result = SUCCESS
stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java


  Punt on the timeout doesn't work in BulkEnabler#waitUntilDone (master's 
 EnableTableHandler)
 

 Key: HBASE-5425
 URL: https://issues.apache.org/jira/browse/HBASE-5425
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.5, 0.92.0
Reporter: terry zhang
 Fix For: 0.94.0

 Attachments: HBASE-5425.patch


 please take a look at the code below in EnableTableHandler(hbase master):
 {code:title=EnableTableHandler.java|borderStyle=solid}
 protected boolean waitUntilDone(long timeout)
 throws InterruptedException {
 
   .
   int lastNumberOfRegions = this.countOfRegionsInTable;
   while (!server.isStopped()  remaining  0) {
 Thread.sleep(waitingTimeForEvents);
 regions = assignmentManager.getRegionsOfTable(tableName);
 if (isDone(regions)) break;
 // Punt on the timeout as long we make progress
 if (regions.size()  lastNumberOfRegions) {
   lastNumberOfRegions = regions.size();
   timeout += waitingTimeForEvents;
 }
 remaining = timeout - (System.currentTimeMillis() - startTime);
 
 }
 private boolean isDone(final ListHRegionInfo regions) {
   return regions != null  regions.size() = this.countOfRegionsInTable;
 }
 {code} 
 We can easily find out if we let lastNumberOfRegions = 
 this.countOfRegionsInTable , the function of punt on timeout code will never 
 be executed. I think initlize lastNumberOfRegions = 0 can make it work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-5427) Upgrade our zk to 3.4.3

2012-02-17 Thread stack (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-5427.
--

Resolution: Fixed
  Assignee: stack

Committed 0.92 branch and trunk.

 Upgrade our zk to 3.4.3
 ---

 Key: HBASE-5427
 URL: https://issues.apache.org/jira/browse/HBASE-5427
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Fix For: 0.94.0, 0.92.1

 Attachments: 5427.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble

2012-02-17 Thread nkeywal (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210561#comment-13210561
 ] 

nkeywal commented on HBASE-5399:


It's not the last version (it needs more comments, unit tests and likely bug 
fixes), but there is already a lot. Master  ZooKeeper connection are now 
created only when necessary, and are closed if not used for 5 minutes.
I added the keep alive stuff. It's not a nice to have; without it the unit 
tests take twice more time.
There is an issue with the masterCheck part, the previous behavior was strange. 
I need to review it in details.
The patch is on monday trunk. I will make it compatible on current trunk this 
week-end.
I will move isTableEnabled  so on in an other patch, this one is already too 
big...

 Cut the link between the client and the zookeeper ensemble
 --

 Key: HBASE-5399
 URL: https://issues.apache.org/jira/browse/HBASE-5399
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5399_inprogress.patch, 5399_inprogress.v3.patch


 The link is often considered as an issue, for various reasons. One of them 
 being that there is a limit on the number of connection that ZK can manage. 
 Stack was suggesting as well to remove the link to master from HConnection.
 There are choices to be made considering the existing API (that we don't want 
 to break).
 The first patches I will submit on hadoop-qa should not be committed: they 
 are here to show the progress on the direction taken.
 ZooKeeper is used for:
 - public getter, to let the client do whatever he wants, and close ZooKeeper 
 when closing the connection = we have to deprecate this but keep it.
 - read get master address to create a master = now done with a temporary 
 zookeeper connection
 - read root location = now done with a temporary zookeeper connection, but 
 questionable. Used in public function locateRegion. To be reworked.
 - read cluster id = now done once with a temporary zookeeper connection.
 - check if base done is available = now done once with a zookeeper 
 connection given as a parameter
 - isTableDisabled/isTableAvailable = public functions, now done with a 
 temporary zookeeper connection.
  - Called internally from HBaseAdmin and HTable
 - getCurrentNrHRS(): public function to get the number of region servers and 
 create a pool of thread = now done with a temporary zookeeper connection
 -
 Master is used for:
 - getMaster public getter, as for ZooKeeper = we have to deprecate this but 
 keep it.
 - isMasterRunning(): public function, used internally by HMerge  HBaseAdmin
 - getHTableDescriptor*: public functions offering access to the master.  = 
 we could make them using a temporary master connection as well.
 Main points are:
 - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a 
 strongly coupled architecture ;-). This can be changed, but requires a lot of 
 modifications in these classes (likely adding a class in the middle of the 
 hierarchy, something like that). Anyway, non connected client will always be 
 really slower, because it's a tcp connection, and establishing a tcp 
 connection is slow.
 - having a link between ZK and all the client seems to make sense for some 
 Use Cases. However, it won't scale if a TCP connection is required for every 
 client
 - if we move the table descriptor part away from the client, we need to find 
 a new place for it.
 - we will have the same issue if HBaseAdmin (for both ZK  Master), may be we 
 can put a timeout on the connection. That would make the whole system less 
 deterministic however.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5195) [Coprocessors] preGet hook does not allow overriding or wrapping filter on incoming Get

2012-02-17 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210565#comment-13210565
 ] 

stack commented on HBASE-5195:
--

This looks like a pretty important fix.  Should it be more than major priority? 
 Should it go into 0.92.1?

 [Coprocessors] preGet hook does not allow overriding or wrapping filter on 
 incoming Get
 ---

 Key: HBASE-5195
 URL: https://issues.apache.org/jira/browse/HBASE-5195
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.92.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.94.0, 0.92.1

 Attachments: HBASE-5195.patch


 Without the ability to wrap the internal Scan on the Get, we can't override 
 (or protect, in the case of access control) Gets as we can Scans. The result 
 is inconsistent behavior.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5209) HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup

2012-02-17 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210567#comment-13210567
 ] 

stack commented on HBASE-5209:
--

Patch looks excellent.

One issue is upping of the HMasterInterface version.  Its the 'right' thing to 
do but then it means I can't apply to 0.92.1 and it breaks a 0.92 talking to a 
0.94 which currently is possible.  Can you try adding the isActiveMaster to the 
end of the Interface and NOT update the version.  See if you can connect to a 
0.92.1 server from a 0.92.0 client and see if it you can do basic 
HMasterInterface operations such as isLoadBalancer running.



 HConnection/HMasterInterface should allow for way to get hostname of 
 currently active master in multi-master HBase setup
 

 Key: HBASE-5209
 URL: https://issues.apache.org/jira/browse/HBASE-5209
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.94.0, 0.90.5, 0.92.0
Reporter: Aditya Acharya
Assignee: David S. Wang
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: HBASE-5209-v0.diff, HBASE-5209-v1.diff


 I have a multi-master HBase set up, and I'm trying to programmatically 
 determine which of the masters is currently active. But the API does not 
 allow me to do this. There is a getMaster() method in the HConnection class, 
 but it returns an HMasterInterface, whose methods do not allow me to find out 
 which master won the last race. The API should have a 
 getActiveMasterHostname() or something to that effect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5195) [Coprocessors] preGet hook does not allow overriding or wrapping filter on incoming Get

2012-02-17 Thread Andrew Purtell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210569#comment-13210569
 ] 

Andrew Purtell commented on HBASE-5195:
---

+1 for including in 0.92.1


 [Coprocessors] preGet hook does not allow overriding or wrapping filter on 
 incoming Get
 ---

 Key: HBASE-5195
 URL: https://issues.apache.org/jira/browse/HBASE-5195
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.92.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.94.0, 0.92.1

 Attachments: HBASE-5195.patch


 Without the ability to wrap the internal Scan on the Get, we can't override 
 (or protect, in the case of access control) Gets as we can Scans. The result 
 is inconsistent behavior.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5265) Fix 'revoke' shell command

2012-02-17 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210570#comment-13210570
 ] 

stack commented on HBASE-5265:
--

Is this important for 0.92.1 lads?

 Fix 'revoke' shell command
 --

 Key: HBASE-5265
 URL: https://issues.apache.org/jira/browse/HBASE-5265
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Andrew Purtell
Assignee: Eugene Koontz
 Fix For: 0.94.0, 0.92.1


 The 'revoke' shell command needs to be reworked for the AccessControlProtocol 
 implementation that was finalized for 0.92. The permissions being removed 
 must exactly match what was previously granted. No wildcard matching is done 
 server side.
 Allow two forms of the command in the shell for convenience:
 Revocation of a specific grant:
 {code}
 revoke user, table, column family [ , column_qualifier ]
 {code}
 Have the shell automatically do so for all permissions on a table for a given 
 user:
 {code}
 revoke user, table
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5428) Allow for custom filters to be registered within the Thrift interface

2012-02-17 Thread Robert Roland (Created) (JIRA)
Allow for custom filters to be registered within the Thrift interface
-

 Key: HBASE-5428
 URL: https://issues.apache.org/jira/browse/HBASE-5428
 Project: HBase
  Issue Type: Improvement
  Components: thrift
Affects Versions: 0.92.0
Reporter: Robert Roland


Custom filters work within the Java client API, but are not accessible within 
the Thrift API.  Attempting to use one will generate a Filter Name x not 
supported

Attached patch allows a user to specify a list of custom filters that are 
registered at Thrift server startup time within the HBase configuration files:

property
  namehbase.thrift.filters/name
  valueMyFilter:com.foo.Filter,OtherFilter:com.foo.OtherFilter/value
/property

Patch created off SVN r1245727


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5428) Allow for custom filters to be registered within the Thrift interface

2012-02-17 Thread Robert Roland (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Roland updated HBASE-5428:
-

Attachment: ThriftCustomFilters.patch

 Allow for custom filters to be registered within the Thrift interface
 -

 Key: HBASE-5428
 URL: https://issues.apache.org/jira/browse/HBASE-5428
 Project: HBase
  Issue Type: Improvement
  Components: thrift
Affects Versions: 0.92.0
Reporter: Robert Roland
  Labels: patch
 Fix For: 0.94.0

 Attachments: ThriftCustomFilters.patch


 Custom filters work within the Java client API, but are not accessible within 
 the Thrift API.  Attempting to use one will generate a Filter Name x not 
 supported
 Attached patch allows a user to specify a list of custom filters that are 
 registered at Thrift server startup time within the HBase configuration files:
 property
   namehbase.thrift.filters/name
   valueMyFilter:com.foo.Filter,OtherFilter:com.foo.OtherFilter/value
 /property
 Patch created off SVN r1245727

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >