[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23
[ https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210741#comment-13210741 ] Hadoop QA commented on HBASE-5317: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515057/HBASE-5317-v3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 158 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/985//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/985//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/985//console This message is automatically generated. Fix TestHFileOutputFormat to work against hadoop 0.23 - Key: HBASE-5317 URL: https://issues.apache.org/jira/browse/HBASE-5317 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0, 0.92.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch, HBASE-5317-v3.patch Running mvn -Dhadoop.profile=23 test -P localTests -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat yields this on 0.92: Failed tests: testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): HFile for column family info-A not found Tests in error: test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0 (Is a directory) testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable It looks like on trunk, this also results in an error: testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but haven't fixed the other 3 yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5347) GC free memory management in Level-1 Block Cache
[ https://issues.apache.org/jira/browse/HBASE-5347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210750#comment-13210750 ] Todd Lipcon commented on HBASE-5347: Hey folks. I haven't been through the patch yet, but just wanted to throw out one idea that I think can make reference-counted systems a little simpler: in Cocoa (the OSX development framework) there's a class called NSAutoreleasePool, an instance of which is carried around as part of the local thread context. You can then call autorelease on any object, which will not immediately decrement the ref count, but adds it to the pool. When you release the pool, all referenced objects are decremented at that point. This idea might make it easier to manage references. For example, when something is read by a scanner, it could be read with ref count incremented but put on the request's autorelease pool. Then, when any IPC handler thread is returned to the thread pool, the auto release pool could be decremented. This ensures that any stuff we reference is kept around for the whole request lifecycle but still automatically dereffed at the end. Do you think such a construct would be useful here? https://developer.apple.com/library/mac/#documentation/Cocoa/Conceptual/MemoryMgmt/Articles/mmAutoreleasePools.html has some more info. GC free memory management in Level-1 Block Cache Key: HBASE-5347 URL: https://issues.apache.org/jira/browse/HBASE-5347 Project: HBase Issue Type: Improvement Reporter: Prakash Khemani Assignee: Prakash Khemani Attachments: D1635.5.patch On eviction of a block from the block-cache, instead of waiting for the garbage collecter to reuse its memory, reuse the block right away. This will require us to keep reference counts on the HFile blocks. Once we have the reference counts in place we can do our own simple blocks-out-of-slab allocation for the block-cache. This will help us with * reducing gc pressure, especially in the old generation * making it possible to have non-java-heap memory backing the HFile blocks -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5332) Deterministic Compaction Jitter
[ https://issues.apache.org/jira/browse/HBASE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210751#comment-13210751 ] Lars Hofhansl commented on HBASE-5332: -- I kinda like the simpleness of the random jitter. Part of the problem seems to be that we only get a few random choices with delay + jitter*(1 - 2*Math.random()) What if we just change this to delay + jitter*(2 - 4*Math.random()) or delay + jitter*(3 - 6*Math.random()) and decrease jitter accordingly? Deterministic Compaction Jitter --- Key: HBASE-5332 URL: https://issues.apache.org/jira/browse/HBASE-5332 Project: HBase Issue Type: Improvement Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Priority: Minor Attachments: D1785.1.patch, D1785.2.patch Currently, we add jitter to a compaction using delay + jitter*(1 - 2*Math.random()). Since this is non-deterministic, we can get major compaction storms on server restart as half the Stores that were set to delay + jitter will now be set to delay - jitter. We need a more deterministic way to jitter major compactions so this information can persist across server restarts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5003) If the master is started with a wrong root dir, it gets stuck and can't be killed
[ https://issues.apache.org/jira/browse/HBASE-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shaneal Manek updated HBASE-5003: - Attachment: hbase-5003.patch If the master is started with a wrong root dir, it gets stuck and can't be killed - Key: HBASE-5003 URL: https://issues.apache.org/jira/browse/HBASE-5003 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Priority: Critical Labels: noob Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: hbase-5003.patch Reported by a new user on IRC who tried to set hbase.rootdir to file:///~/hbase, the master gets stuck and cannot be killed. I tried something similar on my machine and it spins while logging: {quote} 2011-12-09 16:11:17,002 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to create version file at file:/bin/hbase, retrying: Mkdirs failed to create file:/bin/hbase 2011-12-09 16:11:27,002 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to create version file at file:/bin/hbase, retrying: Mkdirs failed to create file:/bin/hbase 2011-12-09 16:11:37,003 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to create version file at file:/bin/hbase, retrying: Mkdirs failed to create file:/bin/hbase {quote} The reason it cannot be stopped is that the master's main thread is stuck in there and will never be notified: {quote} Master:0;su-jdcryans-01.local,51116,1323475535684 prio=5 tid=7f92b7a3c000 nid=0x1137ba000 waiting on condition [1137b9000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.util.FSUtils.setVersion(FSUtils.java:297) at org.apache.hadoop.hbase.util.FSUtils.setVersion(FSUtils.java:268) at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:339) at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:128) at org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:113) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:435) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:314) at org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:218) at java.lang.Thread.run(Thread.java:680) {quote} It seems we should do a better handling of the exceptions we get in there, and die if we need to. It would make a better user experience. Maybe also do a check on hbase.rootdir before even starting the master. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5003) If the master is started with a wrong root dir, it gets stuck and can't be killed
[ https://issues.apache.org/jira/browse/HBASE-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shaneal Manek updated HBASE-5003: - Attachment: hbase-5003-v2.patch If the master is started with a wrong root dir, it gets stuck and can't be killed - Key: HBASE-5003 URL: https://issues.apache.org/jira/browse/HBASE-5003 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Priority: Critical Labels: noob Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: hbase-5003-v2.patch, hbase-5003.patch Reported by a new user on IRC who tried to set hbase.rootdir to file:///~/hbase, the master gets stuck and cannot be killed. I tried something similar on my machine and it spins while logging: {quote} 2011-12-09 16:11:17,002 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to create version file at file:/bin/hbase, retrying: Mkdirs failed to create file:/bin/hbase 2011-12-09 16:11:27,002 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to create version file at file:/bin/hbase, retrying: Mkdirs failed to create file:/bin/hbase 2011-12-09 16:11:37,003 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to create version file at file:/bin/hbase, retrying: Mkdirs failed to create file:/bin/hbase {quote} The reason it cannot be stopped is that the master's main thread is stuck in there and will never be notified: {quote} Master:0;su-jdcryans-01.local,51116,1323475535684 prio=5 tid=7f92b7a3c000 nid=0x1137ba000 waiting on condition [1137b9000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.util.FSUtils.setVersion(FSUtils.java:297) at org.apache.hadoop.hbase.util.FSUtils.setVersion(FSUtils.java:268) at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:339) at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:128) at org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:113) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:435) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:314) at org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:218) at java.lang.Thread.run(Thread.java:680) {quote} It seems we should do a better handling of the exceptions we get in there, and die if we need to. It would make a better user experience. Maybe also do a check on hbase.rootdir before even starting the master. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5423) Regionserver may block forever on waitOnAllRegionsToClose when aborting
[ https://issues.apache.org/jira/browse/HBASE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210760#comment-13210760 ] chunhui shen commented on HBASE-5423: - If we failed closing one region, we will remove it from RIT , so it maybe calling close multiple times on same region. {code} CloseRegionHandler#process(){ try{ ... region.close(abort) ... }finally{ this.rsServices.getRegionsInTransitionInRS(). remove(this.regionInfo.getEncodedNameAsBytes()); } } {code} Therefore, if we can't close some regions because of some exception, we should break even though online regions is not yet empty, otherwise, it may block forever on waitOnAllRegionsToClose Regionserver may block forever on waitOnAllRegionsToClose when aborting --- Key: HBASE-5423 URL: https://issues.apache.org/jira/browse/HBASE-5423 Project: HBase Issue Type: Bug Components: regionserver Reporter: chunhui shen Assignee: chunhui shen Attachments: hbase-5423.patch If closeRegion throws any exception (It would be caused by FS ) when RS is aborting, RS will block forever on waitOnAllRegionsToClose(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5003) If the master is started with a wrong root dir, it gets stuck and can't be killed
[ https://issues.apache.org/jira/browse/HBASE-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shaneal Manek updated HBASE-5003: - Status: Patch Available (was: Open) Simply has the master retry writing the version file 3 times (by default - but configurable). If it fails, the master shuts down gracefully. Please disregard the first patch - it accidentally includes the buggy hbase-site.xml I was using to reproduce this issue. If the master is started with a wrong root dir, it gets stuck and can't be killed - Key: HBASE-5003 URL: https://issues.apache.org/jira/browse/HBASE-5003 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Priority: Critical Labels: noob Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: hbase-5003-v2.patch, hbase-5003.patch Reported by a new user on IRC who tried to set hbase.rootdir to file:///~/hbase, the master gets stuck and cannot be killed. I tried something similar on my machine and it spins while logging: {quote} 2011-12-09 16:11:17,002 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to create version file at file:/bin/hbase, retrying: Mkdirs failed to create file:/bin/hbase 2011-12-09 16:11:27,002 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to create version file at file:/bin/hbase, retrying: Mkdirs failed to create file:/bin/hbase 2011-12-09 16:11:37,003 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to create version file at file:/bin/hbase, retrying: Mkdirs failed to create file:/bin/hbase {quote} The reason it cannot be stopped is that the master's main thread is stuck in there and will never be notified: {quote} Master:0;su-jdcryans-01.local,51116,1323475535684 prio=5 tid=7f92b7a3c000 nid=0x1137ba000 waiting on condition [1137b9000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.util.FSUtils.setVersion(FSUtils.java:297) at org.apache.hadoop.hbase.util.FSUtils.setVersion(FSUtils.java:268) at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:339) at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:128) at org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:113) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:435) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:314) at org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:218) at java.lang.Thread.run(Thread.java:680) {quote} It seems we should do a better handling of the exceptions we get in there, and die if we need to. It would make a better user experience. Maybe also do a check on hbase.rootdir before even starting the master. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5422) StartupBulkAssigner would cause a lot of timeout on RIT when assigning large numbers of regions (timeout = 3 mins)
[ https://issues.apache.org/jira/browse/HBASE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210766#comment-13210766 ] chunhui shen commented on HBASE-5422: - Yes, we should add regionPlans, so that when an open comes in from the StartupBulkAssign, we could update timers of RIT, where they have the same assigning destination. I agree with make an addPlan method that takes a Map of plans. StartupBulkAssigner would cause a lot of timeout on RIT when assigning large numbers of regions (timeout = 3 mins) -- Key: HBASE-5422 URL: https://issues.apache.org/jira/browse/HBASE-5422 Project: HBase Issue Type: Bug Components: master Reporter: chunhui shen Attachments: 5422-90.patch, hbase-5422.patch In our produce environment We find a lot of timeout on RIT when cluster up, there are about 7w regions in the cluster( 25 regionservers ). First, we could see the following log:(See the region 33cf229845b1009aa8a3f7b0f85c9bd0) master's log 2012-02-13 18:07:41,409 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Async create of unassigned node for 33cf229845b1009aa8a3f7b0f85c9bd0 with OFFLINE state 2012-02-13 18:07:42,560 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=OFFLINE, ts=1329127661409, server=r03f11025.yh.aliyun.com,60020,1329127549907 2012-02-13 18:07:42,996 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=OFFLINE, ts=1329127661409 2012-02-13 18:10:48,072 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=PENDING_OPEN, ts=1329127662996 2012-02-13 18:10:48,072 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN for too long, reassigning region=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 2012-02-13 18:11:16,744 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=r03f11025.yh.aliyun.com,60020,1329127549907, region=33cf229845b1009aa8a3f7b0f85c9bd0 2012-02-13 18:38:07,310 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 33cf229845b1009aa8a3f7b0f85c9bd0; deleting unassigned node 2012-02-13 18:38:07,310 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Deleting existing unassigned node for 33cf229845b1009aa8a3f7b0f85c9bd0 that is in expected state RS_ZK_REGION_OPENED 2012-02-13 18:38:07,314 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Successfully deleted unassigned node for region 33cf229845b1009aa8a3f7b0f85c9bd0 in expected state RS_ZK_REGION_OPENED 2012-02-13 18:38:07,573 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. on r03f11025.yh.aliyun.com,60020,1329127549907 2012-02-13 18:50:54,428 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. so generated a random one; hri=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0., src=, dest=r01b05043.yh.aliyun.com,60020,1329127549041; 29 (online=29, exclude=null) available servers 2012-02-13 18:50:54,428 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. to r01b05043.yh.aliyun.com,60020,1329127549041 2012-02-13 19:31:50,514 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=PENDING_OPEN, ts=1329132528086 2012-02-13 19:31:50,514 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN for too long, reassigning region=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. Regionserver's log 2012-02-13 18:07:43,537 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 2012-02-13 18:11:16,560 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing open of item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0.
[jira] [Commented] (HBASE-5229) Provide basic building blocks for multi-row local transactions.
[ https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210770#comment-13210770 ] Lars Hofhansl commented on HBASE-5229: -- Note that Cassandra adds something similar in 1.1: http://www.datastax.com/dev/blog/schema-in-cassandra-1-1 (check towards the end of that blog post). Provide basic building blocks for multi-row local transactions. - Key: HBASE-5229 URL: https://issues.apache.org/jira/browse/HBASE-5229 Project: HBase Issue Type: New Feature Components: client, regionserver Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.0 Attachments: 5229-endpoint.txt, 5229-final.txt, 5229-multiRow-v2.txt, 5229-multiRow.txt, 5229-seekto-v2.txt, 5229-seekto.txt, 5229.txt In the final iteration, this issue provides a generalized, public mutateRowsWithLocks method on HRegion, that can be used by coprocessors to implement atomic operations efficiently. Coprocessors are already region aware, which makes this is a good pairing of APIs. This feature is by design not available to the client via the HTable API. It took a long time to arrive at this and I apologize for the public exposure of my (erratic in retrospect) thought processes. Was: HBase should provide basic building blocks for multi-row local transactions. Local means that we do this by co-locating the data. Global (cross region) transactions are not discussed here. After a bit of discussion two solutions have emerged: 1. Keep the row-key for determining grouping and location and allow efficient intra-row scanning. A client application would then model tables as HBase-rows. 2. Define a prefix-length in HTableDescriptor that defines a grouping of rows. Regions will then never be split inside a grouping prefix. #1 is true to the current storage paradigm of HBase. #2 is true to the current client side API. I will explore these two with sample patches here. Was: As discussed (at length) on the dev mailing list with the HBASE-3584 and HBASE-5203 committed, supporting atomic cross row transactions within a region becomes simple. I am aware of the hesitation about the usefulness of this feature, but we have to start somewhere. Let's use this jira for discussion, I'll attach a patch (with tests) momentarily to make this concrete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5420) TestImportTsv does not shut down MR Cluster correctly (fails against 0.23 hadoop)
[ https://issues.apache.org/jira/browse/HBASE-5420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210773#comment-13210773 ] Hudson commented on HBASE-5420: --- Integrated in HBase-0.92 #289 (See [https://builds.apache.org/job/HBase-0.92/289/]) HBASE-5420 TestImportTsv does not shut down MR Cluster correctly (fails against 0.23 hadoop) (Revision 1245796) Result = SUCCESS stack : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/mapreduce/TestImportTsv.java TestImportTsv does not shut down MR Cluster correctly (fails against 0.23 hadoop) - Key: HBASE-5420 URL: https://issues.apache.org/jira/browse/HBASE-5420 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0, 0.92.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Fix For: 0.94.0, 0.92.1 Attachments: HBASE-5420-v1.patch, HBASE-5420.patch Test calls startMiniMapReduceCluster() but never calls shutdownMiniMapReduceCluster(). This causes failures with -Dhadoop.profile=23 when both testMROnTable and testMROnTableWithCustomMapper are run, because the cluster cannot start up properly for the second test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210772#comment-13210772 ] chunhui shen commented on HBASE-5270: - {code} +// We set serverLoad with one region, it could differentiate with +// regionserver which is started just now +HServerLoad serverLoad = new HServerLoad(); +serverLoad.setNumberOfRegions(1); How you know it has a region? {code} We do this to mark the RS running ago, not the regionserver which is started just now. (If it is a regionserver started just now, it has no regions, so when master assignRootAndMeta,we needn't expire it.(Only 90 version need do this, because rootLocation doesn't contain startcode, so we can't be sure it is a rootServer according to HServerAddress)) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler - Key: HBASE-5270 URL: https://issues.apache.org/jira/browse/HBASE-5270 Project: HBase Issue Type: Sub-task Components: master Reporter: Zhihong Yu Fix For: 0.94.0, 0.92.1 Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 5270-90.patch, 5270-90v2.patch, 5270-testcase.patch, 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch, sampletest.txt This JIRA continues the effort from HBASE-5179. Starting with Stack's comments about patches for 0.92 and TRUNK: Reviewing 0.92v17 isDeadServerInProgress is a new public method in ServerManager but it does not seem to be used anywhere. Does isDeadRootServerInProgress need to be public? Ditto for meta version. This method param names are not right 'definitiveRootServer'; what is meant by definitive? Do they need this qualifier? Is there anything in place to stop us expiring a server twice if its carrying root and meta? What is difference between asking assignment manager isCarryingRoot and this variable that is passed in? Should be doc'd at least. Ditto for meta. I think I've asked for this a few times - onlineServers needs to be explained... either in javadoc or in comment. This is the param passed into joinCluster. How does it arise? I think I know but am unsure. God love the poor noob that comes awandering this code trying to make sense of it all. It looks like we get the list by trawling zk for regionserver znodes that have not checked in. Don't we do this operation earlier in master setup? Are we doing it again here? Though distributed split log is configured, we will do in master single process splitting under some conditions with this patch. Its not explained in code why we would do this. Why do we think master log splitting 'high priority' when it could very well be slower. Should we only go this route if distributed splitting is not going on. Do we know if concurrent distributed log splitting and master splitting works? Why would we have dead servers in progress here in master startup? Because a servershutdownhandler fired? This patch is different to the patch for 0.90. Should go into trunk first with tests, then 0.92. Should it be in this issue? This issue is really hard to follow now. Maybe this issue is for 0.90.x and new issue for more work on this trunk patch? This patch needs to have the v18 differences applied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210776#comment-13210776 ] chunhui shen commented on HBASE-5270: - {code}Can you just do + super.nodeDeleted(path); instead of + GatedNodeDeleteRegionServerTracker.super.nodeDeleted(path);? {code} If we block the nodeDeleted(path) in GatedNodeDeleteRegionServerTracker, it will block all the ZK event. so I just want to delay the event of RS node deleted through a thread. However, in the thread#run(), we need call GatedNodeDeleteRegionServerTracker.super.nodeDeleted(path); Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler - Key: HBASE-5270 URL: https://issues.apache.org/jira/browse/HBASE-5270 Project: HBase Issue Type: Sub-task Components: master Reporter: Zhihong Yu Fix For: 0.94.0, 0.92.1 Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 5270-90.patch, 5270-90v2.patch, 5270-testcase.patch, 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch, sampletest.txt This JIRA continues the effort from HBASE-5179. Starting with Stack's comments about patches for 0.92 and TRUNK: Reviewing 0.92v17 isDeadServerInProgress is a new public method in ServerManager but it does not seem to be used anywhere. Does isDeadRootServerInProgress need to be public? Ditto for meta version. This method param names are not right 'definitiveRootServer'; what is meant by definitive? Do they need this qualifier? Is there anything in place to stop us expiring a server twice if its carrying root and meta? What is difference between asking assignment manager isCarryingRoot and this variable that is passed in? Should be doc'd at least. Ditto for meta. I think I've asked for this a few times - onlineServers needs to be explained... either in javadoc or in comment. This is the param passed into joinCluster. How does it arise? I think I know but am unsure. God love the poor noob that comes awandering this code trying to make sense of it all. It looks like we get the list by trawling zk for regionserver znodes that have not checked in. Don't we do this operation earlier in master setup? Are we doing it again here? Though distributed split log is configured, we will do in master single process splitting under some conditions with this patch. Its not explained in code why we would do this. Why do we think master log splitting 'high priority' when it could very well be slower. Should we only go this route if distributed splitting is not going on. Do we know if concurrent distributed log splitting and master splitting works? Why would we have dead servers in progress here in master startup? Because a servershutdownhandler fired? This patch is different to the patch for 0.90. Should go into trunk first with tests, then 0.92. Should it be in this issue? This issue is really hard to follow now. Maybe this issue is for 0.90.x and new issue for more work on this trunk patch? This patch needs to have the v18 differences applied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210777#comment-13210777 ] chunhui shen commented on HBASE-5270: - {code} Why the need for this timeout: +Thread.sleep(1 * 2); +((GatedNodeDeleteRegionServerTracker) master.getRegionServerTracker()).gate +.set(false); {code} Because we sleep 10s after splitLog, we sleep 20s to make sure that master is assigning RootAndMeta or has assigned. After it we starting process the event of RS node deleted Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler - Key: HBASE-5270 URL: https://issues.apache.org/jira/browse/HBASE-5270 Project: HBase Issue Type: Sub-task Components: master Reporter: Zhihong Yu Fix For: 0.94.0, 0.92.1 Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 5270-90.patch, 5270-90v2.patch, 5270-testcase.patch, 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch, sampletest.txt This JIRA continues the effort from HBASE-5179. Starting with Stack's comments about patches for 0.92 and TRUNK: Reviewing 0.92v17 isDeadServerInProgress is a new public method in ServerManager but it does not seem to be used anywhere. Does isDeadRootServerInProgress need to be public? Ditto for meta version. This method param names are not right 'definitiveRootServer'; what is meant by definitive? Do they need this qualifier? Is there anything in place to stop us expiring a server twice if its carrying root and meta? What is difference between asking assignment manager isCarryingRoot and this variable that is passed in? Should be doc'd at least. Ditto for meta. I think I've asked for this a few times - onlineServers needs to be explained... either in javadoc or in comment. This is the param passed into joinCluster. How does it arise? I think I know but am unsure. God love the poor noob that comes awandering this code trying to make sense of it all. It looks like we get the list by trawling zk for regionserver znodes that have not checked in. Don't we do this operation earlier in master setup? Are we doing it again here? Though distributed split log is configured, we will do in master single process splitting under some conditions with this patch. Its not explained in code why we would do this. Why do we think master log splitting 'high priority' when it could very well be slower. Should we only go this route if distributed splitting is not going on. Do we know if concurrent distributed log splitting and master splitting works? Why would we have dead servers in progress here in master startup? Because a servershutdownhandler fired? This patch is different to the patch for 0.90. Should go into trunk first with tests, then 0.92. Should it be in this issue? This issue is really hard to follow now. Maybe this issue is for 0.90.x and new issue for more work on this trunk patch? This patch needs to have the v18 differences applied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210778#comment-13210778 ] chunhui shen commented on HBASE-5270: - Because this issue contains a bug that root will not be assigned and master will block on waiting for root when initializing So we set timeout for the testcase. Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler - Key: HBASE-5270 URL: https://issues.apache.org/jira/browse/HBASE-5270 Project: HBase Issue Type: Sub-task Components: master Reporter: Zhihong Yu Fix For: 0.94.0, 0.92.1 Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 5270-90.patch, 5270-90v2.patch, 5270-testcase.patch, 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch, sampletest.txt This JIRA continues the effort from HBASE-5179. Starting with Stack's comments about patches for 0.92 and TRUNK: Reviewing 0.92v17 isDeadServerInProgress is a new public method in ServerManager but it does not seem to be used anywhere. Does isDeadRootServerInProgress need to be public? Ditto for meta version. This method param names are not right 'definitiveRootServer'; what is meant by definitive? Do they need this qualifier? Is there anything in place to stop us expiring a server twice if its carrying root and meta? What is difference between asking assignment manager isCarryingRoot and this variable that is passed in? Should be doc'd at least. Ditto for meta. I think I've asked for this a few times - onlineServers needs to be explained... either in javadoc or in comment. This is the param passed into joinCluster. How does it arise? I think I know but am unsure. God love the poor noob that comes awandering this code trying to make sense of it all. It looks like we get the list by trawling zk for regionserver znodes that have not checked in. Don't we do this operation earlier in master setup? Are we doing it again here? Though distributed split log is configured, we will do in master single process splitting under some conditions with this patch. Its not explained in code why we would do this. Why do we think master log splitting 'high priority' when it could very well be slower. Should we only go this route if distributed splitting is not going on. Do we know if concurrent distributed log splitting and master splitting works? Why would we have dead servers in progress here in master startup? Because a servershutdownhandler fired? This patch is different to the patch for 0.90. Should go into trunk first with tests, then 0.92. Should it be in this issue? This issue is really hard to follow now. Maybe this issue is for 0.90.x and new issue for more work on this trunk patch? This patch needs to have the v18 differences applied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210783#comment-13210783 ] chunhui shen commented on HBASE-5270: - {code} + * Dead servers under processing by the ServerShutdownHander. Whats that mean? Its while the server is being processed by ServerShutdownHandler exclusively -- these are the inProgress servers? {code} Yes,these are the inProgress servers Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler - Key: HBASE-5270 URL: https://issues.apache.org/jira/browse/HBASE-5270 Project: HBase Issue Type: Sub-task Components: master Reporter: Zhihong Yu Fix For: 0.94.0, 0.92.1 Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 5270-90.patch, 5270-90v2.patch, 5270-testcase.patch, 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch, sampletest.txt This JIRA continues the effort from HBASE-5179. Starting with Stack's comments about patches for 0.92 and TRUNK: Reviewing 0.92v17 isDeadServerInProgress is a new public method in ServerManager but it does not seem to be used anywhere. Does isDeadRootServerInProgress need to be public? Ditto for meta version. This method param names are not right 'definitiveRootServer'; what is meant by definitive? Do they need this qualifier? Is there anything in place to stop us expiring a server twice if its carrying root and meta? What is difference between asking assignment manager isCarryingRoot and this variable that is passed in? Should be doc'd at least. Ditto for meta. I think I've asked for this a few times - onlineServers needs to be explained... either in javadoc or in comment. This is the param passed into joinCluster. How does it arise? I think I know but am unsure. God love the poor noob that comes awandering this code trying to make sense of it all. It looks like we get the list by trawling zk for regionserver znodes that have not checked in. Don't we do this operation earlier in master setup? Are we doing it again here? Though distributed split log is configured, we will do in master single process splitting under some conditions with this patch. Its not explained in code why we would do this. Why do we think master log splitting 'high priority' when it could very well be slower. Should we only go this route if distributed splitting is not going on. Do we know if concurrent distributed log splitting and master splitting works? Why would we have dead servers in progress here in master startup? Because a servershutdownhandler fired? This patch is different to the patch for 0.90. Should go into trunk first with tests, then 0.92. Should it be in this issue? This issue is really hard to follow now. Maybe this issue is for 0.90.x and new issue for more work on this trunk patch? This patch needs to have the v18 differences applied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210785#comment-13210785 ] chunhui shen commented on HBASE-5270: - So, what happens if a server had root and meta and its not expired when we do failover? We'll expire it processing root. Will we expire it a second time processing meta? Perhaps the answer is no because the first expiration will clear the meta state in master? {code} if (metaServerLoad != null metaServerLoad.getNumberOfRegions() 0 + !catalogTracker.getRootLocation().equals(metaServerAddress)) { + // If metaServer is online not start just now, we expire it + this.serverManager.expireServer(metaServerInfo); +} {code} If a server had root and meta , we will ensure not expire it a second time through catalogTracker.getRootLocation().equals(metaServerAddress) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler - Key: HBASE-5270 URL: https://issues.apache.org/jira/browse/HBASE-5270 Project: HBase Issue Type: Sub-task Components: master Reporter: Zhihong Yu Fix For: 0.94.0, 0.92.1 Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 5270-90.patch, 5270-90v2.patch, 5270-testcase.patch, 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch, sampletest.txt This JIRA continues the effort from HBASE-5179. Starting with Stack's comments about patches for 0.92 and TRUNK: Reviewing 0.92v17 isDeadServerInProgress is a new public method in ServerManager but it does not seem to be used anywhere. Does isDeadRootServerInProgress need to be public? Ditto for meta version. This method param names are not right 'definitiveRootServer'; what is meant by definitive? Do they need this qualifier? Is there anything in place to stop us expiring a server twice if its carrying root and meta? What is difference between asking assignment manager isCarryingRoot and this variable that is passed in? Should be doc'd at least. Ditto for meta. I think I've asked for this a few times - onlineServers needs to be explained... either in javadoc or in comment. This is the param passed into joinCluster. How does it arise? I think I know but am unsure. God love the poor noob that comes awandering this code trying to make sense of it all. It looks like we get the list by trawling zk for regionserver znodes that have not checked in. Don't we do this operation earlier in master setup? Are we doing it again here? Though distributed split log is configured, we will do in master single process splitting under some conditions with this patch. Its not explained in code why we would do this. Why do we think master log splitting 'high priority' when it could very well be slower. Should we only go this route if distributed splitting is not going on. Do we know if concurrent distributed log splitting and master splitting works? Why would we have dead servers in progress here in master startup? Because a servershutdownhandler fired? This patch is different to the patch for 0.90. Should go into trunk first with tests, then 0.92. Should it be in this issue? This issue is really hard to follow now. Maybe this issue is for 0.90.x and new issue for more work on this trunk patch? This patch needs to have the v18 differences applied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210788#comment-13210788 ] chunhui shen commented on HBASE-5270: - For the other suggestion,I will do a modify later. Thanks for Stack's review! Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler - Key: HBASE-5270 URL: https://issues.apache.org/jira/browse/HBASE-5270 Project: HBase Issue Type: Sub-task Components: master Reporter: Zhihong Yu Fix For: 0.94.0, 0.92.1 Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 5270-90.patch, 5270-90v2.patch, 5270-testcase.patch, 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch, sampletest.txt This JIRA continues the effort from HBASE-5179. Starting with Stack's comments about patches for 0.92 and TRUNK: Reviewing 0.92v17 isDeadServerInProgress is a new public method in ServerManager but it does not seem to be used anywhere. Does isDeadRootServerInProgress need to be public? Ditto for meta version. This method param names are not right 'definitiveRootServer'; what is meant by definitive? Do they need this qualifier? Is there anything in place to stop us expiring a server twice if its carrying root and meta? What is difference between asking assignment manager isCarryingRoot and this variable that is passed in? Should be doc'd at least. Ditto for meta. I think I've asked for this a few times - onlineServers needs to be explained... either in javadoc or in comment. This is the param passed into joinCluster. How does it arise? I think I know but am unsure. God love the poor noob that comes awandering this code trying to make sense of it all. It looks like we get the list by trawling zk for regionserver znodes that have not checked in. Don't we do this operation earlier in master setup? Are we doing it again here? Though distributed split log is configured, we will do in master single process splitting under some conditions with this patch. Its not explained in code why we would do this. Why do we think master log splitting 'high priority' when it could very well be slower. Should we only go this route if distributed splitting is not going on. Do we know if concurrent distributed log splitting and master splitting works? Why would we have dead servers in progress here in master startup? Because a servershutdownhandler fired? This patch is different to the patch for 0.90. Should go into trunk first with tests, then 0.92. Should it be in this issue? This issue is really hard to follow now. Maybe this issue is for 0.90.x and new issue for more work on this trunk patch? This patch needs to have the v18 differences applied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5003) If the master is started with a wrong root dir, it gets stuck and can't be killed
[ https://issues.apache.org/jira/browse/HBASE-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210786#comment-13210786 ] Hadoop QA commented on HBASE-5003: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515059/hbase-5003-v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 158 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/986//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/986//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/986//console This message is automatically generated. If the master is started with a wrong root dir, it gets stuck and can't be killed - Key: HBASE-5003 URL: https://issues.apache.org/jira/browse/HBASE-5003 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Priority: Critical Labels: noob Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: hbase-5003-v2.patch, hbase-5003.patch Reported by a new user on IRC who tried to set hbase.rootdir to file:///~/hbase, the master gets stuck and cannot be killed. I tried something similar on my machine and it spins while logging: {quote} 2011-12-09 16:11:17,002 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to create version file at file:/bin/hbase, retrying: Mkdirs failed to create file:/bin/hbase 2011-12-09 16:11:27,002 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to create version file at file:/bin/hbase, retrying: Mkdirs failed to create file:/bin/hbase 2011-12-09 16:11:37,003 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to create version file at file:/bin/hbase, retrying: Mkdirs failed to create file:/bin/hbase {quote} The reason it cannot be stopped is that the master's main thread is stuck in there and will never be notified: {quote} Master:0;su-jdcryans-01.local,51116,1323475535684 prio=5 tid=7f92b7a3c000 nid=0x1137ba000 waiting on condition [1137b9000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.util.FSUtils.setVersion(FSUtils.java:297) at org.apache.hadoop.hbase.util.FSUtils.setVersion(FSUtils.java:268) at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:339) at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:128) at org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:113) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:435) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:314) at org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:218) at java.lang.Thread.run(Thread.java:680) {quote} It seems we should do a better handling of the exceptions we get in there, and die if we need to. It would make a better user experience. Maybe also do a check on hbase.rootdir before even starting the master. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210787#comment-13210787 ] chunhui shen commented on HBASE-5270: - {code} +// Remove regions in RIT, they are may being processed by the SSH. +synchronized (regionsInTransition) { + nodes.removeAll(regionsInTransition.keySet()); +} {code} Perhaps SSH has put up something in RIT because its done an assign and here we are blanket removing them all? Yes, SSH and master'initializing Thread may assign the same regions, so we should do a prevent of mutli assign. Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler - Key: HBASE-5270 URL: https://issues.apache.org/jira/browse/HBASE-5270 Project: HBase Issue Type: Sub-task Components: master Reporter: Zhihong Yu Fix For: 0.94.0, 0.92.1 Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 5270-90.patch, 5270-90v2.patch, 5270-testcase.patch, 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch, sampletest.txt This JIRA continues the effort from HBASE-5179. Starting with Stack's comments about patches for 0.92 and TRUNK: Reviewing 0.92v17 isDeadServerInProgress is a new public method in ServerManager but it does not seem to be used anywhere. Does isDeadRootServerInProgress need to be public? Ditto for meta version. This method param names are not right 'definitiveRootServer'; what is meant by definitive? Do they need this qualifier? Is there anything in place to stop us expiring a server twice if its carrying root and meta? What is difference between asking assignment manager isCarryingRoot and this variable that is passed in? Should be doc'd at least. Ditto for meta. I think I've asked for this a few times - onlineServers needs to be explained... either in javadoc or in comment. This is the param passed into joinCluster. How does it arise? I think I know but am unsure. God love the poor noob that comes awandering this code trying to make sense of it all. It looks like we get the list by trawling zk for regionserver znodes that have not checked in. Don't we do this operation earlier in master setup? Are we doing it again here? Though distributed split log is configured, we will do in master single process splitting under some conditions with this patch. Its not explained in code why we would do this. Why do we think master log splitting 'high priority' when it could very well be slower. Should we only go this route if distributed splitting is not going on. Do we know if concurrent distributed log splitting and master splitting works? Why would we have dead servers in progress here in master startup? Because a servershutdownhandler fired? This patch is different to the patch for 0.90. Should go into trunk first with tests, then 0.92. Should it be in this issue? This issue is really hard to follow now. Maybe this issue is for 0.90.x and new issue for more work on this trunk patch? This patch needs to have the v18 differences applied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210791#comment-13210791 ] chunhui shen commented on HBASE-5270: - {code}So, what happens if a server had root and meta and its not expired when we do failover? We'll expire it processing root. Will we expire it a second time processing meta? Perhaps the answer is no because the first expiration will clear the meta state in master? {code} I'm sorry I'm wrong for the upper comment. if a server had root and meta, it will be expired when processing root, and we will not expire it a second time processing meta because the following code (metaServerInfo == null) {code}+ HServerInfo metaServerInfo = this.serverManager + .getHServerInfo(metaServerAddress); + if (metaServerInfo != null) { +HServerLoad metaServerLoad = metaServerInfo.getLoad(); +if (metaServerLoad != null metaServerLoad.getNumberOfRegions() 0 + !catalogTracker.getRootLocation().equals(metaServerAddress)) { + // If metaServer is online not start just now, we expire it + this.serverManager.expireServer(metaServerInfo); +} + } {code} Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler - Key: HBASE-5270 URL: https://issues.apache.org/jira/browse/HBASE-5270 Project: HBase Issue Type: Sub-task Components: master Reporter: Zhihong Yu Fix For: 0.94.0, 0.92.1 Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 5270-90.patch, 5270-90v2.patch, 5270-testcase.patch, 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch, sampletest.txt This JIRA continues the effort from HBASE-5179. Starting with Stack's comments about patches for 0.92 and TRUNK: Reviewing 0.92v17 isDeadServerInProgress is a new public method in ServerManager but it does not seem to be used anywhere. Does isDeadRootServerInProgress need to be public? Ditto for meta version. This method param names are not right 'definitiveRootServer'; what is meant by definitive? Do they need this qualifier? Is there anything in place to stop us expiring a server twice if its carrying root and meta? What is difference between asking assignment manager isCarryingRoot and this variable that is passed in? Should be doc'd at least. Ditto for meta. I think I've asked for this a few times - onlineServers needs to be explained... either in javadoc or in comment. This is the param passed into joinCluster. How does it arise? I think I know but am unsure. God love the poor noob that comes awandering this code trying to make sense of it all. It looks like we get the list by trawling zk for regionserver znodes that have not checked in. Don't we do this operation earlier in master setup? Are we doing it again here? Though distributed split log is configured, we will do in master single process splitting under some conditions with this patch. Its not explained in code why we would do this. Why do we think master log splitting 'high priority' when it could very well be slower. Should we only go this route if distributed splitting is not going on. Do we know if concurrent distributed log splitting and master splitting works? Why would we have dead servers in progress here in master startup? Because a servershutdownhandler fired? This patch is different to the patch for 0.90. Should go into trunk first with tests, then 0.92. Should it be in this issue? This issue is really hard to follow now. Maybe this issue is for 0.90.x and new issue for more work on this trunk patch? This patch needs to have the v18 differences applied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210790#comment-13210790 ] chunhui shen commented on HBASE-5270: - {code}So, what happens if a server had root and meta and its not expired when we do failover? We'll expire it processing root. Will we expire it a second time processing meta? Perhaps the answer is no because the first expiration will clear the meta state in master? {code} I'm sorry I'm wrong for the upper comment. if a server had root and meta, it will be expired when processing root, and we will not expire it a second time processing meta because the following code (metaServerInfo == null) {code}+ HServerInfo metaServerInfo = this.serverManager + .getHServerInfo(metaServerAddress); + if (metaServerInfo != null) { +HServerLoad metaServerLoad = metaServerInfo.getLoad(); +if (metaServerLoad != null metaServerLoad.getNumberOfRegions() 0 + !catalogTracker.getRootLocation().equals(metaServerAddress)) { + // If metaServer is online not start just now, we expire it + this.serverManager.expireServer(metaServerInfo); +} + } {code} Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler - Key: HBASE-5270 URL: https://issues.apache.org/jira/browse/HBASE-5270 Project: HBase Issue Type: Sub-task Components: master Reporter: Zhihong Yu Fix For: 0.94.0, 0.92.1 Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 5270-90.patch, 5270-90v2.patch, 5270-testcase.patch, 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch, sampletest.txt This JIRA continues the effort from HBASE-5179. Starting with Stack's comments about patches for 0.92 and TRUNK: Reviewing 0.92v17 isDeadServerInProgress is a new public method in ServerManager but it does not seem to be used anywhere. Does isDeadRootServerInProgress need to be public? Ditto for meta version. This method param names are not right 'definitiveRootServer'; what is meant by definitive? Do they need this qualifier? Is there anything in place to stop us expiring a server twice if its carrying root and meta? What is difference between asking assignment manager isCarryingRoot and this variable that is passed in? Should be doc'd at least. Ditto for meta. I think I've asked for this a few times - onlineServers needs to be explained... either in javadoc or in comment. This is the param passed into joinCluster. How does it arise? I think I know but am unsure. God love the poor noob that comes awandering this code trying to make sense of it all. It looks like we get the list by trawling zk for regionserver znodes that have not checked in. Don't we do this operation earlier in master setup? Are we doing it again here? Though distributed split log is configured, we will do in master single process splitting under some conditions with this patch. Its not explained in code why we would do this. Why do we think master log splitting 'high priority' when it could very well be slower. Should we only go this route if distributed splitting is not going on. Do we know if concurrent distributed log splitting and master splitting works? Why would we have dead servers in progress here in master startup? Because a servershutdownhandler fired? This patch is different to the patch for 0.90. Should go into trunk first with tests, then 0.92. Should it be in this issue? This issue is really hard to follow now. Maybe this issue is for 0.90.x and new issue for more work on this trunk patch? This patch needs to have the v18 differences applied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5075) regionserver crashed and failover
[ https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210793#comment-13210793 ] zhiyuan.dai commented on HBASE-5075: @Lars Hofhansl I am sorry for that i have reformated the existing code,i will do another patch. There are two versions of this work.We have implemented version 1 that can't check machine dying but can check process crashed.The next version will realize all about it. thanks for your reply. regionserver crashed and failover - Key: HBASE-5075 URL: https://issues.apache.org/jira/browse/HBASE-5075 Project: HBase Issue Type: Improvement Components: monitoring, regionserver, replication, zookeeper Affects Versions: 0.92.1 Reporter: zhiyuan.dai Fix For: 0.90.5 Attachments: 5075.patch regionserver crashed,it is too long time to notify hmaster.when hmaster know regionserver's shutdown,it is long time to fetch the hlog's lease. hbase is a online db, availability is very important. i have a idea to improve availability, monitor node to check regionserver's pid.if this pid not exsits,i think the rs down,i will delete the znode,and force close the hlog file. so the period maybe 100ms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5075) regionserver crashed and failover
[ https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210796#comment-13210796 ] zhiyuan.dai commented on HBASE-5075: @stack I am sorry for that i have reformated the existing code,i will do another patch. Thanks for the designing points you've mentioned. I am handling the design documents which include the answers to your questions and i will upload them as soon as possible. regionserver crashed and failover - Key: HBASE-5075 URL: https://issues.apache.org/jira/browse/HBASE-5075 Project: HBase Issue Type: Improvement Components: monitoring, regionserver, replication, zookeeper Affects Versions: 0.92.1 Reporter: zhiyuan.dai Fix For: 0.90.5 Attachments: 5075.patch regionserver crashed,it is too long time to notify hmaster.when hmaster know regionserver's shutdown,it is long time to fetch the hlog's lease. hbase is a online db, availability is very important. i have a idea to improve availability, monitor node to check regionserver's pid.if this pid not exsits,i think the rs down,i will delete the znode,and force close the hlog file. so the period maybe 100ms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5424) HTable meet NPE when call getRegionInfo()
[ https://issues.apache.org/jira/browse/HBASE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210797#comment-13210797 ] zhiyuan.dai commented on HBASE-5424: @stack @Hadoop QA i will improve, and do another patch which included test ut. HTable meet NPE when call getRegionInfo() - Key: HBASE-5424 URL: https://issues.apache.org/jira/browse/HBASE-5424 Project: HBase Issue Type: Bug Affects Versions: 0.90.1, 0.90.5 Reporter: junhua yang Attachments: HBASE-5424.patch Original Estimate: 48h Remaining Estimate: 48h We meet NPE when call getRegionInfo() in testing environment. Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75) at org.apache.hadoop.hbase.util.Writables.getHRegionInfo(Writables.java:119) at org.apache.hadoop.hbase.client.HTable$2.processRow(HTable.java:395) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:190) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:73) at org.apache.hadoop.hbase.client.HTable.getRegionsInfo(HTable.java:418) This NPE also make the table.jsp can't show the region information of this table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5424) HTable meet NPE when call getRegionInfo()
[ https://issues.apache.org/jira/browse/HBASE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210799#comment-13210799 ] Lars Hofhansl commented on HBASE-5424: -- @junhua: Under which circumstances do you see this NPE? Seems strange that we have not encountered that before. Are you sure this is not a case of different server and client versions? HTable meet NPE when call getRegionInfo() - Key: HBASE-5424 URL: https://issues.apache.org/jira/browse/HBASE-5424 Project: HBase Issue Type: Bug Affects Versions: 0.90.1, 0.90.5 Reporter: junhua yang Attachments: HBASE-5424.patch Original Estimate: 48h Remaining Estimate: 48h We meet NPE when call getRegionInfo() in testing environment. Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75) at org.apache.hadoop.hbase.util.Writables.getHRegionInfo(Writables.java:119) at org.apache.hadoop.hbase.client.HTable$2.processRow(HTable.java:395) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:190) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:73) at org.apache.hadoop.hbase.client.HTable.getRegionsInfo(HTable.java:418) This NPE also make the table.jsp can't show the region information of this table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5294) Make sure javadoc is included in tarball bundle when we release
[ https://issues.apache.org/jira/browse/HBASE-5294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210803#comment-13210803 ] Hudson commented on HBASE-5294: --- Integrated in HBase-0.92 #290 (See [https://builds.apache.org/job/HBase-0.92/290/]) HBASE-5294 Make sure javadoc is included in tarball bundle when we release (Revision 1245826) Result = SUCCESS stack : Files : * /hbase/branches/0.92/pom.xml Make sure javadoc is included in tarball bundle when we release --- Key: HBASE-5294 URL: https://issues.apache.org/jira/browse/HBASE-5294 Project: HBase Issue Type: Task Affects Versions: 0.92.0 Reporter: stack Assignee: Shaneal Manek Priority: Critical Fix For: 0.94.0, 0.92.1 Attachments: hbase-5294.patch 0.92.0 doesn't have javadoc in the tarball. Fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent
[ https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210804#comment-13210804 ] ramkrishna.s.vasudevan commented on HBASE-5200: --- @Stack yes the closing node is created by master now. As I had mentioned in my previous comments in 0.90 the closing node if created by RS then on master failover first we set watch on list children on unassigned node. So RS creates the node just after setting children watch we will start getting callback which will be missed. If we make only the master to create nodes then thisproblem Can be avoided. AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent - Key: HBASE-5200 URL: https://issues.apache.org/jira/browse/HBASE-5200 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: 5200-test.txt, 5200-v2.txt, 5200-v3.txt, 5200-v4.txt, 5200-v4no-prefix.txt, HBASE-5200.patch, HBASE-5200_1.patch, HBASE-5200_trunk_latest_with_test_2.patch, TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, hbase-5200_90_latest.patch, hbase-5200_90_latest_new.patch This is the scenario Consider a case where the balancer is going on thus trying to close regions in a RS. Before we could close a master switch happens. On Master switch the set of nodes that are in RIT is collected and we first get Data and start watching the node After that the node data is added into RIT. Now by this time (before adding to RIT) if the RS to which close was called does a transition in AM.handleRegion() we miss the handling saying RIT state was null. {code} 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region a66d281d231dfcaea97c270698b26b6f from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region c12e53bfd48ddc5eec507d66821c4d23 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 59ae13de8c1eb325a0dd51f4902d2052 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region f45bc9614d7575f35244849af85aa078 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region cc3ecd7054fe6cd4a1159ed92fd62641 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 3af40478a17fee96b4a192b22c90d5a2 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region e6096a8466e730463e10d3d61f809b92 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 4806781a1a23066f7baed22b4d237e24 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region d69e104131accaefe21dcc01fddc7629 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states {code} In branch the CLOSING node is created by RS thus leading to more inconsistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5162) Basic client pushback mechanism
[ https://issues.apache.org/jira/browse/HBASE-5162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210809#comment-13210809 ] jirapos...@reviews.apache.org commented on HBASE-5162: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3930/#review5209 --- src/main/java/org/apache/hadoop/hbase/client/HTable.java https://reviews.apache.org/r/3930/#comment11397 I find this a bit dubious. This won't actually slow the client thread down, but just accumulate more data and reduce the number of RPCs. In the end it might lead to more load on the server, because we can deliver more puts as with fewer but larger batches. I'd rather just rely on the server sleeping the thread for a bit (as you do later). src/main/java/org/apache/hadoop/hbase/client/HTable.java https://reviews.apache.org/r/3930/#comment11399 What if the flusher is not null? Should we re-calculate the wait time? src/main/java/org/apache/hadoop/hbase/client/HTable.java https://reviews.apache.org/r/3930/#comment11400 If sleepTime is 0 (for example from NoServerBackoffPolicy), we should probably not create the thread and flush right here. (But as I said in the comment above, I'd probably not bother with this extra thread to begin with :) ) src/main/java/org/apache/hadoop/hbase/client/HTable.java https://reviews.apache.org/r/3930/#comment11401 Was this a problem before? Or only now becaue of the background thread? src/main/java/org/apache/hadoop/hbase/client/HTable.java https://reviews.apache.org/r/3930/#comment11398 This will break backwards compatibility, right? (Not saying that's not ok, just calling it out) I'd almost rather have the client not know about this, until we reach a bad spot (in which case we can throw back retryable exceptions). src/main/java/org/apache/hadoop/hbase/regionserver/MemstorePressureMonitor.java https://reviews.apache.org/r/3930/#comment11402 Ah ok, this is where we gracefully delay the server thread a bit. Seems this would need to be tweaked carefully to make it effective while not slowing normal operations. Should the serverPauseTime be somehow related to the amount of pressure. I.e. wait a bit more if the pressure is higher? Maybe the pausetime calculation should be part of the pluggable policy? Also in the jira there was some discussion about throwing (presumably retryable) exceptions back to the client is the pressure gets too high. That would slow the client, without consuming server resources (beyond multiple requests). src/main/java/org/apache/hadoop/hbase/regionserver/StoreUtils.java https://reviews.apache.org/r/3930/#comment11403 General comment: Where are we on putting/documenting these things in hbase-defaults.xml? - Lars On 2012-02-16 20:45:50, Jesse Yates wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/3930/ bq. --- bq. bq. (Updated 2012-02-16 20:45:50) bq. bq. bq. Review request for hbase, Michael Stack, Jean-Daniel Cryans, and Lars Hofhansl. bq. bq. bq. Summary bq. --- bq. bq. Under heavy write load, HBase will create a saw-tooth pattern in accepting writes. This is due to the I/O in minor compactions not being able to keep up with the write load. Specifically, the memstore is attempting to flush while we are attempting to do a minor compaction, leading to blocking _all_ writes. Instead, we need to have the option of graceful degradation mechanism. bq. bq. This patch supports both a short-term,adjustable server-side write blocking as well as client-side back-off to help alleviate temporary memstore pressure. bq. bq. bq. This addresses bug HBASE-5162. bq. https://issues.apache.org/jira/browse/HBASE-5162 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 bq.src/main/java/org/apache/hadoop/hbase/client/BackoffPolicy.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/client/HTable.java 57605e6 bq.src/main/java/org/apache/hadoop/hbase/client/MonitoredResult.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/client/NoServerBackoffPolicy.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 25cb31d bq.src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 7d7be3c bq. src/main/java/org/apache/hadoop/hbase/regionserver/MemstorePressureMonitor.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/OperationStatus.java 1b94ab5 bq.
[jira] [Created] (HBASE-5431) Improve delete marker handling in Import M/R jobs
Improve delete marker handling in Import M/R jobs - Key: HBASE-5431 URL: https://issues.apache.org/jira/browse/HBASE-5431 Project: HBase Issue Type: Sub-task Components: mapreduce Affects Versions: 0.94.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.94.0 Import currently create a new Delete object for each delete KV found in a result object. This can be improved with the new Delete API that allows adding a delete KV to a Delete object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5431) Improve delete marker handling in Import M/R jobs
[ https://issues.apache.org/jira/browse/HBASE-5431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5431: - Attachment: 5431.txt Simple patch. The removed Delete constructor was added in 0.94 just for this case, so it's safe to remove it now. Improve delete marker handling in Import M/R jobs - Key: HBASE-5431 URL: https://issues.apache.org/jira/browse/HBASE-5431 Project: HBase Issue Type: Sub-task Components: mapreduce Affects Versions: 0.94.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.94.0 Attachments: 5431.txt Import currently create a new Delete object for each delete KV found in a result object. This can be improved with the new Delete API that allows adding a delete KV to a Delete object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5431) Improve delete marker handling in Import M/R jobs
[ https://issues.apache.org/jira/browse/HBASE-5431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5431: - Status: Patch Available (was: Open) Improve delete marker handling in Import M/R jobs - Key: HBASE-5431 URL: https://issues.apache.org/jira/browse/HBASE-5431 Project: HBase Issue Type: Sub-task Components: mapreduce Affects Versions: 0.94.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.94.0 Attachments: 5431.txt Import currently create a new Delete object for each delete KV found in a result object. This can be improved with the new Delete API that allows adding a delete KV to a Delete object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5255) Use singletons for OperationStatus to save memory
[ https://issues.apache.org/jira/browse/HBASE-5255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210817#comment-13210817 ] Benoit Sigoure commented on HBASE-5255: --- A hunk was missed when this patch got merged in the 0.92 branch: {code} --- a/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java @@ -1862,8 +1862,7 @@ public class HRegion implements HeapSize { // , Writable{ continue; } addedSize += applyFamilyMapToMemstore(familyMaps[i]); -batchOp.retCodeDetails[i] = new OperationStatus( -OperationStatusCode.SUCCESS); +batchOp.retCodeDetails[i] = OperationStatus.SUCCESS; } // {code} Do you want to file another JIRA about it? Use singletons for OperationStatus to save memory - Key: HBASE-5255 URL: https://issues.apache.org/jira/browse/HBASE-5255 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.90.5, 0.92.0 Reporter: Benoit Sigoure Assignee: Benoit Sigoure Priority: Minor Labels: performance Fix For: 0.94.0, 0.92.1 Attachments: 5255-92.txt, 5255-v2.txt, HBASE-5255-0.92-Use-singletons-to-remove-unnecessary-memory-allocati.patch, HBASE-5255-trunk-Use-singletons-to-remove-unnecessary-memory-allocati.patch Every single {{Put}} causes the allocation of at least one {{OperationStatus}}, yet {{OperationStatus}} is almost always stateless, so these allocations are unnecessary and could be avoided. Attached patch adds a few singletons and uses them, with no public API change. I didn't test the patches, but you get the idea. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5209) HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup
[ https://issues.apache.org/jira/browse/HBASE-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210818#comment-13210818 ] David S. Wang commented on HBASE-5209: -- Stack, Thanks for the review. I am a bit confused about your previous comment though: I do not update HMasterInterface in my latest patch, nor do I change anything related to isActiveMaster. The only version that I update is VERSION for ClusterStatus, and I already add the new fields to the end with my current patch. I think perhaps that is what you are referring to. I did two tests with hbase hbck -details to test ClusterStatus: 1. I tested old client (top of trunk 0.92) with new server (0.92 with my patch but without bumping VERSION), and things worked fine. 2. I tested new client (0.92 with my patch but without bumping VERSION), with old server (top of trunk 0.92), and got the following error. I'm thinking because the new client expects the new fields I added that the old server never sends. Is this OK behavior? INFO zookeeper.ClientCnxn: Session establishment complete on server haus02.sf.cloudera.com/172.29.5.33:30181, sessionid = 0x1358ef9f91b000b, negotiated timeout = 5000 [... pauses for some time ...] 12/02/17 15:51:46 ERROR io.HbaseObjectWritable: Error in readFields java.net.SocketTimeoutException: 6 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.29.5.33:41223 remote=haus04.sf.cloudera.com/172.29.5.35:31000] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) at java.io.FilterInputStream.read(FilterInputStream.java:116) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:311) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at java.io.DataInputStream.readByte(DataInputStream.java:248) at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:299) at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:320) at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:146) at org.apache.hadoop.hbase.ClusterStatus.readFields(ClusterStatus.java:334) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:647) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:311) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:593) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:50 HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup Key: HBASE-5209 URL: https://issues.apache.org/jira/browse/HBASE-5209 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.94.0, 0.90.5, 0.92.0 Reporter: Aditya Acharya Assignee: David S. Wang Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: HBASE-5209-v0.diff, HBASE-5209-v1.diff I have a multi-master HBase set up, and I'm trying to programmatically determine which of the masters is currently active. But the API does not allow me to do this. There is a getMaster() method in the HConnection class, but it returns an HMasterInterface, whose methods do not allow me to find out which master won the last race. The API should have a getActiveMasterHostname() or something to that effect. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5428) Allow for custom filters to be registered within the Thrift interface
[ https://issues.apache.org/jira/browse/HBASE-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210820#comment-13210820 ] Hudson commented on HBASE-5428: --- Integrated in HBase-TRUNK-security #114 (See [https://builds.apache.org/job/HBase-TRUNK-security/114/]) HBASE-5428 Allow for custom filters to be registered within the Thrift interface (Revision 1245774) Result = FAILURE stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/filter/ParseFilter.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift/ThriftServerRunner.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/filter/TestParseFilter.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServer.java Allow for custom filters to be registered within the Thrift interface - Key: HBASE-5428 URL: https://issues.apache.org/jira/browse/HBASE-5428 Project: HBase Issue Type: Improvement Components: thrift Affects Versions: 0.92.0 Reporter: Robert Roland Labels: patch Fix For: 0.94.0 Attachments: ThriftCustomFilters.patch Custom filters work within the Java client API, but are not accessible within the Thrift API. Attempting to use one will generate a Filter Name x not supported Attached patch allows a user to specify a list of custom filters that are registered at Thrift server startup time within the HBase configuration files: property namehbase.thrift.filters/name valueMyFilter:com.foo.Filter,OtherFilter:com.foo.OtherFilter/value /property Patch created off SVN r1245727 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5427) Upgrade our zk to 3.4.3
[ https://issues.apache.org/jira/browse/HBASE-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210821#comment-13210821 ] Hudson commented on HBASE-5427: --- Integrated in HBase-TRUNK-security #114 (See [https://builds.apache.org/job/HBase-TRUNK-security/114/]) HBASE-5427 Upgrade our zk to 3.4.3 (Revision 1245759) Result = FAILURE stack : Files : * /hbase/trunk/pom.xml Upgrade our zk to 3.4.3 --- Key: HBASE-5427 URL: https://issues.apache.org/jira/browse/HBASE-5427 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Fix For: 0.94.0, 0.92.1 Attachments: 5427.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5294) Make sure javadoc is included in tarball bundle when we release
[ https://issues.apache.org/jira/browse/HBASE-5294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210823#comment-13210823 ] Hudson commented on HBASE-5294: --- Integrated in HBase-TRUNK-security #114 (See [https://builds.apache.org/job/HBase-TRUNK-security/114/]) HBASE-5294 Make sure javadoc is included in tarball bundle when we release (Revision 1245827) Result = FAILURE stack : Files : * /hbase/trunk/pom.xml * /hbase/trunk/src/docbkx/developer.xml Make sure javadoc is included in tarball bundle when we release --- Key: HBASE-5294 URL: https://issues.apache.org/jira/browse/HBASE-5294 Project: HBase Issue Type: Task Affects Versions: 0.92.0 Reporter: stack Assignee: Shaneal Manek Priority: Critical Fix For: 0.94.0, 0.92.1 Attachments: hbase-5294.patch 0.92.0 doesn't have javadoc in the tarball. Fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5120) Timeout monitor races with table disable handler
[ https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210822#comment-13210822 ] Hudson commented on HBASE-5120: --- Integrated in HBase-TRUNK-security #114 (See [https://builds.apache.org/job/HBase-TRUNK-security/114/]) HBASE-5120 Timeout monitor races with table disable handler (Revision 1245731) Result = FAILURE stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java Timeout monitor races with table disable handler Key: HBASE-5120 URL: https://issues.apache.org/jira/browse/HBASE-5120 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Zhihong Yu Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.94.0 Attachments: HBASE-5120.patch, HBASE-5120_1.patch, HBASE-5120_2.patch, HBASE-5120_3.patch, HBASE-5120_4.patch, HBASE-5120_5.patch, HBASE-5120_5.patch Here is what J-D described here: https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176 I think I will retract from my statement that it used to be extremely racy and caused more troubles than it fixed, on my first test I got a stuck region in transition instead of being able to recover. The timeout was set to 2 minutes to be sure I hit it. First the region gets closed {quote} 2012-01-04 00:16:25,811 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to sv4r5s38,62023,1325635980913 for region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. {quote} 2 minutes later it times out: {quote} 2012-01-04 00:18:30,026 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. state=PENDING_CLOSE, ts=1325636185810, server=null 2012-01-04 00:18:30,026 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_CLOSE for too long, running forced unassign again on region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 2012-01-04 00:18:30,027 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. (offlining) {quote} 100ms later the master finally gets the event: {quote} 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for 1a4b111bcc228043e89f59c4c3f6a791 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so deleting ZK node and removing from regions in transition, skipping assignment of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x134589d3db03587 Deleting existing unassigned node for 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x134589d3db03587 Successfully deleted unassigned node for region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED {quote} At this point everything is fine, the region was processed as closed. But wait, remember that line where it said it was going to force an unassign? {quote} 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x134589d3db03587 Creating unassigned node for 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state 2012-01-04 00:18:30,328 INFO org.apache.hadoop.hbase.master.AssignmentManager: Server null returned java.lang.NullPointerException: Passed server is null for 1a4b111bcc228043e89f59c4c3f6a791 {quote} Now the master is confused, it recreated the RIT znode but the region doesn't even exist anymore. It even tries to shut it down but is blocked by NPEs. Now this is what's going on. The late ZK notification that the znode was deleted (but it got recreated after): {quote} 2012-01-04 00:19:33,285 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: The znode of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been deleted. {quote} Then it prints this, and much later tries to unassign it again: {quote} 2012-01-04 00:19:46,607 DEBUG org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on region to clear
[jira] [Commented] (HBASE-5393) Consider splitting after flushing
[ https://issues.apache.org/jira/browse/HBASE-5393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210819#comment-13210819 ] Hudson commented on HBASE-5393: --- Integrated in HBase-TRUNK-security #114 (See [https://builds.apache.org/job/HBase-TRUNK-security/114/]) HBASE-5393 Consider splitting after flushing (Revision 1245727) Result = FAILURE jdcryans : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java Consider splitting after flushing - Key: HBASE-5393 URL: https://issues.apache.org/jira/browse/HBASE-5393 Project: HBase Issue Type: Improvement Affects Versions: 0.90.5 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Fix For: 0.94.0, 0.92.1 Attachments: HBASE-2375-flush-split.patch Spawning this from HBASE-2375, I saw that it was much more efficient compaction-wise to check if we can split right after flushing. Much like the ideas that Jon spelled out in the description of that jira, the window is smaller because you don't have to compact and then split right away to only compact again when the daughters open. Another thing it improves is while we're normally waiting for the compaction to happen, data that's still coming in will make us go way past the MAX_FILESIZE to a point where for the first region I was seeing a store size 3-4x bigger before it was able to split. I targeted this for 0.94, but I'd like to get this into 0.92.1 or .2 too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4640) Catch ClosedChannelException and document it
[ https://issues.apache.org/jira/browse/HBASE-4640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210825#comment-13210825 ] Hudson commented on HBASE-4640: --- Integrated in HBase-TRUNK-security #114 (See [https://builds.apache.org/job/HBase-TRUNK-security/114/]) HBASE-4640 Catch ClosedChannelException and document it (Revision 1245730) Result = FAILURE jdcryans : Files : * /hbase/trunk/src/docbkx/troubleshooting.xml * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java Catch ClosedChannelException and document it Key: HBASE-4640 URL: https://issues.apache.org/jira/browse/HBASE-4640 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Priority: Minor Fix For: 0.94.0 Attachments: HBASE-4640.patch ClosedChannelException is a pretty obscure exception for the non-expert and doesn't tell you why you get it. We should instead catch it, print a WARN, don't print a stack trace, and add a line in the book about this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5420) TestImportTsv does not shut down MR Cluster correctly (fails against 0.23 hadoop)
[ https://issues.apache.org/jira/browse/HBASE-5420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210827#comment-13210827 ] Hudson commented on HBASE-5420: --- Integrated in HBase-TRUNK-security #114 (See [https://builds.apache.org/job/HBase-TRUNK-security/114/]) HBASE-5420 TestImportTsv does not shut down MR Cluster correctly (fails against 0.23 hadoop) (Revision 1245795) Result = FAILURE stack : Files : * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestImportTsv.java TestImportTsv does not shut down MR Cluster correctly (fails against 0.23 hadoop) - Key: HBASE-5420 URL: https://issues.apache.org/jira/browse/HBASE-5420 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0, 0.92.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Fix For: 0.94.0, 0.92.1 Attachments: HBASE-5420-v1.patch, HBASE-5420.patch Test calls startMiniMapReduceCluster() but never calls shutdownMiniMapReduceCluster(). This causes failures with -Dhadoop.profile=23 when both testMROnTable and testMROnTableWithCustomMapper are run, because the cluster cannot start up properly for the second test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5425) Punt on the timeout doesn't work in BulkEnabler#waitUntilDone (master's EnableTableHandler)
[ https://issues.apache.org/jira/browse/HBASE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210830#comment-13210830 ] Hudson commented on HBASE-5425: --- Integrated in HBase-TRUNK-security #114 (See [https://builds.apache.org/job/HBase-TRUNK-security/114/]) HBASE-5425 Punt on the timeout doesn't work in BulkEnabler#waitUntilDone (master's EnableTableHandler) (Revision 1245674) Result = FAILURE stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java Punt on the timeout doesn't work in BulkEnabler#waitUntilDone (master's EnableTableHandler) Key: HBASE-5425 URL: https://issues.apache.org/jira/browse/HBASE-5425 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.5, 0.92.0 Reporter: terry zhang Fix For: 0.94.0 Attachments: HBASE-5425.patch please take a look at the code below in EnableTableHandler(hbase master): {code:title=EnableTableHandler.java|borderStyle=solid} protected boolean waitUntilDone(long timeout) throws InterruptedException { . int lastNumberOfRegions = this.countOfRegionsInTable; while (!server.isStopped() remaining 0) { Thread.sleep(waitingTimeForEvents); regions = assignmentManager.getRegionsOfTable(tableName); if (isDone(regions)) break; // Punt on the timeout as long we make progress if (regions.size() lastNumberOfRegions) { lastNumberOfRegions = regions.size(); timeout += waitingTimeForEvents; } remaining = timeout - (System.currentTimeMillis() - startTime); } private boolean isDone(final ListHRegionInfo regions) { return regions != null regions.size() = this.countOfRegionsInTable; } {code} We can easily find out if we let lastNumberOfRegions = this.countOfRegionsInTable , the function of punt on timeout code will never be executed. I think initlize lastNumberOfRegions = 0 can make it work. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5421) use hadoop-client/hadoop-minicluster artifacts for Hadoop 0.23 build
[ https://issues.apache.org/jira/browse/HBASE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210829#comment-13210829 ] Hudson commented on HBASE-5421: --- Integrated in HBase-TRUNK-security #114 (See [https://builds.apache.org/job/HBase-TRUNK-security/114/]) HBASE-5421 use hadoop-client/hadoop-minicluster artifacts for Hadoop 0.23 build (Revision 1245743) Result = FAILURE stack : Files : * /hbase/trunk/pom.xml use hadoop-client/hadoop-minicluster artifacts for Hadoop 0.23 build Key: HBASE-5421 URL: https://issues.apache.org/jira/browse/HBASE-5421 Project: HBase Issue Type: Improvement Components: build Affects Versions: 0.92.0 Reporter: Shaneal Manek Assignee: Shaneal Manek Priority: Minor Labels: build Fix For: 0.94.0, 0.92.1 Attachments: hbase-5421.patch Hadoop recently added hadoop-client and hadoop-minicluster artifacts for Hadoop 0.23+ that don't export all the internal dependencies (HADOOP-8009). Let's use them instead of manually specifying transitive dependency exclusion lists (which is error prone and annoying). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5195) [Coprocessors] preGet hook does not allow overriding or wrapping filter on incoming Get
[ https://issues.apache.org/jira/browse/HBASE-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210828#comment-13210828 ] Hudson commented on HBASE-5195: --- Integrated in HBase-TRUNK-security #114 (See [https://builds.apache.org/job/HBase-TRUNK-security/114/]) HBASE-5195 [Coprocessors] preGet hook does not allow overriding or wrapping filter on incoming Get -- SECOND HALF OF THIS COMMIT (Revision 1245773) Result = FAILURE stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java [Coprocessors] preGet hook does not allow overriding or wrapping filter on incoming Get --- Key: HBASE-5195 URL: https://issues.apache.org/jira/browse/HBASE-5195 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.92.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.94.0, 0.92.1 Attachments: HBASE-5195.patch Without the ability to wrap the internal Scan on the Get, we can't override (or protect, in the case of access control) Gets as we can Scans. The result is inconsistent behavior. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3584) Allow atomic put/delete in one call
[ https://issues.apache.org/jira/browse/HBASE-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210826#comment-13210826 ] Hudson commented on HBASE-3584: --- Integrated in HBase-TRUNK-security #114 (See [https://builds.apache.org/job/HBase-TRUNK-security/114/]) HBASE-3584 Rename RowMutation to RowMutations (Revision 1245792) Result = FAILURE stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/RowMutation.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/RowMutations.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java Allow atomic put/delete in one call --- Key: HBASE-3584 URL: https://issues.apache.org/jira/browse/HBASE-3584 Project: HBase Issue Type: New Feature Components: client, coprocessors, regionserver Reporter: ryan rawson Assignee: Lars Hofhansl Fix For: 0.94.0 Attachments: 3584-final.txt, 3584-v1.txt, 3584-v3.txt Right now we have the following calls: put(Put) delete(Delete) increment(Increments) But we cannot combine all of the above in a single call, complete with a single row lock. It would be nice to do that. It would also allow us to do a CAS where we could do a put/increment if the check succeeded. - Amendment: Since Increment does not currently support MVCC it cannot be included in an atomic operation. So this for Put and Delete only. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5279) NPE in Master after upgrading to 0.92.0
[ https://issues.apache.org/jira/browse/HBASE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210824#comment-13210824 ] Hudson commented on HBASE-5279: --- Integrated in HBase-TRUNK-security #114 (See [https://builds.apache.org/job/HBase-TRUNK-security/114/]) HBASE-5279 NPE in Master after upgrading to 0.92.0 -- REVERT OVERCOMMIT TO HREGION (Revision 1245768) HBASE-5279 NPE in Master after upgrading to 0.92.0 (Revision 1245767) Result = FAILURE stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java NPE in Master after upgrading to 0.92.0 --- Key: HBASE-5279 URL: https://issues.apache.org/jira/browse/HBASE-5279 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.0 Reporter: Tobias Herbert Priority: Critical Fix For: 0.92.1 Attachments: HBASE-5279-v2.patch, HBASE-5279.patch I have upgraded my environment from 0.90.4 to 0.92.0 after the table migration I get the following error in the master (permanent) {noformat} 2012-01-25 18:23:48,648 FATAL master-namenode,6,1327512209588 org.apache.hadoop.hbase.master.HMaster - Unhandled exception. Starting shutdown. java.lang.NullPointerException at org.apache.hadoop.hbase.master.AssignmentManager.rebuildUserRegions(AssignmentManager.java:2190) at org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:323) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:501) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:326) at java.lang.Thread.run(Thread.java:662) 2012-01-25 18:23:48,650 INFO namenode,6,1327512209588 org.apache.hadoop.hbase.master.HMaster - Aborting {noformat} I think that's because I had a hard crash in the cluster a while ago - and the following WARN since then {noformat} 2012-01-25 21:20:47,121 WARN namenode,6,1327513078123-CatalogJanitor org.apache.hadoop.hbase.master.CatalogJanitor - REGIONINFO_QUALIFIER is empty in keyvalues={emails,,xxx./info:server/1314336400471/Put/vlen=38, emails,,1314189353300.xxx./info:serverstartcode/1314336400471/Put/vlen=8} {noformat} my patch was simple to go around the NPE (as the other code around the lines) but I don't know if that's correct -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5431) Improve delete marker handling in Import M/R jobs
[ https://issues.apache.org/jira/browse/HBASE-5431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210832#comment-13210832 ] Hadoop QA commented on HBASE-5431: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515067/5431.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 158 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/987//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/987//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/987//console This message is automatically generated. Improve delete marker handling in Import M/R jobs - Key: HBASE-5431 URL: https://issues.apache.org/jira/browse/HBASE-5431 Project: HBase Issue Type: Sub-task Components: mapreduce Affects Versions: 0.94.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.94.0 Attachments: 5431.txt Import currently create a new Delete object for each delete KV found in a result object. This can be improved with the new Delete API that allows adding a delete KV to a Delete object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5423) Regionserver may block forever on waitOnAllRegionsToClose when aborting
[ https://issues.apache.org/jira/browse/HBASE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210835#comment-13210835 ] ramkrishna.s.vasudevan commented on HBASE-5423: --- @Chunhui Patch makes sense. Using a set is also fine with me. +1 on patch except if there is any name change for the name suggested by Stack. Regionserver may block forever on waitOnAllRegionsToClose when aborting --- Key: HBASE-5423 URL: https://issues.apache.org/jira/browse/HBASE-5423 Project: HBase Issue Type: Bug Components: regionserver Reporter: chunhui shen Assignee: chunhui shen Attachments: hbase-5423.patch If closeRegion throws any exception (It would be caused by FS ) when RS is aborting, RS will block forever on waitOnAllRegionsToClose(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5424) HTable meet NPE when call getRegionInfo()
[ https://issues.apache.org/jira/browse/HBASE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210838#comment-13210838 ] zhiyuan.dai commented on HBASE-5424: @Lars Hofhansl sure,meta may have some problem, we found the bug in 0.90.x(0.90.1 0.90.5) HTable meet NPE when call getRegionInfo() - Key: HBASE-5424 URL: https://issues.apache.org/jira/browse/HBASE-5424 Project: HBase Issue Type: Bug Affects Versions: 0.90.1, 0.90.5 Reporter: junhua yang Attachments: HBASE-5424.patch Original Estimate: 48h Remaining Estimate: 48h We meet NPE when call getRegionInfo() in testing environment. Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75) at org.apache.hadoop.hbase.util.Writables.getHRegionInfo(Writables.java:119) at org.apache.hadoop.hbase.client.HTable$2.processRow(HTable.java:395) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:190) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:73) at org.apache.hadoop.hbase.client.HTable.getRegionsInfo(HTable.java:418) This NPE also make the table.jsp can't show the region information of this table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5424) HTable meet NPE when call getRegionInfo()
[ https://issues.apache.org/jira/browse/HBASE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210845#comment-13210845 ] Lars Hofhansl commented on HBASE-5424: -- That zhiyuan. I'm wondering whether we shouldn't focus on fixing the problem that caused the problem rather than pasting over it. HTable meet NPE when call getRegionInfo() - Key: HBASE-5424 URL: https://issues.apache.org/jira/browse/HBASE-5424 Project: HBase Issue Type: Bug Affects Versions: 0.90.1, 0.90.5 Reporter: junhua yang Attachments: HBASE-5424.patch Original Estimate: 48h Remaining Estimate: 48h We meet NPE when call getRegionInfo() in testing environment. Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75) at org.apache.hadoop.hbase.util.Writables.getHRegionInfo(Writables.java:119) at org.apache.hadoop.hbase.client.HTable$2.processRow(HTable.java:395) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:190) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:73) at org.apache.hadoop.hbase.client.HTable.getRegionsInfo(HTable.java:418) This NPE also make the table.jsp can't show the region information of this table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent
[ https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210851#comment-13210851 ] ramkrishna.s.vasudevan commented on HBASE-5200: --- @Stack and @Ted I suggest we commit this to 0.92 and trunk. The creating of closing node is created in HBASE-3789 and even there the 0.90 patch was left uncommitted as it may affect rolling restarts. AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent - Key: HBASE-5200 URL: https://issues.apache.org/jira/browse/HBASE-5200 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: 5200-test.txt, 5200-v2.txt, 5200-v3.txt, 5200-v4.txt, 5200-v4no-prefix.txt, HBASE-5200.patch, HBASE-5200_1.patch, HBASE-5200_trunk_latest_with_test_2.patch, TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, hbase-5200_90_latest.patch, hbase-5200_90_latest_new.patch This is the scenario Consider a case where the balancer is going on thus trying to close regions in a RS. Before we could close a master switch happens. On Master switch the set of nodes that are in RIT is collected and we first get Data and start watching the node After that the node data is added into RIT. Now by this time (before adding to RIT) if the RS to which close was called does a transition in AM.handleRegion() we miss the handling saying RIT state was null. {code} 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region a66d281d231dfcaea97c270698b26b6f from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region c12e53bfd48ddc5eec507d66821c4d23 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 59ae13de8c1eb325a0dd51f4902d2052 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region f45bc9614d7575f35244849af85aa078 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region cc3ecd7054fe6cd4a1159ed92fd62641 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 3af40478a17fee96b4a192b22c90d5a2 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region e6096a8466e730463e10d3d61f809b92 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 4806781a1a23066f7baed22b4d237e24 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region d69e104131accaefe21dcc01fddc7629 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states {code} In branch the CLOSING node is created by RS thus leading to more inconsistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (HBASE-5424) HTable meet NPE when call getRegionInfo()
[ https://issues.apache.org/jira/browse/HBASE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210845#comment-13210845 ] Lars Hofhansl edited comment on HBASE-5424 at 2/18/12 7:16 AM: --- Thanks zhiyuan. I'm wondering whether we shouldn't focus on fixing the problem that caused the problem rather than pasting over it. was (Author: lhofhansl): That zhiyuan. I'm wondering whether we shouldn't focus on fixing the problem that caused the problem rather than pasting over it. HTable meet NPE when call getRegionInfo() - Key: HBASE-5424 URL: https://issues.apache.org/jira/browse/HBASE-5424 Project: HBase Issue Type: Bug Affects Versions: 0.90.1, 0.90.5 Reporter: junhua yang Attachments: HBASE-5424.patch Original Estimate: 48h Remaining Estimate: 48h We meet NPE when call getRegionInfo() in testing environment. Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75) at org.apache.hadoop.hbase.util.Writables.getHRegionInfo(Writables.java:119) at org.apache.hadoop.hbase.client.HTable$2.processRow(HTable.java:395) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:190) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:73) at org.apache.hadoop.hbase.client.HTable.getRegionsInfo(HTable.java:418) This NPE also make the table.jsp can't show the region information of this table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3584) Allow atomic put/delete in one call
[ https://issues.apache.org/jira/browse/HBASE-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210863#comment-13210863 ] Lars Hofhansl commented on HBASE-3584: -- Something went wrong with this. I still see RowMutation, but it is empty. I suspect that it was not marked as deleted. Allow atomic put/delete in one call --- Key: HBASE-3584 URL: https://issues.apache.org/jira/browse/HBASE-3584 Project: HBase Issue Type: New Feature Components: client, coprocessors, regionserver Reporter: ryan rawson Assignee: Lars Hofhansl Fix For: 0.94.0 Attachments: 3584-final.txt, 3584-v1.txt, 3584-v3.txt Right now we have the following calls: put(Put) delete(Delete) increment(Increments) But we cannot combine all of the above in a single call, complete with a single row lock. It would be nice to do that. It would also allow us to do a CAS where we could do a put/increment if the check succeeded. - Amendment: Since Increment does not currently support MVCC it cannot be included in an atomic operation. So this for Put and Delete only. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3584) Allow atomic put/delete in one call
[ https://issues.apache.org/jira/browse/HBASE-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210864#comment-13210864 ] Lars Hofhansl commented on HBASE-3584: -- Fixed with addendum. Allow atomic put/delete in one call --- Key: HBASE-3584 URL: https://issues.apache.org/jira/browse/HBASE-3584 Project: HBase Issue Type: New Feature Components: client, coprocessors, regionserver Reporter: ryan rawson Assignee: Lars Hofhansl Fix For: 0.94.0 Attachments: 3584-final.txt, 3584-v1.txt, 3584-v3.txt Right now we have the following calls: put(Put) delete(Delete) increment(Increments) But we cannot combine all of the above in a single call, complete with a single row lock. It would be nice to do that. It would also allow us to do a CAS where we could do a put/increment if the check succeeded. - Amendment: Since Increment does not currently support MVCC it cannot be included in an atomic operation. So this for Put and Delete only. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5075) regionserver crashed and failover
[ https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhiyuan.dai updated HBASE-5075: --- Attachment: 5075.patch regionserver crashed and failover - Key: HBASE-5075 URL: https://issues.apache.org/jira/browse/HBASE-5075 Project: HBase Issue Type: Improvement Components: monitoring, regionserver, replication, zookeeper Affects Versions: 0.92.1 Reporter: zhiyuan.dai Fix For: 0.90.5 Attachments: 5075.patch regionserver crashed,it is too long time to notify hmaster.when hmaster know regionserver's shutdown,it is long time to fetch the hlog's lease. hbase is a online db, availability is very important. i have a idea to improve availability, monitor node to check regionserver's pid.if this pid not exsits,i think the rs down,i will delete the znode,and force close the hlog file. so the period maybe 100ms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5075) regionserver crashed and failover
[ https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210170#comment-13210170 ] zhiyuan.dai commented on HBASE-5075: @stack you are right,I really is considering a supervisor-like process that will remove the regionserver ephemeral node if the pid goes missing and fail to ping(new Socket-Connection refused),now i am translate us design documents. regionserver crashed and failover - Key: HBASE-5075 URL: https://issues.apache.org/jira/browse/HBASE-5075 Project: HBase Issue Type: Improvement Components: monitoring, regionserver, replication, zookeeper Affects Versions: 0.92.1 Reporter: zhiyuan.dai Fix For: 0.90.5 Attachments: 5075.patch regionserver crashed,it is too long time to notify hmaster.when hmaster know regionserver's shutdown,it is long time to fetch the hlog's lease. hbase is a online db, availability is very important. i have a idea to improve availability, monitor node to check regionserver's pid.if this pid not exsits,i think the rs down,i will delete the znode,and force close the hlog file. so the period maybe 100ms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5412) HBase book, section 2.6.4, has deficient list of client dependencies
[ https://issues.apache.org/jira/browse/HBASE-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210323#comment-13210323 ] Mike Spreitzer commented on HBASE-5412: --- Yes, I tested HBase 0.92.0 with Hadoop 1.0.0. I doubt this is the only combination for which the current text is deficient. HBase book, section 2.6.4, has deficient list of client dependencies Key: HBASE-5412 URL: https://issues.apache.org/jira/browse/HBASE-5412 Project: HBase Issue Type: Bug Components: documentation Affects Versions: 0.92.0 Reporter: Mike Spreitzer Assignee: Doug Meil Priority: Minor Labels: documentation Original Estimate: 1h Remaining Estimate: 1h The current text in section 2.6.4 of the HBase book says this about client dependencies: Minimally, a client of HBase needs the hbase, hadoop, log4j, commons-logging, commons-lang, and ZooKeeper jars in its CLASSPATH connecting to a cluster. I tried that, and got an exception due to a class not being found. I fixed that by searching for that class in the jars in lib/, and tried again. Got an exception, due to a different class not found. I iterated until it worked. When I was done, I found myself using the following JARs: commons-configuration-1.6.jar hadoop-core-1.0.0.jar slf4j-api-1.5.8.jar commons-lang-2.5.jar hbase-0.92.0.jar slf4j-log4j12-1.5.8.jar commons-logging-1.1.1.jar log4j-1.2.16.jar zookeeper-3.4.2.jar -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5075) regionserver crashed and failover
[ https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210367#comment-13210367 ] stack commented on HBASE-5075: -- Thanks for doing this. It looks very interesting. Please do not reformat existing code. It bloats your patch and makes reviews take longer; reviewer attention span is short (at least in this case) and its a shame to spend it going over code reformats. On the patch, is this necessary: + public String getRSPidAndRsZknode(); Can't you get the pid from a process listing? Or you want us to publish it via jmx? Or it looks like it is already published via jmx. Can your tool pick it up there? On the znode, can't you get the regionserver servername and then do lookup in zk directly? Can't you have supervisor do this? Is there not existing utilities that watch a pid and allow you do stuff when its gone? Or is it that you'd kill the server if a long GC pause? Do you have a bit of documentation on how this new utility works? Thanks. regionserver crashed and failover - Key: HBASE-5075 URL: https://issues.apache.org/jira/browse/HBASE-5075 Project: HBase Issue Type: Improvement Components: monitoring, regionserver, replication, zookeeper Affects Versions: 0.92.1 Reporter: zhiyuan.dai Fix For: 0.90.5 Attachments: 5075.patch regionserver crashed,it is too long time to notify hmaster.when hmaster know regionserver's shutdown,it is long time to fetch the hlog's lease. hbase is a online db, availability is very important. i have a idea to improve availability, monitor node to check regionserver's pid.if this pid not exsits,i think the rs down,i will delete the znode,and force close the hlog file. so the period maybe 100ms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5426) How to Set Up a Pseudo-Distributed Mode for HBase
How to Set Up a Pseudo-Distributed Mode for HBase - Key: HBASE-5426 URL: https://issues.apache.org/jira/browse/HBASE-5426 Project: HBase Issue Type: Improvement Components: documentation Affects Versions: 0.92.0 Environment: RedHat 7, Ubuntu 10 Reporter: Bing Li Fix For: 0.92.0 Hi, all, I just made a summary about the experiences to set up a pseudo-distributed mode HBase. 1) RedHat 9 is not suitable for running HBase and Hadoop. I don't know the reasons. Now Ubuntu is my choice. 2) After the pseudo-distributed mode of HDFS is configured, it is required to configure the hbase-env.sh and hbase-site.xml. The book, HBase the Definitive Guide, does not mentions hbase.env.xml. 3) It should set up JAVA_HOME, HBASE_CLASSPATH and HBASE_MANAGES_ZK. My hbase-env.sh is as follows. export JAVA_HOME=/opt/jdk1.6.1/ export HBASE_CLASSPATH=/opt/hbase-0.92.0/conf export HBASE_OPTS=-XX:+UseConcMarkSweepGC export HBASE_MANAGES_ZK=true 4) When configuring hbase-site.xml, the property, hbase.cluster.distributed, must be set also. The book, HBase the Definitive Guide, does not do that either. My hbase-site.xml is as follows. ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? configuration property namehbase.rootdir/name valuehdfs://localhost:9000/hbase/value /property property namedfs.replication/name value1/value /property property namehbase.cluster.distributed/name valuetrue/value /property /configuration I am a new user of HBase. Your suggestions are highly appreciated. Best regards, Bing -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5075) regionserver crashed and failover
[ https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210386#comment-13210386 ] Lars Hofhansl commented on HBASE-5075: -- I would 2nd Stack's request to create a patch without the format changes, also there're some author tags in the javadoc (which we don't do with Apache code). Is this guarding against just the RegionServer process dying (but its machine still up), or also against the machine dying? (I know I could take a closer look at the patch, but it's easier if you just tell me) :) regionserver crashed and failover - Key: HBASE-5075 URL: https://issues.apache.org/jira/browse/HBASE-5075 Project: HBase Issue Type: Improvement Components: monitoring, regionserver, replication, zookeeper Affects Versions: 0.92.1 Reporter: zhiyuan.dai Fix For: 0.90.5 Attachments: 5075.patch regionserver crashed,it is too long time to notify hmaster.when hmaster know regionserver's shutdown,it is long time to fetch the hlog's lease. hbase is a online db, availability is very important. i have a idea to improve availability, monitor node to check regionserver's pid.if this pid not exsits,i think the rs down,i will delete the znode,and force close the hlog file. so the period maybe 100ms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5425) Punt on the timeout doesn't work in BulkEnabler#waitUntilDone (master's EnableTableHandler)
[ https://issues.apache.org/jira/browse/HBASE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5425: - Resolution: Fixed Fix Version/s: 0.94.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk. Thanks for the patch Terry. Punt on the timeout doesn't work in BulkEnabler#waitUntilDone (master's EnableTableHandler) Key: HBASE-5425 URL: https://issues.apache.org/jira/browse/HBASE-5425 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.5, 0.92.0 Reporter: terry zhang Fix For: 0.94.0 Attachments: HBASE-5425.patch please take a look at the code below in EnableTableHandler(hbase master): {code:title=EnableTableHandler.java|borderStyle=solid} protected boolean waitUntilDone(long timeout) throws InterruptedException { . int lastNumberOfRegions = this.countOfRegionsInTable; while (!server.isStopped() remaining 0) { Thread.sleep(waitingTimeForEvents); regions = assignmentManager.getRegionsOfTable(tableName); if (isDone(regions)) break; // Punt on the timeout as long we make progress if (regions.size() lastNumberOfRegions) { lastNumberOfRegions = regions.size(); timeout += waitingTimeForEvents; } remaining = timeout - (System.currentTimeMillis() - startTime); } private boolean isDone(final ListHRegionInfo regions) { return regions != null regions.size() = this.countOfRegionsInTable; } {code} We can easily find out if we let lastNumberOfRegions = this.countOfRegionsInTable , the function of punt on timeout code will never be executed. I think initlize lastNumberOfRegions = 0 can make it work. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5425) Punt on the timeout doesn't work in BulkEnabler#waitUntilDone (master's EnableTableHandler)
[ https://issues.apache.org/jira/browse/HBASE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210393#comment-13210393 ] stack commented on HBASE-5425: -- Committed to 0.92 branch too. Punt on the timeout doesn't work in BulkEnabler#waitUntilDone (master's EnableTableHandler) Key: HBASE-5425 URL: https://issues.apache.org/jira/browse/HBASE-5425 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.5, 0.92.0 Reporter: terry zhang Fix For: 0.94.0 Attachments: HBASE-5425.patch please take a look at the code below in EnableTableHandler(hbase master): {code:title=EnableTableHandler.java|borderStyle=solid} protected boolean waitUntilDone(long timeout) throws InterruptedException { . int lastNumberOfRegions = this.countOfRegionsInTable; while (!server.isStopped() remaining 0) { Thread.sleep(waitingTimeForEvents); regions = assignmentManager.getRegionsOfTable(tableName); if (isDone(regions)) break; // Punt on the timeout as long we make progress if (regions.size() lastNumberOfRegions) { lastNumberOfRegions = regions.size(); timeout += waitingTimeForEvents; } remaining = timeout - (System.currentTimeMillis() - startTime); } private boolean isDone(final ListHRegionInfo regions) { return regions != null regions.size() = this.countOfRegionsInTable; } {code} We can easily find out if we let lastNumberOfRegions = this.countOfRegionsInTable , the function of punt on timeout code will never be executed. I think initlize lastNumberOfRegions = 0 can make it work. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5424) HTable meet NPE when call getRegionInfo()
[ https://issues.apache.org/jira/browse/HBASE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210405#comment-13210405 ] Hadoop QA commented on HBASE-5424: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12514961/HBASE-5424.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/980//console This message is automatically generated. HTable meet NPE when call getRegionInfo() - Key: HBASE-5424 URL: https://issues.apache.org/jira/browse/HBASE-5424 Project: HBase Issue Type: Bug Affects Versions: 0.90.1, 0.90.5 Reporter: junhua yang Attachments: HBASE-5424.patch Original Estimate: 48h Remaining Estimate: 48h We meet NPE when call getRegionInfo() in testing environment. Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75) at org.apache.hadoop.hbase.util.Writables.getHRegionInfo(Writables.java:119) at org.apache.hadoop.hbase.client.HTable$2.processRow(HTable.java:395) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:190) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:73) at org.apache.hadoop.hbase.client.HTable.getRegionsInfo(HTable.java:418) This NPE also make the table.jsp can't show the region information of this table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5426) How to Set Up a Pseudo-Distributed Mode for HBase
[ https://issues.apache.org/jira/browse/HBASE-5426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210409#comment-13210409 ] Doug Meil commented on HBASE-5426: -- no problem. How to Set Up a Pseudo-Distributed Mode for HBase - Key: HBASE-5426 URL: https://issues.apache.org/jira/browse/HBASE-5426 Project: HBase Issue Type: Improvement Components: documentation Affects Versions: 0.92.0 Environment: RedHat 7, Ubuntu 10 Reporter: Bing Li Assignee: Doug Meil Labels: documentation Fix For: 0.92.0 Original Estimate: 48h Remaining Estimate: 48h Hi, all, I just made a summary about the experiences to set up a pseudo-distributed mode HBase. 1) RedHat 9 is not suitable for running HBase and Hadoop. I don't know the reasons. Now Ubuntu is my choice. 2) After the pseudo-distributed mode of HDFS is configured, it is required to configure the hbase-env.sh and hbase-site.xml. The book, HBase the Definitive Guide, does not mentions hbase.env.xml. 3) It should set up JAVA_HOME, HBASE_CLASSPATH and HBASE_MANAGES_ZK. My hbase-env.sh is as follows. export JAVA_HOME=/opt/jdk1.6.1/ export HBASE_CLASSPATH=/opt/hbase-0.92.0/conf export HBASE_OPTS=-XX:+UseConcMarkSweepGC export HBASE_MANAGES_ZK=true 4) When configuring hbase-site.xml, the property, hbase.cluster.distributed, must be set also. The book, HBase the Definitive Guide, does not do that either. My hbase-site.xml is as follows. ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? configuration property namehbase.rootdir/name valuehdfs://localhost:9000/hbase/value /property property namedfs.replication/name value1/value /property property namehbase.cluster.distributed/name valuetrue/value /property /configuration I am a new user of HBase. Your suggestions are highly appreciated. Best regards, Bing -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5407) Show the per-region level request/sec count in the web ui
[ https://issues.apache.org/jira/browse/HBASE-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-5407: --- Attachment: D1779.1.patch Liyin requested code review of [jira][HBASE-5407][89-fb] Show the per-region level request/sec count in the web ui. Reviewers: Kannan, Karthik It would be nice to show the per-region level request/sec count in the web ui, especially when debugging the hot region problem. TEST PLAN Tested on the dev cluster REVISION DETAIL https://reviews.facebook.net/D1779 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/HServerLoad.java src/main/java/org/apache/hadoop/hbase/metrics/RequestMetrics.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/resources/hbase-webapps/regionserver/regionserver.jsp MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/3795/ Tip: use the X-Herald-Rules header to filter Herald messages in your client. Show the per-region level request/sec count in the web ui - Key: HBASE-5407 URL: https://issues.apache.org/jira/browse/HBASE-5407 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D1779.1.patch, D1779.1.patch It would be nice to show the per-region level request/sec count in the web ui, especially when debugging the hot region problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5407) Show the per-region level request/sec count in the web ui
[ https://issues.apache.org/jira/browse/HBASE-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-5407: --- Attachment: D1779.1.patch Liyin requested code review of [jira][HBASE-5407][89-fb] Show the per-region level request/sec count in the web ui. Reviewers: Kannan, Karthik It would be nice to show the per-region level request/sec count in the web ui, especially when debugging the hot region problem. TEST PLAN Tested on the dev cluster REVISION DETAIL https://reviews.facebook.net/D1779 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/HServerLoad.java src/main/java/org/apache/hadoop/hbase/metrics/RequestMetrics.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/resources/hbase-webapps/regionserver/regionserver.jsp MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/3795/ Tip: use the X-Herald-Rules header to filter Herald messages in your client. Show the per-region level request/sec count in the web ui - Key: HBASE-5407 URL: https://issues.apache.org/jira/browse/HBASE-5407 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D1779.1.patch, D1779.1.patch, D1779.1.patch It would be nice to show the per-region level request/sec count in the web ui, especially when debugging the hot region problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5241) Deletes should not mask Puts that come after it.
[ https://issues.apache.org/jira/browse/HBASE-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210444#comment-13210444 ] Phabricator commented on HBASE-5241: aaiyer has commented on the revision HBASE-5241 [jira] Deletes should not mask Puts that come after it.. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:1748 This will only happen for Deletes (Column and Family). The idea is that the Delete shall apply to all the puts, with a lower memstoreTS, regardless of their timestamp -- even if it is in future. Subsequent Puts etc. will not get masked by the Delete, because they should have a memstoreTS that is larger. src/main/java/org/apache/hadoop/hbase/regionserver/ScanDeleteTracker.java:155 This is not yet in production. But, if we decide to go down this route, we will definitely test it out for performance. Haven't optimised much here. Since, I don't expect there to be too many delete Family. Will revisit if the assumption turns out to be false. src/main/java/org/apache/hadoop/hbase/regionserver/ScanDeleteTracker.java:155 I'm not sure if we want to put this under ENFORCE_STRICTER_SEMANTICS my understanding was that it would be better to have Puts not be masked by previous Deletes, regardless weather we are willing to pay the extra performance cost for it, was the trade-off enforced using ENFORCE_STRICTER_SEMANTICS. If there is a good reason for clients to expect that the Put will be masked by previous Deletes, we can definitely guard this with the flag. src/main/java/org/apache/hadoop/hbase/regionserver/ScanDeleteTracker.java:173 Perhaps, I might rename this class to something different, and we can add a flag in ScanQueryMatcher to instantiate the appropriate DeleteTracker. src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java:223 Agree that this is going to be a performance issue here. But, this is just a V-1 to get the general idea out. I'm hopeful, we can optimise the codepath so that we incur the performance penalty only when there is really a later KV with a higher memstoreTS. We currently, do not have a way to tell that. But, it can be done, say dump a flag while writing the HFile, if there is a memstoreTS inversion. Or something along that lines Will try to optimise this, if needed, along those lines. REVISION DETAIL https://reviews.facebook.net/D1731 Deletes should not mask Puts that come after it. Key: HBASE-5241 URL: https://issues.apache.org/jira/browse/HBASE-5241 Project: HBase Issue Type: Improvement Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Attachments: HBASE-5241.D1731.1.patch, HBASE-5241.D1731.2.patch, HBASE-5241.D1731.3.patch Suppose that we have a delete row, and then followed by the put. The delete row can mask the put, unless there was a major compaction in between. Now that we are flushing the memstoreTS to disk, along with the KVs, we should be able to differentiate whether or not the Put happened after the Delete and offer better delete semantics. Couldn't find a pre-existing JIRA that already discusses this, so creating one. Seems related to https://issues.apache.org/jira/browse/HBASE-2406, but is not quite the same. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5407) Show the per-region level request/sec count in the web ui
[ https://issues.apache.org/jira/browse/HBASE-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210450#comment-13210450 ] stack commented on HBASE-5407: -- Liyin. Is this a backport for 0.89fb? If so, is there something you've added to your backport that we should have in trunk? Thanks boss. Show the per-region level request/sec count in the web ui - Key: HBASE-5407 URL: https://issues.apache.org/jira/browse/HBASE-5407 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D1779.1.patch, D1779.1.patch, D1779.1.patch It would be nice to show the per-region level request/sec count in the web ui, especially when debugging the hot region problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5407) Show the per-region level request/sec count in the web ui
[ https://issues.apache.org/jira/browse/HBASE-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210455#comment-13210455 ] Liyin Tang commented on HBASE-5407: --- Hi Stack. This patch is to add total read/write request number and read/write request per second for each region in 89-fb branch. For the apache trunk, I will also need to add the read/write request per second only. Show the per-region level request/sec count in the web ui - Key: HBASE-5407 URL: https://issues.apache.org/jira/browse/HBASE-5407 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D1779.1.patch, D1779.1.patch, D1779.1.patch It would be nice to show the per-region level request/sec count in the web ui, especially when debugging the hot region problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent
[ https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210460#comment-13210460 ] stack commented on HBASE-5200: -- Ram wants me to apply the patches here but the version for 0.90 is very different to the version for 0.92. This is removed: -RS_ZK_REGION_CLOSING (1), // RS is in process of closing a region And this is added: +M_ZK_REGION_CLOSING (51), // Master adds this region as closing in ZK This looks like a port from 0.92? AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent - Key: HBASE-5200 URL: https://issues.apache.org/jira/browse/HBASE-5200 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: 5200-test.txt, 5200-v2.txt, 5200-v3.txt, 5200-v4.txt, HBASE-5200.patch, HBASE-5200_1.patch, HBASE-5200_trunk_latest_with_test_2.patch, TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, hbase-5200_90_latest.patch, hbase-5200_90_latest_new.patch This is the scenario Consider a case where the balancer is going on thus trying to close regions in a RS. Before we could close a master switch happens. On Master switch the set of nodes that are in RIT is collected and we first get Data and start watching the node After that the node data is added into RIT. Now by this time (before adding to RIT) if the RS to which close was called does a transition in AM.handleRegion() we miss the handling saying RIT state was null. {code} 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region a66d281d231dfcaea97c270698b26b6f from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region c12e53bfd48ddc5eec507d66821c4d23 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 59ae13de8c1eb325a0dd51f4902d2052 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region f45bc9614d7575f35244849af85aa078 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region cc3ecd7054fe6cd4a1159ed92fd62641 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 3af40478a17fee96b4a192b22c90d5a2 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region e6096a8466e730463e10d3d61f809b92 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 4806781a1a23066f7baed22b4d237e24 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region d69e104131accaefe21dcc01fddc7629 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states {code} In branch the CLOSING node is created by RS thus leading to more inconsistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent
[ https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5200: - Attachment: 5200-v4no-prefix.txt v4 for hadoopqa AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent - Key: HBASE-5200 URL: https://issues.apache.org/jira/browse/HBASE-5200 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: 5200-test.txt, 5200-v2.txt, 5200-v3.txt, 5200-v4.txt, 5200-v4no-prefix.txt, HBASE-5200.patch, HBASE-5200_1.patch, HBASE-5200_trunk_latest_with_test_2.patch, TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, hbase-5200_90_latest.patch, hbase-5200_90_latest_new.patch This is the scenario Consider a case where the balancer is going on thus trying to close regions in a RS. Before we could close a master switch happens. On Master switch the set of nodes that are in RIT is collected and we first get Data and start watching the node After that the node data is added into RIT. Now by this time (before adding to RIT) if the RS to which close was called does a transition in AM.handleRegion() we miss the handling saying RIT state was null. {code} 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region a66d281d231dfcaea97c270698b26b6f from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region c12e53bfd48ddc5eec507d66821c4d23 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 59ae13de8c1eb325a0dd51f4902d2052 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region f45bc9614d7575f35244849af85aa078 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region cc3ecd7054fe6cd4a1159ed92fd62641 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 3af40478a17fee96b4a192b22c90d5a2 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region e6096a8466e730463e10d3d61f809b92 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 4806781a1a23066f7baed22b4d237e24 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region d69e104131accaefe21dcc01fddc7629 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states {code} In branch the CLOSING node is created by RS thus leading to more inconsistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent
[ https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5200: - Status: Open (was: Patch Available) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent - Key: HBASE-5200 URL: https://issues.apache.org/jira/browse/HBASE-5200 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: 5200-test.txt, 5200-v2.txt, 5200-v3.txt, 5200-v4.txt, 5200-v4no-prefix.txt, HBASE-5200.patch, HBASE-5200_1.patch, HBASE-5200_trunk_latest_with_test_2.patch, TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, hbase-5200_90_latest.patch, hbase-5200_90_latest_new.patch This is the scenario Consider a case where the balancer is going on thus trying to close regions in a RS. Before we could close a master switch happens. On Master switch the set of nodes that are in RIT is collected and we first get Data and start watching the node After that the node data is added into RIT. Now by this time (before adding to RIT) if the RS to which close was called does a transition in AM.handleRegion() we miss the handling saying RIT state was null. {code} 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region a66d281d231dfcaea97c270698b26b6f from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region c12e53bfd48ddc5eec507d66821c4d23 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 59ae13de8c1eb325a0dd51f4902d2052 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region f45bc9614d7575f35244849af85aa078 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region cc3ecd7054fe6cd4a1159ed92fd62641 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 3af40478a17fee96b4a192b22c90d5a2 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region e6096a8466e730463e10d3d61f809b92 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 4806781a1a23066f7baed22b4d237e24 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region d69e104131accaefe21dcc01fddc7629 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states {code} In branch the CLOSING node is created by RS thus leading to more inconsistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent
[ https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5200: - Status: Patch Available (was: Open) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent - Key: HBASE-5200 URL: https://issues.apache.org/jira/browse/HBASE-5200 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: 5200-test.txt, 5200-v2.txt, 5200-v3.txt, 5200-v4.txt, 5200-v4no-prefix.txt, HBASE-5200.patch, HBASE-5200_1.patch, HBASE-5200_trunk_latest_with_test_2.patch, TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, hbase-5200_90_latest.patch, hbase-5200_90_latest_new.patch This is the scenario Consider a case where the balancer is going on thus trying to close regions in a RS. Before we could close a master switch happens. On Master switch the set of nodes that are in RIT is collected and we first get Data and start watching the node After that the node data is added into RIT. Now by this time (before adding to RIT) if the RS to which close was called does a transition in AM.handleRegion() we miss the handling saying RIT state was null. {code} 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region a66d281d231dfcaea97c270698b26b6f from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region c12e53bfd48ddc5eec507d66821c4d23 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 59ae13de8c1eb325a0dd51f4902d2052 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region f45bc9614d7575f35244849af85aa078 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region cc3ecd7054fe6cd4a1159ed92fd62641 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 3af40478a17fee96b4a192b22c90d5a2 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region e6096a8466e730463e10d3d61f809b92 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 4806781a1a23066f7baed22b4d237e24 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region d69e104131accaefe21dcc01fddc7629 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states {code} In branch the CLOSING node is created by RS thus leading to more inconsistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5423) Regionserver may block forever on waitOnAllRegionsToClose when aborting
[ https://issues.apache.org/jira/browse/HBASE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210466#comment-13210466 ] stack commented on HBASE-5423: -- Patch looks good Chunhui. Change name of Set from addedRegionsToCallClose to closed. Why do we have this Set? Are we calling close multiple times on same region? So, we'd break even though online regions is not yet empty? {code} + if (this.regionsInTransitionInRS.isEmpty()) { +break; + } {code} Thanks. Regionserver may block forever on waitOnAllRegionsToClose when aborting --- Key: HBASE-5423 URL: https://issues.apache.org/jira/browse/HBASE-5423 Project: HBase Issue Type: Bug Components: regionserver Reporter: chunhui shen Assignee: chunhui shen Attachments: hbase-5423.patch If closeRegion throws any exception (It would be caused by FS ) when RS is aborting, RS will block forever on waitOnAllRegionsToClose(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5346) Fix testColumnFamilyCompression and test_TIMERANGE in TestHFileOutputFormat
[ https://issues.apache.org/jira/browse/HBASE-5346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5346: - Fix Version/s: 0.92.0 Committed to 0.92 too... Fix testColumnFamilyCompression and test_TIMERANGE in TestHFileOutputFormat Key: HBASE-5346 URL: https://issues.apache.org/jira/browse/HBASE-5346 Project: HBase Issue Type: Sub-task Components: mapreduce, test Affects Versions: 0.94.0, 0.92.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Fix For: 0.94.0, 0.92.0 Attachments: HBASE-5346-v0.patch Running mvn -Dhadoop.profile=23 test -P localTests -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat yields this on 0.92 (for testColumnFamilyCompression and test_TIMERANGE): Failed tests: testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): HFile for column family info-A not found Tests in error: test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0 (Is a directory) The problem is that these tests make incorrect assumptions about the output of mapreduce jobs. Prior to 0.23, temporary data was in, for example: ./_temporary/_attempt___r_00_0/b/1979617994050536795 Now that has changed. The correct way to get that path is based on getDefaultWorkFile. Also, the data is not moved into the outputPath until both the Task and Job are committed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5393) Consider splitting after flushing
[ https://issues.apache.org/jira/browse/HBASE-5393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans resolved HBASE-5393. --- Resolution: Fixed Fix Version/s: 0.92.1 Hadoop Flags: Reviewed Committed to trunk and 0.92, thanks for the votes and reviews guys. Consider splitting after flushing - Key: HBASE-5393 URL: https://issues.apache.org/jira/browse/HBASE-5393 Project: HBase Issue Type: Improvement Affects Versions: 0.90.5 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Fix For: 0.94.0, 0.92.1 Attachments: HBASE-2375-flush-split.patch Spawning this from HBASE-2375, I saw that it was much more efficient compaction-wise to check if we can split right after flushing. Much like the ideas that Jon spelled out in the description of that jira, the window is smaller because you don't have to compact and then split right away to only compact again when the daughters open. Another thing it improves is while we're normally waiting for the compaction to happen, data that's still coming in will make us go way past the MAX_FILESIZE to a point where for the first region I was seeing a store size 3-4x bigger before it was able to split. I targeted this for 0.94, but I'd like to get this into 0.92.1 or .2 too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4640) Catch ClosedChannelException and document it
[ https://issues.apache.org/jira/browse/HBASE-4640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210483#comment-13210483 ] stack commented on HBASE-4640: -- +1 On commit add the CCE.getMessage to the LOG.WARN just in case its got info of use (I'm fine on skipping stack trace) Catch ClosedChannelException and document it Key: HBASE-4640 URL: https://issues.apache.org/jira/browse/HBASE-4640 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Priority: Minor Fix For: 0.94.0 Attachments: HBASE-4640.patch ClosedChannelException is a pretty obscure exception for the non-expert and doesn't tell you why you get it. We should instead catch it, print a WARN, don't print a stack trace, and add a line in the book about this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4640) Catch ClosedChannelException and document it
[ https://issues.apache.org/jira/browse/HBASE-4640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans resolved HBASE-4640. --- Resolution: Fixed Hadoop Flags: Reviewed Committed to trunk with Stack's commit, thanks for the review. Catch ClosedChannelException and document it Key: HBASE-4640 URL: https://issues.apache.org/jira/browse/HBASE-4640 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Priority: Minor Fix For: 0.94.0 Attachments: HBASE-4640.patch ClosedChannelException is a pretty obscure exception for the non-expert and doesn't tell you why you get it. We should instead catch it, print a WARN, don't print a stack trace, and add a line in the book about this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5119) Set the TimeoutMonitor's timeout back down
[ https://issues.apache.org/jira/browse/HBASE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5119: - Fix Version/s: (was: 0.92.1) 0.92.2 Moving to 0.92.2 Set the TimeoutMonitor's timeout back down -- Key: HBASE-5119 URL: https://issues.apache.org/jira/browse/HBASE-5119 Project: HBase Issue Type: Task Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Priority: Blocker Fix For: 0.94.0, 0.92.2 The TimeoutMonitor used to be extremely racy and caused more troubles than it fixed, but most of this has been fixed I believe in the context of 0.92 so I think we should set it down back to a useful level. Currently it's 30 minutes, what should the new value be? I think 5 minutes should be good, will do some testing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5120) Timeout monitor races with table disable handler
[ https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210491#comment-13210491 ] stack commented on HBASE-5120: -- I committed to trunk. Will not commit to 0.92. Not important enough of a bug I'd say. Timeout monitor races with table disable handler Key: HBASE-5120 URL: https://issues.apache.org/jira/browse/HBASE-5120 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Zhihong Yu Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.94.0, 0.92.1 Attachments: HBASE-5120.patch, HBASE-5120_1.patch, HBASE-5120_2.patch, HBASE-5120_3.patch, HBASE-5120_4.patch, HBASE-5120_5.patch, HBASE-5120_5.patch Here is what J-D described here: https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176 I think I will retract from my statement that it used to be extremely racy and caused more troubles than it fixed, on my first test I got a stuck region in transition instead of being able to recover. The timeout was set to 2 minutes to be sure I hit it. First the region gets closed {quote} 2012-01-04 00:16:25,811 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to sv4r5s38,62023,1325635980913 for region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. {quote} 2 minutes later it times out: {quote} 2012-01-04 00:18:30,026 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. state=PENDING_CLOSE, ts=1325636185810, server=null 2012-01-04 00:18:30,026 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_CLOSE for too long, running forced unassign again on region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 2012-01-04 00:18:30,027 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. (offlining) {quote} 100ms later the master finally gets the event: {quote} 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for 1a4b111bcc228043e89f59c4c3f6a791 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so deleting ZK node and removing from regions in transition, skipping assignment of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x134589d3db03587 Deleting existing unassigned node for 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x134589d3db03587 Successfully deleted unassigned node for region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED {quote} At this point everything is fine, the region was processed as closed. But wait, remember that line where it said it was going to force an unassign? {quote} 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x134589d3db03587 Creating unassigned node for 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state 2012-01-04 00:18:30,328 INFO org.apache.hadoop.hbase.master.AssignmentManager: Server null returned java.lang.NullPointerException: Passed server is null for 1a4b111bcc228043e89f59c4c3f6a791 {quote} Now the master is confused, it recreated the RIT znode but the region doesn't even exist anymore. It even tries to shut it down but is blocked by NPEs. Now this is what's going on. The late ZK notification that the znode was deleted (but it got recreated after): {quote} 2012-01-04 00:19:33,285 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: The znode of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been deleted. {quote} Then it prints this, and much later tries to unassign it again: {quote} 2012-01-04 00:19:46,607 DEBUG org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on region to clear regions in transition; test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. state=PENDING_CLOSE, ts=1325636310328, server=null ... 2012-01-04 00:20:39,623 DEBUG
[jira] [Commented] (HBASE-5120) Timeout monitor races with table disable handler
[ https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210492#comment-13210492 ] stack commented on HBASE-5120: -- I did not commit to 0.90 either. Timeout monitor races with table disable handler Key: HBASE-5120 URL: https://issues.apache.org/jira/browse/HBASE-5120 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Zhihong Yu Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.94.0 Attachments: HBASE-5120.patch, HBASE-5120_1.patch, HBASE-5120_2.patch, HBASE-5120_3.patch, HBASE-5120_4.patch, HBASE-5120_5.patch, HBASE-5120_5.patch Here is what J-D described here: https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176 I think I will retract from my statement that it used to be extremely racy and caused more troubles than it fixed, on my first test I got a stuck region in transition instead of being able to recover. The timeout was set to 2 minutes to be sure I hit it. First the region gets closed {quote} 2012-01-04 00:16:25,811 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to sv4r5s38,62023,1325635980913 for region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. {quote} 2 minutes later it times out: {quote} 2012-01-04 00:18:30,026 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. state=PENDING_CLOSE, ts=1325636185810, server=null 2012-01-04 00:18:30,026 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_CLOSE for too long, running forced unassign again on region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 2012-01-04 00:18:30,027 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. (offlining) {quote} 100ms later the master finally gets the event: {quote} 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for 1a4b111bcc228043e89f59c4c3f6a791 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so deleting ZK node and removing from regions in transition, skipping assignment of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x134589d3db03587 Deleting existing unassigned node for 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x134589d3db03587 Successfully deleted unassigned node for region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED {quote} At this point everything is fine, the region was processed as closed. But wait, remember that line where it said it was going to force an unassign? {quote} 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x134589d3db03587 Creating unassigned node for 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state 2012-01-04 00:18:30,328 INFO org.apache.hadoop.hbase.master.AssignmentManager: Server null returned java.lang.NullPointerException: Passed server is null for 1a4b111bcc228043e89f59c4c3f6a791 {quote} Now the master is confused, it recreated the RIT znode but the region doesn't even exist anymore. It even tries to shut it down but is blocked by NPEs. Now this is what's going on. The late ZK notification that the znode was deleted (but it got recreated after): {quote} 2012-01-04 00:19:33,285 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: The znode of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been deleted. {quote} Then it prints this, and much later tries to unassign it again: {quote} 2012-01-04 00:19:46,607 DEBUG org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on region to clear regions in transition; test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. state=PENDING_CLOSE, ts=1325636310328, server=null ... 2012-01-04 00:20:39,623 DEBUG org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on region to clear regions in transition;
[jira] [Updated] (HBASE-4298) Support to drain RS nodes through ZK
[ https://issues.apache.org/jira/browse/HBASE-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4298: - Fix Version/s: (was: 0.90.7) Removed 0.90.7 as a fix version. Support to drain RS nodes through ZK Key: HBASE-4298 URL: https://issues.apache.org/jira/browse/HBASE-4298 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.90.4 Environment: all Reporter: Aravind Gottipati Priority: Critical Labels: patch Fix For: 0.92.0 Attachments: 4298-trunk-v2.txt, 4298-trunk-v3.txt, 90_hbase.patch, drainingservertest-v2.txt, drainingservertest.txt, trunk_hbase.patch, trunk_with_test.txt HDFS currently has a way to exclude certain datanodes and prevent them from getting new blocks. HDFS goes one step further and even drains these nodes for you. This enhancement is a step in that direction. The idea is that we mark nodes in zookeeper as draining nodes. This means that they don't get any more new regions. These draining nodes look exactly the same as the corresponding nodes in /rs, except they live under /draining. Eventually, support for draining them can be added. I am submitting two patches for review - one for the 0.90 branch and one for trunk (in git). Here are the two patches 0.90 - https://github.com/aravind/hbase/commit/181041e72e7ffe6a4da6d82b431ef7f8c99e62d2 trunk - https://github.com/aravind/hbase/commit/e127b25ae3b4034103b185d8380f3b7267bc67d5 I have tested both these patches and they work as advertised. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5120) Timeout monitor races with table disable handler
[ https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5120: - Fix Version/s: (was: 0.92.1) Timeout monitor races with table disable handler Key: HBASE-5120 URL: https://issues.apache.org/jira/browse/HBASE-5120 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Zhihong Yu Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.94.0 Attachments: HBASE-5120.patch, HBASE-5120_1.patch, HBASE-5120_2.patch, HBASE-5120_3.patch, HBASE-5120_4.patch, HBASE-5120_5.patch, HBASE-5120_5.patch Here is what J-D described here: https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176 I think I will retract from my statement that it used to be extremely racy and caused more troubles than it fixed, on my first test I got a stuck region in transition instead of being able to recover. The timeout was set to 2 minutes to be sure I hit it. First the region gets closed {quote} 2012-01-04 00:16:25,811 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to sv4r5s38,62023,1325635980913 for region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. {quote} 2 minutes later it times out: {quote} 2012-01-04 00:18:30,026 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. state=PENDING_CLOSE, ts=1325636185810, server=null 2012-01-04 00:18:30,026 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_CLOSE for too long, running forced unassign again on region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 2012-01-04 00:18:30,027 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. (offlining) {quote} 100ms later the master finally gets the event: {quote} 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for 1a4b111bcc228043e89f59c4c3f6a791 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so deleting ZK node and removing from regions in transition, skipping assignment of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x134589d3db03587 Deleting existing unassigned node for 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x134589d3db03587 Successfully deleted unassigned node for region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED {quote} At this point everything is fine, the region was processed as closed. But wait, remember that line where it said it was going to force an unassign? {quote} 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x134589d3db03587 Creating unassigned node for 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state 2012-01-04 00:18:30,328 INFO org.apache.hadoop.hbase.master.AssignmentManager: Server null returned java.lang.NullPointerException: Passed server is null for 1a4b111bcc228043e89f59c4c3f6a791 {quote} Now the master is confused, it recreated the RIT znode but the region doesn't even exist anymore. It even tries to shut it down but is blocked by NPEs. Now this is what's going on. The late ZK notification that the znode was deleted (but it got recreated after): {quote} 2012-01-04 00:19:33,285 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: The znode of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been deleted. {quote} Then it prints this, and much later tries to unassign it again: {quote} 2012-01-04 00:19:46,607 DEBUG org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on region to clear regions in transition; test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. state=PENDING_CLOSE, ts=1325636310328, server=null ... 2012-01-04 00:20:39,623 DEBUG org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on region to clear regions in transition;
[jira] [Updated] (HBASE-5279) NPE in Master after upgrading to 0.92.0
[ https://issues.apache.org/jira/browse/HBASE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5279: - Attachment: HBASE-5279-v2.patch Version of patch that will work w/ hadoopqa NPE in Master after upgrading to 0.92.0 --- Key: HBASE-5279 URL: https://issues.apache.org/jira/browse/HBASE-5279 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.0 Reporter: Tobias Herbert Priority: Critical Fix For: 0.92.1 Attachments: HBASE-5279-v2.patch, HBASE-5279.patch I have upgraded my environment from 0.90.4 to 0.92.0 after the table migration I get the following error in the master (permanent) {noformat} 2012-01-25 18:23:48,648 FATAL master-namenode,6,1327512209588 org.apache.hadoop.hbase.master.HMaster - Unhandled exception. Starting shutdown. java.lang.NullPointerException at org.apache.hadoop.hbase.master.AssignmentManager.rebuildUserRegions(AssignmentManager.java:2190) at org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:323) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:501) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:326) at java.lang.Thread.run(Thread.java:662) 2012-01-25 18:23:48,650 INFO namenode,6,1327512209588 org.apache.hadoop.hbase.master.HMaster - Aborting {noformat} I think that's because I had a hard crash in the cluster a while ago - and the following WARN since then {noformat} 2012-01-25 21:20:47,121 WARN namenode,6,1327513078123-CatalogJanitor org.apache.hadoop.hbase.master.CatalogJanitor - REGIONINFO_QUALIFIER is empty in keyvalues={emails,,xxx./info:server/1314336400471/Put/vlen=38, emails,,1314189353300.xxx./info:serverstartcode/1314336400471/Put/vlen=8} {noformat} my patch was simple to go around the NPE (as the other code around the lines) but I don't know if that's correct -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5279) NPE in Master after upgrading to 0.92.0
[ https://issues.apache.org/jira/browse/HBASE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5279: - Status: Patch Available (was: Open) NPE in Master after upgrading to 0.92.0 --- Key: HBASE-5279 URL: https://issues.apache.org/jira/browse/HBASE-5279 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.0 Reporter: Tobias Herbert Priority: Critical Fix For: 0.92.1 Attachments: HBASE-5279-v2.patch, HBASE-5279.patch I have upgraded my environment from 0.90.4 to 0.92.0 after the table migration I get the following error in the master (permanent) {noformat} 2012-01-25 18:23:48,648 FATAL master-namenode,6,1327512209588 org.apache.hadoop.hbase.master.HMaster - Unhandled exception. Starting shutdown. java.lang.NullPointerException at org.apache.hadoop.hbase.master.AssignmentManager.rebuildUserRegions(AssignmentManager.java:2190) at org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:323) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:501) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:326) at java.lang.Thread.run(Thread.java:662) 2012-01-25 18:23:48,650 INFO namenode,6,1327512209588 org.apache.hadoop.hbase.master.HMaster - Aborting {noformat} I think that's because I had a hard crash in the cluster a while ago - and the following WARN since then {noformat} 2012-01-25 21:20:47,121 WARN namenode,6,1327513078123-CatalogJanitor org.apache.hadoop.hbase.master.CatalogJanitor - REGIONINFO_QUALIFIER is empty in keyvalues={emails,,xxx./info:server/1314336400471/Put/vlen=38, emails,,1314189353300.xxx./info:serverstartcode/1314336400471/Put/vlen=8} {noformat} my patch was simple to go around the NPE (as the other code around the lines) but I don't know if that's correct -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5271) Result.getValue and Result.getColumnLatest return the wrong column.
[ https://issues.apache.org/jira/browse/HBASE-5271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5271: - Resolution: Fixed Status: Resolved (was: Patch Available) Was committed a while back. Thanks for the patch Ghais. Result.getValue and Result.getColumnLatest return the wrong column. --- Key: HBASE-5271 URL: https://issues.apache.org/jira/browse/HBASE-5271 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.5 Reporter: Ghais Issa Assignee: Ghais Issa Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: 5271-90.txt, 5271-v2.txt, fixKeyValueMatchingColumn.diff, testGetValue.diff In the following example result.getValue returns the wrong column KeyValue kv = new KeyValue(Bytes.toBytes(r), Bytes.toBytes(24), Bytes.toBytes(2), Bytes.toBytes(7L)); Result result = new Result(new KeyValue[] { kv }); System.out.println(Bytes.toLong(result.getValue(Bytes.toBytes(2), Bytes.toBytes(2; //prints 7. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent
[ https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210505#comment-13210505 ] Hadoop QA commented on HBASE-5200: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515010/5200-v4no-prefix.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 158 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.coprocessor.TestMasterObserver org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/981//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/981//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/981//console This message is automatically generated. AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent - Key: HBASE-5200 URL: https://issues.apache.org/jira/browse/HBASE-5200 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: 5200-test.txt, 5200-v2.txt, 5200-v3.txt, 5200-v4.txt, 5200-v4no-prefix.txt, HBASE-5200.patch, HBASE-5200_1.patch, HBASE-5200_trunk_latest_with_test_2.patch, TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, hbase-5200_90_latest.patch, hbase-5200_90_latest_new.patch This is the scenario Consider a case where the balancer is going on thus trying to close regions in a RS. Before we could close a master switch happens. On Master switch the set of nodes that are in RIT is collected and we first get Data and start watching the node After that the node data is added into RIT. Now by this time (before adding to RIT) if the RS to which close was called does a transition in AM.handleRegion() we miss the handling saying RIT state was null. {code} 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region a66d281d231dfcaea97c270698b26b6f from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region c12e53bfd48ddc5eec507d66821c4d23 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 59ae13de8c1eb325a0dd51f4902d2052 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region f45bc9614d7575f35244849af85aa078 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region cc3ecd7054fe6cd4a1159ed92fd62641 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 3af40478a17fee96b4a192b22c90d5a2 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region e6096a8466e730463e10d3d61f809b92 from server HOST-192-168-47-204,20020,1326342744518 but region
[jira] [Updated] (HBASE-5421) use hadoop-client/hadoop-minicluster artifacts for Hadoop 0.23 build
[ https://issues.apache.org/jira/browse/HBASE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5421: - Resolution: Fixed Fix Version/s: 0.94.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed trunk and branch. Thanks for patch Shaneal. use hadoop-client/hadoop-minicluster artifacts for Hadoop 0.23 build Key: HBASE-5421 URL: https://issues.apache.org/jira/browse/HBASE-5421 Project: HBase Issue Type: Improvement Components: build Affects Versions: 0.92.0 Reporter: Shaneal Manek Assignee: Shaneal Manek Priority: Minor Labels: build Fix For: 0.94.0, 0.92.1 Attachments: hbase-5421.patch Hadoop recently added hadoop-client and hadoop-minicluster artifacts for Hadoop 0.23+ that don't export all the internal dependencies (HADOOP-8009). Let's use them instead of manually specifying transitive dependency exclusion lists (which is error prone and annoying). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5399) Cut the link between the client and the zookeeper ensemble
[ https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5399: --- Attachment: 5399_inprogress.v3.patch Cut the link between the client and the zookeeper ensemble -- Key: HBASE-5399 URL: https://issues.apache.org/jira/browse/HBASE-5399 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5399_inprogress.patch, 5399_inprogress.v3.patch The link is often considered as an issue, for various reasons. One of them being that there is a limit on the number of connection that ZK can manage. Stack was suggesting as well to remove the link to master from HConnection. There are choices to be made considering the existing API (that we don't want to break). The first patches I will submit on hadoop-qa should not be committed: they are here to show the progress on the direction taken. ZooKeeper is used for: - public getter, to let the client do whatever he wants, and close ZooKeeper when closing the connection = we have to deprecate this but keep it. - read get master address to create a master = now done with a temporary zookeeper connection - read root location = now done with a temporary zookeeper connection, but questionable. Used in public function locateRegion. To be reworked. - read cluster id = now done once with a temporary zookeeper connection. - check if base done is available = now done once with a zookeeper connection given as a parameter - isTableDisabled/isTableAvailable = public functions, now done with a temporary zookeeper connection. - Called internally from HBaseAdmin and HTable - getCurrentNrHRS(): public function to get the number of region servers and create a pool of thread = now done with a temporary zookeeper connection - Master is used for: - getMaster public getter, as for ZooKeeper = we have to deprecate this but keep it. - isMasterRunning(): public function, used internally by HMerge HBaseAdmin - getHTableDescriptor*: public functions offering access to the master. = we could make them using a temporary master connection as well. Main points are: - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a strongly coupled architecture ;-). This can be changed, but requires a lot of modifications in these classes (likely adding a class in the middle of the hierarchy, something like that). Anyway, non connected client will always be really slower, because it's a tcp connection, and establishing a tcp connection is slow. - having a link between ZK and all the client seems to make sense for some Use Cases. However, it won't scale if a TCP connection is required for every client - if we move the table descriptor part away from the client, we need to find a new place for it. - we will have the same issue if HBaseAdmin (for both ZK Master), may be we can put a timeout on the connection. That would make the whole system less deterministic however. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5279) NPE in Master after upgrading to 0.92.0
[ https://issues.apache.org/jira/browse/HBASE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210552#comment-13210552 ] Hadoop QA commented on HBASE-5279: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515016/HBASE-5279-v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 158 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/982//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/982//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/982//console This message is automatically generated. NPE in Master after upgrading to 0.92.0 --- Key: HBASE-5279 URL: https://issues.apache.org/jira/browse/HBASE-5279 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.0 Reporter: Tobias Herbert Priority: Critical Fix For: 0.92.1 Attachments: HBASE-5279-v2.patch, HBASE-5279.patch I have upgraded my environment from 0.90.4 to 0.92.0 after the table migration I get the following error in the master (permanent) {noformat} 2012-01-25 18:23:48,648 FATAL master-namenode,6,1327512209588 org.apache.hadoop.hbase.master.HMaster - Unhandled exception. Starting shutdown. java.lang.NullPointerException at org.apache.hadoop.hbase.master.AssignmentManager.rebuildUserRegions(AssignmentManager.java:2190) at org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:323) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:501) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:326) at java.lang.Thread.run(Thread.java:662) 2012-01-25 18:23:48,650 INFO namenode,6,1327512209588 org.apache.hadoop.hbase.master.HMaster - Aborting {noformat} I think that's because I had a hard crash in the cluster a while ago - and the following WARN since then {noformat} 2012-01-25 21:20:47,121 WARN namenode,6,1327513078123-CatalogJanitor org.apache.hadoop.hbase.master.CatalogJanitor - REGIONINFO_QUALIFIER is empty in keyvalues={emails,,xxx./info:server/1314336400471/Put/vlen=38, emails,,1314189353300.xxx./info:serverstartcode/1314336400471/Put/vlen=8} {noformat} my patch was simple to go around the NPE (as the other code around the lines) but I don't know if that's correct -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5427) Upgrade our zk to 3.4.3
Upgrade our zk to 3.4.3 --- Key: HBASE-5427 URL: https://issues.apache.org/jira/browse/HBASE-5427 Project: HBase Issue Type: Task Reporter: stack Fix For: 0.94.0, 0.92.1 Attachments: 5427.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5427) Upgrade our zk to 3.4.3
[ https://issues.apache.org/jira/browse/HBASE-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5427: - Attachment: 5427.txt Upgrade our zk to 3.4.3 --- Key: HBASE-5427 URL: https://issues.apache.org/jira/browse/HBASE-5427 Project: HBase Issue Type: Task Reporter: stack Fix For: 0.94.0, 0.92.1 Attachments: 5427.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5425) Punt on the timeout doesn't work in BulkEnabler#waitUntilDone (master's EnableTableHandler)
[ https://issues.apache.org/jira/browse/HBASE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210562#comment-13210562 ] Hudson commented on HBASE-5425: --- Integrated in HBase-0.92 #286 (See [https://builds.apache.org/job/HBase-0.92/286/]) HBASE-5425 Punt on the timeout doesn't work in BulkEnabler#waitUntilDone (master's EnableTableHandler) (Revision 1245676) Result = SUCCESS stack : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java Punt on the timeout doesn't work in BulkEnabler#waitUntilDone (master's EnableTableHandler) Key: HBASE-5425 URL: https://issues.apache.org/jira/browse/HBASE-5425 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.5, 0.92.0 Reporter: terry zhang Fix For: 0.94.0 Attachments: HBASE-5425.patch please take a look at the code below in EnableTableHandler(hbase master): {code:title=EnableTableHandler.java|borderStyle=solid} protected boolean waitUntilDone(long timeout) throws InterruptedException { . int lastNumberOfRegions = this.countOfRegionsInTable; while (!server.isStopped() remaining 0) { Thread.sleep(waitingTimeForEvents); regions = assignmentManager.getRegionsOfTable(tableName); if (isDone(regions)) break; // Punt on the timeout as long we make progress if (regions.size() lastNumberOfRegions) { lastNumberOfRegions = regions.size(); timeout += waitingTimeForEvents; } remaining = timeout - (System.currentTimeMillis() - startTime); } private boolean isDone(final ListHRegionInfo regions) { return regions != null regions.size() = this.countOfRegionsInTable; } {code} We can easily find out if we let lastNumberOfRegions = this.countOfRegionsInTable , the function of punt on timeout code will never be executed. I think initlize lastNumberOfRegions = 0 can make it work. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5427) Upgrade our zk to 3.4.3
[ https://issues.apache.org/jira/browse/HBASE-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-5427. -- Resolution: Fixed Assignee: stack Committed 0.92 branch and trunk. Upgrade our zk to 3.4.3 --- Key: HBASE-5427 URL: https://issues.apache.org/jira/browse/HBASE-5427 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Fix For: 0.94.0, 0.92.1 Attachments: 5427.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble
[ https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210561#comment-13210561 ] nkeywal commented on HBASE-5399: It's not the last version (it needs more comments, unit tests and likely bug fixes), but there is already a lot. Master ZooKeeper connection are now created only when necessary, and are closed if not used for 5 minutes. I added the keep alive stuff. It's not a nice to have; without it the unit tests take twice more time. There is an issue with the masterCheck part, the previous behavior was strange. I need to review it in details. The patch is on monday trunk. I will make it compatible on current trunk this week-end. I will move isTableEnabled so on in an other patch, this one is already too big... Cut the link between the client and the zookeeper ensemble -- Key: HBASE-5399 URL: https://issues.apache.org/jira/browse/HBASE-5399 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5399_inprogress.patch, 5399_inprogress.v3.patch The link is often considered as an issue, for various reasons. One of them being that there is a limit on the number of connection that ZK can manage. Stack was suggesting as well to remove the link to master from HConnection. There are choices to be made considering the existing API (that we don't want to break). The first patches I will submit on hadoop-qa should not be committed: they are here to show the progress on the direction taken. ZooKeeper is used for: - public getter, to let the client do whatever he wants, and close ZooKeeper when closing the connection = we have to deprecate this but keep it. - read get master address to create a master = now done with a temporary zookeeper connection - read root location = now done with a temporary zookeeper connection, but questionable. Used in public function locateRegion. To be reworked. - read cluster id = now done once with a temporary zookeeper connection. - check if base done is available = now done once with a zookeeper connection given as a parameter - isTableDisabled/isTableAvailable = public functions, now done with a temporary zookeeper connection. - Called internally from HBaseAdmin and HTable - getCurrentNrHRS(): public function to get the number of region servers and create a pool of thread = now done with a temporary zookeeper connection - Master is used for: - getMaster public getter, as for ZooKeeper = we have to deprecate this but keep it. - isMasterRunning(): public function, used internally by HMerge HBaseAdmin - getHTableDescriptor*: public functions offering access to the master. = we could make them using a temporary master connection as well. Main points are: - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a strongly coupled architecture ;-). This can be changed, but requires a lot of modifications in these classes (likely adding a class in the middle of the hierarchy, something like that). Anyway, non connected client will always be really slower, because it's a tcp connection, and establishing a tcp connection is slow. - having a link between ZK and all the client seems to make sense for some Use Cases. However, it won't scale if a TCP connection is required for every client - if we move the table descriptor part away from the client, we need to find a new place for it. - we will have the same issue if HBaseAdmin (for both ZK Master), may be we can put a timeout on the connection. That would make the whole system less deterministic however. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5195) [Coprocessors] preGet hook does not allow overriding or wrapping filter on incoming Get
[ https://issues.apache.org/jira/browse/HBASE-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210565#comment-13210565 ] stack commented on HBASE-5195: -- This looks like a pretty important fix. Should it be more than major priority? Should it go into 0.92.1? [Coprocessors] preGet hook does not allow overriding or wrapping filter on incoming Get --- Key: HBASE-5195 URL: https://issues.apache.org/jira/browse/HBASE-5195 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.92.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.94.0, 0.92.1 Attachments: HBASE-5195.patch Without the ability to wrap the internal Scan on the Get, we can't override (or protect, in the case of access control) Gets as we can Scans. The result is inconsistent behavior. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5209) HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup
[ https://issues.apache.org/jira/browse/HBASE-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210567#comment-13210567 ] stack commented on HBASE-5209: -- Patch looks excellent. One issue is upping of the HMasterInterface version. Its the 'right' thing to do but then it means I can't apply to 0.92.1 and it breaks a 0.92 talking to a 0.94 which currently is possible. Can you try adding the isActiveMaster to the end of the Interface and NOT update the version. See if you can connect to a 0.92.1 server from a 0.92.0 client and see if it you can do basic HMasterInterface operations such as isLoadBalancer running. HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup Key: HBASE-5209 URL: https://issues.apache.org/jira/browse/HBASE-5209 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.94.0, 0.90.5, 0.92.0 Reporter: Aditya Acharya Assignee: David S. Wang Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: HBASE-5209-v0.diff, HBASE-5209-v1.diff I have a multi-master HBase set up, and I'm trying to programmatically determine which of the masters is currently active. But the API does not allow me to do this. There is a getMaster() method in the HConnection class, but it returns an HMasterInterface, whose methods do not allow me to find out which master won the last race. The API should have a getActiveMasterHostname() or something to that effect. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5195) [Coprocessors] preGet hook does not allow overriding or wrapping filter on incoming Get
[ https://issues.apache.org/jira/browse/HBASE-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210569#comment-13210569 ] Andrew Purtell commented on HBASE-5195: --- +1 for including in 0.92.1 [Coprocessors] preGet hook does not allow overriding or wrapping filter on incoming Get --- Key: HBASE-5195 URL: https://issues.apache.org/jira/browse/HBASE-5195 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.92.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.94.0, 0.92.1 Attachments: HBASE-5195.patch Without the ability to wrap the internal Scan on the Get, we can't override (or protect, in the case of access control) Gets as we can Scans. The result is inconsistent behavior. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5265) Fix 'revoke' shell command
[ https://issues.apache.org/jira/browse/HBASE-5265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210570#comment-13210570 ] stack commented on HBASE-5265: -- Is this important for 0.92.1 lads? Fix 'revoke' shell command -- Key: HBASE-5265 URL: https://issues.apache.org/jira/browse/HBASE-5265 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Andrew Purtell Assignee: Eugene Koontz Fix For: 0.94.0, 0.92.1 The 'revoke' shell command needs to be reworked for the AccessControlProtocol implementation that was finalized for 0.92. The permissions being removed must exactly match what was previously granted. No wildcard matching is done server side. Allow two forms of the command in the shell for convenience: Revocation of a specific grant: {code} revoke user, table, column family [ , column_qualifier ] {code} Have the shell automatically do so for all permissions on a table for a given user: {code} revoke user, table {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5428) Allow for custom filters to be registered within the Thrift interface
Allow for custom filters to be registered within the Thrift interface - Key: HBASE-5428 URL: https://issues.apache.org/jira/browse/HBASE-5428 Project: HBase Issue Type: Improvement Components: thrift Affects Versions: 0.92.0 Reporter: Robert Roland Custom filters work within the Java client API, but are not accessible within the Thrift API. Attempting to use one will generate a Filter Name x not supported Attached patch allows a user to specify a list of custom filters that are registered at Thrift server startup time within the HBase configuration files: property namehbase.thrift.filters/name valueMyFilter:com.foo.Filter,OtherFilter:com.foo.OtherFilter/value /property Patch created off SVN r1245727 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5428) Allow for custom filters to be registered within the Thrift interface
[ https://issues.apache.org/jira/browse/HBASE-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Roland updated HBASE-5428: - Attachment: ThriftCustomFilters.patch Allow for custom filters to be registered within the Thrift interface - Key: HBASE-5428 URL: https://issues.apache.org/jira/browse/HBASE-5428 Project: HBase Issue Type: Improvement Components: thrift Affects Versions: 0.92.0 Reporter: Robert Roland Labels: patch Fix For: 0.94.0 Attachments: ThriftCustomFilters.patch Custom filters work within the Java client API, but are not accessible within the Thrift API. Attempting to use one will generate a Filter Name x not supported Attached patch allows a user to specify a list of custom filters that are registered at Thrift server startup time within the HBase configuration files: property namehbase.thrift.filters/name valueMyFilter:com.foo.Filter,OtherFilter:com.foo.OtherFilter/value /property Patch created off SVN r1245727 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira