[jira] [Commented] (HBASE-6092) Authorize flush, split, compact operations in AccessController
[ https://issues.apache.org/jira/browse/HBASE-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293379#comment-13293379 ] Laxman commented on HBASE-6092: --- Thanks for review Andy. bq. Please update the comments in the tests you added Copy paste error. My mistake. Sorry. Also, one more observation. We should throw back IOException from RegionCoprocessorHost from pre/post split/flush/compact methods using: {code} org.apache.hadoop.hbase.coprocessor.CoprocessorHost.handleCoprocessorThrowable(CoprocessorEnvironment, Throwable) {code} bq. -1 core tests. Verified these testcases locally. And not relevant to the patch. bq. -1 findbugs. Verified the findbugs report. This patch does not introduce any findbugs. Authorize flush, split, compact operations in AccessController -- Key: HBASE-6092 URL: https://issues.apache.org/jira/browse/HBASE-6092 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.94.0, 0.96.0, 0.94.1 Reporter: Laxman Assignee: Laxman Labels: acl, security Fix For: 0.96.0, 0.94.1 Attachments: HBASE-6092.patch Currently, flush, split and compaction are not checked for authorization in AccessController. With the current implementation any unauthorized client can trigger these operations on a table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover
[ https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rajeshbabu updated HBASE-6060: -- Attachment: HBASE-6060-trunk_4.patch Regions's in OPENING state from failed regionservers takes a long time to recover - Key: HBASE-6060 URL: https://issues.apache.org/jira/browse/HBASE-6060 Project: HBase Issue Type: Bug Components: master, regionserver Reporter: Enis Soztutar Assignee: rajeshbabu Fix For: 0.96.0, 0.94.1, 0.92.3 Attachments: 6060-94-v3.patch, 6060-94-v4.patch, 6060-94-v4_1.patch, 6060-94-v4_1.patch, 6060-trunk.patch, 6060-trunk.patch, 6060-trunk_2.patch, 6060-trunk_3.patch, 6060_alternative_suggestion.txt, 6060_suggestion2_based_off_v3.patch, 6060_suggestion_based_off_v3.patch, 6060_suggestion_toassign_rs_wentdown_beforerequest.patch, HBASE-6060-92.patch, HBASE-6060-94.patch, HBASE-6060-trunk_4.patch we have seen a pattern in tests, that the regions are stuck in OPENING state for a very long time when the region server who is opening the region fails. My understanding of the process: - master calls rs to open the region. If rs is offline, a new plan is generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), HMaster.assign() - RegionServer, starts opening a region, changes the state in znode. But that znode is not ephemeral. (see ZkAssign) - Rs transitions zk node from OFFLINE to OPENING. See OpenRegionHandler.process() - rs then opens the region, and changes znode from OPENING to OPENED - when rs is killed between OPENING and OPENED states, then zk shows OPENING state, and the master just waits for rs to change the region state, but since rs is down, that wont happen. - There is a AssignmentManager.TimeoutMonitor, which does exactly guard against these kind of conditions. It periodically checks (every 10 sec by default) the regions in transition to see whether they timedout (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, which explains what you and I are seeing. - ServerShutdownHandler in Master does not reassign regions in OPENING state, although it handles other states. Lowering that threshold from the configuration is one option, but still I think we can do better. Will investigate more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6092) Authorize flush, split, compact operations in AccessController
[ https://issues.apache.org/jira/browse/HBASE-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laxman updated HBASE-6092: -- Status: Open (was: Patch Available) Authorize flush, split, compact operations in AccessController -- Key: HBASE-6092 URL: https://issues.apache.org/jira/browse/HBASE-6092 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.94.0, 0.96.0, 0.94.1 Reporter: Laxman Assignee: Laxman Labels: acl, security Fix For: 0.96.0, 0.94.1 Attachments: HBASE-6092.patch Currently, flush, split and compaction are not checked for authorization in AccessController. With the current implementation any unauthorized client can trigger these operations on a table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6092) Authorize flush, split, compact operations in AccessController
[ https://issues.apache.org/jira/browse/HBASE-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laxman updated HBASE-6092: -- Attachment: HBASE-6092.1.patch Authorize flush, split, compact operations in AccessController -- Key: HBASE-6092 URL: https://issues.apache.org/jira/browse/HBASE-6092 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.94.0, 0.96.0, 0.94.1 Reporter: Laxman Assignee: Laxman Labels: acl, security Fix For: 0.96.0, 0.94.1 Attachments: HBASE-6092.1.patch, HBASE-6092.patch Currently, flush, split and compaction are not checked for authorization in AccessController. With the current implementation any unauthorized client can trigger these operations on a table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6092) Authorize flush, split, compact operations in AccessController
[ https://issues.apache.org/jira/browse/HBASE-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laxman updated HBASE-6092: -- Status: Patch Available (was: Open) New patch attached for review after fixing review comments and my observation. Authorize flush, split, compact operations in AccessController -- Key: HBASE-6092 URL: https://issues.apache.org/jira/browse/HBASE-6092 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.94.0, 0.96.0, 0.94.1 Reporter: Laxman Assignee: Laxman Labels: acl, security Fix For: 0.96.0, 0.94.1 Attachments: HBASE-6092.1.patch, HBASE-6092.patch Currently, flush, split and compaction are not checked for authorization in AccessController. With the current implementation any unauthorized client can trigger these operations on a table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover
[ https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293382#comment-13293382 ] ramkrishna.s.vasudevan commented on HBASE-6060: --- In the lastest patch there is one more problem. Uploaded to just sync up if this what you are trying Stack. Now after calling openregion if the OpenRegionHandler is spawned and just after that the RS goes down.And still the PENDING_OPEN is not yet updated and by the time SSH starts processing. Then as per the updated latest patch we will not be able to call assign. No testcase runs with this patch needs modifications. Regions's in OPENING state from failed regionservers takes a long time to recover - Key: HBASE-6060 URL: https://issues.apache.org/jira/browse/HBASE-6060 Project: HBase Issue Type: Bug Components: master, regionserver Reporter: Enis Soztutar Assignee: rajeshbabu Fix For: 0.96.0, 0.94.1, 0.92.3 Attachments: 6060-94-v3.patch, 6060-94-v4.patch, 6060-94-v4_1.patch, 6060-94-v4_1.patch, 6060-trunk.patch, 6060-trunk.patch, 6060-trunk_2.patch, 6060-trunk_3.patch, 6060_alternative_suggestion.txt, 6060_suggestion2_based_off_v3.patch, 6060_suggestion_based_off_v3.patch, 6060_suggestion_toassign_rs_wentdown_beforerequest.patch, HBASE-6060-92.patch, HBASE-6060-94.patch, HBASE-6060-trunk_4.patch we have seen a pattern in tests, that the regions are stuck in OPENING state for a very long time when the region server who is opening the region fails. My understanding of the process: - master calls rs to open the region. If rs is offline, a new plan is generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), HMaster.assign() - RegionServer, starts opening a region, changes the state in znode. But that znode is not ephemeral. (see ZkAssign) - Rs transitions zk node from OFFLINE to OPENING. See OpenRegionHandler.process() - rs then opens the region, and changes znode from OPENING to OPENED - when rs is killed between OPENING and OPENED states, then zk shows OPENING state, and the master just waits for rs to change the region state, but since rs is down, that wont happen. - There is a AssignmentManager.TimeoutMonitor, which does exactly guard against these kind of conditions. It periodically checks (every 10 sec by default) the regions in transition to see whether they timedout (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, which explains what you and I are seeing. - ServerShutdownHandler in Master does not reassign regions in OPENING state, although it handles other states. Lowering that threshold from the configuration is one option, but still I think we can do better. Will investigate more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover
[ https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293390#comment-13293390 ] stack commented on HBASE-6060: -- bq. Now after calling openregion if the OpenRegionHandler is spawned and just after that the RS goes down.And still the PENDING_OPEN is not yet updated and by the time SSH starts processing. Then as per the updated latest patch we will not be able to call assign. Can you explain more Ram? The rpc open has been sent (successfully), the RS goes down, but we have not set it to PENDING_OPEN -- and at this time the SSH runs? Is that what you are suggesting? So, its the window between open rpc return, and the setting of PENDING_OPEN -- in this window, if the SSH runs to completion we'll have an unassigned region, one that will have to be picked up by the timeout monitor? Is that what you are suggesting? Thanks Ram (Sorry that this stuff is taking so long...) Regions's in OPENING state from failed regionservers takes a long time to recover - Key: HBASE-6060 URL: https://issues.apache.org/jira/browse/HBASE-6060 Project: HBase Issue Type: Bug Components: master, regionserver Reporter: Enis Soztutar Assignee: rajeshbabu Fix For: 0.96.0, 0.94.1, 0.92.3 Attachments: 6060-94-v3.patch, 6060-94-v4.patch, 6060-94-v4_1.patch, 6060-94-v4_1.patch, 6060-trunk.patch, 6060-trunk.patch, 6060-trunk_2.patch, 6060-trunk_3.patch, 6060_alternative_suggestion.txt, 6060_suggestion2_based_off_v3.patch, 6060_suggestion_based_off_v3.patch, 6060_suggestion_toassign_rs_wentdown_beforerequest.patch, HBASE-6060-92.patch, HBASE-6060-94.patch, HBASE-6060-trunk_4.patch we have seen a pattern in tests, that the regions are stuck in OPENING state for a very long time when the region server who is opening the region fails. My understanding of the process: - master calls rs to open the region. If rs is offline, a new plan is generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), HMaster.assign() - RegionServer, starts opening a region, changes the state in znode. But that znode is not ephemeral. (see ZkAssign) - Rs transitions zk node from OFFLINE to OPENING. See OpenRegionHandler.process() - rs then opens the region, and changes znode from OPENING to OPENED - when rs is killed between OPENING and OPENED states, then zk shows OPENING state, and the master just waits for rs to change the region state, but since rs is down, that wont happen. - There is a AssignmentManager.TimeoutMonitor, which does exactly guard against these kind of conditions. It periodically checks (every 10 sec by default) the regions in transition to see whether they timedout (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, which explains what you and I are seeing. - ServerShutdownHandler in Master does not reassign regions in OPENING state, although it handles other states. Lowering that threshold from the configuration is one option, but still I think we can do better. Will investigate more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6195) Increment data will be lost when the memstore is flushed
[ https://issues.apache.org/jira/browse/HBASE-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293434#comment-13293434 ] Hadoop QA commented on HBASE-6195: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12531767/HBASE-6195-trunk-V4.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2144//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2144//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2144//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2144//console This message is automatically generated. Increment data will be lost when the memstore is flushed Key: HBASE-6195 URL: https://issues.apache.org/jira/browse/HBASE-6195 Project: HBase Issue Type: Bug Components: regionserver Reporter: Xing Shi Assignee: ShiXing Attachments: HBASE-6195-trunk-V2.patch, HBASE-6195-trunk-V3.patch, HBASE-6195-trunk-V4.patch, HBASE-6195-trunk.patch There are two problems in increment() now: First: I see that the timestamp(the variable now) in HRegion's Increment() is generated before got the rowLock, so when there are multi-thread increment the same row, although it generate earlier, it may got the lock later. Because increment just store one version, so till now, the result will still be right. When the region is flushing, these increment will read the kv from snapshot and memstore with whose timestamp is larger, and write it back to memstore. If the snapshot's timestamp larger than the memstore, the increment will got the old data and then do the increment, it's wrong. Secondly: Also there is a risk in increment. Because it writes the memstore first and then HLog, so if it writes HLog failed, the client will also read the incremented value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5539) asynchbase PerformanceEvaluation
[ https://issues.apache.org/jira/browse/HBASE-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoit Sigoure updated HBASE-5539: -- Attachment: 0001-asynchbase-PerformanceEvaluation.patch Updated patch with new {{pom.xml}}. asynchbase PerformanceEvaluation Key: HBASE-5539 URL: https://issues.apache.org/jira/browse/HBASE-5539 Project: HBase Issue Type: New Feature Components: performance Reporter: Benoit Sigoure Assignee: Benoit Sigoure Priority: Minor Labels: benchmark Attachments: 0001-asynchbase-PerformanceEvaluation.patch, 0001-asynchbase-PerformanceEvaluation.patch I plugged [asynchbase|https://github.com/stumbleupon/asynchbase] into {{PerformanceEvaluation}}. This enables testing asynchbase from {{PerformanceEvaluation}} and comparing its performance to {{HTable}}. Also asynchbase doesn't come with any benchmark, so it was good that I was able to plug it into {{PerformanceEvaluation}} relatively easily. I am in the processing of collecting results on a dev cluster running 0.92.1 and will publish them once they're ready. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5539) asynchbase PerformanceEvaluation
[ https://issues.apache.org/jira/browse/HBASE-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoit Sigoure updated HBASE-5539: -- Status: Patch Available (was: Open) asynchbase PerformanceEvaluation Key: HBASE-5539 URL: https://issues.apache.org/jira/browse/HBASE-5539 Project: HBase Issue Type: New Feature Components: performance Reporter: Benoit Sigoure Assignee: Benoit Sigoure Priority: Minor Labels: benchmark Attachments: 0001-asynchbase-PerformanceEvaluation.patch, 0001-asynchbase-PerformanceEvaluation.patch I plugged [asynchbase|https://github.com/stumbleupon/asynchbase] into {{PerformanceEvaluation}}. This enables testing asynchbase from {{PerformanceEvaluation}} and comparing its performance to {{HTable}}. Also asynchbase doesn't come with any benchmark, so it was good that I was able to plug it into {{PerformanceEvaluation}} relatively easily. I am in the processing of collecting results on a dev cluster running 0.92.1 and will publish them once they're ready. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover
[ https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293449#comment-13293449 ] ramkrishna.s.vasudevan commented on HBASE-6060: --- @Stack Absolutely correct. :) May be you have someother idea for that? Regions's in OPENING state from failed regionservers takes a long time to recover - Key: HBASE-6060 URL: https://issues.apache.org/jira/browse/HBASE-6060 Project: HBase Issue Type: Bug Components: master, regionserver Reporter: Enis Soztutar Assignee: rajeshbabu Fix For: 0.96.0, 0.94.1, 0.92.3 Attachments: 6060-94-v3.patch, 6060-94-v4.patch, 6060-94-v4_1.patch, 6060-94-v4_1.patch, 6060-trunk.patch, 6060-trunk.patch, 6060-trunk_2.patch, 6060-trunk_3.patch, 6060_alternative_suggestion.txt, 6060_suggestion2_based_off_v3.patch, 6060_suggestion_based_off_v3.patch, 6060_suggestion_toassign_rs_wentdown_beforerequest.patch, HBASE-6060-92.patch, HBASE-6060-94.patch, HBASE-6060-trunk_4.patch we have seen a pattern in tests, that the regions are stuck in OPENING state for a very long time when the region server who is opening the region fails. My understanding of the process: - master calls rs to open the region. If rs is offline, a new plan is generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), HMaster.assign() - RegionServer, starts opening a region, changes the state in znode. But that znode is not ephemeral. (see ZkAssign) - Rs transitions zk node from OFFLINE to OPENING. See OpenRegionHandler.process() - rs then opens the region, and changes znode from OPENING to OPENED - when rs is killed between OPENING and OPENED states, then zk shows OPENING state, and the master just waits for rs to change the region state, but since rs is down, that wont happen. - There is a AssignmentManager.TimeoutMonitor, which does exactly guard against these kind of conditions. It periodically checks (every 10 sec by default) the regions in transition to see whether they timedout (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, which explains what you and I are seeing. - ServerShutdownHandler in Master does not reassign regions in OPENING state, although it handles other states. Lowering that threshold from the configuration is one option, but still I think we can do better. Will investigate more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6184) HRegionInfo was null or empty in Meta
[ https://issues.apache.org/jira/browse/HBASE-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiafeng.zhang updated HBASE-6184: - Attachment: HBASE-6184.patch HRegionInfo was null or empty in Meta -- Key: HBASE-6184 URL: https://issues.apache.org/jira/browse/HBASE-6184 Project: HBase Issue Type: Bug Components: client, io Affects Versions: 0.94.0 Reporter: jiafeng.zhang Fix For: 0.94.0 Attachments: HBASE-6184.patch insert data hadoop-0.23.2 + hbase-0.94.0 2012-06-07 13:09:38,573 WARN [org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation] Encountered problems when prefetch META table: java.io.IOException: HRegionInfo was null or empty in Meta for hbase_one_col, row=hbase_one_col,09115303780247449149,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:160) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:48) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:126) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:123) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:359) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:123) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:99) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:894) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:948) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:836) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1482) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1367) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:945) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:801) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:776) at org.apache.hadoop.hbase.client.HTablePool$PooledHTable.put(HTablePool.java:397) at com.dinglicom.hbase.HbaseImport.insertData(HbaseImport.java:177) at com.dinglicom.hbase.HbaseImport.run(HbaseImport.java:210) at java.lang.Thread.run(Thread.java:662) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6196) MR testcases does not run with hadoop - 2.0.
[ https://issues.apache.org/jira/browse/HBASE-6196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293456#comment-13293456 ] ramkrishna.s.vasudevan commented on HBASE-6196: --- @Andy Thanks for pointing it out. We did not take the latest patch by mistake and thought this change was missing. Sorry about that. But still i think this JIRA we can use to solve {code} org.apache.hadoop.hbase.mapreduce.TestImportExport.testSimpleCase org.apache.hadoop.hbase.mapreduce.TestImportExport.testWithDeletes {code} that has been failing in HBase-Trunk on hadoop 2.0.0 failing from build #45. We are also facing some problems with these 2 testcases. MR testcases does not run with hadoop - 2.0. Key: HBASE-6196 URL: https://issues.apache.org/jira/browse/HBASE-6196 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.96.0 Reporter: ramkrishna.s.vasudevan Fix For: 0.96.0 The MR related testcases are failing in hadoop-2.0. The resource manager scheduler is not getting spawned. The following fix solves the problem. If you feel it is ok, I can submit as patch and commit. {code} String rmSchedulerAdress = mrClusterJobConf.get(yarn.resourcemanager.scheduler.address); if (rmSchedulerAdress != null) { conf.set(yarn.resourcemanager.scheduler.address, rmSchedulerAdress); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6195) Increment data will be lost when the memstore is flushed
[ https://issues.apache.org/jira/browse/HBASE-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xing Shi updated HBASE-6195: Attachment: HBASE-6195-trunk-V5.patch add UnitTest for increment Increment data will be lost when the memstore is flushed Key: HBASE-6195 URL: https://issues.apache.org/jira/browse/HBASE-6195 Project: HBase Issue Type: Bug Components: regionserver Reporter: Xing Shi Assignee: ShiXing Attachments: HBASE-6195-trunk-V2.patch, HBASE-6195-trunk-V3.patch, HBASE-6195-trunk-V4.patch, HBASE-6195-trunk-V5.patch, HBASE-6195-trunk.patch There are two problems in increment() now: First: I see that the timestamp(the variable now) in HRegion's Increment() is generated before got the rowLock, so when there are multi-thread increment the same row, although it generate earlier, it may got the lock later. Because increment just store one version, so till now, the result will still be right. When the region is flushing, these increment will read the kv from snapshot and memstore with whose timestamp is larger, and write it back to memstore. If the snapshot's timestamp larger than the memstore, the increment will got the old data and then do the increment, it's wrong. Secondly: Also there is a risk in increment. Because it writes the memstore first and then HLog, so if it writes HLog failed, the client will also read the incremented value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5564) Bulkload is discarding duplicate records
[ https://issues.apache.org/jira/browse/HBASE-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293469#comment-13293469 ] ramkrishna.s.vasudevan commented on HBASE-5564: --- Ok.. I will make that change and reupload the patch..Thanks Ted. Bulkload is discarding duplicate records Key: HBASE-5564 URL: https://issues.apache.org/jira/browse/HBASE-5564 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.96.0 Environment: HBase 0.92 Reporter: Laxman Assignee: Laxman Labels: bulkloader Fix For: 0.96.0 Attachments: 5564.lint, 5564v5.txt, HBASE-5564.patch, HBASE-5564_trunk.1.patch, HBASE-5564_trunk.1.patch, HBASE-5564_trunk.2.patch, HBASE-5564_trunk.3.patch, HBASE-5564_trunk.4_final.patch, HBASE-5564_trunk.patch Duplicate records are getting discarded when duplicate records exists in same input file and more specifically if they exists in same split. Duplicate records are considered if the records are from diffrent different splits. Version under test: HBase 0.92 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5699) Run with 1 WAL in HRegionServer
[ https://issues.apache.org/jira/browse/HBASE-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293471#comment-13293471 ] Lars Hofhansl commented on HBASE-5699: -- I think we should wait for test result with HBASE-6116 before we invest more time in this. My gut feeling tells me, that is something that is better handled at the HDFS level. Run with 1 WAL in HRegionServer - Key: HBASE-5699 URL: https://issues.apache.org/jira/browse/HBASE-5699 Project: HBase Issue Type: Improvement Reporter: binlijin Assignee: Li Pi Attachments: PerfHbase.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96
[ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293482#comment-13293482 ] Lars Hofhansl commented on HBASE-6055: -- Let's try to avoid going overboard here. In principle snapshot and backup/restore are different and independent. A snapshot generates a consistent snapshot of the data that can subsequently be copied conveniently somewhere else - thus creating a backup. Ideally we would not even prescribe the backup/restore semantics here, but just provide missing building blocks. Just my $0.02. Another thought here is: In principle an HFile resulting from a major compaction could be considered a baseline copy and additional HFiles would be incremental changes on top of that baseline. It might be worth considering if we can make use of this ability of HBase to overlay changes from many sources into a single view of the data (would probably be tricky as regions are flushed in sync, etc, just waving hands here). Snapshots in HBase 0.96 --- Key: HBASE-6055 URL: https://issues.apache.org/jira/browse/HBASE-6055 Project: HBase Issue Type: New Feature Components: client, master, regionserver, zookeeper Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.96.0 Attachments: Snapshots in HBase.docx Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6184) HRegionInfo was null or empty in Meta
[ https://issues.apache.org/jira/browse/HBASE-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293487#comment-13293487 ] Anoop Sam John commented on HBASE-6184: --- {code} byte[] searchRow = HRegionInfo.createRegionName(tableName, row, HConstants.NINES, - false); + true); {code} This change will affect the look up in the META table? When searchRow is created with passing newformat=true, it will add the encoded name also at the end[tableName,row,regionid.encodedname.]. But the searchRow is used to do metaTable.getRowOrBefore(). Any way after the row we add HConstants.NINES using which we need to get correct row from META table. I mean adding this encodedname might not be needed for this lookup In your issue you are getting the result but in that result the HRegionInfo seems coming as null only? Do this above change really fix your issue? Do u facing some other issues? HRegionInfo was null or empty in Meta -- Key: HBASE-6184 URL: https://issues.apache.org/jira/browse/HBASE-6184 Project: HBase Issue Type: Bug Components: client, io Affects Versions: 0.94.0 Reporter: jiafeng.zhang Fix For: 0.94.0 Attachments: HBASE-6184.patch insert data hadoop-0.23.2 + hbase-0.94.0 2012-06-07 13:09:38,573 WARN [org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation] Encountered problems when prefetch META table: java.io.IOException: HRegionInfo was null or empty in Meta for hbase_one_col, row=hbase_one_col,09115303780247449149,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:160) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:48) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:126) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:123) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:359) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:123) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:99) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:894) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:948) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:836) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1482) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1367) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:945) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:801) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:776) at org.apache.hadoop.hbase.client.HTablePool$PooledHTable.put(HTablePool.java:397) at com.dinglicom.hbase.HbaseImport.insertData(HbaseImport.java:177) at com.dinglicom.hbase.HbaseImport.run(HbaseImport.java:210) at java.lang.Thread.run(Thread.java:662) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6184) HRegionInfo was null or empty in Meta
[ https://issues.apache.org/jira/browse/HBASE-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293493#comment-13293493 ] ramkrishna.s.vasudevan commented on HBASE-6184: --- Can you may be check what was the scenario that happened before this? May be that will give us a clue. HRegionInfo was null or empty in Meta -- Key: HBASE-6184 URL: https://issues.apache.org/jira/browse/HBASE-6184 Project: HBase Issue Type: Bug Components: client, io Affects Versions: 0.94.0 Reporter: jiafeng.zhang Fix For: 0.94.0 Attachments: HBASE-6184.patch insert data hadoop-0.23.2 + hbase-0.94.0 2012-06-07 13:09:38,573 WARN [org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation] Encountered problems when prefetch META table: java.io.IOException: HRegionInfo was null or empty in Meta for hbase_one_col, row=hbase_one_col,09115303780247449149,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:160) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:48) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:126) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:123) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:359) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:123) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:99) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:894) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:948) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:836) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1482) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1367) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:945) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:801) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:776) at org.apache.hadoop.hbase.client.HTablePool$PooledHTable.put(HTablePool.java:397) at com.dinglicom.hbase.HbaseImport.insertData(HbaseImport.java:177) at com.dinglicom.hbase.HbaseImport.run(HbaseImport.java:210) at java.lang.Thread.run(Thread.java:662) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5699) Run with 1 WAL in HRegionServer
[ https://issues.apache.org/jira/browse/HBASE-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293517#comment-13293517 ] Zhihong Ted Yu commented on HBASE-5699: --- As I mentioned in HBASE-6055 @ 04/Jun/12 17:47, one of the benefits of this feature is for each HLog file to receive edits for one single table. Run with 1 WAL in HRegionServer - Key: HBASE-5699 URL: https://issues.apache.org/jira/browse/HBASE-5699 Project: HBase Issue Type: Improvement Reporter: binlijin Assignee: Li Pi Attachments: PerfHbase.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover
[ https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293551#comment-13293551 ] Hadoop QA commented on HBASE-6060: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12531775/HBASE-6060-trunk_4.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster org.apache.hadoop.hbase.util.TestFSUtils org.apache.hadoop.hbase.catalog.TestMetaReaderEditor Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2145//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2145//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2145//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2145//console This message is automatically generated. Regions's in OPENING state from failed regionservers takes a long time to recover - Key: HBASE-6060 URL: https://issues.apache.org/jira/browse/HBASE-6060 Project: HBase Issue Type: Bug Components: master, regionserver Reporter: Enis Soztutar Assignee: rajeshbabu Fix For: 0.96.0, 0.94.1, 0.92.3 Attachments: 6060-94-v3.patch, 6060-94-v4.patch, 6060-94-v4_1.patch, 6060-94-v4_1.patch, 6060-trunk.patch, 6060-trunk.patch, 6060-trunk_2.patch, 6060-trunk_3.patch, 6060_alternative_suggestion.txt, 6060_suggestion2_based_off_v3.patch, 6060_suggestion_based_off_v3.patch, 6060_suggestion_toassign_rs_wentdown_beforerequest.patch, HBASE-6060-92.patch, HBASE-6060-94.patch, HBASE-6060-trunk_4.patch we have seen a pattern in tests, that the regions are stuck in OPENING state for a very long time when the region server who is opening the region fails. My understanding of the process: - master calls rs to open the region. If rs is offline, a new plan is generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), HMaster.assign() - RegionServer, starts opening a region, changes the state in znode. But that znode is not ephemeral. (see ZkAssign) - Rs transitions zk node from OFFLINE to OPENING. See OpenRegionHandler.process() - rs then opens the region, and changes znode from OPENING to OPENED - when rs is killed between OPENING and OPENED states, then zk shows OPENING state, and the master just waits for rs to change the region state, but since rs is down, that wont happen. - There is a AssignmentManager.TimeoutMonitor, which does exactly guard against these kind of conditions. It periodically checks (every 10 sec by default) the regions in transition to see whether they timedout (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, which explains what you and I are seeing. - ServerShutdownHandler in Master does not reassign regions in OPENING state, although it handles other states. Lowering that threshold from the configuration is one option, but still I think we can do better. Will investigate more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6092) Authorize flush, split, compact operations in AccessController
[ https://issues.apache.org/jira/browse/HBASE-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293567#comment-13293567 ] Hadoop QA commented on HBASE-6092: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12531776/HBASE-6092.1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.replication.TestReplication Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2146//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2146//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2146//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2146//console This message is automatically generated. Authorize flush, split, compact operations in AccessController -- Key: HBASE-6092 URL: https://issues.apache.org/jira/browse/HBASE-6092 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.94.0, 0.96.0, 0.94.1 Reporter: Laxman Assignee: Laxman Labels: acl, security Fix For: 0.96.0, 0.94.1 Attachments: HBASE-6092.1.patch, HBASE-6092.patch Currently, flush, split and compaction are not checked for authorization in AccessController. With the current implementation any unauthorized client can trigger these operations on a table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6092) Authorize flush, split, compact operations in AccessController
[ https://issues.apache.org/jira/browse/HBASE-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293570#comment-13293570 ] Laxman commented on HBASE-6092: --- bq. -1 findbugs This patch does not introduce any findbugs. bq. -1 core tests Verified these testcases. Passing locally. Failures are not relevant to the patch. Please review the patch. Authorize flush, split, compact operations in AccessController -- Key: HBASE-6092 URL: https://issues.apache.org/jira/browse/HBASE-6092 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.94.0, 0.96.0, 0.94.1 Reporter: Laxman Assignee: Laxman Labels: acl, security Fix For: 0.96.0, 0.94.1 Attachments: HBASE-6092.1.patch, HBASE-6092.patch Currently, flush, split and compaction are not checked for authorization in AccessController. With the current implementation any unauthorized client can trigger these operations on a table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6134) Improvement for split-worker to speed up distributed-split-log
[ https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293602#comment-13293602 ] Hadoop QA commented on HBASE-6134: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12531772/HBASE-6134v4.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks org.apache.hadoop.hbase.replication.TestReplication org.apache.hadoop.hbase.master.TestSplitLogManager Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2147//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2147//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2147//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2147//console This message is automatically generated. Improvement for split-worker to speed up distributed-split-log -- Key: HBASE-6134 URL: https://issues.apache.org/jira/browse/HBASE-6134 Project: HBase Issue Type: Improvement Components: wal Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.96.0 Attachments: HBASE-6134.patch, HBASE-6134v2.patch, HBASE-6134v3-92.patch, HBASE-6134v3.patch, HBASE-6134v4.patch First,we do the test between local-master-splitting and distributed-log-splitting Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths splitting work), 400 regions in one hlog file local-master-split:60s+ distributed-log-splitting:165s+ In fact, in our production environment, distributed-log-splitting also took 60s with 30 regionservers for 34 hlog files (regionserver may be in high load) We found split-worker split one log file took about 20s (30ms~50ms per writer.close(); 10ms per create writers ) I think we could do the improvement for this: Parallelizing the create and close writers in threads In the patch, change the logic for distributed-log-splitting same as the local-master-splitting and parallelizing the close in threads. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5539) asynchbase PerformanceEvaluation
[ https://issues.apache.org/jira/browse/HBASE-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293621#comment-13293621 ] Hadoop QA commented on HBASE-5539: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12531788/0001-asynchbase-PerformanceEvaluation.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2149//console This message is automatically generated. asynchbase PerformanceEvaluation Key: HBASE-5539 URL: https://issues.apache.org/jira/browse/HBASE-5539 Project: HBase Issue Type: New Feature Components: performance Reporter: Benoit Sigoure Assignee: Benoit Sigoure Priority: Minor Labels: benchmark Attachments: 0001-asynchbase-PerformanceEvaluation.patch, 0001-asynchbase-PerformanceEvaluation.patch I plugged [asynchbase|https://github.com/stumbleupon/asynchbase] into {{PerformanceEvaluation}}. This enables testing asynchbase from {{PerformanceEvaluation}} and comparing its performance to {{HTable}}. Also asynchbase doesn't come with any benchmark, so it was good that I was able to plug it into {{PerformanceEvaluation}} relatively easily. I am in the processing of collecting results on a dev cluster running 0.92.1 and will publish them once they're ready. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6188) Remove the concept of table owner
[ https://issues.apache.org/jira/browse/HBASE-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laxman updated HBASE-6188: -- Attachment: HBASE-6188.patch Remove the concept of table owner - Key: HBASE-6188 URL: https://issues.apache.org/jira/browse/HBASE-6188 Project: HBase Issue Type: Sub-task Components: security Reporter: Andrew Purtell Assignee: Laxman Labels: security Attachments: HBASE-6188.patch The table owner concept was a design simplification in the initial drop. First, the design changes under review means only a user with GLOBAL CREATE permission can create a table, which will probably be an administrator. Then, granting implicit permissions may lead to oversights and it adds unnecessary conditionals to our code. So instead the administrator with GLOBAL CREATE permission should make the appropriate grants at table create time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6188) Remove the concept of table owner
[ https://issues.apache.org/jira/browse/HBASE-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laxman updated HBASE-6188: -- Fix Version/s: 0.94.1 0.96.0 Affects Version/s: 0.94.1 0.96.0 0.94.0 Status: Patch Available (was: Open) Patch attached as per the approach discussed. And this patch includes HBASE-6092 as that is not commited. Please review. Remove the concept of table owner - Key: HBASE-6188 URL: https://issues.apache.org/jira/browse/HBASE-6188 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.94.0, 0.96.0, 0.94.1 Reporter: Andrew Purtell Assignee: Laxman Labels: security Fix For: 0.96.0, 0.94.1 Attachments: HBASE-6188.patch The table owner concept was a design simplification in the initial drop. First, the design changes under review means only a user with GLOBAL CREATE permission can create a table, which will probably be an administrator. Then, granting implicit permissions may lead to oversights and it adds unnecessary conditionals to our code. So instead the administrator with GLOBAL CREATE permission should make the appropriate grants at table create time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6195) Increment data will be lost when the memstore is flushed
[ https://issues.apache.org/jira/browse/HBASE-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293629#comment-13293629 ] Hadoop QA commented on HBASE-6195: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12531794/HBASE-6195-trunk-V5.patch against trunk revision . -1 @author. The patch appears to contain 1 @author tags which the Hadoop community has agreed to not allow in code contributions. +1 tests included. The patch appears to include 3 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2148//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2148//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2148//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2148//console This message is automatically generated. Increment data will be lost when the memstore is flushed Key: HBASE-6195 URL: https://issues.apache.org/jira/browse/HBASE-6195 Project: HBase Issue Type: Bug Components: regionserver Reporter: Xing Shi Assignee: ShiXing Attachments: HBASE-6195-trunk-V2.patch, HBASE-6195-trunk-V3.patch, HBASE-6195-trunk-V4.patch, HBASE-6195-trunk-V5.patch, HBASE-6195-trunk.patch There are two problems in increment() now: First: I see that the timestamp(the variable now) in HRegion's Increment() is generated before got the rowLock, so when there are multi-thread increment the same row, although it generate earlier, it may got the lock later. Because increment just store one version, so till now, the result will still be right. When the region is flushing, these increment will read the kv from snapshot and memstore with whose timestamp is larger, and write it back to memstore. If the snapshot's timestamp larger than the memstore, the increment will got the old data and then do the increment, it's wrong. Secondly: Also there is a risk in increment. Because it writes the memstore first and then HLog, so if it writes HLog failed, the client will also read the incremented value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6195) Increment data will be lost when the memstore is flushed
[ https://issues.apache.org/jira/browse/HBASE-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xing Shi updated HBASE-6195: Attachment: HBASE-6195-trunk-V6.patch remove the @author Increment data will be lost when the memstore is flushed Key: HBASE-6195 URL: https://issues.apache.org/jira/browse/HBASE-6195 Project: HBase Issue Type: Bug Components: regionserver Reporter: Xing Shi Assignee: ShiXing Attachments: HBASE-6195-trunk-V2.patch, HBASE-6195-trunk-V3.patch, HBASE-6195-trunk-V4.patch, HBASE-6195-trunk-V5.patch, HBASE-6195-trunk-V6.patch, HBASE-6195-trunk.patch There are two problems in increment() now: First: I see that the timestamp(the variable now) in HRegion's Increment() is generated before got the rowLock, so when there are multi-thread increment the same row, although it generate earlier, it may got the lock later. Because increment just store one version, so till now, the result will still be right. When the region is flushing, these increment will read the kv from snapshot and memstore with whose timestamp is larger, and write it back to memstore. If the snapshot's timestamp larger than the memstore, the increment will got the old data and then do the increment, it's wrong. Secondly: Also there is a risk in increment. Because it writes the memstore first and then HLog, so if it writes HLog failed, the client will also read the incremented value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96
[ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293657#comment-13293657 ] Jonathan Hsieh commented on HBASE-6055: --- Hey Lars, Sorry if it seems like I'm going overboard -- I'm trying to tease out consistent common definitions, and get an explicit high-level understanding of how the feature is supposed to be used from an user/admin point of view. I'm also trying to understand what is in scope and not (ex: making the snapshot act like a read-only table could be in scope, restoring/replacing the original table should be, but restoring to another name could be deferred until hardlinks, the import/export stuff could be done using existing means.) Snapshots in HBase 0.96 --- Key: HBASE-6055 URL: https://issues.apache.org/jira/browse/HBASE-6055 Project: HBase Issue Type: New Feature Components: client, master, regionserver, zookeeper Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.96.0 Attachments: Snapshots in HBase.docx Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6012) Handling RegionOpeningState for bulk assign
[ https://issues.apache.org/jira/browse/HBASE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293668#comment-13293668 ] Zhihong Ted Yu commented on HBASE-6012: --- Integrated to trunk. Thanks for the patch, Chunhui. Thanks for the review, Stack and Ram. Handling RegionOpeningState for bulk assign --- Key: HBASE-6012 URL: https://issues.apache.org/jira/browse/HBASE-6012 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0 Attachments: HBASE-6012.patch, HBASE-6012v2.patch, HBASE-6012v3.patch, HBASE-6012v4.patch, HBASE-6012v5.patch, HBASE-6012v6.patch, HBASE-6012v7.patch, HBASE-6012v8.patch Since HBASE-5914, we using bulk assign for SSH But in the bulk assign case if we get an ALREADY_OPENED case there is no one to clear the znode created by bulk assign. Another thing, when RS opening a list of regions, if one region is already in transition, it will throw RegionAlreadyInTransitionException and stop opening other regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6188) Remove the concept of table owner
[ https://issues.apache.org/jira/browse/HBASE-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293667#comment-13293667 ] Hadoop QA commented on HBASE-6188: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12531829/HBASE-6188.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.replication.TestReplication org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2150//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2150//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2150//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2150//console This message is automatically generated. Remove the concept of table owner - Key: HBASE-6188 URL: https://issues.apache.org/jira/browse/HBASE-6188 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.94.0, 0.96.0, 0.94.1 Reporter: Andrew Purtell Assignee: Laxman Labels: security Fix For: 0.96.0, 0.94.1 Attachments: HBASE-6188.patch The table owner concept was a design simplification in the initial drop. First, the design changes under review means only a user with GLOBAL CREATE permission can create a table, which will probably be an administrator. Then, granting implicit permissions may lead to oversights and it adds unnecessary conditionals to our code. So instead the administrator with GLOBAL CREATE permission should make the appropriate grants at table create time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6195) Increment data will be lost when the memstore is flushed
[ https://issues.apache.org/jira/browse/HBASE-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293677#comment-13293677 ] Hadoop QA commented on HBASE-6195: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12531834/HBASE-6195-trunk-V6.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestSplitLogManager org.apache.hadoop.hbase.coprocessor.TestRowProcessorEndpoint org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2151//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2151//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2151//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2151//console This message is automatically generated. Increment data will be lost when the memstore is flushed Key: HBASE-6195 URL: https://issues.apache.org/jira/browse/HBASE-6195 Project: HBase Issue Type: Bug Components: regionserver Reporter: Xing Shi Assignee: ShiXing Attachments: HBASE-6195-trunk-V2.patch, HBASE-6195-trunk-V3.patch, HBASE-6195-trunk-V4.patch, HBASE-6195-trunk-V5.patch, HBASE-6195-trunk-V6.patch, HBASE-6195-trunk.patch There are two problems in increment() now: First: I see that the timestamp(the variable now) in HRegion's Increment() is generated before got the rowLock, so when there are multi-thread increment the same row, although it generate earlier, it may got the lock later. Because increment just store one version, so till now, the result will still be right. When the region is flushing, these increment will read the kv from snapshot and memstore with whose timestamp is larger, and write it back to memstore. If the snapshot's timestamp larger than the memstore, the increment will got the old data and then do the increment, it's wrong. Secondly: Also there is a risk in increment. Because it writes the memstore first and then HLog, so if it writes HLog failed, the client will also read the incremented value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover
[ https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293706#comment-13293706 ] stack commented on HBASE-6060: -- @Ram So, the rpc returns successfully. The very next step is to set RegionState to PENDING_OPEN. Somehow, in between these two points, a timeout of the RS znode will expire, a verification of ROOT and META will take place, a full scan of META (a new RPC) will happen, and all before we get to setting the PENDING_OPEN state? Seems unlikely. If it ever happens, I suppose this would be one for the timeout monitor to clean up? Or do you have a scenario where this case is more likely? Regions's in OPENING state from failed regionservers takes a long time to recover - Key: HBASE-6060 URL: https://issues.apache.org/jira/browse/HBASE-6060 Project: HBase Issue Type: Bug Components: master, regionserver Reporter: Enis Soztutar Assignee: rajeshbabu Fix For: 0.96.0, 0.94.1, 0.92.3 Attachments: 6060-94-v3.patch, 6060-94-v4.patch, 6060-94-v4_1.patch, 6060-94-v4_1.patch, 6060-trunk.patch, 6060-trunk.patch, 6060-trunk_2.patch, 6060-trunk_3.patch, 6060_alternative_suggestion.txt, 6060_suggestion2_based_off_v3.patch, 6060_suggestion_based_off_v3.patch, 6060_suggestion_toassign_rs_wentdown_beforerequest.patch, HBASE-6060-92.patch, HBASE-6060-94.patch, HBASE-6060-trunk_4.patch we have seen a pattern in tests, that the regions are stuck in OPENING state for a very long time when the region server who is opening the region fails. My understanding of the process: - master calls rs to open the region. If rs is offline, a new plan is generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), HMaster.assign() - RegionServer, starts opening a region, changes the state in znode. But that znode is not ephemeral. (see ZkAssign) - Rs transitions zk node from OFFLINE to OPENING. See OpenRegionHandler.process() - rs then opens the region, and changes znode from OPENING to OPENED - when rs is killed between OPENING and OPENED states, then zk shows OPENING state, and the master just waits for rs to change the region state, but since rs is down, that wont happen. - There is a AssignmentManager.TimeoutMonitor, which does exactly guard against these kind of conditions. It periodically checks (every 10 sec by default) the regions in transition to see whether they timedout (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, which explains what you and I are seeing. - ServerShutdownHandler in Master does not reassign regions in OPENING state, although it handles other states. Lowering that threshold from the configuration is one option, but still I think we can do better. Will investigate more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6195) Increment data will be lost when the memstore is flushed
[ https://issues.apache.org/jira/browse/HBASE-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293708#comment-13293708 ] Zhihong Ted Yu commented on HBASE-6195: --- The new test fails without fix in patch: {code} Failed tests: testParallelIncrementWithMemStoreFlush(org.apache.hadoop.hbase.regionserver.TestHRegion): expected:2000 but was:968 {code} Will integrate this afternoon if there is no objection. Increment data will be lost when the memstore is flushed Key: HBASE-6195 URL: https://issues.apache.org/jira/browse/HBASE-6195 Project: HBase Issue Type: Bug Components: regionserver Reporter: Xing Shi Assignee: ShiXing Attachments: HBASE-6195-trunk-V2.patch, HBASE-6195-trunk-V3.patch, HBASE-6195-trunk-V4.patch, HBASE-6195-trunk-V5.patch, HBASE-6195-trunk-V6.patch, HBASE-6195-trunk.patch There are two problems in increment() now: First: I see that the timestamp(the variable now) in HRegion's Increment() is generated before got the rowLock, so when there are multi-thread increment the same row, although it generate earlier, it may got the lock later. Because increment just store one version, so till now, the result will still be right. When the region is flushing, these increment will read the kv from snapshot and memstore with whose timestamp is larger, and write it back to memstore. If the snapshot's timestamp larger than the memstore, the increment will got the old data and then do the increment, it's wrong. Secondly: Also there is a risk in increment. Because it writes the memstore first and then HLog, so if it writes HLog failed, the client will also read the incremented value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6195) Increment data will be lost when the memstore is flushed
[ https://issues.apache.org/jira/browse/HBASE-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6195: -- Attachment: 6195-trunk-V7.patch Modified the test slightly. Made Incrementer class private, removed unused variable. Increment data will be lost when the memstore is flushed Key: HBASE-6195 URL: https://issues.apache.org/jira/browse/HBASE-6195 Project: HBase Issue Type: Bug Components: regionserver Reporter: Xing Shi Assignee: ShiXing Attachments: 6195-trunk-V7.patch, HBASE-6195-trunk-V2.patch, HBASE-6195-trunk-V3.patch, HBASE-6195-trunk-V4.patch, HBASE-6195-trunk-V5.patch, HBASE-6195-trunk-V6.patch, HBASE-6195-trunk.patch There are two problems in increment() now: First: I see that the timestamp(the variable now) in HRegion's Increment() is generated before got the rowLock, so when there are multi-thread increment the same row, although it generate earlier, it may got the lock later. Because increment just store one version, so till now, the result will still be right. When the region is flushing, these increment will read the kv from snapshot and memstore with whose timestamp is larger, and write it back to memstore. If the snapshot's timestamp larger than the memstore, the increment will got the old data and then do the increment, it's wrong. Secondly: Also there is a risk in increment. Because it writes the memstore first and then HLog, so if it writes HLog failed, the client will also read the incremented value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4791) Allow Secure Zookeeper JAAS configuration to be programmatically set (rather than only by reading JAAS configuration file)
[ https://issues.apache.org/jira/browse/HBASE-4791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laxman updated HBASE-4791: -- Component/s: zookeeper security Allow Secure Zookeeper JAAS configuration to be programmatically set (rather than only by reading JAAS configuration file) -- Key: HBASE-4791 URL: https://issues.apache.org/jira/browse/HBASE-4791 Project: HBase Issue Type: Improvement Components: security, zookeeper Reporter: Eugene Koontz Assignee: Eugene Koontz Labels: security, zookeeper Attachments: HBASE-4791-v0.patch In the currently proposed fix for HBASE-2418, there must be a JAAS file specified in System.setProperty(java.security.auth.login.config). However, it might be preferable to construct a JAAS configuration programmatically, as is done with secure Hadoop (see https://github.com/apache/hadoop-common/blob/a48eceb62c9b5c1a5d71ee2945d9eea2ed62527b/src/java/org/apache/hadoop/security/UserGroupInformation.java#L175). This would have the benefit of avoiding a usage of a system property setting, and allow instead an HBase-local configuration setting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6195) Increment data will be lost when the memstore is flushed
[ https://issues.apache.org/jira/browse/HBASE-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293759#comment-13293759 ] Hadoop QA commented on HBASE-6195: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12531839/6195-trunk-V7.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2152//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2152//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2152//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2152//console This message is automatically generated. Increment data will be lost when the memstore is flushed Key: HBASE-6195 URL: https://issues.apache.org/jira/browse/HBASE-6195 Project: HBase Issue Type: Bug Components: regionserver Reporter: Xing Shi Assignee: ShiXing Attachments: 6195-trunk-V7.patch, HBASE-6195-trunk-V2.patch, HBASE-6195-trunk-V3.patch, HBASE-6195-trunk-V4.patch, HBASE-6195-trunk-V5.patch, HBASE-6195-trunk-V6.patch, HBASE-6195-trunk.patch There are two problems in increment() now: First: I see that the timestamp(the variable now) in HRegion's Increment() is generated before got the rowLock, so when there are multi-thread increment the same row, although it generate earlier, it may got the lock later. Because increment just store one version, so till now, the result will still be right. When the region is flushing, these increment will read the kv from snapshot and memstore with whose timestamp is larger, and write it back to memstore. If the snapshot's timestamp larger than the memstore, the increment will got the old data and then do the increment, it's wrong. Secondly: Also there is a risk in increment. Because it writes the memstore first and then HLog, so if it writes HLog failed, the client will also read the incremented value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover
[ https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293767#comment-13293767 ] ramkrishna.s.vasudevan commented on HBASE-6060: --- This is the open point that we see in moving the PENDING_OPEN after rpc. As far as we brainstormed we did not find any other gaps in this. But our initial patch that shares state across AM and SSH solves all problems. :). You have any new patch Stack? May be we need to work on testcases if the latest patch is fine with you..Thanks a lot for spending your time on this defect with your feedbacks and nice new ideas. Regions's in OPENING state from failed regionservers takes a long time to recover - Key: HBASE-6060 URL: https://issues.apache.org/jira/browse/HBASE-6060 Project: HBase Issue Type: Bug Components: master, regionserver Reporter: Enis Soztutar Assignee: rajeshbabu Fix For: 0.96.0, 0.94.1, 0.92.3 Attachments: 6060-94-v3.patch, 6060-94-v4.patch, 6060-94-v4_1.patch, 6060-94-v4_1.patch, 6060-trunk.patch, 6060-trunk.patch, 6060-trunk_2.patch, 6060-trunk_3.patch, 6060_alternative_suggestion.txt, 6060_suggestion2_based_off_v3.patch, 6060_suggestion_based_off_v3.patch, 6060_suggestion_toassign_rs_wentdown_beforerequest.patch, HBASE-6060-92.patch, HBASE-6060-94.patch, HBASE-6060-trunk_4.patch we have seen a pattern in tests, that the regions are stuck in OPENING state for a very long time when the region server who is opening the region fails. My understanding of the process: - master calls rs to open the region. If rs is offline, a new plan is generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), HMaster.assign() - RegionServer, starts opening a region, changes the state in znode. But that znode is not ephemeral. (see ZkAssign) - Rs transitions zk node from OFFLINE to OPENING. See OpenRegionHandler.process() - rs then opens the region, and changes znode from OPENING to OPENED - when rs is killed between OPENING and OPENED states, then zk shows OPENING state, and the master just waits for rs to change the region state, but since rs is down, that wont happen. - There is a AssignmentManager.TimeoutMonitor, which does exactly guard against these kind of conditions. It periodically checks (every 10 sec by default) the regions in transition to see whether they timedout (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, which explains what you and I are seeing. - ServerShutdownHandler in Master does not reassign regions in OPENING state, although it handles other states. Lowering that threshold from the configuration is one option, but still I think we can do better. Will investigate more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293775#comment-13293775 ] nkeywal commented on HBASE-5924: @ted: Ok. These issues were already in my initial patch. Could you confirm that you have finished the review? I would like to deliver 'the' final patch. Thank you. In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5924.v11.patch, 5924.v14.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
[ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293778#comment-13293778 ] Zhihong Ted Yu commented on HBASE-5924: --- I don't have further comments. Thanks In the client code, don't wait for all the requests to be executed before resubmitting a request in error. -- Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5924.v11.patch, 5924.v14.patch, 5924.v5.patch, 5924.v9.patch The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5564) Bulkload is discarding duplicate records
[ https://issues.apache.org/jira/browse/HBASE-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5564: -- Status: Open (was: Patch Available) Bulkload is discarding duplicate records Key: HBASE-5564 URL: https://issues.apache.org/jira/browse/HBASE-5564 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.96.0 Environment: HBase 0.92 Reporter: Laxman Assignee: Laxman Labels: bulkloader Fix For: 0.96.0 Attachments: 5564.lint, 5564v5.txt, HBASE-5564.patch, HBASE-5564_trunk.1.patch, HBASE-5564_trunk.1.patch, HBASE-5564_trunk.2.patch, HBASE-5564_trunk.3.patch, HBASE-5564_trunk.4_final.patch, HBASE-5564_trunk.patch Duplicate records are getting discarded when duplicate records exists in same input file and more specifically if they exists in same split. Duplicate records are considered if the records are from diffrent different splits. Version under test: HBase 0.92 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5564) Bulkload is discarding duplicate records
[ https://issues.apache.org/jira/browse/HBASE-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5564: -- Attachment: HBASE-5564_1.patch Updated patch addressing Ted's comments. This what am planning to commit if there is no objection. Bulkload is discarding duplicate records Key: HBASE-5564 URL: https://issues.apache.org/jira/browse/HBASE-5564 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.96.0 Environment: HBase 0.92 Reporter: Laxman Assignee: Laxman Labels: bulkloader Fix For: 0.96.0 Attachments: 5564.lint, 5564v5.txt, HBASE-5564.patch, HBASE-5564_1.patch, HBASE-5564_trunk.1.patch, HBASE-5564_trunk.1.patch, HBASE-5564_trunk.2.patch, HBASE-5564_trunk.3.patch, HBASE-5564_trunk.4_final.patch, HBASE-5564_trunk.patch Duplicate records are getting discarded when duplicate records exists in same input file and more specifically if they exists in same split. Duplicate records are considered if the records are from diffrent different splits. Version under test: HBase 0.92 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5564) Bulkload is discarding duplicate records
[ https://issues.apache.org/jira/browse/HBASE-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5564: -- Status: Patch Available (was: Open) Bulkload is discarding duplicate records Key: HBASE-5564 URL: https://issues.apache.org/jira/browse/HBASE-5564 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.96.0 Environment: HBase 0.92 Reporter: Laxman Assignee: Laxman Labels: bulkloader Fix For: 0.96.0 Attachments: 5564.lint, 5564v5.txt, HBASE-5564.patch, HBASE-5564_1.patch, HBASE-5564_trunk.1.patch, HBASE-5564_trunk.1.patch, HBASE-5564_trunk.2.patch, HBASE-5564_trunk.3.patch, HBASE-5564_trunk.4_final.patch, HBASE-5564_trunk.patch Duplicate records are getting discarded when duplicate records exists in same input file and more specifically if they exists in same split. Duplicate records are considered if the records are from diffrent different splits. Version under test: HBase 0.92 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6012) Handling RegionOpeningState for bulk assign
[ https://issues.apache.org/jira/browse/HBASE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293793#comment-13293793 ] Hudson commented on HBASE-6012: --- Integrated in HBase-TRUNK #3014 (See [https://builds.apache.org/job/HBase-TRUNK/3014/]) HBASE-6012 Handling RegionOpeningState for bulk assign (Chunhui) (Revision 1349377) Result = FAILURE tedyu : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ResponseConverter.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java Handling RegionOpeningState for bulk assign --- Key: HBASE-6012 URL: https://issues.apache.org/jira/browse/HBASE-6012 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0 Attachments: HBASE-6012.patch, HBASE-6012v2.patch, HBASE-6012v3.patch, HBASE-6012v4.patch, HBASE-6012v5.patch, HBASE-6012v6.patch, HBASE-6012v7.patch, HBASE-6012v8.patch Since HBASE-5914, we using bulk assign for SSH But in the bulk assign case if we get an ALREADY_OPENED case there is no one to clear the znode created by bulk assign. Another thing, when RS opening a list of regions, if one region is already in transition, it will throw RegionAlreadyInTransitionException and stop opening other regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5970) Improve the AssignmentManager#updateTimer and speed up handling opened event
[ https://issues.apache.org/jira/browse/HBASE-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293796#comment-13293796 ] ramkrishna.s.vasudevan commented on HBASE-5970: --- @Chunhui I have a question here. We tried this patch on 0.94 with 2 regions and 4 RS. The scenario we tried was to disable and enable a table that had 2 regions. We did not see much improvement. Do you see any specific scenario where we can get improvement? Thanks. Improve the AssignmentManager#updateTimer and speed up handling opened event Key: HBASE-5970 URL: https://issues.apache.org/jira/browse/HBASE-5970 Project: HBase Issue Type: Improvement Components: master Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.96.0 Attachments: 5970v3.patch, HBASE-5970.patch, HBASE-5970v2.patch, HBASE-5970v3.patch, HBASE-5970v4.patch, HBASE-5970v4.patch We found handing opened event very slow in the environment with lots of regions. The problem is the slow AssignmentManager#updateTimer. We do the test for bulk assigning 10w (i.e. 100k) regions, the whole process of bulk assigning took 1 hours. 2012-05-06 20:31:49,201 INFO org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning 10 region(s) round-robin across 5 server(s) 2012-05-06 21:26:32,103 INFO org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning done I think we could do the improvement for the AssignmentManager#updateTimer: Make a thread do this work. After the improvement, it took only 4.5mins 2012-05-07 11:03:36,581 INFO org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning 10 region(s) across 5 server(s), retainAssignment=true 2012-05-07 11:07:57,073 INFO org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning done -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4791) Allow Secure Zookeeper JAAS configuration to be programmatically set (rather than only by reading JAAS configuration file)
[ https://issues.apache.org/jira/browse/HBASE-4791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293797#comment-13293797 ] Laxman commented on HBASE-4791: --- IMO, this requires a fix in zookeeper as it expects JAAS configuration provided as a system property in ZooKeeperSaslClient. Changing that may not be so easy due to following reason. * ZooKeeper client doesn't expect any configuration. It just needs a quorum string. So, introducing a configuration may introduce compatability issue. I filed a similar hard-coding related issue ZOOKEEPER-1467. Allow Secure Zookeeper JAAS configuration to be programmatically set (rather than only by reading JAAS configuration file) -- Key: HBASE-4791 URL: https://issues.apache.org/jira/browse/HBASE-4791 Project: HBase Issue Type: Improvement Components: security, zookeeper Reporter: Eugene Koontz Assignee: Eugene Koontz Labels: security, zookeeper Attachments: HBASE-4791-v0.patch In the currently proposed fix for HBASE-2418, there must be a JAAS file specified in System.setProperty(java.security.auth.login.config). However, it might be preferable to construct a JAAS configuration programmatically, as is done with secure Hadoop (see https://github.com/apache/hadoop-common/blob/a48eceb62c9b5c1a5d71ee2945d9eea2ed62527b/src/java/org/apache/hadoop/security/UserGroupInformation.java#L175). This would have the benefit of avoiding a usage of a system property setting, and allow instead an HBase-local configuration setting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6188) Remove the concept of table owner
[ https://issues.apache.org/jira/browse/HBASE-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293803#comment-13293803 ] Laxman commented on HBASE-6188: --- Test failures and findbugs are not relevant to the current patch. Please review the patch. Remove the concept of table owner - Key: HBASE-6188 URL: https://issues.apache.org/jira/browse/HBASE-6188 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.94.0, 0.96.0, 0.94.1 Reporter: Andrew Purtell Assignee: Laxman Labels: security Fix For: 0.96.0, 0.94.1 Attachments: HBASE-6188.patch The table owner concept was a design simplification in the initial drop. First, the design changes under review means only a user with GLOBAL CREATE permission can create a table, which will probably be an administrator. Then, granting implicit permissions may lead to oversights and it adds unnecessary conditionals to our code. So instead the administrator with GLOBAL CREATE permission should make the appropriate grants at table create time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5564) Bulkload is discarding duplicate records
[ https://issues.apache.org/jira/browse/HBASE-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293815#comment-13293815 ] Hadoop QA commented on HBASE-5564: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12531854/HBASE-5564_1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.coprocessor.TestClassLoading Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2153//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2153//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2153//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2153//console This message is automatically generated. Bulkload is discarding duplicate records Key: HBASE-5564 URL: https://issues.apache.org/jira/browse/HBASE-5564 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.96.0 Environment: HBase 0.92 Reporter: Laxman Assignee: Laxman Labels: bulkloader Fix For: 0.96.0 Attachments: 5564.lint, 5564v5.txt, HBASE-5564.patch, HBASE-5564_1.patch, HBASE-5564_trunk.1.patch, HBASE-5564_trunk.1.patch, HBASE-5564_trunk.2.patch, HBASE-5564_trunk.3.patch, HBASE-5564_trunk.4_final.patch, HBASE-5564_trunk.patch Duplicate records are getting discarded when duplicate records exists in same input file and more specifically if they exists in same split. Duplicate records are considered if the records are from diffrent different splits. Version under test: HBase 0.92 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6195) Increment data will be lost when the memstore is flushed
[ https://issues.apache.org/jira/browse/HBASE-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293828#comment-13293828 ] Zhihong Ted Yu commented on HBASE-6195: --- Integrated to trunk. Thanks for the patch, Xing. Increment data will be lost when the memstore is flushed Key: HBASE-6195 URL: https://issues.apache.org/jira/browse/HBASE-6195 Project: HBase Issue Type: Bug Components: regionserver Reporter: Xing Shi Assignee: ShiXing Attachments: 6195-trunk-V7.patch, HBASE-6195-trunk-V2.patch, HBASE-6195-trunk-V3.patch, HBASE-6195-trunk-V4.patch, HBASE-6195-trunk-V5.patch, HBASE-6195-trunk-V6.patch, HBASE-6195-trunk.patch There are two problems in increment() now: First: I see that the timestamp(the variable now) in HRegion's Increment() is generated before got the rowLock, so when there are multi-thread increment the same row, although it generate earlier, it may got the lock later. Because increment just store one version, so till now, the result will still be right. When the region is flushing, these increment will read the kv from snapshot and memstore with whose timestamp is larger, and write it back to memstore. If the snapshot's timestamp larger than the memstore, the increment will got the old data and then do the increment, it's wrong. Secondly: Also there is a risk in increment. Because it writes the memstore first and then HLog, so if it writes HLog failed, the client will also read the incremented value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover
[ https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293849#comment-13293849 ] rajeshbabu commented on HBASE-6060: --- Patch with some test cases and corrections. TestAssignmentManager passes. Regions's in OPENING state from failed regionservers takes a long time to recover - Key: HBASE-6060 URL: https://issues.apache.org/jira/browse/HBASE-6060 Project: HBase Issue Type: Bug Components: master, regionserver Reporter: Enis Soztutar Assignee: rajeshbabu Fix For: 0.96.0, 0.94.1, 0.92.3 Attachments: 6060-94-v3.patch, 6060-94-v4.patch, 6060-94-v4_1.patch, 6060-94-v4_1.patch, 6060-trunk.patch, 6060-trunk.patch, 6060-trunk_2.patch, 6060-trunk_3.patch, 6060_alternative_suggestion.txt, 6060_suggestion2_based_off_v3.patch, 6060_suggestion_based_off_v3.patch, 6060_suggestion_toassign_rs_wentdown_beforerequest.patch, HBASE-6060-92.patch, HBASE-6060-94.patch, HBASE-6060-trunk_4.patch, HBASE-6060_trunk_5.patch we have seen a pattern in tests, that the regions are stuck in OPENING state for a very long time when the region server who is opening the region fails. My understanding of the process: - master calls rs to open the region. If rs is offline, a new plan is generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), HMaster.assign() - RegionServer, starts opening a region, changes the state in znode. But that znode is not ephemeral. (see ZkAssign) - Rs transitions zk node from OFFLINE to OPENING. See OpenRegionHandler.process() - rs then opens the region, and changes znode from OPENING to OPENED - when rs is killed between OPENING and OPENED states, then zk shows OPENING state, and the master just waits for rs to change the region state, but since rs is down, that wont happen. - There is a AssignmentManager.TimeoutMonitor, which does exactly guard against these kind of conditions. It periodically checks (every 10 sec by default) the regions in transition to see whether they timedout (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, which explains what you and I are seeing. - ServerShutdownHandler in Master does not reassign regions in OPENING state, although it handles other states. Lowering that threshold from the configuration is one option, but still I think we can do better. Will investigate more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover
[ https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rajeshbabu updated HBASE-6060: -- Status: Open (was: Patch Available) Regions's in OPENING state from failed regionservers takes a long time to recover - Key: HBASE-6060 URL: https://issues.apache.org/jira/browse/HBASE-6060 Project: HBase Issue Type: Bug Components: master, regionserver Reporter: Enis Soztutar Assignee: rajeshbabu Fix For: 0.96.0, 0.94.1, 0.92.3 Attachments: 6060-94-v3.patch, 6060-94-v4.patch, 6060-94-v4_1.patch, 6060-94-v4_1.patch, 6060-trunk.patch, 6060-trunk.patch, 6060-trunk_2.patch, 6060-trunk_3.patch, 6060_alternative_suggestion.txt, 6060_suggestion2_based_off_v3.patch, 6060_suggestion_based_off_v3.patch, 6060_suggestion_toassign_rs_wentdown_beforerequest.patch, HBASE-6060-92.patch, HBASE-6060-94.patch, HBASE-6060-trunk_4.patch, HBASE-6060_trunk_5.patch we have seen a pattern in tests, that the regions are stuck in OPENING state for a very long time when the region server who is opening the region fails. My understanding of the process: - master calls rs to open the region. If rs is offline, a new plan is generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), HMaster.assign() - RegionServer, starts opening a region, changes the state in znode. But that znode is not ephemeral. (see ZkAssign) - Rs transitions zk node from OFFLINE to OPENING. See OpenRegionHandler.process() - rs then opens the region, and changes znode from OPENING to OPENED - when rs is killed between OPENING and OPENED states, then zk shows OPENING state, and the master just waits for rs to change the region state, but since rs is down, that wont happen. - There is a AssignmentManager.TimeoutMonitor, which does exactly guard against these kind of conditions. It periodically checks (every 10 sec by default) the regions in transition to see whether they timedout (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, which explains what you and I are seeing. - ServerShutdownHandler in Master does not reassign regions in OPENING state, although it handles other states. Lowering that threshold from the configuration is one option, but still I think we can do better. Will investigate more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover
[ https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rajeshbabu updated HBASE-6060: -- Status: Patch Available (was: Open) Regions's in OPENING state from failed regionservers takes a long time to recover - Key: HBASE-6060 URL: https://issues.apache.org/jira/browse/HBASE-6060 Project: HBase Issue Type: Bug Components: master, regionserver Reporter: Enis Soztutar Assignee: rajeshbabu Fix For: 0.96.0, 0.94.1, 0.92.3 Attachments: 6060-94-v3.patch, 6060-94-v4.patch, 6060-94-v4_1.patch, 6060-94-v4_1.patch, 6060-trunk.patch, 6060-trunk.patch, 6060-trunk_2.patch, 6060-trunk_3.patch, 6060_alternative_suggestion.txt, 6060_suggestion2_based_off_v3.patch, 6060_suggestion_based_off_v3.patch, 6060_suggestion_toassign_rs_wentdown_beforerequest.patch, HBASE-6060-92.patch, HBASE-6060-94.patch, HBASE-6060-trunk_4.patch, HBASE-6060_trunk_5.patch we have seen a pattern in tests, that the regions are stuck in OPENING state for a very long time when the region server who is opening the region fails. My understanding of the process: - master calls rs to open the region. If rs is offline, a new plan is generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), HMaster.assign() - RegionServer, starts opening a region, changes the state in znode. But that znode is not ephemeral. (see ZkAssign) - Rs transitions zk node from OFFLINE to OPENING. See OpenRegionHandler.process() - rs then opens the region, and changes znode from OPENING to OPENED - when rs is killed between OPENING and OPENED states, then zk shows OPENING state, and the master just waits for rs to change the region state, but since rs is down, that wont happen. - There is a AssignmentManager.TimeoutMonitor, which does exactly guard against these kind of conditions. It periodically checks (every 10 sec by default) the regions in transition to see whether they timedout (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, which explains what you and I are seeing. - ServerShutdownHandler in Master does not reassign regions in OPENING state, although it handles other states. Lowering that threshold from the configuration is one option, but still I think we can do better. Will investigate more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5699) Run with 1 WAL in HRegionServer
[ https://issues.apache.org/jira/browse/HBASE-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293853#comment-13293853 ] Todd Lipcon commented on HBASE-5699: bq. I think we should wait for test result with HBASE-6116 before we invest more time in this. HBASE-6116 seems like it would improve latency but hurt throughput -- on a typical gbit link, the parallel writes would limit us to 50M/sec for 3 replicas, whereas pipelined writes could give us 100M+. The other main advantage of this JIRA is that the speed of the WAL is currently limited to the minimum speed of the 3 disks chosen in the pipeline. Given that disks can be heavily loaded, the probability of getting even a full disk's worth of throughput is low -- the likelihood is that at least one of those disks is also being written to or read from at least another client. So typically any single HDFS stream is limited to 35-40MB/sec in my experience. Given that gbit is much faster than this, we can get better throughput by adding parallel WALs, so as to stripe across disks and dynamically push writes to less-loaded disks. Run with 1 WAL in HRegionServer - Key: HBASE-5699 URL: https://issues.apache.org/jira/browse/HBASE-5699 Project: HBase Issue Type: Improvement Reporter: binlijin Assignee: Li Pi Attachments: PerfHbase.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6195) Increment data will be lost when the memstore is flushed
[ https://issues.apache.org/jira/browse/HBASE-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293882#comment-13293882 ] Hudson commented on HBASE-6195: --- Integrated in HBase-TRUNK #3016 (See [https://builds.apache.org/job/HBase-TRUNK/3016/]) HBASE-6195 Increment data will be lost when the memstore is flushed (Xing Shi) (Revision 1349471) Result = FAILURE tedyu : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java Increment data will be lost when the memstore is flushed Key: HBASE-6195 URL: https://issues.apache.org/jira/browse/HBASE-6195 Project: HBase Issue Type: Bug Components: regionserver Reporter: Xing Shi Assignee: ShiXing Attachments: 6195-trunk-V7.patch, HBASE-6195-trunk-V2.patch, HBASE-6195-trunk-V3.patch, HBASE-6195-trunk-V4.patch, HBASE-6195-trunk-V5.patch, HBASE-6195-trunk-V6.patch, HBASE-6195-trunk.patch There are two problems in increment() now: First: I see that the timestamp(the variable now) in HRegion's Increment() is generated before got the rowLock, so when there are multi-thread increment the same row, although it generate earlier, it may got the lock later. Because increment just store one version, so till now, the result will still be right. When the region is flushing, these increment will read the kv from snapshot and memstore with whose timestamp is larger, and write it back to memstore. If the snapshot's timestamp larger than the memstore, the increment will got the old data and then do the increment, it's wrong. Secondly: Also there is a risk in increment. Because it writes the memstore first and then HLog, so if it writes HLog failed, the client will also read the incremented value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover
[ https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293883#comment-13293883 ] Hadoop QA commented on HBASE-6060: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12531867/HBASE-6060_trunk_5.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestAtomicOperation org.apache.hadoop.hbase.master.TestDistributedLogSplitting org.apache.hadoop.hbase.master.TestSplitLogManager Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2154//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2154//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2154//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2154//console This message is automatically generated. Regions's in OPENING state from failed regionservers takes a long time to recover - Key: HBASE-6060 URL: https://issues.apache.org/jira/browse/HBASE-6060 Project: HBase Issue Type: Bug Components: master, regionserver Reporter: Enis Soztutar Assignee: rajeshbabu Fix For: 0.96.0, 0.94.1, 0.92.3 Attachments: 6060-94-v3.patch, 6060-94-v4.patch, 6060-94-v4_1.patch, 6060-94-v4_1.patch, 6060-trunk.patch, 6060-trunk.patch, 6060-trunk_2.patch, 6060-trunk_3.patch, 6060_alternative_suggestion.txt, 6060_suggestion2_based_off_v3.patch, 6060_suggestion_based_off_v3.patch, 6060_suggestion_toassign_rs_wentdown_beforerequest.patch, HBASE-6060-92.patch, HBASE-6060-94.patch, HBASE-6060-trunk_4.patch, HBASE-6060_trunk_5.patch we have seen a pattern in tests, that the regions are stuck in OPENING state for a very long time when the region server who is opening the region fails. My understanding of the process: - master calls rs to open the region. If rs is offline, a new plan is generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), HMaster.assign() - RegionServer, starts opening a region, changes the state in znode. But that znode is not ephemeral. (see ZkAssign) - Rs transitions zk node from OFFLINE to OPENING. See OpenRegionHandler.process() - rs then opens the region, and changes znode from OPENING to OPENED - when rs is killed between OPENING and OPENED states, then zk shows OPENING state, and the master just waits for rs to change the region state, but since rs is down, that wont happen. - There is a AssignmentManager.TimeoutMonitor, which does exactly guard against these kind of conditions. It periodically checks (every 10 sec by default) the regions in transition to see whether they timedout (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, which explains what you and I are seeing. - ServerShutdownHandler in Master does not reassign regions in OPENING state, although it handles other states. Lowering that threshold from the configuration is one option, but still I think we can do better. Will investigate more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6202) Medium tests fail with jdk1.7
Jimmy Xiang created HBASE-6202: -- Summary: Medium tests fail with jdk1.7 Key: HBASE-6202 URL: https://issues.apache.org/jira/browse/HBASE-6202 Project: HBase Issue Type: Sub-task Components: test Affects Versions: 0.96.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Failed tests: testOrphanTaskAcquisition(org.apache.hadoop.hbase.master.TestSplitLogManager) testCreateAndUpdate(org.apache.hadoop.hbase.util.TestFSTableDescriptors): statuses.length=5 testManageSingleton(org.apache.hadoop.hbase.util.TestEnvironmentEdgeManager) Tests in error: testRegionTransitionOperations(org.apache.hadoop.hbase.coprocessor.TestMasterObserver): org.apache.hadoop.hbase.UnknownRegionException: ef07dae0851f4fbe7d550413530b8774 testTableOperations(org.apache.hadoop.hbase.coprocessor.TestMasterObserver): org.apache.hadoop.hbase.TableExistsException: observed_table testClassLoadingFromHDFS(org.apache.hadoop.hbase.coprocessor.TestClassLoading): org.apache.hadoop.hbase.TableNotEnabledException: TestClassLoading -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6201) HBase integration/system tests
[ https://issues.apache.org/jira/browse/HBASE-6201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293954#comment-13293954 ] Andrew Purtell commented on HBASE-6201: --- Bigtop provides a framework for integration tests that is, essentially, 'mvn verify'. If we are discussing an integration test framework that must interact with a cluster, perhaps on demand, then there is potentially a framework already under development for that. HBase integration/system tests -- Key: HBASE-6201 URL: https://issues.apache.org/jira/browse/HBASE-6201 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Integration and general system tests have been discussed previously, and the conclusion is that we need to unify how we do release candidate testing (HBASE-6091). In this issue, I would like to discuss and agree on a general plan, and open subtickets for execution so that we can carry out most of the tests in HBASE-6091 automatically. Initially, here is what I have in mind: 1. Create hbase-it (or hbase-tests) containing forward port of HBASE-4454 (without any tests). This will allow integration test to be run with {code} mvn verify {code} 2. Add ability to run all integration/system tests on a given cluster. Smt like: {code} mvn verify -Dconf=/etc/hbase/conf/ {code} should run the test suite on the given cluster. (Right now we can launch some of the tests (TestAcidGuarantees) from command line). Most of the system tests will be client side, and interface with the cluster through public APIs. We need a tool on top of MiniHBaseCluster or improve HBaseTestingUtility, so that tests can interface with the mini cluster or the actual cluster uniformly. 3. Port candidate unit tests to the integration tests module. Some of the candidates are: - TestAcidGuarantees / TestAtomicOperation - TestRegionBalancing (HBASE-6053) - TestFullLogReconstruction - TestMasterFailover - TestImportExport - TestMultiVersions / TestKeepDeletes - TestFromClientSide - TestShell and src/test/ruby - TestRollingRestart - Test**OnCluster - Balancer tests These tests should continue to be run as unit tests w/o any change in semantics. However, given an actual cluster, they should use that, instead of spinning a mini cluster. 4. Add more tests, especially, long running ingestion tests (goraci, BigTop's TestLoadAndVerify, LoadTestTool), and chaos monkey style fault tests. All suggestions welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (HBASE-6201) HBase integration/system tests
[ https://issues.apache.org/jira/browse/HBASE-6201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293954#comment-13293954 ] Andrew Purtell edited comment on HBASE-6201 at 6/12/12 10:04 PM: - Bigtop provides a framework for integration tests that is, essentially, 'mvn verify'. If we are discussing an integration test framework that must interact with a cluster, perhaps on demand, then there is potentially one already under development for that. was (Author: apurtell): Bigtop provides a framework for integration tests that is, essentially, 'mvn verify'. If we are discussing an integration test framework that must interact with a cluster, perhaps on demand, then there is potentially a framework already under development for that. HBase integration/system tests -- Key: HBASE-6201 URL: https://issues.apache.org/jira/browse/HBASE-6201 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Integration and general system tests have been discussed previously, and the conclusion is that we need to unify how we do release candidate testing (HBASE-6091). In this issue, I would like to discuss and agree on a general plan, and open subtickets for execution so that we can carry out most of the tests in HBASE-6091 automatically. Initially, here is what I have in mind: 1. Create hbase-it (or hbase-tests) containing forward port of HBASE-4454 (without any tests). This will allow integration test to be run with {code} mvn verify {code} 2. Add ability to run all integration/system tests on a given cluster. Smt like: {code} mvn verify -Dconf=/etc/hbase/conf/ {code} should run the test suite on the given cluster. (Right now we can launch some of the tests (TestAcidGuarantees) from command line). Most of the system tests will be client side, and interface with the cluster through public APIs. We need a tool on top of MiniHBaseCluster or improve HBaseTestingUtility, so that tests can interface with the mini cluster or the actual cluster uniformly. 3. Port candidate unit tests to the integration tests module. Some of the candidates are: - TestAcidGuarantees / TestAtomicOperation - TestRegionBalancing (HBASE-6053) - TestFullLogReconstruction - TestMasterFailover - TestImportExport - TestMultiVersions / TestKeepDeletes - TestFromClientSide - TestShell and src/test/ruby - TestRollingRestart - Test**OnCluster - Balancer tests These tests should continue to be run as unit tests w/o any change in semantics. However, given an actual cluster, they should use that, instead of spinning a mini cluster. 4. Add more tests, especially, long running ingestion tests (goraci, BigTop's TestLoadAndVerify, LoadTestTool), and chaos monkey style fault tests. All suggestions welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6203) Create hbase-it
[ https://issues.apache.org/jira/browse/HBASE-6203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-6203: - Attachment: HBASE-6203_v1.patch Attaching a patch. Create hbase-it --- Key: HBASE-6203 URL: https://issues.apache.org/jira/browse/HBASE-6203 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Attachments: HBASE-6203_v1.patch Create hbase-it, as per parent issue, and re-introduce HBASE-4454 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6203) Create hbase-it
Enis Soztutar created HBASE-6203: Summary: Create hbase-it Key: HBASE-6203 URL: https://issues.apache.org/jira/browse/HBASE-6203 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Create hbase-it, as per parent issue, and re-introduce HBASE-4454 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6203) Create hbase-it
[ https://issues.apache.org/jira/browse/HBASE-6203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293972#comment-13293972 ] Enis Soztutar commented on HBASE-6203: -- Some notes: {code} mvn verify {code} runs the tests under hbase-it, named IntegrationTestXXX. Note that {{mvn test}} does not run these tests. You can run just the integration tests, but cd'ing under hbase-it module, or use {code} mvn verify -Dskip-server-tests -Dskip-common-tests {code} You can also skip integration tests with {{-Dskip-integration-tests}}. Failsafe also honors {{-DskipTests}}. Create hbase-it --- Key: HBASE-6203 URL: https://issues.apache.org/jira/browse/HBASE-6203 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Attachments: HBASE-6203_v1.patch Create hbase-it, as per parent issue, and re-introduce HBASE-4454 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover
[ https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293973#comment-13293973 ] stack commented on HBASE-6060: -- @Ram Thinking on it, the way to plug the hole you and Rajesh have identfied is by having the RS update the znode to OPENING form OFFINE before returning from the open rpc call. The way I see it, we've found problems in our RegionState. As it is now, we cannot ask RegionState to reliably find the state of a region. Its so bad, you fellas went and made a solution figuring where a region is at using another system altogether, the RegionPlan state which strikes me as a little odd; because the system we should be relying on is broke -- RegionState -- you fellas look at the ghost traces of region moves, the impression left in RegionPlan. Is this a fair characterization? Don't you think we should fix RegionState while we're in here? I've been working on a patch but was going to do it piecemeal, fix hbase-6199 first? Regions's in OPENING state from failed regionservers takes a long time to recover - Key: HBASE-6060 URL: https://issues.apache.org/jira/browse/HBASE-6060 Project: HBase Issue Type: Bug Components: master, regionserver Reporter: Enis Soztutar Assignee: rajeshbabu Fix For: 0.96.0, 0.94.1, 0.92.3 Attachments: 6060-94-v3.patch, 6060-94-v4.patch, 6060-94-v4_1.patch, 6060-94-v4_1.patch, 6060-trunk.patch, 6060-trunk.patch, 6060-trunk_2.patch, 6060-trunk_3.patch, 6060_alternative_suggestion.txt, 6060_suggestion2_based_off_v3.patch, 6060_suggestion_based_off_v3.patch, 6060_suggestion_toassign_rs_wentdown_beforerequest.patch, HBASE-6060-92.patch, HBASE-6060-94.patch, HBASE-6060-trunk_4.patch, HBASE-6060_trunk_5.patch we have seen a pattern in tests, that the regions are stuck in OPENING state for a very long time when the region server who is opening the region fails. My understanding of the process: - master calls rs to open the region. If rs is offline, a new plan is generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), HMaster.assign() - RegionServer, starts opening a region, changes the state in znode. But that znode is not ephemeral. (see ZkAssign) - Rs transitions zk node from OFFLINE to OPENING. See OpenRegionHandler.process() - rs then opens the region, and changes znode from OPENING to OPENED - when rs is killed between OPENING and OPENED states, then zk shows OPENING state, and the master just waits for rs to change the region state, but since rs is down, that wont happen. - There is a AssignmentManager.TimeoutMonitor, which does exactly guard against these kind of conditions. It periodically checks (every 10 sec by default) the regions in transition to see whether they timedout (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, which explains what you and I are seeing. - ServerShutdownHandler in Master does not reassign regions in OPENING state, although it handles other states. Lowering that threshold from the configuration is one option, but still I think we can do better. Will investigate more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6185) region autoSplit when not reach 'hbase.hregion.max.filesize'
[ https://issues.apache.org/jira/browse/HBASE-6185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293980#comment-13293980 ] Jean-Daniel Cryans commented on HBASE-6185: --- Please change this jira's title to something more relevant to your actual issue. region autoSplit when not reach 'hbase.hregion.max.filesize' Key: HBASE-6185 URL: https://issues.apache.org/jira/browse/HBASE-6185 Project: HBase Issue Type: Bug Components: documentation Affects Versions: 0.94.0 Reporter: nneverwei Attachments: HBASE-6185.patch When using hbase0.94.0 we met a strange problem. We config the 'hbase.hregion.max.filesize' to 100Gb (The recommed value to act as auto-split turn off). {code:xml} property namehbase.hregion.max.filesize/name value107374182400/value /property {code} Then we keep putting datas into a table. But when the data size far more less than 100Gb(about 500~600 uncompressed datas), the table auto splte to 2 regions... I change the log4j config to DEBUG, and saw logs below: {code} 2012-06-07 10:30:52,161 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~128.0m/134221272, currentsize=1.5m/1617744 for region FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. in 3201ms, sequenceid=176387980, compaction requested=false 2012-06-07 10:30:52,161 DEBUG org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=138657416, sizeToCheck=134217728, regionsWithCommonTable=1 2012-06-07 10:30:52,161 DEBUG org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=138657416, sizeToCheck=134217728, regionsWithCommonTable=1 2012-06-07 10:30:52,240 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Split requested for FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8.. compaction_queue=(0:0), split_queue=0 2012-06-07 10:30:52,265 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. 2012-06-07 10:30:52,265 DEBUG org.apache.hadoop.hbase.regionserver.SplitTransaction: regionserver:60020-0x137c4929efe0001 Creating ephemeral node for 7b229abcd0785408251a579e9bdf49c8 in SPLITTING state 2012-06-07 10:30:52,368 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x137c4929efe0001 Attempting to transition node 7b229abcd0785408251a579e9bdf49c8 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING 2012-06-07 10:30:52,382 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x137c4929efe0001 Successfully transitioned node 7b229abcd0785408251a579e9bdf49c8 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING 2012-06-07 10:30:52,410 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8.: disabling compactions flushes 2012-06-07 10:30:52,410 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. is closing 2012-06-07 10:30:52,411 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. is closing {code} {color:red}IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=138657416, sizeToCheck=134217728{color} I did not config splitPolicy for hbase, so it means *IncreasingToUpperBoundRegionSplitPolicy is the default splitPolicy of 0.94.0* After add {code:xml} property namehbase.regionserver.region.split.policy/name valueorg.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy/value /property {code} autosplit did not happen again and everything goes well. But we can still see javadoc on ConstantSizeRegionSplitPolicy, it says 'This is the default split policy'. Or even in the http://hbase.apache.org/book/regions.arch.html 9.7.4.1. Custom Split Policies, 'default split policy: ConstantSizeRegionSplitPolicy.'. Those may mistaken us that if we set hbase.hregion.max.filesize to 100Gb, than the auto-split can be almost shutdown. You may change those docs, and What more, in many scenerys, we actually need to control split manually(As you know when spliting the table are offline, reads and writes will fail) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6201) HBase integration/system tests
[ https://issues.apache.org/jira/browse/HBASE-6201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293981#comment-13293981 ] Enis Soztutar commented on HBASE-6201: -- bq. Bigtop provides a framework for integration tests that is, essentially, 'mvn verify'. Thanks for bringing this up. I know that bigtop provides a test framework for integration tests. From my perspective, I see hbase and bigtop sharing responsibility on the testing side, and we can work to define best practices for this, and would love to hear Bigtop's perspective as well. I completely agree that HBase code, should not bother with deployments, cluster management services, smoke testing, nor integration with other components (hive, pig, etc). Those kind of functionality can belong in BigTop or similar projects. However, some core testing functionality, is better managed by the HBase project. Lets consider the TestMasterFailover test. Right now it is a unit test, testing the internal state transitions, when the master fails. However, we can extend this test to run from the client side, and see whether the transition is transparent when we kill the active master on an actual cluster. That kind of testing, should be managed by HBase itself, because, although they would run from the client side, these kind of tests are hbase-specific, and better managed by Hbase devs. Also, I do not expect BigTop to host a large number of test cases for all of the stack (right now 8 projects). Having said that, in this issue, we can come up with a way to interface with BigTop (and other projects, custom jenkins jobs, etc) so that, these tests can use the underlying deployment, server management, etc services, and BigTop, and others can just execute the HBase internal integration tests on the cluster. A simple way for this is that HBase to offer {{mvn verify}} to be consumed by BigTop, and those tests will use HBase's own scripts (and SSH, etc) for cluster/server management. Since BigTop configures the cluster to be usable by those, it should be ok. HBase integration/system tests -- Key: HBASE-6201 URL: https://issues.apache.org/jira/browse/HBASE-6201 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Integration and general system tests have been discussed previously, and the conclusion is that we need to unify how we do release candidate testing (HBASE-6091). In this issue, I would like to discuss and agree on a general plan, and open subtickets for execution so that we can carry out most of the tests in HBASE-6091 automatically. Initially, here is what I have in mind: 1. Create hbase-it (or hbase-tests) containing forward port of HBASE-4454 (without any tests). This will allow integration test to be run with {code} mvn verify {code} 2. Add ability to run all integration/system tests on a given cluster. Smt like: {code} mvn verify -Dconf=/etc/hbase/conf/ {code} should run the test suite on the given cluster. (Right now we can launch some of the tests (TestAcidGuarantees) from command line). Most of the system tests will be client side, and interface with the cluster through public APIs. We need a tool on top of MiniHBaseCluster or improve HBaseTestingUtility, so that tests can interface with the mini cluster or the actual cluster uniformly. 3. Port candidate unit tests to the integration tests module. Some of the candidates are: - TestAcidGuarantees / TestAtomicOperation - TestRegionBalancing (HBASE-6053) - TestFullLogReconstruction - TestMasterFailover - TestImportExport - TestMultiVersions / TestKeepDeletes - TestFromClientSide - TestShell and src/test/ruby - TestRollingRestart - Test**OnCluster - Balancer tests These tests should continue to be run as unit tests w/o any change in semantics. However, given an actual cluster, they should use that, instead of spinning a mini cluster. 4. Add more tests, especially, long running ingestion tests (goraci, BigTop's TestLoadAndVerify, LoadTestTool), and chaos monkey style fault tests. All suggestions welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6053) Enhance TestRegionRebalancing test to be a system test
[ https://issues.apache.org/jira/browse/HBASE-6053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated HBASE-6053: --- Attachment: 6053-1.patch Attached is a revised patch that attempts to abstract some of the MiniHBaseCluster APIs to a common place, and implemented by RealHBaseCluster and MiniHBaseCluster. Far from complete, I think, but serves as a start. Haven't yet updated the system test (TestRegionRebalancing) to use the new APIs. Review feedback welcome. Enhance TestRegionRebalancing test to be a system test -- Key: HBASE-6053 URL: https://issues.apache.org/jira/browse/HBASE-6053 Project: HBase Issue Type: Bug Components: test Reporter: Devaraj Das Assignee: Devaraj Das Priority: Minor Attachments: 6053-1.patch, regionRebalancingSystemTest.txt TestRegionRebalancing can be converted to be a system test -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6201) HBase integration/system tests
[ https://issues.apache.org/jira/browse/HBASE-6201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293995#comment-13293995 ] Andrew Purtell commented on HBASE-6201: --- Makes sense, thanks Enis. HBase integration/system tests -- Key: HBASE-6201 URL: https://issues.apache.org/jira/browse/HBASE-6201 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Integration and general system tests have been discussed previously, and the conclusion is that we need to unify how we do release candidate testing (HBASE-6091). In this issue, I would like to discuss and agree on a general plan, and open subtickets for execution so that we can carry out most of the tests in HBASE-6091 automatically. Initially, here is what I have in mind: 1. Create hbase-it (or hbase-tests) containing forward port of HBASE-4454 (without any tests). This will allow integration test to be run with {code} mvn verify {code} 2. Add ability to run all integration/system tests on a given cluster. Smt like: {code} mvn verify -Dconf=/etc/hbase/conf/ {code} should run the test suite on the given cluster. (Right now we can launch some of the tests (TestAcidGuarantees) from command line). Most of the system tests will be client side, and interface with the cluster through public APIs. We need a tool on top of MiniHBaseCluster or improve HBaseTestingUtility, so that tests can interface with the mini cluster or the actual cluster uniformly. 3. Port candidate unit tests to the integration tests module. Some of the candidates are: - TestAcidGuarantees / TestAtomicOperation - TestRegionBalancing (HBASE-6053) - TestFullLogReconstruction - TestMasterFailover - TestImportExport - TestMultiVersions / TestKeepDeletes - TestFromClientSide - TestShell and src/test/ruby - TestRollingRestart - Test**OnCluster - Balancer tests These tests should continue to be run as unit tests w/o any change in semantics. However, given an actual cluster, they should use that, instead of spinning a mini cluster. 4. Add more tests, especially, long running ingestion tests (goraci, BigTop's TestLoadAndVerify, LoadTestTool), and chaos monkey style fault tests. All suggestions welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6092) Authorize flush, split, compact operations in AccessController
[ https://issues.apache.org/jira/browse/HBASE-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293997#comment-13293997 ] Andrew Purtell commented on HBASE-6092: --- +1 on HBASE-6092.1.patch Authorize flush, split, compact operations in AccessController -- Key: HBASE-6092 URL: https://issues.apache.org/jira/browse/HBASE-6092 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.94.0, 0.96.0, 0.94.1 Reporter: Laxman Assignee: Laxman Labels: acl, security Fix For: 0.96.0, 0.94.1 Attachments: HBASE-6092.1.patch, HBASE-6092.patch Currently, flush, split and compaction are not checked for authorization in AccessController. With the current implementation any unauthorized client can trigger these operations on a table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6053) Enhance TestRegionRebalancing test to be a system test
[ https://issues.apache.org/jira/browse/HBASE-6053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-6053: - Issue Type: Sub-task (was: Bug) Parent: HBASE-6201 Enhance TestRegionRebalancing test to be a system test -- Key: HBASE-6053 URL: https://issues.apache.org/jira/browse/HBASE-6053 Project: HBase Issue Type: Sub-task Components: test Reporter: Devaraj Das Assignee: Devaraj Das Priority: Minor Attachments: 6053-1.patch, regionRebalancingSystemTest.txt TestRegionRebalancing can be converted to be a system test -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6185) region autoSplit when not reach 'hbase.hregion.max.filesize'
[ https://issues.apache.org/jira/browse/HBASE-6185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294004#comment-13294004 ] Zhihong Ted Yu commented on HBASE-6185: --- Please also wrap the long line in the patch. Currently we maintain 100 characters per line. region autoSplit when not reach 'hbase.hregion.max.filesize' Key: HBASE-6185 URL: https://issues.apache.org/jira/browse/HBASE-6185 Project: HBase Issue Type: Bug Components: documentation Affects Versions: 0.94.0 Reporter: nneverwei Attachments: HBASE-6185.patch When using hbase0.94.0 we met a strange problem. We config the 'hbase.hregion.max.filesize' to 100Gb (The recommed value to act as auto-split turn off). {code:xml} property namehbase.hregion.max.filesize/name value107374182400/value /property {code} Then we keep putting datas into a table. But when the data size far more less than 100Gb(about 500~600 uncompressed datas), the table auto splte to 2 regions... I change the log4j config to DEBUG, and saw logs below: {code} 2012-06-07 10:30:52,161 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~128.0m/134221272, currentsize=1.5m/1617744 for region FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. in 3201ms, sequenceid=176387980, compaction requested=false 2012-06-07 10:30:52,161 DEBUG org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=138657416, sizeToCheck=134217728, regionsWithCommonTable=1 2012-06-07 10:30:52,161 DEBUG org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=138657416, sizeToCheck=134217728, regionsWithCommonTable=1 2012-06-07 10:30:52,240 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Split requested for FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8.. compaction_queue=(0:0), split_queue=0 2012-06-07 10:30:52,265 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. 2012-06-07 10:30:52,265 DEBUG org.apache.hadoop.hbase.regionserver.SplitTransaction: regionserver:60020-0x137c4929efe0001 Creating ephemeral node for 7b229abcd0785408251a579e9bdf49c8 in SPLITTING state 2012-06-07 10:30:52,368 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x137c4929efe0001 Attempting to transition node 7b229abcd0785408251a579e9bdf49c8 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING 2012-06-07 10:30:52,382 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x137c4929efe0001 Successfully transitioned node 7b229abcd0785408251a579e9bdf49c8 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING 2012-06-07 10:30:52,410 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8.: disabling compactions flushes 2012-06-07 10:30:52,410 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. is closing 2012-06-07 10:30:52,411 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. is closing {code} {color:red}IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=138657416, sizeToCheck=134217728{color} I did not config splitPolicy for hbase, so it means *IncreasingToUpperBoundRegionSplitPolicy is the default splitPolicy of 0.94.0* After add {code:xml} property namehbase.regionserver.region.split.policy/name valueorg.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy/value /property {code} autosplit did not happen again and everything goes well. But we can still see javadoc on ConstantSizeRegionSplitPolicy, it says 'This is the default split policy'. Or even in the http://hbase.apache.org/book/regions.arch.html 9.7.4.1. Custom Split Policies, 'default split policy: ConstantSizeRegionSplitPolicy.'. Those may mistaken us that if we set hbase.hregion.max.filesize to 100Gb, than the auto-split can be almost shutdown. You may change those docs, and What more, in many scenerys, we actually need to control split manually(As you know when spliting the table are offline, reads and writes will fail) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6134) Improvement for split-worker to speed up distributed-split-log
[ https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6134: -- Attachment: 6134v4.patch TestSplitLogManager passes locally. Reattaching patch v4. Improvement for split-worker to speed up distributed-split-log -- Key: HBASE-6134 URL: https://issues.apache.org/jira/browse/HBASE-6134 Project: HBase Issue Type: Improvement Components: wal Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.96.0 Attachments: 6134v4.patch, HBASE-6134.patch, HBASE-6134v2.patch, HBASE-6134v3-92.patch, HBASE-6134v3.patch, HBASE-6134v4.patch First,we do the test between local-master-splitting and distributed-log-splitting Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths splitting work), 400 regions in one hlog file local-master-split:60s+ distributed-log-splitting:165s+ In fact, in our production environment, distributed-log-splitting also took 60s with 30 regionservers for 34 hlog files (regionserver may be in high load) We found split-worker split one log file took about 20s (30ms~50ms per writer.close(); 10ms per create writers ) I think we could do the improvement for this: Parallelizing the create and close writers in threads In the patch, change the logic for distributed-log-splitting same as the local-master-splitting and parallelizing the close in threads. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6134) Improvement for split-worker to speed up distributed-split-log
[ https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6134: -- Hadoop Flags: Reviewed Improvement for split-worker to speed up distributed-split-log -- Key: HBASE-6134 URL: https://issues.apache.org/jira/browse/HBASE-6134 Project: HBase Issue Type: Improvement Components: wal Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.96.0 Attachments: 6134v4.patch, HBASE-6134.patch, HBASE-6134v2.patch, HBASE-6134v3-92.patch, HBASE-6134v3.patch, HBASE-6134v4.patch First,we do the test between local-master-splitting and distributed-log-splitting Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths splitting work), 400 regions in one hlog file local-master-split:60s+ distributed-log-splitting:165s+ In fact, in our production environment, distributed-log-splitting also took 60s with 30 regionservers for 34 hlog files (regionserver may be in high load) We found split-worker split one log file took about 20s (30ms~50ms per writer.close(); 10ms per create writers ) I think we could do the improvement for this: Parallelizing the create and close writers in threads In the patch, change the logic for distributed-log-splitting same as the local-master-splitting and parallelizing the close in threads. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6053) Enhance TestRegionRebalancing test to be a system test
[ https://issues.apache.org/jira/browse/HBASE-6053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294010#comment-13294010 ] Enis Soztutar commented on HBASE-6053: -- TestRegionRebalancing assumes that there are 1 RS available, and adds other RSs afterwards. What happens when we run this on 10/100 node cluster. We can have more RS, than initial regions. Should we also generalize the testing condition? Or the test will shut down every RS, except for 1, and restart them afterwards? We can remove RandomKiller, not used for now. Enhance TestRegionRebalancing test to be a system test -- Key: HBASE-6053 URL: https://issues.apache.org/jira/browse/HBASE-6053 Project: HBase Issue Type: Sub-task Components: test Reporter: Devaraj Das Assignee: Devaraj Das Priority: Minor Attachments: 6053-1.patch, regionRebalancingSystemTest.txt TestRegionRebalancing can be converted to be a system test -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6012) Handling RegionOpeningState for bulk assign
[ https://issues.apache.org/jira/browse/HBASE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294012#comment-13294012 ] Hudson commented on HBASE-6012: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #51 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/51/]) HBASE-6012 Handling RegionOpeningState for bulk assign (Chunhui) (Revision 1349377) Result = FAILURE tedyu : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ResponseConverter.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java Handling RegionOpeningState for bulk assign --- Key: HBASE-6012 URL: https://issues.apache.org/jira/browse/HBASE-6012 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0 Attachments: HBASE-6012.patch, HBASE-6012v2.patch, HBASE-6012v3.patch, HBASE-6012v4.patch, HBASE-6012v5.patch, HBASE-6012v6.patch, HBASE-6012v7.patch, HBASE-6012v8.patch Since HBASE-5914, we using bulk assign for SSH But in the bulk assign case if we get an ALREADY_OPENED case there is no one to clear the znode created by bulk assign. Another thing, when RS opening a list of regions, if one region is already in transition, it will throw RegionAlreadyInTransitionException and stop opening other regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6195) Increment data will be lost when the memstore is flushed
[ https://issues.apache.org/jira/browse/HBASE-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294011#comment-13294011 ] Hudson commented on HBASE-6195: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #51 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/51/]) HBASE-6195 Increment data will be lost when the memstore is flushed (Xing Shi) (Revision 1349471) Result = FAILURE tedyu : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java Increment data will be lost when the memstore is flushed Key: HBASE-6195 URL: https://issues.apache.org/jira/browse/HBASE-6195 Project: HBase Issue Type: Bug Components: regionserver Reporter: Xing Shi Assignee: ShiXing Attachments: 6195-trunk-V7.patch, HBASE-6195-trunk-V2.patch, HBASE-6195-trunk-V3.patch, HBASE-6195-trunk-V4.patch, HBASE-6195-trunk-V5.patch, HBASE-6195-trunk-V6.patch, HBASE-6195-trunk.patch There are two problems in increment() now: First: I see that the timestamp(the variable now) in HRegion's Increment() is generated before got the rowLock, so when there are multi-thread increment the same row, although it generate earlier, it may got the lock later. Because increment just store one version, so till now, the result will still be right. When the region is flushing, these increment will read the kv from snapshot and memstore with whose timestamp is larger, and write it back to memstore. If the snapshot's timestamp larger than the memstore, the increment will got the old data and then do the increment, it's wrong. Secondly: Also there is a risk in increment. Because it writes the memstore first and then HLog, so if it writes HLog failed, the client will also read the incremented value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6188) Remove the concept of table owner
[ https://issues.apache.org/jira/browse/HBASE-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294014#comment-13294014 ] Andrew Purtell commented on HBASE-6188: --- I'm going to commit HBASE-6092 in a few minutes. Remove the concept of table owner - Key: HBASE-6188 URL: https://issues.apache.org/jira/browse/HBASE-6188 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.94.0, 0.96.0, 0.94.1 Reporter: Andrew Purtell Assignee: Laxman Labels: security Fix For: 0.96.0, 0.94.1 Attachments: HBASE-6188.patch The table owner concept was a design simplification in the initial drop. First, the design changes under review means only a user with GLOBAL CREATE permission can create a table, which will probably be an administrator. Then, granting implicit permissions may lead to oversights and it adds unnecessary conditionals to our code. So instead the administrator with GLOBAL CREATE permission should make the appropriate grants at table create time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6092) Authorize flush, split, compact operations in AccessController
[ https://issues.apache.org/jira/browse/HBASE-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-6092: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk and 0.94 branch. TestAccessController passes locally. Thanks for the patch Laxman! Authorize flush, split, compact operations in AccessController -- Key: HBASE-6092 URL: https://issues.apache.org/jira/browse/HBASE-6092 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.94.0, 0.96.0, 0.94.1 Reporter: Laxman Assignee: Laxman Labels: acl, security Fix For: 0.96.0, 0.94.1 Attachments: HBASE-6092.1.patch, HBASE-6092.patch Currently, flush, split and compaction are not checked for authorization in AccessController. With the current implementation any unauthorized client can trigger these operations on a table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (HBASE-6188) Remove the concept of table owner
[ https://issues.apache.org/jira/browse/HBASE-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294014#comment-13294014 ] Andrew Purtell edited comment on HBASE-6188 at 6/13/12 12:21 AM: - HBASE-6092 has been committed. was (Author: apurtell): I'm going to commit HBASE-6092 in a few minutes. Remove the concept of table owner - Key: HBASE-6188 URL: https://issues.apache.org/jira/browse/HBASE-6188 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.94.0, 0.96.0, 0.94.1 Reporter: Andrew Purtell Assignee: Laxman Labels: security Fix For: 0.96.0, 0.94.1 Attachments: HBASE-6188.patch The table owner concept was a design simplification in the initial drop. First, the design changes under review means only a user with GLOBAL CREATE permission can create a table, which will probably be an administrator. Then, granting implicit permissions may lead to oversights and it adds unnecessary conditionals to our code. So instead the administrator with GLOBAL CREATE permission should make the appropriate grants at table create time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6184) HRegionInfo was null or empty in Meta
[ https://issues.apache.org/jira/browse/HBASE-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294051#comment-13294051 ] jiafeng.zhang commented on HBASE-6184: -- public HRegionInfo(final byte[] tableName, final byte[] startKey, final byte[] endKey, final boolean split, final long regionid) throws IllegalArgumentException { super(); if (tableName == null) { throw new IllegalArgumentException(tableName cannot be null); } this.tableName = tableName.clone(); this.offLine = false; this.regionId = regionid; this.regionName = createRegionName(this.tableName, startKey, regionId, true); this.regionNameStr = Bytes.toStringBinary(this.regionName); this.split = split; this.endKey = endKey == null? HConstants.EMPTY_END_ROW: endKey.clone(); this.startKey = startKey == null? HConstants.EMPTY_START_ROW: startKey.clone(); this.tableName = tableName.clone(); setHashCode(); } HRegionInfo was null or empty in Meta -- Key: HBASE-6184 URL: https://issues.apache.org/jira/browse/HBASE-6184 Project: HBase Issue Type: Bug Components: client, io Affects Versions: 0.94.0 Reporter: jiafeng.zhang Fix For: 0.94.0 Attachments: HBASE-6184.patch insert data hadoop-0.23.2 + hbase-0.94.0 2012-06-07 13:09:38,573 WARN [org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation] Encountered problems when prefetch META table: java.io.IOException: HRegionInfo was null or empty in Meta for hbase_one_col, row=hbase_one_col,09115303780247449149,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:160) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:48) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:126) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:123) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:359) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:123) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:99) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:894) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:948) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:836) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1482) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1367) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:945) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:801) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:776) at org.apache.hadoop.hbase.client.HTablePool$PooledHTable.put(HTablePool.java:397) at com.dinglicom.hbase.HbaseImport.insertData(HbaseImport.java:177) at com.dinglicom.hbase.HbaseImport.run(HbaseImport.java:210) at java.lang.Thread.run(Thread.java:662) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6134) Improvement for split-worker to speed up distributed-split-log
[ https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294053#comment-13294053 ] Zhihong Ted Yu commented on HBASE-6134: --- TestServerCustomProtocol passes locally. Will integrate later if there is no objection. Improvement for split-worker to speed up distributed-split-log -- Key: HBASE-6134 URL: https://issues.apache.org/jira/browse/HBASE-6134 Project: HBase Issue Type: Improvement Components: wal Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.96.0 Attachments: 6134v4.patch, HBASE-6134.patch, HBASE-6134v2.patch, HBASE-6134v3-92.patch, HBASE-6134v3.patch, HBASE-6134v4.patch First,we do the test between local-master-splitting and distributed-log-splitting Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths splitting work), 400 regions in one hlog file local-master-split:60s+ distributed-log-splitting:165s+ In fact, in our production environment, distributed-log-splitting also took 60s with 30 regionservers for 34 hlog files (regionserver may be in high load) We found split-worker split one log file took about 20s (30ms~50ms per writer.close(); 10ms per create writers ) I think we could do the improvement for this: Parallelizing the create and close writers in threads In the patch, change the logic for distributed-log-splitting same as the local-master-splitting and parallelizing the close in threads. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6184) HRegionInfo was null or empty in Meta
[ https://issues.apache.org/jira/browse/HBASE-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294055#comment-13294055 ] jiafeng.zhang commented on HBASE-6184: -- This change, my program is no problem HRegionInfo was null or empty in Meta -- Key: HBASE-6184 URL: https://issues.apache.org/jira/browse/HBASE-6184 Project: HBase Issue Type: Bug Components: client, io Affects Versions: 0.94.0 Reporter: jiafeng.zhang Fix For: 0.94.0 Attachments: HBASE-6184.patch insert data hadoop-0.23.2 + hbase-0.94.0 2012-06-07 13:09:38,573 WARN [org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation] Encountered problems when prefetch META table: java.io.IOException: HRegionInfo was null or empty in Meta for hbase_one_col, row=hbase_one_col,09115303780247449149,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:160) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:48) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:126) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:123) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:359) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:123) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:99) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:894) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:948) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:836) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1482) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1367) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:945) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:801) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:776) at org.apache.hadoop.hbase.client.HTablePool$PooledHTable.put(HTablePool.java:397) at com.dinglicom.hbase.HbaseImport.insertData(HbaseImport.java:177) at com.dinglicom.hbase.HbaseImport.run(HbaseImport.java:210) at java.lang.Thread.run(Thread.java:662) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6092) Authorize flush, split, compact operations in AccessController
[ https://issues.apache.org/jira/browse/HBASE-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294060#comment-13294060 ] Hudson commented on HBASE-6092: --- Integrated in HBase-TRUNK #3019 (See [https://builds.apache.org/job/HBase-TRUNK/3019/]) HBASE-6092. Authorize flush, split, compact operations in AccessController (Laxman) (Revision 1349596) Result = SUCCESS apurtell : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/RegionObserver.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController.java Authorize flush, split, compact operations in AccessController -- Key: HBASE-6092 URL: https://issues.apache.org/jira/browse/HBASE-6092 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.94.0, 0.96.0, 0.94.1 Reporter: Laxman Assignee: Laxman Labels: acl, security Fix For: 0.96.0, 0.94.1 Attachments: HBASE-6092.1.patch, HBASE-6092.patch Currently, flush, split and compaction are not checked for authorization in AccessController. With the current implementation any unauthorized client can trigger these operations on a table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5970) Improve the AssignmentManager#updateTimer and speed up handling opened event
[ https://issues.apache.org/jira/browse/HBASE-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294063#comment-13294063 ] chunhui shen commented on HBASE-5970: - @ram How much time did you took to enable 2 regions, if regions is not very much, master handle fast enough, could you test with 100,000 regions? In our testing, RS open region fast, but master handle opened event slowly, and find 100,000 regions much much slower than 50,000 regions Another reason, recently we found RS open regions very slowly after 0.92 if one table has much regions You could see the following code to get the detail {code} public RegionOpeningState openRegion(HRegionInfo region, int versionOfOfflineNode){ ... HTableDescriptor htd = this.tableDescriptors.get(region.getTableName()); ... public HTableDescriptor get(final String tablename){ ... long modtime = getTableInfoModtime(this.fs, this.rootdir, tablename); ... } } {code} getTableInfoModtime-getTableInfoPath-getTableInfoPath-FSUtils.listStatus() if one table has much regions, FSUtils.listStatus() will take much time, and opening region in parallel on the rs will be closed to serially. So maybe we should do the improvement for above code. Improve the AssignmentManager#updateTimer and speed up handling opened event Key: HBASE-5970 URL: https://issues.apache.org/jira/browse/HBASE-5970 Project: HBase Issue Type: Improvement Components: master Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.96.0 Attachments: 5970v3.patch, HBASE-5970.patch, HBASE-5970v2.patch, HBASE-5970v3.patch, HBASE-5970v4.patch, HBASE-5970v4.patch We found handing opened event very slow in the environment with lots of regions. The problem is the slow AssignmentManager#updateTimer. We do the test for bulk assigning 10w (i.e. 100k) regions, the whole process of bulk assigning took 1 hours. 2012-05-06 20:31:49,201 INFO org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning 10 region(s) round-robin across 5 server(s) 2012-05-06 21:26:32,103 INFO org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning done I think we could do the improvement for the AssignmentManager#updateTimer: Make a thread do this work. After the improvement, it took only 4.5mins 2012-05-07 11:03:36,581 INFO org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning 10 region(s) across 5 server(s), retainAssignment=true 2012-05-07 11:07:57,073 INFO org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning done -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6092) Authorize flush, split, compact operations in AccessController
[ https://issues.apache.org/jira/browse/HBASE-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294067#comment-13294067 ] Hudson commented on HBASE-6092: --- Integrated in HBase-0.94 #255 (See [https://builds.apache.org/job/HBase-0.94/255/]) HBASE-6092. Authorize flush, split, compact operations in AccessController (Laxman) (Revision 1349597) Result = SUCCESS apurtell : Files : * /hbase/branches/0.94/security/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java * /hbase/branches/0.94/security/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/coprocessor/RegionObserver.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java Authorize flush, split, compact operations in AccessController -- Key: HBASE-6092 URL: https://issues.apache.org/jira/browse/HBASE-6092 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.94.0, 0.96.0, 0.94.1 Reporter: Laxman Assignee: Laxman Labels: acl, security Fix For: 0.96.0, 0.94.1 Attachments: HBASE-6092.1.patch, HBASE-6092.patch Currently, flush, split and compaction are not checked for authorization in AccessController. With the current implementation any unauthorized client can trigger these operations on a table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5970) Improve the AssignmentManager#updateTimer and speed up handling opened event
[ https://issues.apache.org/jira/browse/HBASE-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294072#comment-13294072 ] Zhihong Ted Yu commented on HBASE-5970: --- @Chunhui: You can open a new issue for improving the above code. Improve the AssignmentManager#updateTimer and speed up handling opened event Key: HBASE-5970 URL: https://issues.apache.org/jira/browse/HBASE-5970 Project: HBase Issue Type: Improvement Components: master Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.96.0 Attachments: 5970v3.patch, HBASE-5970.patch, HBASE-5970v2.patch, HBASE-5970v3.patch, HBASE-5970v4.patch, HBASE-5970v4.patch We found handing opened event very slow in the environment with lots of regions. The problem is the slow AssignmentManager#updateTimer. We do the test for bulk assigning 10w (i.e. 100k) regions, the whole process of bulk assigning took 1 hours. 2012-05-06 20:31:49,201 INFO org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning 10 region(s) round-robin across 5 server(s) 2012-05-06 21:26:32,103 INFO org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning done I think we could do the improvement for the AssignmentManager#updateTimer: Make a thread do this work. After the improvement, it took only 4.5mins 2012-05-07 11:03:36,581 INFO org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning 10 region(s) across 5 server(s), retainAssignment=true 2012-05-07 11:07:57,073 INFO org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning done -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6185) region autoSplit when not reach 'hbase.hregion.max.filesize'
[ https://issues.apache.org/jira/browse/HBASE-6185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nneverwei updated HBASE-6185: - Attachment: (was: HBASE-6185.patch) region autoSplit when not reach 'hbase.hregion.max.filesize' Key: HBASE-6185 URL: https://issues.apache.org/jira/browse/HBASE-6185 Project: HBase Issue Type: Bug Components: documentation Affects Versions: 0.94.0 Reporter: nneverwei When using hbase0.94.0 we met a strange problem. We config the 'hbase.hregion.max.filesize' to 100Gb (The recommed value to act as auto-split turn off). {code:xml} property namehbase.hregion.max.filesize/name value107374182400/value /property {code} Then we keep putting datas into a table. But when the data size far more less than 100Gb(about 500~600 uncompressed datas), the table auto splte to 2 regions... I change the log4j config to DEBUG, and saw logs below: {code} 2012-06-07 10:30:52,161 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~128.0m/134221272, currentsize=1.5m/1617744 for region FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. in 3201ms, sequenceid=176387980, compaction requested=false 2012-06-07 10:30:52,161 DEBUG org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=138657416, sizeToCheck=134217728, regionsWithCommonTable=1 2012-06-07 10:30:52,161 DEBUG org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=138657416, sizeToCheck=134217728, regionsWithCommonTable=1 2012-06-07 10:30:52,240 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Split requested for FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8.. compaction_queue=(0:0), split_queue=0 2012-06-07 10:30:52,265 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. 2012-06-07 10:30:52,265 DEBUG org.apache.hadoop.hbase.regionserver.SplitTransaction: regionserver:60020-0x137c4929efe0001 Creating ephemeral node for 7b229abcd0785408251a579e9bdf49c8 in SPLITTING state 2012-06-07 10:30:52,368 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x137c4929efe0001 Attempting to transition node 7b229abcd0785408251a579e9bdf49c8 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING 2012-06-07 10:30:52,382 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x137c4929efe0001 Successfully transitioned node 7b229abcd0785408251a579e9bdf49c8 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING 2012-06-07 10:30:52,410 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8.: disabling compactions flushes 2012-06-07 10:30:52,410 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. is closing 2012-06-07 10:30:52,411 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. is closing {code} {color:red}IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=138657416, sizeToCheck=134217728{color} I did not config splitPolicy for hbase, so it means *IncreasingToUpperBoundRegionSplitPolicy is the default splitPolicy of 0.94.0* After add {code:xml} property namehbase.regionserver.region.split.policy/name valueorg.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy/value /property {code} autosplit did not happen again and everything goes well. But we can still see javadoc on ConstantSizeRegionSplitPolicy, it says 'This is the default split policy'. Or even in the http://hbase.apache.org/book/regions.arch.html 9.7.4.1. Custom Split Policies, 'default split policy: ConstantSizeRegionSplitPolicy.'. Those may mistaken us that if we set hbase.hregion.max.filesize to 100Gb, than the auto-split can be almost shutdown. You may change those docs, and What more, in many scenerys, we actually need to control split manually(As you know when spliting the table are offline, reads and writes will fail) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6185) region autoSplit when not reach 'hbase.hregion.max.filesize'
[ https://issues.apache.org/jira/browse/HBASE-6185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nneverwei updated HBASE-6185: - Fix Version/s: 0.94.1 Status: Patch Available (was: Open) long line warpped region autoSplit when not reach 'hbase.hregion.max.filesize' Key: HBASE-6185 URL: https://issues.apache.org/jira/browse/HBASE-6185 Project: HBase Issue Type: Bug Components: documentation Affects Versions: 0.94.0 Reporter: nneverwei Fix For: 0.94.1 Attachments: HBASE-6185.patch When using hbase0.94.0 we met a strange problem. We config the 'hbase.hregion.max.filesize' to 100Gb (The recommed value to act as auto-split turn off). {code:xml} property namehbase.hregion.max.filesize/name value107374182400/value /property {code} Then we keep putting datas into a table. But when the data size far more less than 100Gb(about 500~600 uncompressed datas), the table auto splte to 2 regions... I change the log4j config to DEBUG, and saw logs below: {code} 2012-06-07 10:30:52,161 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~128.0m/134221272, currentsize=1.5m/1617744 for region FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. in 3201ms, sequenceid=176387980, compaction requested=false 2012-06-07 10:30:52,161 DEBUG org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=138657416, sizeToCheck=134217728, regionsWithCommonTable=1 2012-06-07 10:30:52,161 DEBUG org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=138657416, sizeToCheck=134217728, regionsWithCommonTable=1 2012-06-07 10:30:52,240 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Split requested for FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8.. compaction_queue=(0:0), split_queue=0 2012-06-07 10:30:52,265 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. 2012-06-07 10:30:52,265 DEBUG org.apache.hadoop.hbase.regionserver.SplitTransaction: regionserver:60020-0x137c4929efe0001 Creating ephemeral node for 7b229abcd0785408251a579e9bdf49c8 in SPLITTING state 2012-06-07 10:30:52,368 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x137c4929efe0001 Attempting to transition node 7b229abcd0785408251a579e9bdf49c8 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING 2012-06-07 10:30:52,382 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x137c4929efe0001 Successfully transitioned node 7b229abcd0785408251a579e9bdf49c8 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING 2012-06-07 10:30:52,410 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8.: disabling compactions flushes 2012-06-07 10:30:52,410 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. is closing 2012-06-07 10:30:52,411 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. is closing {code} {color:red}IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=138657416, sizeToCheck=134217728{color} I did not config splitPolicy for hbase, so it means *IncreasingToUpperBoundRegionSplitPolicy is the default splitPolicy of 0.94.0* After add {code:xml} property namehbase.regionserver.region.split.policy/name valueorg.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy/value /property {code} autosplit did not happen again and everything goes well. But we can still see javadoc on ConstantSizeRegionSplitPolicy, it says 'This is the default split policy'. Or even in the http://hbase.apache.org/book/regions.arch.html 9.7.4.1. Custom Split Policies, 'default split policy: ConstantSizeRegionSplitPolicy.'. Those may mistaken us that if we set hbase.hregion.max.filesize to 100Gb, than the auto-split can be almost shutdown. You may change those docs, and What more, in many scenerys, we actually need to control split manually(As you know when spliting the table are offline, reads and writes will fail) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6053) Enhance TestRegionRebalancing test to be a system test
[ https://issues.apache.org/jira/browse/HBASE-6053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294075#comment-13294075 ] Devaraj Das commented on HBASE-6053: @Enis, yes the RealHBaseCluster class upon initialization kills all the known servers. Ok, I'll remove the RandomKiller, and have it in a follow up. Enhance TestRegionRebalancing test to be a system test -- Key: HBASE-6053 URL: https://issues.apache.org/jira/browse/HBASE-6053 Project: HBase Issue Type: Sub-task Components: test Reporter: Devaraj Das Assignee: Devaraj Das Priority: Minor Attachments: 6053-1.patch, regionRebalancingSystemTest.txt TestRegionRebalancing can be converted to be a system test -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6185) region autoSplit when not reach 'hbase.hregion.max.filesize'
[ https://issues.apache.org/jira/browse/HBASE-6185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nneverwei updated HBASE-6185: - Attachment: HBASE-6185.patch region autoSplit when not reach 'hbase.hregion.max.filesize' Key: HBASE-6185 URL: https://issues.apache.org/jira/browse/HBASE-6185 Project: HBase Issue Type: Bug Components: documentation Affects Versions: 0.94.0 Reporter: nneverwei Fix For: 0.94.1 Attachments: HBASE-6185.patch When using hbase0.94.0 we met a strange problem. We config the 'hbase.hregion.max.filesize' to 100Gb (The recommed value to act as auto-split turn off). {code:xml} property namehbase.hregion.max.filesize/name value107374182400/value /property {code} Then we keep putting datas into a table. But when the data size far more less than 100Gb(about 500~600 uncompressed datas), the table auto splte to 2 regions... I change the log4j config to DEBUG, and saw logs below: {code} 2012-06-07 10:30:52,161 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~128.0m/134221272, currentsize=1.5m/1617744 for region FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. in 3201ms, sequenceid=176387980, compaction requested=false 2012-06-07 10:30:52,161 DEBUG org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=138657416, sizeToCheck=134217728, regionsWithCommonTable=1 2012-06-07 10:30:52,161 DEBUG org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=138657416, sizeToCheck=134217728, regionsWithCommonTable=1 2012-06-07 10:30:52,240 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Split requested for FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8.. compaction_queue=(0:0), split_queue=0 2012-06-07 10:30:52,265 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. 2012-06-07 10:30:52,265 DEBUG org.apache.hadoop.hbase.regionserver.SplitTransaction: regionserver:60020-0x137c4929efe0001 Creating ephemeral node for 7b229abcd0785408251a579e9bdf49c8 in SPLITTING state 2012-06-07 10:30:52,368 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x137c4929efe0001 Attempting to transition node 7b229abcd0785408251a579e9bdf49c8 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING 2012-06-07 10:30:52,382 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x137c4929efe0001 Successfully transitioned node 7b229abcd0785408251a579e9bdf49c8 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING 2012-06-07 10:30:52,410 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8.: disabling compactions flushes 2012-06-07 10:30:52,410 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. is closing 2012-06-07 10:30:52,411 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. is closing {code} {color:red}IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=138657416, sizeToCheck=134217728{color} I did not config splitPolicy for hbase, so it means *IncreasingToUpperBoundRegionSplitPolicy is the default splitPolicy of 0.94.0* After add {code:xml} property namehbase.regionserver.region.split.policy/name valueorg.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy/value /property {code} autosplit did not happen again and everything goes well. But we can still see javadoc on ConstantSizeRegionSplitPolicy, it says 'This is the default split policy'. Or even in the http://hbase.apache.org/book/regions.arch.html 9.7.4.1. Custom Split Policies, 'default split policy: ConstantSizeRegionSplitPolicy.'. Those may mistaken us that if we set hbase.hregion.max.filesize to 100Gb, than the auto-split can be almost shutdown. You may change those docs, and What more, in many scenerys, we actually need to control split manually(As you know when spliting the table are offline, reads and writes will fail) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6185) region autoSplit when not reach 'hbase.hregion.max.filesize'
[ https://issues.apache.org/jira/browse/HBASE-6185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294079#comment-13294079 ] Zhihong Ted Yu commented on HBASE-6185: --- {code} + * This is the default split policy. From 0.94.0 the default split policy change {code} The above should read 'This was the default split policy. From 0.94.0 on the default split policy has changed' region autoSplit when not reach 'hbase.hregion.max.filesize' Key: HBASE-6185 URL: https://issues.apache.org/jira/browse/HBASE-6185 Project: HBase Issue Type: Bug Components: documentation Affects Versions: 0.94.0 Reporter: nneverwei Fix For: 0.94.1 Attachments: HBASE-6185.patch When using hbase0.94.0 we met a strange problem. We config the 'hbase.hregion.max.filesize' to 100Gb (The recommed value to act as auto-split turn off). {code:xml} property namehbase.hregion.max.filesize/name value107374182400/value /property {code} Then we keep putting datas into a table. But when the data size far more less than 100Gb(about 500~600 uncompressed datas), the table auto splte to 2 regions... I change the log4j config to DEBUG, and saw logs below: {code} 2012-06-07 10:30:52,161 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~128.0m/134221272, currentsize=1.5m/1617744 for region FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. in 3201ms, sequenceid=176387980, compaction requested=false 2012-06-07 10:30:52,161 DEBUG org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=138657416, sizeToCheck=134217728, regionsWithCommonTable=1 2012-06-07 10:30:52,161 DEBUG org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=138657416, sizeToCheck=134217728, regionsWithCommonTable=1 2012-06-07 10:30:52,240 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Split requested for FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8.. compaction_queue=(0:0), split_queue=0 2012-06-07 10:30:52,265 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. 2012-06-07 10:30:52,265 DEBUG org.apache.hadoop.hbase.regionserver.SplitTransaction: regionserver:60020-0x137c4929efe0001 Creating ephemeral node for 7b229abcd0785408251a579e9bdf49c8 in SPLITTING state 2012-06-07 10:30:52,368 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x137c4929efe0001 Attempting to transition node 7b229abcd0785408251a579e9bdf49c8 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING 2012-06-07 10:30:52,382 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x137c4929efe0001 Successfully transitioned node 7b229abcd0785408251a579e9bdf49c8 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING 2012-06-07 10:30:52,410 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8.: disabling compactions flushes 2012-06-07 10:30:52,410 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. is closing 2012-06-07 10:30:52,411 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. is closing {code} {color:red}IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=138657416, sizeToCheck=134217728{color} I did not config splitPolicy for hbase, so it means *IncreasingToUpperBoundRegionSplitPolicy is the default splitPolicy of 0.94.0* After add {code:xml} property namehbase.regionserver.region.split.policy/name valueorg.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy/value /property {code} autosplit did not happen again and everything goes well. But we can still see javadoc on ConstantSizeRegionSplitPolicy, it says 'This is the default split policy'. Or even in the http://hbase.apache.org/book/regions.arch.html 9.7.4.1. Custom Split Policies, 'default split policy: ConstantSizeRegionSplitPolicy.'. Those may mistaken us that if we set hbase.hregion.max.filesize to 100Gb, than the auto-split can be almost shutdown. You may change those docs, and What more, in many scenerys, we actually need to control split manually(As you know when spliting the table are offline, reads and writes will fail) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Updated] (HBASE-6185) Update javadoc for ConstantSizeRegionSplitPolicy class
[ https://issues.apache.org/jira/browse/HBASE-6185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6185: -- Hadoop Flags: Reviewed Summary: Update javadoc for ConstantSizeRegionSplitPolicy class (was: region autoSplit when not reach 'hbase.hregion.max.filesize') Update javadoc for ConstantSizeRegionSplitPolicy class -- Key: HBASE-6185 URL: https://issues.apache.org/jira/browse/HBASE-6185 Project: HBase Issue Type: Bug Components: documentation Affects Versions: 0.94.0 Reporter: nneverwei Fix For: 0.94.1 Attachments: HBASE-6185.patch When using hbase0.94.0 we met a strange problem. We config the 'hbase.hregion.max.filesize' to 100Gb (The recommed value to act as auto-split turn off). {code:xml} property namehbase.hregion.max.filesize/name value107374182400/value /property {code} Then we keep putting datas into a table. But when the data size far more less than 100Gb(about 500~600 uncompressed datas), the table auto splte to 2 regions... I change the log4j config to DEBUG, and saw logs below: {code} 2012-06-07 10:30:52,161 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~128.0m/134221272, currentsize=1.5m/1617744 for region FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. in 3201ms, sequenceid=176387980, compaction requested=false 2012-06-07 10:30:52,161 DEBUG org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=138657416, sizeToCheck=134217728, regionsWithCommonTable=1 2012-06-07 10:30:52,161 DEBUG org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=138657416, sizeToCheck=134217728, regionsWithCommonTable=1 2012-06-07 10:30:52,240 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Split requested for FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8.. compaction_queue=(0:0), split_queue=0 2012-06-07 10:30:52,265 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. 2012-06-07 10:30:52,265 DEBUG org.apache.hadoop.hbase.regionserver.SplitTransaction: regionserver:60020-0x137c4929efe0001 Creating ephemeral node for 7b229abcd0785408251a579e9bdf49c8 in SPLITTING state 2012-06-07 10:30:52,368 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x137c4929efe0001 Attempting to transition node 7b229abcd0785408251a579e9bdf49c8 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING 2012-06-07 10:30:52,382 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x137c4929efe0001 Successfully transitioned node 7b229abcd0785408251a579e9bdf49c8 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING 2012-06-07 10:30:52,410 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8.: disabling compactions flushes 2012-06-07 10:30:52,410 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. is closing 2012-06-07 10:30:52,411 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. is closing {code} {color:red}IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=138657416, sizeToCheck=134217728{color} I did not config splitPolicy for hbase, so it means *IncreasingToUpperBoundRegionSplitPolicy is the default splitPolicy of 0.94.0* After add {code:xml} property namehbase.regionserver.region.split.policy/name valueorg.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy/value /property {code} autosplit did not happen again and everything goes well. But we can still see javadoc on ConstantSizeRegionSplitPolicy, it says 'This is the default split policy'. Or even in the http://hbase.apache.org/book/regions.arch.html 9.7.4.1. Custom Split Policies, 'default split policy: ConstantSizeRegionSplitPolicy.'. Those may mistaken us that if we set hbase.hregion.max.filesize to 100Gb, than the auto-split can be almost shutdown. You may change those docs, and What more, in many scenerys, we actually need to control split manually(As you know when spliting the table are offline, reads and writes will fail) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see:
[jira] [Updated] (HBASE-6195) Increment data will be lost when the memstore is flushed
[ https://issues.apache.org/jira/browse/HBASE-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6195: -- Attachment: 6195.addendum Increment data will be lost when the memstore is flushed Key: HBASE-6195 URL: https://issues.apache.org/jira/browse/HBASE-6195 Project: HBase Issue Type: Bug Components: regionserver Reporter: Xing Shi Assignee: ShiXing Attachments: 6195-trunk-V7.patch, 6195.addendum, HBASE-6195-trunk-V2.patch, HBASE-6195-trunk-V3.patch, HBASE-6195-trunk-V4.patch, HBASE-6195-trunk-V5.patch, HBASE-6195-trunk-V6.patch, HBASE-6195-trunk.patch There are two problems in increment() now: First: I see that the timestamp(the variable now) in HRegion's Increment() is generated before got the rowLock, so when there are multi-thread increment the same row, although it generate earlier, it may got the lock later. Because increment just store one version, so till now, the result will still be right. When the region is flushing, these increment will read the kv from snapshot and memstore with whose timestamp is larger, and write it back to memstore. If the snapshot's timestamp larger than the memstore, the increment will got the old data and then do the increment, it's wrong. Secondly: Also there is a risk in increment. Because it writes the memstore first and then HLog, so if it writes HLog failed, the client will also read the incremented value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6185) Update javadoc for ConstantSizeRegionSplitPolicy class
[ https://issues.apache.org/jira/browse/HBASE-6185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nneverwei updated HBASE-6185: - Attachment: (was: HBASE-6185.patch) Update javadoc for ConstantSizeRegionSplitPolicy class -- Key: HBASE-6185 URL: https://issues.apache.org/jira/browse/HBASE-6185 Project: HBase Issue Type: Bug Components: documentation Affects Versions: 0.94.0 Reporter: nneverwei Fix For: 0.94.1 Attachments: HBASE-6185.patch When using hbase0.94.0 we met a strange problem. We config the 'hbase.hregion.max.filesize' to 100Gb (The recommed value to act as auto-split turn off). {code:xml} property namehbase.hregion.max.filesize/name value107374182400/value /property {code} Then we keep putting datas into a table. But when the data size far more less than 100Gb(about 500~600 uncompressed datas), the table auto splte to 2 regions... I change the log4j config to DEBUG, and saw logs below: {code} 2012-06-07 10:30:52,161 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~128.0m/134221272, currentsize=1.5m/1617744 for region FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. in 3201ms, sequenceid=176387980, compaction requested=false 2012-06-07 10:30:52,161 DEBUG org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=138657416, sizeToCheck=134217728, regionsWithCommonTable=1 2012-06-07 10:30:52,161 DEBUG org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=138657416, sizeToCheck=134217728, regionsWithCommonTable=1 2012-06-07 10:30:52,240 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Split requested for FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8.. compaction_queue=(0:0), split_queue=0 2012-06-07 10:30:52,265 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. 2012-06-07 10:30:52,265 DEBUG org.apache.hadoop.hbase.regionserver.SplitTransaction: regionserver:60020-0x137c4929efe0001 Creating ephemeral node for 7b229abcd0785408251a579e9bdf49c8 in SPLITTING state 2012-06-07 10:30:52,368 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x137c4929efe0001 Attempting to transition node 7b229abcd0785408251a579e9bdf49c8 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING 2012-06-07 10:30:52,382 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x137c4929efe0001 Successfully transitioned node 7b229abcd0785408251a579e9bdf49c8 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING 2012-06-07 10:30:52,410 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8.: disabling compactions flushes 2012-06-07 10:30:52,410 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. is closing 2012-06-07 10:30:52,411 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. is closing {code} {color:red}IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=138657416, sizeToCheck=134217728{color} I did not config splitPolicy for hbase, so it means *IncreasingToUpperBoundRegionSplitPolicy is the default splitPolicy of 0.94.0* After add {code:xml} property namehbase.regionserver.region.split.policy/name valueorg.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy/value /property {code} autosplit did not happen again and everything goes well. But we can still see javadoc on ConstantSizeRegionSplitPolicy, it says 'This is the default split policy'. Or even in the http://hbase.apache.org/book/regions.arch.html 9.7.4.1. Custom Split Policies, 'default split policy: ConstantSizeRegionSplitPolicy.'. Those may mistaken us that if we set hbase.hregion.max.filesize to 100Gb, than the auto-split can be almost shutdown. You may change those docs, and What more, in many scenerys, we actually need to control split manually(As you know when spliting the table are offline, reads and writes will fail) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6185) Update javadoc for ConstantSizeRegionSplitPolicy class
[ https://issues.apache.org/jira/browse/HBASE-6185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nneverwei updated HBASE-6185: - Attachment: HBASE-6185.patch Thanks Tad. Update javadoc for ConstantSizeRegionSplitPolicy class -- Key: HBASE-6185 URL: https://issues.apache.org/jira/browse/HBASE-6185 Project: HBase Issue Type: Bug Components: documentation Affects Versions: 0.94.0 Reporter: nneverwei Fix For: 0.94.1 Attachments: HBASE-6185.patch When using hbase0.94.0 we met a strange problem. We config the 'hbase.hregion.max.filesize' to 100Gb (The recommed value to act as auto-split turn off). {code:xml} property namehbase.hregion.max.filesize/name value107374182400/value /property {code} Then we keep putting datas into a table. But when the data size far more less than 100Gb(about 500~600 uncompressed datas), the table auto splte to 2 regions... I change the log4j config to DEBUG, and saw logs below: {code} 2012-06-07 10:30:52,161 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~128.0m/134221272, currentsize=1.5m/1617744 for region FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. in 3201ms, sequenceid=176387980, compaction requested=false 2012-06-07 10:30:52,161 DEBUG org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=138657416, sizeToCheck=134217728, regionsWithCommonTable=1 2012-06-07 10:30:52,161 DEBUG org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=138657416, sizeToCheck=134217728, regionsWithCommonTable=1 2012-06-07 10:30:52,240 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Split requested for FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8.. compaction_queue=(0:0), split_queue=0 2012-06-07 10:30:52,265 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. 2012-06-07 10:30:52,265 DEBUG org.apache.hadoop.hbase.regionserver.SplitTransaction: regionserver:60020-0x137c4929efe0001 Creating ephemeral node for 7b229abcd0785408251a579e9bdf49c8 in SPLITTING state 2012-06-07 10:30:52,368 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x137c4929efe0001 Attempting to transition node 7b229abcd0785408251a579e9bdf49c8 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING 2012-06-07 10:30:52,382 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x137c4929efe0001 Successfully transitioned node 7b229abcd0785408251a579e9bdf49c8 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING 2012-06-07 10:30:52,410 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8.: disabling compactions flushes 2012-06-07 10:30:52,410 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. is closing 2012-06-07 10:30:52,411 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. is closing {code} {color:red}IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=138657416, sizeToCheck=134217728{color} I did not config splitPolicy for hbase, so it means *IncreasingToUpperBoundRegionSplitPolicy is the default splitPolicy of 0.94.0* After add {code:xml} property namehbase.regionserver.region.split.policy/name valueorg.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy/value /property {code} autosplit did not happen again and everything goes well. But we can still see javadoc on ConstantSizeRegionSplitPolicy, it says 'This is the default split policy'. Or even in the http://hbase.apache.org/book/regions.arch.html 9.7.4.1. Custom Split Policies, 'default split policy: ConstantSizeRegionSplitPolicy.'. Those may mistaken us that if we set hbase.hregion.max.filesize to 100Gb, than the auto-split can be almost shutdown. You may change those docs, and What more, in many scenerys, we actually need to control split manually(As you know when spliting the table are offline, reads and writes will fail) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6195) Increment data will be lost when the memstore is flushed
[ https://issues.apache.org/jira/browse/HBASE-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6195: -- Fix Version/s: 0.96.0 Addendum integrated to trunk. Thanks for the reminder, Xing. Increment data will be lost when the memstore is flushed Key: HBASE-6195 URL: https://issues.apache.org/jira/browse/HBASE-6195 Project: HBase Issue Type: Bug Components: regionserver Reporter: Xing Shi Assignee: ShiXing Fix For: 0.96.0 Attachments: 6195-trunk-V7.patch, 6195.addendum, HBASE-6195-trunk-V2.patch, HBASE-6195-trunk-V3.patch, HBASE-6195-trunk-V4.patch, HBASE-6195-trunk-V5.patch, HBASE-6195-trunk-V6.patch, HBASE-6195-trunk.patch There are two problems in increment() now: First: I see that the timestamp(the variable now) in HRegion's Increment() is generated before got the rowLock, so when there are multi-thread increment the same row, although it generate earlier, it may got the lock later. Because increment just store one version, so till now, the result will still be right. When the region is flushing, these increment will read the kv from snapshot and memstore with whose timestamp is larger, and write it back to memstore. If the snapshot's timestamp larger than the memstore, the increment will got the old data and then do the increment, it's wrong. Secondly: Also there is a risk in increment. Because it writes the memstore first and then HLog, so if it writes HLog failed, the client will also read the incremented value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6185) Update javadoc for ConstantSizeRegionSplitPolicy class
[ https://issues.apache.org/jira/browse/HBASE-6185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294090#comment-13294090 ] Hadoop QA commented on HBASE-6185: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12531905/HBASE-6185.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +0 tests included. The patch appears to be a documentation patch that doesn't require tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2156//console This message is automatically generated. Update javadoc for ConstantSizeRegionSplitPolicy class -- Key: HBASE-6185 URL: https://issues.apache.org/jira/browse/HBASE-6185 Project: HBase Issue Type: Bug Components: documentation Affects Versions: 0.94.0 Reporter: nneverwei Fix For: 0.94.1 Attachments: HBASE-6185.patch When using hbase0.94.0 we met a strange problem. We config the 'hbase.hregion.max.filesize' to 100Gb (The recommed value to act as auto-split turn off). {code:xml} property namehbase.hregion.max.filesize/name value107374182400/value /property {code} Then we keep putting datas into a table. But when the data size far more less than 100Gb(about 500~600 uncompressed datas), the table auto splte to 2 regions... I change the log4j config to DEBUG, and saw logs below: {code} 2012-06-07 10:30:52,161 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~128.0m/134221272, currentsize=1.5m/1617744 for region FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. in 3201ms, sequenceid=176387980, compaction requested=false 2012-06-07 10:30:52,161 DEBUG org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=138657416, sizeToCheck=134217728, regionsWithCommonTable=1 2012-06-07 10:30:52,161 DEBUG org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=138657416, sizeToCheck=134217728, regionsWithCommonTable=1 2012-06-07 10:30:52,240 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Split requested for FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8.. compaction_queue=(0:0), split_queue=0 2012-06-07 10:30:52,265 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. 2012-06-07 10:30:52,265 DEBUG org.apache.hadoop.hbase.regionserver.SplitTransaction: regionserver:60020-0x137c4929efe0001 Creating ephemeral node for 7b229abcd0785408251a579e9bdf49c8 in SPLITTING state 2012-06-07 10:30:52,368 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x137c4929efe0001 Attempting to transition node 7b229abcd0785408251a579e9bdf49c8 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING 2012-06-07 10:30:52,382 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x137c4929efe0001 Successfully transitioned node 7b229abcd0785408251a579e9bdf49c8 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING 2012-06-07 10:30:52,410 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8.: disabling compactions flushes 2012-06-07 10:30:52,410 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. is closing 2012-06-07 10:30:52,411 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. is closing {code} {color:red}IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=138657416, sizeToCheck=134217728{color} I did not config splitPolicy for hbase, so it means *IncreasingToUpperBoundRegionSplitPolicy is the default splitPolicy of 0.94.0* After add {code:xml} property namehbase.regionserver.region.split.policy/name valueorg.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy/value /property {code} autosplit did not happen again and everything goes well. But we can still see javadoc on ConstantSizeRegionSplitPolicy, it says 'This is the default split policy'. Or even in the http://hbase.apache.org/book/regions.arch.html 9.7.4.1. Custom Split Policies, 'default split policy: ConstantSizeRegionSplitPolicy.'. Those may mistaken us that if we set hbase.hregion.max.filesize to 100Gb, than the auto-split can be almost shutdown. You may change those docs, and
[jira] [Commented] (HBASE-6185) Update javadoc for ConstantSizeRegionSplitPolicy class
[ https://issues.apache.org/jira/browse/HBASE-6185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294093#comment-13294093 ] Zhihong Ted Yu commented on HBASE-6185: --- Please base the patch on trunk where the path starts with 'hbase-server/' There is no need to remove previous attachment. You can give subsequent patches version numbers. Update javadoc for ConstantSizeRegionSplitPolicy class -- Key: HBASE-6185 URL: https://issues.apache.org/jira/browse/HBASE-6185 Project: HBase Issue Type: Bug Components: documentation Affects Versions: 0.94.0 Reporter: nneverwei Fix For: 0.94.1 Attachments: HBASE-6185.patch When using hbase0.94.0 we met a strange problem. We config the 'hbase.hregion.max.filesize' to 100Gb (The recommed value to act as auto-split turn off). {code:xml} property namehbase.hregion.max.filesize/name value107374182400/value /property {code} Then we keep putting datas into a table. But when the data size far more less than 100Gb(about 500~600 uncompressed datas), the table auto splte to 2 regions... I change the log4j config to DEBUG, and saw logs below: {code} 2012-06-07 10:30:52,161 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~128.0m/134221272, currentsize=1.5m/1617744 for region FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. in 3201ms, sequenceid=176387980, compaction requested=false 2012-06-07 10:30:52,161 DEBUG org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=138657416, sizeToCheck=134217728, regionsWithCommonTable=1 2012-06-07 10:30:52,161 DEBUG org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=138657416, sizeToCheck=134217728, regionsWithCommonTable=1 2012-06-07 10:30:52,240 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Split requested for FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8.. compaction_queue=(0:0), split_queue=0 2012-06-07 10:30:52,265 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. 2012-06-07 10:30:52,265 DEBUG org.apache.hadoop.hbase.regionserver.SplitTransaction: regionserver:60020-0x137c4929efe0001 Creating ephemeral node for 7b229abcd0785408251a579e9bdf49c8 in SPLITTING state 2012-06-07 10:30:52,368 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x137c4929efe0001 Attempting to transition node 7b229abcd0785408251a579e9bdf49c8 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING 2012-06-07 10:30:52,382 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x137c4929efe0001 Successfully transitioned node 7b229abcd0785408251a579e9bdf49c8 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING 2012-06-07 10:30:52,410 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8.: disabling compactions flushes 2012-06-07 10:30:52,410 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. is closing 2012-06-07 10:30:52,411 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. is closing {code} {color:red}IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=138657416, sizeToCheck=134217728{color} I did not config splitPolicy for hbase, so it means *IncreasingToUpperBoundRegionSplitPolicy is the default splitPolicy of 0.94.0* After add {code:xml} property namehbase.regionserver.region.split.policy/name valueorg.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy/value /property {code} autosplit did not happen again and everything goes well. But we can still see javadoc on ConstantSizeRegionSplitPolicy, it says 'This is the default split policy'. Or even in the http://hbase.apache.org/book/regions.arch.html 9.7.4.1. Custom Split Policies, 'default split policy: ConstantSizeRegionSplitPolicy.'. Those may mistaken us that if we set hbase.hregion.max.filesize to 100Gb, than the auto-split can be almost shutdown. You may change those docs, and What more, in many scenerys, we actually need to control split manually(As you know when spliting the table are offline, reads and writes will fail) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Created] (HBASE-6204) Improvement for opening region on the regionserver
chunhui shen created HBASE-6204: --- Summary: Improvement for opening region on the regionserver Key: HBASE-6204 URL: https://issues.apache.org/jira/browse/HBASE-6204 Project: HBase Issue Type: Improvement Components: regionserver Reporter: chunhui shen Assignee: chunhui shen If one table has much regions, like 100k regions. We would find regionserver open region very slowly. Opening region in parallel on the rs will be closed to serially. The following code is the detail: {code} public RegionOpeningState openRegion(HRegionInfo region, int versionOfOfflineNode){ ... HTableDescriptor htd = this.tableDescriptors.get(region.getTableName()); ... public HTableDescriptor get(final String tablename){ ... long modtime = getTableInfoModtime(this.fs, this.rootdir, tablename); ... } } {code} getTableInfoModtime-getTableInfoPath-getTableInfoPath-FSUtils.listStatus() if one table has much regions, FSUtils.listStatus() will take much time How to improve the above code? I think an easy way is that make a dir (called .tableinfos) in the table dir and move the files .tableinfo.* to that dir. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6204) Improvement for opening region on the regionserver
[ https://issues.apache.org/jira/browse/HBASE-6204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chunhui shen updated HBASE-6204: Description: If one table has many regions, like 100k regions. We would find regionserver open region very slowly. Opening region in parallel on the rs will be closed to serially. The following code is the detail: {code} public RegionOpeningState openRegion(HRegionInfo region, int versionOfOfflineNode){ ... HTableDescriptor htd = this.tableDescriptors.get(region.getTableName()); ... public HTableDescriptor get(final String tablename){ ... long modtime = getTableInfoModtime(this.fs, this.rootdir, tablename); ... } } {code} getTableInfoModtime-getTableInfoPath-getTableInfoPath-FSUtils.listStatus() if one table has much regions, FSUtils.listStatus() will take much time How to improve the above code? I think an easy way is that make a dir (called .tableinfos) in the table dir and move the files .tableinfo.* to that dir. was: If one table has much regions, like 100k regions. We would find regionserver open region very slowly. Opening region in parallel on the rs will be closed to serially. The following code is the detail: {code} public RegionOpeningState openRegion(HRegionInfo region, int versionOfOfflineNode){ ... HTableDescriptor htd = this.tableDescriptors.get(region.getTableName()); ... public HTableDescriptor get(final String tablename){ ... long modtime = getTableInfoModtime(this.fs, this.rootdir, tablename); ... } } {code} getTableInfoModtime-getTableInfoPath-getTableInfoPath-FSUtils.listStatus() if one table has much regions, FSUtils.listStatus() will take much time How to improve the above code? I think an easy way is that make a dir (called .tableinfos) in the table dir and move the files .tableinfo.* to that dir. Improvement for opening region on the regionserver -- Key: HBASE-6204 URL: https://issues.apache.org/jira/browse/HBASE-6204 Project: HBase Issue Type: Improvement Components: regionserver Reporter: chunhui shen Assignee: chunhui shen If one table has many regions, like 100k regions. We would find regionserver open region very slowly. Opening region in parallel on the rs will be closed to serially. The following code is the detail: {code} public RegionOpeningState openRegion(HRegionInfo region, int versionOfOfflineNode){ ... HTableDescriptor htd = this.tableDescriptors.get(region.getTableName()); ... public HTableDescriptor get(final String tablename){ ... long modtime = getTableInfoModtime(this.fs, this.rootdir, tablename); ... } } {code} getTableInfoModtime-getTableInfoPath-getTableInfoPath-FSUtils.listStatus() if one table has much regions, FSUtils.listStatus() will take much time How to improve the above code? I think an easy way is that make a dir (called .tableinfos) in the table dir and move the files .tableinfo.* to that dir. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6185) Update javadoc for ConstantSizeRegionSplitPolicy class
[ https://issues.apache.org/jira/browse/HBASE-6185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nneverwei updated HBASE-6185: - Attachment: HBASE-6185.v2.patch Update javadoc for ConstantSizeRegionSplitPolicy class -- Key: HBASE-6185 URL: https://issues.apache.org/jira/browse/HBASE-6185 Project: HBase Issue Type: Bug Components: documentation Affects Versions: 0.94.0 Reporter: nneverwei Fix For: 0.94.1 Attachments: HBASE-6185.patch, HBASE-6185.v2.patch When using hbase0.94.0 we met a strange problem. We config the 'hbase.hregion.max.filesize' to 100Gb (The recommed value to act as auto-split turn off). {code:xml} property namehbase.hregion.max.filesize/name value107374182400/value /property {code} Then we keep putting datas into a table. But when the data size far more less than 100Gb(about 500~600 uncompressed datas), the table auto splte to 2 regions... I change the log4j config to DEBUG, and saw logs below: {code} 2012-06-07 10:30:52,161 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~128.0m/134221272, currentsize=1.5m/1617744 for region FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. in 3201ms, sequenceid=176387980, compaction requested=false 2012-06-07 10:30:52,161 DEBUG org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=138657416, sizeToCheck=134217728, regionsWithCommonTable=1 2012-06-07 10:30:52,161 DEBUG org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=138657416, sizeToCheck=134217728, regionsWithCommonTable=1 2012-06-07 10:30:52,240 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Split requested for FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8.. compaction_queue=(0:0), split_queue=0 2012-06-07 10:30:52,265 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. 2012-06-07 10:30:52,265 DEBUG org.apache.hadoop.hbase.regionserver.SplitTransaction: regionserver:60020-0x137c4929efe0001 Creating ephemeral node for 7b229abcd0785408251a579e9bdf49c8 in SPLITTING state 2012-06-07 10:30:52,368 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x137c4929efe0001 Attempting to transition node 7b229abcd0785408251a579e9bdf49c8 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING 2012-06-07 10:30:52,382 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x137c4929efe0001 Successfully transitioned node 7b229abcd0785408251a579e9bdf49c8 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING 2012-06-07 10:30:52,410 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8.: disabling compactions flushes 2012-06-07 10:30:52,410 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. is closing 2012-06-07 10:30:52,411 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; FileStructIndex,,1339032525500.7b229abcd0785408251a579e9bdf49c8. is closing {code} {color:red}IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because info size=138657416, sizeToCheck=134217728{color} I did not config splitPolicy for hbase, so it means *IncreasingToUpperBoundRegionSplitPolicy is the default splitPolicy of 0.94.0* After add {code:xml} property namehbase.regionserver.region.split.policy/name valueorg.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy/value /property {code} autosplit did not happen again and everything goes well. But we can still see javadoc on ConstantSizeRegionSplitPolicy, it says 'This is the default split policy'. Or even in the http://hbase.apache.org/book/regions.arch.html 9.7.4.1. Custom Split Policies, 'default split policy: ConstantSizeRegionSplitPolicy.'. Those may mistaken us that if we set hbase.hregion.max.filesize to 100Gb, than the auto-split can be almost shutdown. You may change those docs, and What more, in many scenerys, we actually need to control split manually(As you know when spliting the table are offline, reads and writes will fail) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira