[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418738#comment-13418738 ] Zhihong Ted Yu commented on HBASE-6389: --- Thanks for your explanation. Have you seen the test failure that I described above @ 19/Jul/12 03:34 ? Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments Key: HBASE-6389 URL: https://issues.apache.org/jira/browse/HBASE-6389 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.0, 0.96.0 Reporter: Aditya Kishore Assignee: Aditya Kishore Priority: Critical Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, HBASE-6389_trunk.patch Continuing from HBASE-6375. It seems I was mistaken in my assumption that changing the value of hbase.master.wait.on.regionservers.mintostart to a sufficient number (from default of 1) can help prevent assignment of all regions to one (or a small number of) region server(s). While this was the case in 0.90.x and 0.92.x, the behavior has changed in 0.94.0 onwards to address HBASE-4993. From 0.94.0 onwards, Master will proceed immediately after the timeout has lapsed, even if hbase.master.wait.on.regionservers.mintostart has not reached. Reading the current conditions of waitForRegionServers() clarifies it {code:title=ServerManager.java (trunk rev:1360470)} 581 /** 582 * Wait for the region servers to report in. 583 * We will wait until one of this condition is met: 584 * - the master is stopped 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of 587 *region servers is reached 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND 589 * there have been no new region server in for 590 * 'hbase.master.wait.on.regionservers.interval' time 591 * 592 * @throws InterruptedException 593 */ 594 public void waitForRegionServers(MonitoredTask status) 595 throws InterruptedException { 612 while ( 613 !this.master.isStopped() 614 slept timeout 615 count maxToStart 616 (lastCountChange+interval now || count minToStart) 617 ){ {code} So with the current conditions, the wait will end as soon as timeout is reached even lesser number of RS have checked-in with the Master and the master will proceed with the region assignment among these RSes alone. As mentioned in -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, and I concur, this could have disastrous effect in large cluster especially now that MSLAB is turned on. To enforce the required quorum as specified by hbase.master.wait.on.regionservers.mintostart irrespective of timeout, these conditions need to be modified as following {code:title=ServerManager.java} .. /** * Wait for the region servers to report in. * We will wait until one of this condition is met: * - the master is stopped * - the 'hbase.master.wait.on.regionservers.maxtostart' number of *region servers is reached * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND * there have been no new region server in for * 'hbase.master.wait.on.regionservers.interval' time AND * the 'hbase.master.wait.on.regionservers.timeout' is reached * * @throws InterruptedException */ public void waitForRegionServers(MonitoredTask status) .. .. int minToStart = this.master.getConfiguration(). getInt(hbase.master.wait.on.regionservers.mintostart, 1); int maxToStart = this.master.getConfiguration(). getInt(hbase.master.wait.on.regionservers.maxtostart, Integer.MAX_VALUE); if (maxToStart minToStart) { maxToStart = minToStart; } .. .. while ( !this.master.isStopped() count maxToStart (lastCountChange+interval now || timeout slept || count minToStart) ){ .. {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6389: -- Attachment: org.apache.hadoop.hbase.TestZooKeeper-output.txt Here was the test output from yesterday. Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments Key: HBASE-6389 URL: https://issues.apache.org/jira/browse/HBASE-6389 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.0, 0.96.0 Reporter: Aditya Kishore Assignee: Aditya Kishore Priority: Critical Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt Continuing from HBASE-6375. It seems I was mistaken in my assumption that changing the value of hbase.master.wait.on.regionservers.mintostart to a sufficient number (from default of 1) can help prevent assignment of all regions to one (or a small number of) region server(s). While this was the case in 0.90.x and 0.92.x, the behavior has changed in 0.94.0 onwards to address HBASE-4993. From 0.94.0 onwards, Master will proceed immediately after the timeout has lapsed, even if hbase.master.wait.on.regionservers.mintostart has not reached. Reading the current conditions of waitForRegionServers() clarifies it {code:title=ServerManager.java (trunk rev:1360470)} 581 /** 582 * Wait for the region servers to report in. 583 * We will wait until one of this condition is met: 584 * - the master is stopped 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of 587 *region servers is reached 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND 589 * there have been no new region server in for 590 * 'hbase.master.wait.on.regionservers.interval' time 591 * 592 * @throws InterruptedException 593 */ 594 public void waitForRegionServers(MonitoredTask status) 595 throws InterruptedException { 612 while ( 613 !this.master.isStopped() 614 slept timeout 615 count maxToStart 616 (lastCountChange+interval now || count minToStart) 617 ){ {code} So with the current conditions, the wait will end as soon as timeout is reached even lesser number of RS have checked-in with the Master and the master will proceed with the region assignment among these RSes alone. As mentioned in -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, and I concur, this could have disastrous effect in large cluster especially now that MSLAB is turned on. To enforce the required quorum as specified by hbase.master.wait.on.regionservers.mintostart irrespective of timeout, these conditions need to be modified as following {code:title=ServerManager.java} .. /** * Wait for the region servers to report in. * We will wait until one of this condition is met: * - the master is stopped * - the 'hbase.master.wait.on.regionservers.maxtostart' number of *region servers is reached * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND * there have been no new region server in for * 'hbase.master.wait.on.regionservers.interval' time AND * the 'hbase.master.wait.on.regionservers.timeout' is reached * * @throws InterruptedException */ public void waitForRegionServers(MonitoredTask status) .. .. int minToStart = this.master.getConfiguration(). getInt(hbase.master.wait.on.regionservers.mintostart, 1); int maxToStart = this.master.getConfiguration(). getInt(hbase.master.wait.on.regionservers.maxtostart, Integer.MAX_VALUE); if (maxToStart minToStart) { maxToStart = minToStart; } .. .. while ( !this.master.isStopped() count maxToStart (lastCountChange+interval now || timeout slept || count minToStart) ){ .. {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418752#comment-13418752 ] Zhihong Ted Yu commented on HBASE-6389: --- Looking at https://builds.apache.org/job/PreCommit-HBASE-Build/2406/console, there was still some hanging test although I wasn't able to find which test hung. Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments Key: HBASE-6389 URL: https://issues.apache.org/jira/browse/HBASE-6389 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.0, 0.96.0 Reporter: Aditya Kishore Assignee: Aditya Kishore Priority: Critical Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt Continuing from HBASE-6375. It seems I was mistaken in my assumption that changing the value of hbase.master.wait.on.regionservers.mintostart to a sufficient number (from default of 1) can help prevent assignment of all regions to one (or a small number of) region server(s). While this was the case in 0.90.x and 0.92.x, the behavior has changed in 0.94.0 onwards to address HBASE-4993. From 0.94.0 onwards, Master will proceed immediately after the timeout has lapsed, even if hbase.master.wait.on.regionservers.mintostart has not reached. Reading the current conditions of waitForRegionServers() clarifies it {code:title=ServerManager.java (trunk rev:1360470)} 581 /** 582 * Wait for the region servers to report in. 583 * We will wait until one of this condition is met: 584 * - the master is stopped 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of 587 *region servers is reached 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND 589 * there have been no new region server in for 590 * 'hbase.master.wait.on.regionservers.interval' time 591 * 592 * @throws InterruptedException 593 */ 594 public void waitForRegionServers(MonitoredTask status) 595 throws InterruptedException { 612 while ( 613 !this.master.isStopped() 614 slept timeout 615 count maxToStart 616 (lastCountChange+interval now || count minToStart) 617 ){ {code} So with the current conditions, the wait will end as soon as timeout is reached even lesser number of RS have checked-in with the Master and the master will proceed with the region assignment among these RSes alone. As mentioned in -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, and I concur, this could have disastrous effect in large cluster especially now that MSLAB is turned on. To enforce the required quorum as specified by hbase.master.wait.on.regionservers.mintostart irrespective of timeout, these conditions need to be modified as following {code:title=ServerManager.java} .. /** * Wait for the region servers to report in. * We will wait until one of this condition is met: * - the master is stopped * - the 'hbase.master.wait.on.regionservers.maxtostart' number of *region servers is reached * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND * there have been no new region server in for * 'hbase.master.wait.on.regionservers.interval' time AND * the 'hbase.master.wait.on.regionservers.timeout' is reached * * @throws InterruptedException */ public void waitForRegionServers(MonitoredTask status) .. .. int minToStart = this.master.getConfiguration(). getInt(hbase.master.wait.on.regionservers.mintostart, 1); int maxToStart = this.master.getConfiguration(). getInt(hbase.master.wait.on.regionservers.maxtostart, Integer.MAX_VALUE); if (maxToStart minToStart) { maxToStart = minToStart; } .. .. while ( !this.master.isStopped() count maxToStart (lastCountChange+interval now || timeout slept || count minToStart) ){ .. {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418866#comment-13418866 ] Zhihong Ted Yu commented on HBASE-6389: --- I ran test suite with latest patch on trunk and got: {code} Running org.apache.hadoop.hbase.client.TestHCM Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 37.265 sec FAILURE! -- Running org.apache.hadoop.hbase.client.TestAdmin Tests run: 40, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 322.872 sec FAILURE! -- Running org.apache.hadoop.hbase.catalog.TestMetaReaderEditor Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 134.193 sec FAILURE! -- Running org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine Tests run: 20, Failures: 5, Errors: 2, Skipped: 0, Time elapsed: 669.588 sec FAILURE! {code} There was one hanging test: {code} at org.apache.hadoop.hbase.replication.TestReplication.setUp(TestReplication.java:183) {code} BTW what do R sub i, C and F sub i represent in the formula above ? Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments Key: HBASE-6389 URL: https://issues.apache.org/jira/browse/HBASE-6389 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.0, 0.96.0 Reporter: Aditya Kishore Assignee: Aditya Kishore Priority: Critical Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt Continuing from HBASE-6375. It seems I was mistaken in my assumption that changing the value of hbase.master.wait.on.regionservers.mintostart to a sufficient number (from default of 1) can help prevent assignment of all regions to one (or a small number of) region server(s). While this was the case in 0.90.x and 0.92.x, the behavior has changed in 0.94.0 onwards to address HBASE-4993. From 0.94.0 onwards, Master will proceed immediately after the timeout has lapsed, even if hbase.master.wait.on.regionservers.mintostart has not reached. Reading the current conditions of waitForRegionServers() clarifies it {code:title=ServerManager.java (trunk rev:1360470)} 581 /** 582 * Wait for the region servers to report in. 583 * We will wait until one of this condition is met: 584 * - the master is stopped 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of 587 *region servers is reached 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND 589 * there have been no new region server in for 590 * 'hbase.master.wait.on.regionservers.interval' time 591 * 592 * @throws InterruptedException 593 */ 594 public void waitForRegionServers(MonitoredTask status) 595 throws InterruptedException { 612 while ( 613 !this.master.isStopped() 614 slept timeout 615 count maxToStart 616 (lastCountChange+interval now || count minToStart) 617 ){ {code} So with the current conditions, the wait will end as soon as timeout is reached even lesser number of RS have checked-in with the Master and the master will proceed with the region assignment among these RSes alone. As mentioned in -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, and I concur, this could have disastrous effect in large cluster especially now that MSLAB is turned on. To enforce the required quorum as specified by hbase.master.wait.on.regionservers.mintostart irrespective of timeout, these conditions need to be modified as following {code:title=ServerManager.java} .. /** * Wait for the region servers to report in. * We will wait until one of this condition is met: * - the master is stopped * - the 'hbase.master.wait.on.regionservers.maxtostart' number of *region servers is reached * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND * there have been no new region server in for * 'hbase.master.wait.on.regionservers.interval' time AND * the 'hbase.master.wait.on.regionservers.timeout' is reached * * @throws InterruptedException */ public void waitForRegionServers(MonitoredTask status) .. .. int minToStart = this.master.getConfiguration(). getInt(hbase.master.wait.on.regionservers.mintostart, 1); int maxToStart =
[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6389: -- Attachment: testReplication.jstack jstack for the hanging TestReplication Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments Key: HBASE-6389 URL: https://issues.apache.org/jira/browse/HBASE-6389 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.0, 0.96.0 Reporter: Aditya Kishore Assignee: Aditya Kishore Priority: Critical Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, testReplication.jstack Continuing from HBASE-6375. It seems I was mistaken in my assumption that changing the value of hbase.master.wait.on.regionservers.mintostart to a sufficient number (from default of 1) can help prevent assignment of all regions to one (or a small number of) region server(s). While this was the case in 0.90.x and 0.92.x, the behavior has changed in 0.94.0 onwards to address HBASE-4993. From 0.94.0 onwards, Master will proceed immediately after the timeout has lapsed, even if hbase.master.wait.on.regionservers.mintostart has not reached. Reading the current conditions of waitForRegionServers() clarifies it {code:title=ServerManager.java (trunk rev:1360470)} 581 /** 582 * Wait for the region servers to report in. 583 * We will wait until one of this condition is met: 584 * - the master is stopped 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of 587 *region servers is reached 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND 589 * there have been no new region server in for 590 * 'hbase.master.wait.on.regionservers.interval' time 591 * 592 * @throws InterruptedException 593 */ 594 public void waitForRegionServers(MonitoredTask status) 595 throws InterruptedException { 612 while ( 613 !this.master.isStopped() 614 slept timeout 615 count maxToStart 616 (lastCountChange+interval now || count minToStart) 617 ){ {code} So with the current conditions, the wait will end as soon as timeout is reached even lesser number of RS have checked-in with the Master and the master will proceed with the region assignment among these RSes alone. As mentioned in -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, and I concur, this could have disastrous effect in large cluster especially now that MSLAB is turned on. To enforce the required quorum as specified by hbase.master.wait.on.regionservers.mintostart irrespective of timeout, these conditions need to be modified as following {code:title=ServerManager.java} .. /** * Wait for the region servers to report in. * We will wait until one of this condition is met: * - the master is stopped * - the 'hbase.master.wait.on.regionservers.maxtostart' number of *region servers is reached * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND * there have been no new region server in for * 'hbase.master.wait.on.regionservers.interval' time AND * the 'hbase.master.wait.on.regionservers.timeout' is reached * * @throws InterruptedException */ public void waitForRegionServers(MonitoredTask status) .. .. int minToStart = this.master.getConfiguration(). getInt(hbase.master.wait.on.regionservers.mintostart, 1); int maxToStart = this.master.getConfiguration(). getInt(hbase.master.wait.on.regionservers.maxtostart, Integer.MAX_VALUE); if (maxToStart minToStart) { maxToStart = minToStart; } .. .. while ( !this.master.isStopped() count maxToStart (lastCountChange+interval now || timeout slept || count minToStart) ){ .. {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418866#comment-13418866 ] Zhihong Ted Yu edited comment on HBASE-6389 at 7/20/12 1:37 AM: I ran test suite with latest patch on trunk and got: {code} Running org.apache.hadoop.hbase.client.TestHCM Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 37.265 sec FAILURE! -- Running org.apache.hadoop.hbase.client.TestAdmin Tests run: 40, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 322.872 sec FAILURE! -- Running org.apache.hadoop.hbase.catalog.TestMetaReaderEditor Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 134.193 sec FAILURE! -- Running org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine Tests run: 20, Failures: 5, Errors: 2, Skipped: 0, Time elapsed: 669.588 sec FAILURE! {code} There was one hanging test: {code} at org.apache.hadoop.hbase.replication.TestReplication.setUp(TestReplication.java:183) {code} BTW what do R~i~, C and F~i~ represent in the formula above ? was (Author: zhi...@ebaysf.com): I ran test suite with latest patch on trunk and got: {code} Running org.apache.hadoop.hbase.client.TestHCM Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 37.265 sec FAILURE! -- Running org.apache.hadoop.hbase.client.TestAdmin Tests run: 40, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 322.872 sec FAILURE! -- Running org.apache.hadoop.hbase.catalog.TestMetaReaderEditor Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 134.193 sec FAILURE! -- Running org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine Tests run: 20, Failures: 5, Errors: 2, Skipped: 0, Time elapsed: 669.588 sec FAILURE! {code} There was one hanging test: {code} at org.apache.hadoop.hbase.replication.TestReplication.setUp(TestReplication.java:183) {code} BTW what do R sub i, C and F sub i represent in the formula above ? Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments Key: HBASE-6389 URL: https://issues.apache.org/jira/browse/HBASE-6389 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.0, 0.96.0 Reporter: Aditya Kishore Assignee: Aditya Kishore Priority: Critical Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, testReplication.jstack Continuing from HBASE-6375. It seems I was mistaken in my assumption that changing the value of hbase.master.wait.on.regionservers.mintostart to a sufficient number (from default of 1) can help prevent assignment of all regions to one (or a small number of) region server(s). While this was the case in 0.90.x and 0.92.x, the behavior has changed in 0.94.0 onwards to address HBASE-4993. From 0.94.0 onwards, Master will proceed immediately after the timeout has lapsed, even if hbase.master.wait.on.regionservers.mintostart has not reached. Reading the current conditions of waitForRegionServers() clarifies it {code:title=ServerManager.java (trunk rev:1360470)} 581 /** 582 * Wait for the region servers to report in. 583 * We will wait until one of this condition is met: 584 * - the master is stopped 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of 587 *region servers is reached 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND 589 * there have been no new region server in for 590 * 'hbase.master.wait.on.regionservers.interval' time 591 * 592 * @throws InterruptedException 593 */ 594 public void waitForRegionServers(MonitoredTask status) 595 throws InterruptedException { 612 while ( 613 !this.master.isStopped() 614 slept timeout 615 count maxToStart 616 (lastCountChange+interval now || count minToStart) 617 ){ {code} So with the current conditions, the wait will end as soon as timeout is reached even lesser number of RS have checked-in with the Master and the master will proceed with the region assignment among these RSes alone. As mentioned in -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, and I concur, this could have disastrous effect in large cluster especially now that MSLAB is turned on. To
[jira] [Comment Edited] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418866#comment-13418866 ] Zhihong Ted Yu edited comment on HBASE-6389 at 7/20/12 1:41 AM: I ran test suite with latest patch on trunk and got: {code} Running org.apache.hadoop.hbase.client.TestHCM Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 37.265 sec FAILURE! -- Running org.apache.hadoop.hbase.client.TestAdmin Tests run: 40, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 322.872 sec FAILURE! -- Running org.apache.hadoop.hbase.catalog.TestMetaReaderEditor Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 134.193 sec FAILURE! -- Running org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine Tests run: 20, Failures: 5, Errors: 2, Skipped: 0, Time elapsed: 669.588 sec FAILURE! {code} There was one hanging test: {code} at org.apache.hadoop.hbase.replication.TestReplication.setUp(TestReplication.java:183) {code} BTW what do *R*~i~, C and *F*~i~ represent in the formula above ? was (Author: zhi...@ebaysf.com): I ran test suite with latest patch on trunk and got: {code} Running org.apache.hadoop.hbase.client.TestHCM Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 37.265 sec FAILURE! -- Running org.apache.hadoop.hbase.client.TestAdmin Tests run: 40, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 322.872 sec FAILURE! -- Running org.apache.hadoop.hbase.catalog.TestMetaReaderEditor Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 134.193 sec FAILURE! -- Running org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine Tests run: 20, Failures: 5, Errors: 2, Skipped: 0, Time elapsed: 669.588 sec FAILURE! {code} There was one hanging test: {code} at org.apache.hadoop.hbase.replication.TestReplication.setUp(TestReplication.java:183) {code} BTW what do R~i~, C and F~i~ represent in the formula above ? Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments Key: HBASE-6389 URL: https://issues.apache.org/jira/browse/HBASE-6389 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.0, 0.96.0 Reporter: Aditya Kishore Assignee: Aditya Kishore Priority: Critical Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, testReplication.jstack Continuing from HBASE-6375. It seems I was mistaken in my assumption that changing the value of hbase.master.wait.on.regionservers.mintostart to a sufficient number (from default of 1) can help prevent assignment of all regions to one (or a small number of) region server(s). While this was the case in 0.90.x and 0.92.x, the behavior has changed in 0.94.0 onwards to address HBASE-4993. From 0.94.0 onwards, Master will proceed immediately after the timeout has lapsed, even if hbase.master.wait.on.regionservers.mintostart has not reached. Reading the current conditions of waitForRegionServers() clarifies it {code:title=ServerManager.java (trunk rev:1360470)} 581 /** 582 * Wait for the region servers to report in. 583 * We will wait until one of this condition is met: 584 * - the master is stopped 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of 587 *region servers is reached 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND 589 * there have been no new region server in for 590 * 'hbase.master.wait.on.regionservers.interval' time 591 * 592 * @throws InterruptedException 593 */ 594 public void waitForRegionServers(MonitoredTask status) 595 throws InterruptedException { 612 while ( 613 !this.master.isStopped() 614 slept timeout 615 count maxToStart 616 (lastCountChange+interval now || count minToStart) 617 ){ {code} So with the current conditions, the wait will end as soon as timeout is reached even lesser number of RS have checked-in with the Master and the master will proceed with the region assignment among these RSes alone. As mentioned in -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, and I concur, this could have disastrous effect in large cluster especially now that MSLAB is turned on. To
[jira] [Comment Edited] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418866#comment-13418866 ] Zhihong Ted Yu edited comment on HBASE-6389 at 7/20/12 2:53 AM: I ran test suite with latest patch on trunk and got: {code} Failed tests: testRunThriftServer[12](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine): expected:1 but was:0 testRunThriftServer[14](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine): expected:1 but was:0 testRunThriftServer[15](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine): expected:1 but was:0 testRunThriftServer[16](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine): expected:1 but was:0 testRunThriftServer[17](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine): expected:1 but was:0 Tests in error: testRegionCaching(org.apache.hadoop.hbase.client.TestHCM): org.apache.hadoop.hbase.UnknownRegionException: bd992463917ba68fe5389c5bf9e94a3a testCloseRegionThatFetchesTheHRIFromMeta(org.apache.hadoop.hbase.client.TestAdmin): -1 testTableExists(org.apache.hadoop.hbase.catalog.TestMetaReaderEditor): org.apache.hadoop.hbase.TableNotEnabledException: testTableExists testRunThriftServer[11](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine): test timed out after 6 milliseconds testRunThriftServer[13](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine): test timed out after 6 milliseconds {code} There was one hanging test: {code} at org.apache.hadoop.hbase.replication.TestReplication.setUp(TestReplication.java:183) {code} BTW what do *R*~i~, C and *F*~i~ represent in the formula above ? was (Author: zhi...@ebaysf.com): I ran test suite with latest patch on trunk and got: {code} Running org.apache.hadoop.hbase.client.TestHCM Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 37.265 sec FAILURE! -- Running org.apache.hadoop.hbase.client.TestAdmin Tests run: 40, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 322.872 sec FAILURE! -- Running org.apache.hadoop.hbase.catalog.TestMetaReaderEditor Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 134.193 sec FAILURE! -- Running org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine Tests run: 20, Failures: 5, Errors: 2, Skipped: 0, Time elapsed: 669.588 sec FAILURE! {code} There was one hanging test: {code} at org.apache.hadoop.hbase.replication.TestReplication.setUp(TestReplication.java:183) {code} BTW what do *R*~i~, C and *F*~i~ represent in the formula above ? Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments Key: HBASE-6389 URL: https://issues.apache.org/jira/browse/HBASE-6389 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.0, 0.96.0 Reporter: Aditya Kishore Assignee: Aditya Kishore Priority: Critical Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, testReplication.jstack Continuing from HBASE-6375. It seems I was mistaken in my assumption that changing the value of hbase.master.wait.on.regionservers.mintostart to a sufficient number (from default of 1) can help prevent assignment of all regions to one (or a small number of) region server(s). While this was the case in 0.90.x and 0.92.x, the behavior has changed in 0.94.0 onwards to address HBASE-4993. From 0.94.0 onwards, Master will proceed immediately after the timeout has lapsed, even if hbase.master.wait.on.regionservers.mintostart has not reached. Reading the current conditions of waitForRegionServers() clarifies it {code:title=ServerManager.java (trunk rev:1360470)} 581 /** 582 * Wait for the region servers to report in. 583 * We will wait until one of this condition is met: 584 * - the master is stopped 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of 587 *region servers is reached 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND 589 * there have been no new region server in for 590 * 'hbase.master.wait.on.regionservers.interval' time 591 * 592 * @throws InterruptedException 593 */ 594 public void waitForRegionServers(MonitoredTask status) 595 throws InterruptedException { 612 while ( 613 !this.master.isStopped() 614
[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6389: -- Status: Open (was: Patch Available) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments Key: HBASE-6389 URL: https://issues.apache.org/jira/browse/HBASE-6389 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.0, 0.96.0 Reporter: Aditya Kishore Assignee: Aditya Kishore Priority: Critical Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, testReplication.jstack Continuing from HBASE-6375. It seems I was mistaken in my assumption that changing the value of hbase.master.wait.on.regionservers.mintostart to a sufficient number (from default of 1) can help prevent assignment of all regions to one (or a small number of) region server(s). While this was the case in 0.90.x and 0.92.x, the behavior has changed in 0.94.0 onwards to address HBASE-4993. From 0.94.0 onwards, Master will proceed immediately after the timeout has lapsed, even if hbase.master.wait.on.regionservers.mintostart has not reached. Reading the current conditions of waitForRegionServers() clarifies it {code:title=ServerManager.java (trunk rev:1360470)} 581 /** 582 * Wait for the region servers to report in. 583 * We will wait until one of this condition is met: 584 * - the master is stopped 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of 587 *region servers is reached 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND 589 * there have been no new region server in for 590 * 'hbase.master.wait.on.regionservers.interval' time 591 * 592 * @throws InterruptedException 593 */ 594 public void waitForRegionServers(MonitoredTask status) 595 throws InterruptedException { 612 while ( 613 !this.master.isStopped() 614 slept timeout 615 count maxToStart 616 (lastCountChange+interval now || count minToStart) 617 ){ {code} So with the current conditions, the wait will end as soon as timeout is reached even lesser number of RS have checked-in with the Master and the master will proceed with the region assignment among these RSes alone. As mentioned in -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, and I concur, this could have disastrous effect in large cluster especially now that MSLAB is turned on. To enforce the required quorum as specified by hbase.master.wait.on.regionservers.mintostart irrespective of timeout, these conditions need to be modified as following {code:title=ServerManager.java} .. /** * Wait for the region servers to report in. * We will wait until one of this condition is met: * - the master is stopped * - the 'hbase.master.wait.on.regionservers.maxtostart' number of *region servers is reached * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND * there have been no new region server in for * 'hbase.master.wait.on.regionservers.interval' time AND * the 'hbase.master.wait.on.regionservers.timeout' is reached * * @throws InterruptedException */ public void waitForRegionServers(MonitoredTask status) .. .. int minToStart = this.master.getConfiguration(). getInt(hbase.master.wait.on.regionservers.mintostart, 1); int maxToStart = this.master.getConfiguration(). getInt(hbase.master.wait.on.regionservers.maxtostart, Integer.MAX_VALUE); if (maxToStart minToStart) { maxToStart = minToStart; } .. .. while ( !this.master.isStopped() count maxToStart (lastCountChange+interval now || timeout slept || count minToStart) ){ .. {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3725) HBase increments from old value after delete and write to disk
[ https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418904#comment-13418904 ] Zhihong Ted Yu commented on HBASE-3725: --- Looking at existing code: {code} private ListKeyValue getLastIncrement(final Get get) throws IOException { InternalScan iscan = new InternalScan(get); {code} iscan was assigned at the beginning. Looks like the assignment in else block is redundant. TestHRegion#testIncrementWithFlushAndDelete passed without that assignment. HBase increments from old value after delete and write to disk -- Key: HBASE-3725 URL: https://issues.apache.org/jira/browse/HBASE-3725 Project: HBase Issue Type: Bug Components: io, regionserver Affects Versions: 0.90.1 Reporter: Nathaniel Cook Assignee: Jonathan Gray Attachments: HBASE-3725-0.92-V1.patch, HBASE-3725-0.92-V2.patch, HBASE-3725-0.92-V3.patch, HBASE-3725-0.92-V4.patch, HBASE-3725-0.92-V5.patch, HBASE-3725-Test-v1.patch, HBASE-3725-v3.patch, HBASE-3725.patch Deleted row values are sometimes used for starting points on new increments. To reproduce: Create a row r. Set column x to some default value. Force hbase to write that value to the file system (such as restarting the cluster). Delete the row. Call table.incrementColumnValue with some_value Get the row. The returned value in the column was incremented from the old value before the row was deleted instead of being initialized to some_value. Code to reproduce: {code} import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.HColumnDescriptor; import org.apache.hadoop.hbase.HTableDescriptor; import org.apache.hadoop.hbase.client.Delete; import org.apache.hadoop.hbase.client.Get; import org.apache.hadoop.hbase.client.HBaseAdmin; import org.apache.hadoop.hbase.client.HTableInterface; import org.apache.hadoop.hbase.client.HTablePool; import org.apache.hadoop.hbase.client.Increment; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.util.Bytes; public class HBaseTestIncrement { static String tableName = testIncrement; static byte[] infoCF = Bytes.toBytes(info); static byte[] rowKey = Bytes.toBytes(test-rowKey); static byte[] newInc = Bytes.toBytes(new); static byte[] oldInc = Bytes.toBytes(old); /** * This code reproduces a bug with increment column values in hbase * Usage: First run part one by passing '1' as the first arg *Then restart the hbase cluster so it writes everything to disk *Run part two by passing '2' as the first arg * * This will result in the old deleted data being found and used for the increment calls * * @param args * @throws IOException */ public static void main(String[] args) throws IOException { if(1.equals(args[0])) partOne(); if(2.equals(args[0])) partTwo(); if (both.equals(args[0])) { partOne(); partTwo(); } } /** * Creates a table and increments a column value 10 times by 10 each time. * Results in a value of 100 for the column * * @throws IOException */ static void partOne()throws IOException { Configuration conf = HBaseConfiguration.create(); HBaseAdmin admin = new HBaseAdmin(conf); HTableDescriptor tableDesc = new HTableDescriptor(tableName); tableDesc.addFamily(new HColumnDescriptor(infoCF)); if(admin.tableExists(tableName)) { admin.disableTable(tableName); admin.deleteTable(tableName); } admin.createTable(tableDesc); HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE); HTableInterface table = pool.getTable(Bytes.toBytes(tableName)); //Increment unitialized column for (int j = 0; j 10; j++) { table.incrementColumnValue(rowKey, infoCF, oldInc, (long)10); Increment inc = new Increment(rowKey); inc.addColumn(infoCF, newInc, (long)10); table.increment(inc); } Get get = new Get(rowKey); Result r = table.get(get); System.out.println(initial values: new + Bytes.toLong(r.getValue(infoCF, newInc)) + old +
[jira] [Resolved] (HBASE-6345) Utilize fault injection in testing using AspectJ
[ https://issues.apache.org/jira/browse/HBASE-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu resolved HBASE-6345. --- Resolution: Won't Fix There was not enough incentive to pursue fault injection using AspectJ. Utilize fault injection in testing using AspectJ Key: HBASE-6345 URL: https://issues.apache.org/jira/browse/HBASE-6345 Project: HBase Issue Type: Bug Reporter: Zhihong Ted Yu HDFS uses fault injection to test pipeline failure in addition to mock, spy. HBase uses mock, spy. But there are cases where mock, spy aren't convenient. Some example from DFSClientAspects.aj : {code} pointcut pipelineInitNonAppend(DataStreamer datastreamer): callCreateBlockOutputStream(datastreamer) cflow(execution(* nextBlockOutputStream(..))) within(DataStreamer); after(DataStreamer datastreamer) returning : pipelineInitNonAppend(datastreamer) { LOG.info(FI: after pipelineInitNonAppend: hasError= + datastreamer.hasError + errorIndex= + datastreamer.errorIndex); if (datastreamer.hasError) { DataTransferTest dtTest = DataTransferTestUtil.getDataTransferTest(); if (dtTest != null) dtTest.fiPipelineInitErrorNonAppend.run(datastreamer.errorIndex); } } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-4255) Expose CatalogJanitor controls
[ https://issues.apache.org/jira/browse/HBASE-4255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu reassigned HBASE-4255: - Assignee: Devaraj Das Expose CatalogJanitor controls -- Key: HBASE-4255 URL: https://issues.apache.org/jira/browse/HBASE-4255 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: Devaraj Das Fix For: 0.96.0 Attachments: 4255-4.2.patch When doing surgery or other operational tasks, it's nice to be able to have the .META. table quickly cleaned of split parents. The CatalogJanitor already has controls baked in (currently used in unit tests), I think we should expose this the same way we do with the balancer, that is: - start - stop - request a run A client would need to go through HBaseAdmin, and shell commands need to be created. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4255) Expose CatalogJanitor controls
[ https://issues.apache.org/jira/browse/HBASE-4255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-4255: -- Attachment: 4255-4.2.patch Patch from review board. Expose CatalogJanitor controls -- Key: HBASE-4255 URL: https://issues.apache.org/jira/browse/HBASE-4255 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: Devaraj Das Fix For: 0.96.0 Attachments: 4255-4.2.patch When doing surgery or other operational tasks, it's nice to be able to have the .META. table quickly cleaned of split parents. The CatalogJanitor already has controls baked in (currently used in unit tests), I think we should expose this the same way we do with the balancer, that is: - start - stop - request a run A client would need to go through HBaseAdmin, and shell commands need to be created. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4255) Expose CatalogJanitor controls
[ https://issues.apache.org/jira/browse/HBASE-4255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-4255: -- Hadoop Flags: Reviewed Status: Patch Available (was: Open) Expose CatalogJanitor controls -- Key: HBASE-4255 URL: https://issues.apache.org/jira/browse/HBASE-4255 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: Devaraj Das Fix For: 0.96.0 Attachments: 4255-4.2.patch When doing surgery or other operational tasks, it's nice to be able to have the .META. table quickly cleaned of split parents. The CatalogJanitor already has controls baked in (currently used in unit tests), I think we should expose this the same way we do with the balancer, that is: - start - stop - request a run A client would need to go through HBaseAdmin, and shell commands need to be created. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4255) Expose CatalogJanitor controls
[ https://issues.apache.org/jira/browse/HBASE-4255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13417057#comment-13417057 ] Zhihong Ted Yu commented on HBASE-4255: --- @J-D: Please take a look at Deravaj's patch. Expose CatalogJanitor controls -- Key: HBASE-4255 URL: https://issues.apache.org/jira/browse/HBASE-4255 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: Devaraj Das Fix For: 0.96.0 Attachments: 4255-4.2.patch When doing surgery or other operational tasks, it's nice to be able to have the .META. table quickly cleaned of split parents. The CatalogJanitor already has controls baked in (currently used in unit tests), I think we should expose this the same way we do with the balancer, that is: - start - stop - request a run A client would need to go through HBaseAdmin, and shell commands need to be created. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4470) ServerNotRunningException coming out of assignRootAndMeta kills the Master
[ https://issues.apache.org/jira/browse/HBASE-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13417157#comment-13417157 ] Zhihong Ted Yu commented on HBASE-4470: --- Indentation seems to be off in testVerifyMetaRegionLocationWithException(): {code} + Mockito.when(implementation.get((byte [])Mockito.any(), (Get)Mockito.any())). {code} ServerNotRunningException coming out of assignRootAndMeta kills the Master -- Key: HBASE-4470 URL: https://issues.apache.org/jira/browse/HBASE-4470 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: Gregory Chanan Priority: Critical Fix For: 0.90.7 Attachments: HBASE-4470-90.patch I'm surprised we still have issues like that and I didn't get a hit while googling so forgive me if there's already a jira about it. When the master starts it verifies the locations of root and meta before assigning them, if the server is started but not running you'll get this: {quote} 2011-09-23 04:47:44,859 WARN org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: RemoteException connecting to RS org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hbase.ipc.ServerNotRunningException: Server is not running yet at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1038) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:771) at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257) at $Proxy6.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:969) at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:388) at org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:287) at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:484) at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:441) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:388) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:282) {quote} I hit that 3-4 times this week while debugging something else. The worst is that when you restart the master it sees that as a failover, but none of the regions are assigned so it takes an eternity to get back fully online. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6400) Add getMasterAdmin() and getMasterMonitor() to HConnection
[ https://issues.apache.org/jira/browse/HBASE-6400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13417230#comment-13417230 ] Zhihong Ted Yu commented on HBASE-6400: --- Integrated to trunk. Thanks for the patch, Enis. Thanks for the review, Stack. Add getMasterAdmin() and getMasterMonitor() to HConnection -- Key: HBASE-6400 URL: https://issues.apache.org/jira/browse/HBASE-6400 Project: HBase Issue Type: Improvement Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.96.0 Attachments: 6400-v2.patch, HBASE-6400_v1.patch HConnection used to have getMaster() which returns HMasterInterface, but after HBASE-6039 it has been removed. I think we need to expose HConnection.getMasterAdmin() and getMasterMonitor() a la HConnection.getAdmin(), and getClient(). HConnectionImplementation has getKeepAliveMasterAdmin() but, I see no reason to leak keep alive classes to upper layers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6419) PersistentMetricsTimeVaryingRate gets used for non-time-based metrics (part2 of HBASE-6220)
[ https://issues.apache.org/jira/browse/HBASE-6419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13417250#comment-13417250 ] Zhihong Ted Yu commented on HBASE-6419: --- I ran the two tests above and they passed with patch. Integrated to trunk. Thanks for the patch, Paul. Thanks for the review, Stack. PersistentMetricsTimeVaryingRate gets used for non-time-based metrics (part2 of HBASE-6220) --- Key: HBASE-6419 URL: https://issues.apache.org/jira/browse/HBASE-6419 Project: HBase Issue Type: Improvement Reporter: stack Assignee: Paul Cavallaro Attachments: ServerMetrics_HBASE_6220_Flush_Metrics.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6405) Create Hadoop compatibilty modules and Metrics2 implementation of replication metrics
[ https://issues.apache.org/jira/browse/HBASE-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13417279#comment-13417279 ] Zhihong Ted Yu commented on HBASE-6405: --- Addendum 2 looks good. Create Hadoop compatibilty modules and Metrics2 implementation of replication metrics - Key: HBASE-6405 URL: https://issues.apache.org/jira/browse/HBASE-6405 Project: HBase Issue Type: Sub-task Reporter: Zhihong Ted Yu Assignee: Elliott Clark Fix For: 0.96.0 Attachments: 6405.txt, HBASE-6405-ADD.patch, hbase-6405-addendum-2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6421) [pom] add jettison and fix netty specification
[ https://issues.apache.org/jira/browse/HBASE-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13417324#comment-13417324 ] Zhihong Ted Yu commented on HBASE-6421: --- Patch didn't compile against hadoop 2.0: {code} == == Checking against hadoop 2.0 build == == {code} [pom] add jettison and fix netty specification -- Key: HBASE-6421 URL: https://issues.apache.org/jira/browse/HBASE-6421 Project: HBase Issue Type: Bug Reporter: Jesse Yates Assignee: Jesse Yates Attachments: hbase-6421-v0.patch Currently, jettison isn't required for testing hbase-server, but TestSchemaConfigured requires it, causing the compile phase (at least on my MBP) to fail. Further, in cleaning up the poms, netty should be declared in the parent hbase/pom.xml and then inherited in the subclass. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5547) Don't delete HFiles when in backup mode
[ https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13417366#comment-13417366 ] Zhihong Ted Yu commented on HBASE-5547: --- Here is the link about InterruptedException handling: http://www.ibm.com/developerworks/java/library/j-jtp05236/index.html Take a look at Listing 3 under 'Don't swallow interrupts' Don't delete HFiles when in backup mode - Key: HBASE-5547 URL: https://issues.apache.org/jira/browse/HBASE-5547 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Assignee: Jesse Yates Fix For: 0.94.2 Attachments: 5547-v12.txt, hbase-5447-v8.patch, hbase-5447-v8.patch, hbase-5547-v9.patch, java_HBASE-5547_v13.patch, java_HBASE-5547_v4.patch, java_HBASE-5547_v5.patch, java_HBASE-5547_v6.patch, java_HBASE-5547_v7.patch This came up in a discussion I had with Stack. It would be nice if HBase could be notified that a backup is in progress (via a znode for example) and in that case either: 1. rename HFiles to be delete to file.bck 2. rename the HFiles into a special directory 3. rename them to a general trash directory (which would not need to be tied to backup mode). That way it should be able to get a consistent backup based on HFiles (HDFS snapshots or hard links would be better options here, but we do not have those). #1 makes cleanup a bit harder. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6396) Fix NoSuchMethodError running against hadoop 2.0
[ https://issues.apache.org/jira/browse/HBASE-6396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13417379#comment-13417379 ] Zhihong Ted Yu commented on HBASE-6396: --- Clarification: the goal of the patch was not to make build compiled against hadoop 1 to work against hadoop 2. The goal is to make build against 0.23 profile work with hadoop 2. Fix NoSuchMethodError running against hadoop 2.0 Key: HBASE-6396 URL: https://issues.apache.org/jira/browse/HBASE-6396 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: Zhihong Ted Yu Assignee: Zhihong Ted Yu Labels: hadoop-2.0 Fix For: 0.96.0 Attachments: 6396-v2.txt HADOOP-8350 changed the signature of NetUtils.getInputStream() This leads to NoSuchMethodError in HBaseClient$Connection.setupIOstreams(). See https://issues.apache.org/jira/browse/HADOOP-8350?focusedCommentId=13414276page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13414276 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6220) PersistentMetricsTimeVaryingRate gets used for non-time-based metrics
[ https://issues.apache.org/jira/browse/HBASE-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6220: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) PersistentMetricsTimeVaryingRate gets used for non-time-based metrics - Key: HBASE-6220 URL: https://issues.apache.org/jira/browse/HBASE-6220 Project: HBase Issue Type: Bug Components: metrics Affects Versions: 0.96.0 Reporter: David S. Wang Assignee: Paul Cavallaro Priority: Minor Labels: noob Attachments: ServerMetrics_HBASE_6220.patch, ServerMetrics_HBASE_6220_Flush_Metrics.patch PersistentMetricsTimeVaryingRate gets used for metrics that are not time-based, leading to confusing names such as avg_time for compaction size, etc. You hav to read the code in order to understand that this is actually referring to bytes, not seconds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4255) Expose CatalogJanitor controls
[ https://issues.apache.org/jira/browse/HBASE-4255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13417400#comment-13417400 ] Zhihong Ted Yu commented on HBASE-4255: --- No test failure from https://builds.apache.org/job/PreCommit-HBASE-Build/2400//testReport/: {code} Results : Tests run: 1021, Failures: 0, Errors: 0, Skipped: 9 {code} Expose CatalogJanitor controls -- Key: HBASE-4255 URL: https://issues.apache.org/jira/browse/HBASE-4255 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: Devaraj Das Fix For: 0.96.0 Attachments: 4255-4.2.patch, 4255-5.1.patch When doing surgery or other operational tasks, it's nice to be able to have the .META. table quickly cleaned of split parents. The CatalogJanitor already has controls baked in (currently used in unit tests), I think we should expose this the same way we do with the balancer, that is: - start - stop - request a run A client would need to go through HBaseAdmin, and shell commands need to be created. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6406) TestReplicationPeer.testResetZooKeeperSession and TestZooKeeper.testClientSessionExpired fail frequently
[ https://issues.apache.org/jira/browse/HBASE-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13417665#comment-13417665 ] Zhihong Ted Yu commented on HBASE-6406: --- For trunk, TestZooKeeper hung with the following output: {code} 2012-07-18 13:24:34,764 INFO [Master:0;sdev25.arch.ebay.com,59816,1342643039714] master.HMaster(455): HMaster main thread exiting 2012-07-18 13:24:34,764 INFO [RegionServer:2;sdev25.arch.ebay.com,60707,1342643074759] zookeeper.RecoverableZooKeeper(102): The identifier of this process is 15496@sdev25 2012-07-18 13:24:34,772 DEBUG [RegionServer:2;sdev25.arch.ebay.com,60707,1342643074759-EventThread] zookeeper.ZooKeeperWatcher(262): regionserver:60707 Received ZooKeeper Event, type=None, state=SyncConnected, path=null 2012-07-18 13:24:34,773 DEBUG [RegionServer:2;sdev25.arch.ebay.com,60707,1342643074759] zookeeper.ZKUtil(238): regionserver:60707 /hbase/master does not exist. Watcher is set. 2012-07-18 13:24:34,774 DEBUG [RegionServer:2;sdev25.arch.ebay.com,60707,1342643074759-EventThread] zookeeper.ZooKeeperWatcher(339): regionserver:60707-0x1389bc2dddb000c connected 2012-07-18 13:24:35,062 INFO [sdev25.arch.ebay.com,59816,1342643039714.splitLogManagerTimeoutMonitor] hbase.Chore(82): sdev25.arch.ebay.com,59816,1342643039714.splitLogManagerTimeoutMonitor exiting 2012-07-18 13:24:35,080 DEBUG [RegionServer:0;sdev25.arch.ebay.com,48349,1342643039994] regionserver.HRegionServer(1817): No master found; retry 2012-07-18 13:24:36,081 DEBUG [RegionServer:0;sdev25.arch.ebay.com,48349,1342643039994] regionserver.HRegionServer(1817): No master found; retry {code} TestReplicationPeer.testResetZooKeeperSession and TestZooKeeper.testClientSessionExpired fail frequently Key: HBASE-6406 URL: https://issues.apache.org/jira/browse/HBASE-6406 Project: HBase Issue Type: Bug Affects Versions: 0.94.1 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.2 Attachments: testReplication.jstack, testZooKeeper.jstack Looking back through the 0.94 test runs these two tests accounted for 11 of 34 failed tests. They should be fixed or (temporarily) disabled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (HBASE-6406) TestReplicationPeer.testResetZooKeeperSession and TestZooKeeper.testClientSessionExpired fail frequently
[ https://issues.apache.org/jira/browse/HBASE-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13417665#comment-13417665 ] Zhihong Ted Yu edited comment on HBASE-6406 at 7/18/12 8:50 PM: For trunk, TestZooKeeper hung with the following output: {code} 2012-07-18 13:24:34,764 INFO [Master:0;X.ebay.com,59816,1342643039714] master.HMaster(455): HMaster main thread exiting 2012-07-18 13:24:34,764 INFO [RegionServer:2;X.ebay.com,60707,1342643074759] zookeeper.RecoverableZooKeeper(102): The identifier of this process is 15496@sdev25 2012-07-18 13:24:34,772 DEBUG [RegionServer:2;X.ebay.com,60707,1342643074759-EventThread] zookeeper.ZooKeeperWatcher(262): regionserver:60707 Received ZooKeeper Event, type=None, state=SyncConnected, path=null 2012-07-18 13:24:34,773 DEBUG [RegionServer:2;X.ebay.com,60707,1342643074759] zookeeper.ZKUtil(238): regionserver:60707 /hbase/master does not exist. Watcher is set. 2012-07-18 13:24:34,774 DEBUG [RegionServer:2;X.ebay.com,60707,1342643074759-EventThread] zookeeper.ZooKeeperWatcher(339): regionserver:60707-0x1389bc2dddb000c connected 2012-07-18 13:24:35,062 INFO [X.ebay.com,59816,1342643039714.splitLogManagerTimeoutMonitor] hbase.Chore(82): X.ebay.com,59816,1342643039714.splitLogManagerTimeoutMonitor exiting 2012-07-18 13:24:35,080 DEBUG [RegionServer:0;X.ebay.com,48349,1342643039994] regionserver.HRegionServer(1817): No master found; retry 2012-07-18 13:24:36,081 DEBUG [RegionServer:0;X.ebay.com,48349,1342643039994] regionserver.HRegionServer(1817): No master found; retry{code} was (Author: zhi...@ebaysf.com): For trunk, TestZooKeeper hung with the following output: {code} 2012-07-18 13:24:34,764 INFO [Master:0;sdev25.arch.ebay.com,59816,1342643039714] master.HMaster(455): HMaster main thread exiting 2012-07-18 13:24:34,764 INFO [RegionServer:2;sdev25.arch.ebay.com,60707,1342643074759] zookeeper.RecoverableZooKeeper(102): The identifier of this process is 15496@sdev25 2012-07-18 13:24:34,772 DEBUG [RegionServer:2;sdev25.arch.ebay.com,60707,1342643074759-EventThread] zookeeper.ZooKeeperWatcher(262): regionserver:60707 Received ZooKeeper Event, type=None, state=SyncConnected, path=null 2012-07-18 13:24:34,773 DEBUG [RegionServer:2;sdev25.arch.ebay.com,60707,1342643074759] zookeeper.ZKUtil(238): regionserver:60707 /hbase/master does not exist. Watcher is set. 2012-07-18 13:24:34,774 DEBUG [RegionServer:2;sdev25.arch.ebay.com,60707,1342643074759-EventThread] zookeeper.ZooKeeperWatcher(339): regionserver:60707-0x1389bc2dddb000c connected 2012-07-18 13:24:35,062 INFO [sdev25.arch.ebay.com,59816,1342643039714.splitLogManagerTimeoutMonitor] hbase.Chore(82): sdev25.arch.ebay.com,59816,1342643039714.splitLogManagerTimeoutMonitor exiting 2012-07-18 13:24:35,080 DEBUG [RegionServer:0;sdev25.arch.ebay.com,48349,1342643039994] regionserver.HRegionServer(1817): No master found; retry 2012-07-18 13:24:36,081 DEBUG [RegionServer:0;sdev25.arch.ebay.com,48349,1342643039994] regionserver.HRegionServer(1817): No master found; retry {code} TestReplicationPeer.testResetZooKeeperSession and TestZooKeeper.testClientSessionExpired fail frequently Key: HBASE-6406 URL: https://issues.apache.org/jira/browse/HBASE-6406 Project: HBase Issue Type: Bug Affects Versions: 0.94.1 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.2 Attachments: testReplication.jstack, testZooKeeper.jstack Looking back through the 0.94 test runs these two tests accounted for 11 of 34 failed tests. They should be fixed or (temporarily) disabled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6421) [pom] add jettison and fix netty specification
[ https://issues.apache.org/jira/browse/HBASE-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13417670#comment-13417670 ] Zhihong Ted Yu commented on HBASE-6421: --- I got the following: {code} [ERROR] The build could not read 1 project - [Help 1] org.apache.maven.project.ProjectBuildingException: Some problems were encountered while processing the POMs: [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-core:jar is missing. @ line 64, column 17 [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-test:jar is missing. @ line 91, column 17 at org.apache.maven.project.DefaultProjectBuilder.build(DefaultProjectBuilder.java:339) at org.apache.maven.DefaultMaven.collectProjects(DefaultMaven.java:632) at org.apache.maven.DefaultMaven.getProjectsForMavenReactor(DefaultMaven.java:581) at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:233) {code} Here is the command: mvn clean test help:active-profiles -X -DskipTests -Dhadoop.profile=2.0 [pom] add jettison and fix netty specification -- Key: HBASE-6421 URL: https://issues.apache.org/jira/browse/HBASE-6421 Project: HBase Issue Type: Bug Reporter: Jesse Yates Assignee: Jesse Yates Attachments: hbase-6421-v0.patch Currently, jettison isn't required for testing hbase-server, but TestSchemaConfigured requires it, causing the compile phase (at least on my MBP) to fail. Further, in cleaning up the poms, netty should be declared in the parent hbase/pom.xml and then inherited in the subclass. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13417697#comment-13417697 ] Zhihong Ted Yu commented on HBASE-6389: --- I tried to see why TestZooKeeper hung strangely: {code} 2012-07-18 14:05:59,533 DEBUG [pool-57-thread-1] zookeeper.ZKUtil(1142): master:52861-0x1389be8bd6e-0x1389be8bd6e000a-0x1389be8bd6e000b Retrieved 39 byte(s) of data from znode /hbase/root-region-server and set watcher; X.ebay.com,44052,1342645522433 2012-07-18 14:05:59,533 WARN [pool-52-thread-1] zookeeper.RecoverableZooKeeper(218): Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/root-region-server 2012-07-18 14:05:59,533 INFO [pool-52-thread-1] util.RetryCounter(55): Sleeping 2000ms before retry #1... 2012-07-18 14:05:59,536 INFO [main] ipc.HBaseRpcMetrics(66): Initializing RPC Metrics with hostName=MiniHBaseCluster$MiniHBaseClusterRegionServer, port=44030 2012-07-18 14:05:59,537 INFO [Master:0;X.ebay.com,52861,1342645522110] master.HMaster(455): HMaster main thread exiting {code} Basically the test hung in setup(). I then traced where TestZooKeeper stopped showing up in test result and this was the first URL giving me 404 error: https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/3126/testReport/org.apache.hadoop.hbase/TestZooKeeper/ That was when this patch went in. Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments Key: HBASE-6389 URL: https://issues.apache.org/jira/browse/HBASE-6389 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.0, 0.96.0 Reporter: Aditya Kishore Assignee: Aditya Kishore Priority: Critical Fix For: 0.96.0, 0.94.1 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch Continuing from HBASE-6375. It seems I was mistaken in my assumption that changing the value of hbase.master.wait.on.regionservers.mintostart to a sufficient number (from default of 1) can help prevent assignment of all regions to one (or a small number of) region server(s). While this was the case in 0.90.x and 0.92.x, the behavior has changed in 0.94.0 onwards to address HBASE-4993. From 0.94.0 onwards, Master will proceed immediately after the timeout has lapsed, even if hbase.master.wait.on.regionservers.mintostart has not reached. Reading the current conditions of waitForRegionServers() clarifies it {code:title=ServerManager.java (trunk rev:1360470)} 581 /** 582 * Wait for the region servers to report in. 583 * We will wait until one of this condition is met: 584 * - the master is stopped 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of 587 *region servers is reached 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND 589 * there have been no new region server in for 590 * 'hbase.master.wait.on.regionservers.interval' time 591 * 592 * @throws InterruptedException 593 */ 594 public void waitForRegionServers(MonitoredTask status) 595 throws InterruptedException { 612 while ( 613 !this.master.isStopped() 614 slept timeout 615 count maxToStart 616 (lastCountChange+interval now || count minToStart) 617 ){ {code} So with the current conditions, the wait will end as soon as timeout is reached even lesser number of RS have checked-in with the Master and the master will proceed with the region assignment among these RSes alone. As mentioned in -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, and I concur, this could have disastrous effect in large cluster especially now that MSLAB is turned on. To enforce the required quorum as specified by hbase.master.wait.on.regionservers.mintostart irrespective of timeout, these conditions need to be modified as following {code:title=ServerManager.java} .. /** * Wait for the region servers to report in. * We will wait until one of this condition is met: * - the master is stopped * - the 'hbase.master.wait.on.regionservers.maxtostart' number of *region servers is reached * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND * there have been no new region server in for * 'hbase.master.wait.on.regionservers.interval'
[jira] [Reopened] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu reopened HBASE-6389: --- After reverting the patch, test passed smoothly: {code} Running org.apache.hadoop.hbase.TestZooKeeper Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 48.678 sec Results : Tests run: 11, Failures: 0, Errors: 0, Skipped: 0 [INFO] [INFO] --- maven-surefire-plugin:2.10:test (secondPartTestsExecution) @ hbase-server --- [INFO] Tests are skipped. [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 58.563s {code} Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments Key: HBASE-6389 URL: https://issues.apache.org/jira/browse/HBASE-6389 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.0, 0.96.0 Reporter: Aditya Kishore Assignee: Aditya Kishore Priority: Critical Fix For: 0.96.0, 0.94.1 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch Continuing from HBASE-6375. It seems I was mistaken in my assumption that changing the value of hbase.master.wait.on.regionservers.mintostart to a sufficient number (from default of 1) can help prevent assignment of all regions to one (or a small number of) region server(s). While this was the case in 0.90.x and 0.92.x, the behavior has changed in 0.94.0 onwards to address HBASE-4993. From 0.94.0 onwards, Master will proceed immediately after the timeout has lapsed, even if hbase.master.wait.on.regionservers.mintostart has not reached. Reading the current conditions of waitForRegionServers() clarifies it {code:title=ServerManager.java (trunk rev:1360470)} 581 /** 582 * Wait for the region servers to report in. 583 * We will wait until one of this condition is met: 584 * - the master is stopped 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of 587 *region servers is reached 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND 589 * there have been no new region server in for 590 * 'hbase.master.wait.on.regionservers.interval' time 591 * 592 * @throws InterruptedException 593 */ 594 public void waitForRegionServers(MonitoredTask status) 595 throws InterruptedException { 612 while ( 613 !this.master.isStopped() 614 slept timeout 615 count maxToStart 616 (lastCountChange+interval now || count minToStart) 617 ){ {code} So with the current conditions, the wait will end as soon as timeout is reached even lesser number of RS have checked-in with the Master and the master will proceed with the region assignment among these RSes alone. As mentioned in -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, and I concur, this could have disastrous effect in large cluster especially now that MSLAB is turned on. To enforce the required quorum as specified by hbase.master.wait.on.regionservers.mintostart irrespective of timeout, these conditions need to be modified as following {code:title=ServerManager.java} .. /** * Wait for the region servers to report in. * We will wait until one of this condition is met: * - the master is stopped * - the 'hbase.master.wait.on.regionservers.maxtostart' number of *region servers is reached * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND * there have been no new region server in for * 'hbase.master.wait.on.regionservers.interval' time AND * the 'hbase.master.wait.on.regionservers.timeout' is reached * * @throws InterruptedException */ public void waitForRegionServers(MonitoredTask status) .. .. int minToStart = this.master.getConfiguration(). getInt(hbase.master.wait.on.regionservers.mintostart, 1); int maxToStart = this.master.getConfiguration(). getInt(hbase.master.wait.on.regionservers.maxtostart, Integer.MAX_VALUE); if (maxToStart minToStart) { maxToStart = minToStart; } .. .. while ( !this.master.isStopped() count maxToStart (lastCountChange+interval now || timeout slept || count minToStart) ){ .. {code} -- This message is automatically generated by
[jira] [Commented] (HBASE-4050) Update HBase metrics framework to metrics2 framework
[ https://issues.apache.org/jira/browse/HBASE-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13417816#comment-13417816 ] Zhihong Ted Yu commented on HBASE-4050: --- Elliot's first patch used ResourceFinder which allowed passing initialization parameters to ctor. Maybe revive that approach ? Update HBase metrics framework to metrics2 framework Key: HBASE-4050 URL: https://issues.apache.org/jira/browse/HBASE-4050 Project: HBase Issue Type: New Feature Components: metrics Affects Versions: 0.90.4 Environment: Java 6 Reporter: Eric Yang Assignee: Elliott Clark Priority: Critical Fix For: 0.96.0 Attachments: 4050-metrics-v2.patch, 4050-metrics-v3.patch, HBASE-4050-0.patch, HBASE-4050-1.patch, HBASE-4050-2.patch, HBASE-4050-3.patch, HBASE-4050-5.patch, HBASE-4050-6.patch, HBASE-4050-7.patch, HBASE-4050-8.patch, HBASE-4050.patch Metrics Framework has been marked deprecated in Hadoop 0.20.203+ and 0.22+, and it might get removed in future Hadoop release. Hence, HBase needs to revise the dependency of MetricsContext to use Metrics2 framework. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13417825#comment-13417825 ] Zhihong Ted Yu commented on HBASE-6389: --- Patch for 0.94 wasn't attached here. @Lars: Can you revert the patches ? Thanks Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments Key: HBASE-6389 URL: https://issues.apache.org/jira/browse/HBASE-6389 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.0, 0.96.0 Reporter: Aditya Kishore Assignee: Aditya Kishore Priority: Critical Fix For: 0.96.0, 0.94.1 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch Continuing from HBASE-6375. It seems I was mistaken in my assumption that changing the value of hbase.master.wait.on.regionservers.mintostart to a sufficient number (from default of 1) can help prevent assignment of all regions to one (or a small number of) region server(s). While this was the case in 0.90.x and 0.92.x, the behavior has changed in 0.94.0 onwards to address HBASE-4993. From 0.94.0 onwards, Master will proceed immediately after the timeout has lapsed, even if hbase.master.wait.on.regionservers.mintostart has not reached. Reading the current conditions of waitForRegionServers() clarifies it {code:title=ServerManager.java (trunk rev:1360470)} 581 /** 582 * Wait for the region servers to report in. 583 * We will wait until one of this condition is met: 584 * - the master is stopped 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of 587 *region servers is reached 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND 589 * there have been no new region server in for 590 * 'hbase.master.wait.on.regionservers.interval' time 591 * 592 * @throws InterruptedException 593 */ 594 public void waitForRegionServers(MonitoredTask status) 595 throws InterruptedException { 612 while ( 613 !this.master.isStopped() 614 slept timeout 615 count maxToStart 616 (lastCountChange+interval now || count minToStart) 617 ){ {code} So with the current conditions, the wait will end as soon as timeout is reached even lesser number of RS have checked-in with the Master and the master will proceed with the region assignment among these RSes alone. As mentioned in -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, and I concur, this could have disastrous effect in large cluster especially now that MSLAB is turned on. To enforce the required quorum as specified by hbase.master.wait.on.regionservers.mintostart irrespective of timeout, these conditions need to be modified as following {code:title=ServerManager.java} .. /** * Wait for the region servers to report in. * We will wait until one of this condition is met: * - the master is stopped * - the 'hbase.master.wait.on.regionservers.maxtostart' number of *region servers is reached * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND * there have been no new region server in for * 'hbase.master.wait.on.regionservers.interval' time AND * the 'hbase.master.wait.on.regionservers.timeout' is reached * * @throws InterruptedException */ public void waitForRegionServers(MonitoredTask status) .. .. int minToStart = this.master.getConfiguration(). getInt(hbase.master.wait.on.regionservers.mintostart, 1); int maxToStart = this.master.getConfiguration(). getInt(hbase.master.wait.on.regionservers.maxtostart, Integer.MAX_VALUE); if (maxToStart minToStart) { maxToStart = minToStart; } .. .. while ( !this.master.isStopped() count maxToStart (lastCountChange+interval now || timeout slept || count minToStart) ){ .. {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13417869#comment-13417869 ] Zhihong Ted Yu commented on HBASE-6389: --- Reverted trunk patch. Have not touched 0.94 branch yet. Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments Key: HBASE-6389 URL: https://issues.apache.org/jira/browse/HBASE-6389 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.0, 0.96.0 Reporter: Aditya Kishore Assignee: Aditya Kishore Priority: Critical Fix For: 0.96.0, 0.94.1 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch Continuing from HBASE-6375. It seems I was mistaken in my assumption that changing the value of hbase.master.wait.on.regionservers.mintostart to a sufficient number (from default of 1) can help prevent assignment of all regions to one (or a small number of) region server(s). While this was the case in 0.90.x and 0.92.x, the behavior has changed in 0.94.0 onwards to address HBASE-4993. From 0.94.0 onwards, Master will proceed immediately after the timeout has lapsed, even if hbase.master.wait.on.regionservers.mintostart has not reached. Reading the current conditions of waitForRegionServers() clarifies it {code:title=ServerManager.java (trunk rev:1360470)} 581 /** 582 * Wait for the region servers to report in. 583 * We will wait until one of this condition is met: 584 * - the master is stopped 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of 587 *region servers is reached 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND 589 * there have been no new region server in for 590 * 'hbase.master.wait.on.regionservers.interval' time 591 * 592 * @throws InterruptedException 593 */ 594 public void waitForRegionServers(MonitoredTask status) 595 throws InterruptedException { 612 while ( 613 !this.master.isStopped() 614 slept timeout 615 count maxToStart 616 (lastCountChange+interval now || count minToStart) 617 ){ {code} So with the current conditions, the wait will end as soon as timeout is reached even lesser number of RS have checked-in with the Master and the master will proceed with the region assignment among these RSes alone. As mentioned in -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, and I concur, this could have disastrous effect in large cluster especially now that MSLAB is turned on. To enforce the required quorum as specified by hbase.master.wait.on.regionservers.mintostart irrespective of timeout, these conditions need to be modified as following {code:title=ServerManager.java} .. /** * Wait for the region servers to report in. * We will wait until one of this condition is met: * - the master is stopped * - the 'hbase.master.wait.on.regionservers.maxtostart' number of *region servers is reached * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND * there have been no new region server in for * 'hbase.master.wait.on.regionservers.interval' time AND * the 'hbase.master.wait.on.regionservers.timeout' is reached * * @throws InterruptedException */ public void waitForRegionServers(MonitoredTask status) .. .. int minToStart = this.master.getConfiguration(). getInt(hbase.master.wait.on.regionservers.mintostart, 1); int maxToStart = this.master.getConfiguration(). getInt(hbase.master.wait.on.regionservers.maxtostart, Integer.MAX_VALUE); if (maxToStart minToStart) { maxToStart = minToStart; } .. .. while ( !this.master.isStopped() count maxToStart (lastCountChange+interval now || timeout slept || count minToStart) ){ .. {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6405) Create Hadoop compatibilty modules and Metrics2 implementation of replication metrics
[ https://issues.apache.org/jira/browse/HBASE-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13417958#comment-13417958 ] Zhihong Ted Yu commented on HBASE-6405: --- Addendum v2 integrated to trunk. Thanks for the patch, Jesse. Thanks for the review, Elliot. Create Hadoop compatibilty modules and Metrics2 implementation of replication metrics - Key: HBASE-6405 URL: https://issues.apache.org/jira/browse/HBASE-6405 Project: HBase Issue Type: Sub-task Reporter: Zhihong Ted Yu Assignee: Elliott Clark Fix For: 0.96.0 Attachments: 6405.txt, HBASE-6405-ADD.patch, hbase-6405-addendum-2-v2.patch, hbase-6405-addendum-2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6426) Add Hadoop 2.0.x profile to 0.92+
[ https://issues.apache.org/jira/browse/HBASE-6426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6426: -- Description: 0.96 already has a Hadoop-2.0 build profile. Let add this to 0.92 and 0.94 as well. (was: 0.96 already has a Hadoop-2.0 build profile. Let add this to 0.92 and 0.96 as well.) Add Hadoop 2.0.x profile to 0.92+ - Key: HBASE-6426 URL: https://issues.apache.org/jira/browse/HBASE-6426 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.2, 0.94.1 0.96 already has a Hadoop-2.0 build profile. Let add this to 0.92 and 0.94 as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6392) UnknownRegionException blocks hbck from sideline big overlap regions
[ https://issues.apache.org/jira/browse/HBASE-6392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13417968#comment-13417968 ] Zhihong Ted Yu commented on HBASE-6392: --- @Jimmy: Can you attach patch for 0.92 to this JIRA ? UnknownRegionException blocks hbck from sideline big overlap regions Key: HBASE-6392 URL: https://issues.apache.org/jira/browse/HBASE-6392 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1 Attachments: 6392-trunk.patch, 6392-trunk_v2.patch Before sidelining a big overlap region, hbck tries to close it and offline it at first. However, sometimes, it throws NotServingRegion or UnknownRegionException. It could be because the region is not open/assigned at all, or some other issue. We should figure out why and fix it. By the way, it's better to print out in the log the command line to bulk load back sidelined regions, if any. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6327) HLog can be null when create table
[ https://issues.apache.org/jira/browse/HBASE-6327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13417993#comment-13417993 ] Zhihong Ted Yu commented on HBASE-6327: --- I was expecting other committer(s) to take a look. HLog can be null when create table -- Key: HBASE-6327 URL: https://issues.apache.org/jira/browse/HBASE-6327 Project: HBase Issue Type: Bug Reporter: ShiXing Assignee: ShiXing Fix For: 0.96.0 Attachments: 6327.txt, HBASE-6327-trunk-V1.patch, createTableFailedMaster.log As HBASE-4010 discussed, the HLog can be null. We have meet createTable failed because the no use hlog. When createHReagion, the HLog.LogSyncer is run sync(), in under layer it call the DFSClient.DFSOutputStream.sync(). Then the hlog.closeAndDelete() was called,firstly the HLog.close() will interrupt the LogSyncer, and interrupt DFSClient.DFSOutputStream.sync().The DFSClient.DFSOutputStream will store the exception and throw it when we called DFSClient.close(). The HLog.close() call the writer.close()/DFSClient.close() after interrupt the LogSyncer. And there is no catch exception for the close(). So the Master throw exception to the client. There is no need to throw this exception, further, the hlog is no use. Our cluster is 0.90, the logs is attached, after closing hlog writer, there is no log for the createTable(). The trunk and 0.92, 0.94, we used just one hlog, and if the exception happends, the client will got createTable failed, but indeed ,we expect all the regions for the table can also be assigned. I will give the patch for this later. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418025#comment-13418025 ] Zhihong Ted Yu commented on HBASE-6389: --- [~areborn]: Can you remove the sentence in Release Notes ? Thanks Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments Key: HBASE-6389 URL: https://issues.apache.org/jira/browse/HBASE-6389 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.0, 0.96.0 Reporter: Aditya Kishore Assignee: Aditya Kishore Priority: Critical Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, HBASE-6389_trunk.patch Continuing from HBASE-6375. It seems I was mistaken in my assumption that changing the value of hbase.master.wait.on.regionservers.mintostart to a sufficient number (from default of 1) can help prevent assignment of all regions to one (or a small number of) region server(s). While this was the case in 0.90.x and 0.92.x, the behavior has changed in 0.94.0 onwards to address HBASE-4993. From 0.94.0 onwards, Master will proceed immediately after the timeout has lapsed, even if hbase.master.wait.on.regionservers.mintostart has not reached. Reading the current conditions of waitForRegionServers() clarifies it {code:title=ServerManager.java (trunk rev:1360470)} 581 /** 582 * Wait for the region servers to report in. 583 * We will wait until one of this condition is met: 584 * - the master is stopped 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of 587 *region servers is reached 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND 589 * there have been no new region server in for 590 * 'hbase.master.wait.on.regionservers.interval' time 591 * 592 * @throws InterruptedException 593 */ 594 public void waitForRegionServers(MonitoredTask status) 595 throws InterruptedException { 612 while ( 613 !this.master.isStopped() 614 slept timeout 615 count maxToStart 616 (lastCountChange+interval now || count minToStart) 617 ){ {code} So with the current conditions, the wait will end as soon as timeout is reached even lesser number of RS have checked-in with the Master and the master will proceed with the region assignment among these RSes alone. As mentioned in -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, and I concur, this could have disastrous effect in large cluster especially now that MSLAB is turned on. To enforce the required quorum as specified by hbase.master.wait.on.regionservers.mintostart irrespective of timeout, these conditions need to be modified as following {code:title=ServerManager.java} .. /** * Wait for the region servers to report in. * We will wait until one of this condition is met: * - the master is stopped * - the 'hbase.master.wait.on.regionservers.maxtostart' number of *region servers is reached * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND * there have been no new region server in for * 'hbase.master.wait.on.regionservers.interval' time AND * the 'hbase.master.wait.on.regionservers.timeout' is reached * * @throws InterruptedException */ public void waitForRegionServers(MonitoredTask status) .. .. int minToStart = this.master.getConfiguration(). getInt(hbase.master.wait.on.regionservers.mintostart, 1); int maxToStart = this.master.getConfiguration(). getInt(hbase.master.wait.on.regionservers.maxtostart, Integer.MAX_VALUE); if (maxToStart minToStart) { maxToStart = minToStart; } .. .. while ( !this.master.isStopped() count maxToStart (lastCountChange+interval now || timeout slept || count minToStart) ){ .. {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418030#comment-13418030 ] Zhihong Ted Yu commented on HBASE-6389: --- It has been 12 minutes since I started running TestZooKeeper based on latest patch. Here is the tail of jstack: {code} main prio=5 tid=102801000 nid=0x100601000 in Object.wait() [1005fe000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 78ebfee68 (a org.apache.hadoop.hbase.util.JVMClusterUtil$RegionServerThread) at java.lang.Thread.join(Thread.java:1210) - locked 78ebfee68 (a org.apache.hadoop.hbase.util.JVMClusterUtil$RegionServerThread) at java.lang.Thread.join(Thread.java:1263) at org.apache.hadoop.hbase.LocalHBaseCluster.waitOnRegionServer(LocalHBaseCluster.java:262) at org.apache.hadoop.hbase.MiniHBaseCluster.waitOnRegionServer(MiniHBaseCluster.java:285) at org.apache.hadoop.hbase.TestZooKeeper.testRegionServerSessionExpired(TestZooKeeper.java:201) {code} Please take a look at: https://issues.apache.org/jira/browse/HBASE-6406?focusedCommentId=13417665page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13417665 See if your finding can explain that symptom. Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments Key: HBASE-6389 URL: https://issues.apache.org/jira/browse/HBASE-6389 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.0, 0.96.0 Reporter: Aditya Kishore Assignee: Aditya Kishore Priority: Critical Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, HBASE-6389_trunk.patch Continuing from HBASE-6375. It seems I was mistaken in my assumption that changing the value of hbase.master.wait.on.regionservers.mintostart to a sufficient number (from default of 1) can help prevent assignment of all regions to one (or a small number of) region server(s). While this was the case in 0.90.x and 0.92.x, the behavior has changed in 0.94.0 onwards to address HBASE-4993. From 0.94.0 onwards, Master will proceed immediately after the timeout has lapsed, even if hbase.master.wait.on.regionservers.mintostart has not reached. Reading the current conditions of waitForRegionServers() clarifies it {code:title=ServerManager.java (trunk rev:1360470)} 581 /** 582 * Wait for the region servers to report in. 583 * We will wait until one of this condition is met: 584 * - the master is stopped 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of 587 *region servers is reached 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND 589 * there have been no new region server in for 590 * 'hbase.master.wait.on.regionservers.interval' time 591 * 592 * @throws InterruptedException 593 */ 594 public void waitForRegionServers(MonitoredTask status) 595 throws InterruptedException { 612 while ( 613 !this.master.isStopped() 614 slept timeout 615 count maxToStart 616 (lastCountChange+interval now || count minToStart) 617 ){ {code} So with the current conditions, the wait will end as soon as timeout is reached even lesser number of RS have checked-in with the Master and the master will proceed with the region assignment among these RSes alone. As mentioned in -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, and I concur, this could have disastrous effect in large cluster especially now that MSLAB is turned on. To enforce the required quorum as specified by hbase.master.wait.on.regionservers.mintostart irrespective of timeout, these conditions need to be modified as following {code:title=ServerManager.java} .. /** * Wait for the region servers to report in. * We will wait until one of this condition is met: * - the master is stopped * - the 'hbase.master.wait.on.regionservers.maxtostart' number of *region servers is reached * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND * there have been no new region server in for * 'hbase.master.wait.on.regionservers.interval' time AND * the 'hbase.master.wait.on.regionservers.timeout' is reached * * @throws InterruptedException
[jira] [Comment Edited] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418025#comment-13418025 ] Zhihong Ted Yu edited comment on HBASE-6389 at 7/19/12 4:16 AM: @Aditya: Can you remove the sentence in Release Notes ? Thanks was (Author: zhi...@ebaysf.com): [~areborn]: Can you remove the sentence in Release Notes ? Thanks Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments Key: HBASE-6389 URL: https://issues.apache.org/jira/browse/HBASE-6389 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.0, 0.96.0 Reporter: Aditya Kishore Assignee: Aditya Kishore Priority: Critical Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, HBASE-6389_trunk.patch Continuing from HBASE-6375. It seems I was mistaken in my assumption that changing the value of hbase.master.wait.on.regionservers.mintostart to a sufficient number (from default of 1) can help prevent assignment of all regions to one (or a small number of) region server(s). While this was the case in 0.90.x and 0.92.x, the behavior has changed in 0.94.0 onwards to address HBASE-4993. From 0.94.0 onwards, Master will proceed immediately after the timeout has lapsed, even if hbase.master.wait.on.regionservers.mintostart has not reached. Reading the current conditions of waitForRegionServers() clarifies it {code:title=ServerManager.java (trunk rev:1360470)} 581 /** 582 * Wait for the region servers to report in. 583 * We will wait until one of this condition is met: 584 * - the master is stopped 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of 587 *region servers is reached 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND 589 * there have been no new region server in for 590 * 'hbase.master.wait.on.regionservers.interval' time 591 * 592 * @throws InterruptedException 593 */ 594 public void waitForRegionServers(MonitoredTask status) 595 throws InterruptedException { 612 while ( 613 !this.master.isStopped() 614 slept timeout 615 count maxToStart 616 (lastCountChange+interval now || count minToStart) 617 ){ {code} So with the current conditions, the wait will end as soon as timeout is reached even lesser number of RS have checked-in with the Master and the master will proceed with the region assignment among these RSes alone. As mentioned in -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, and I concur, this could have disastrous effect in large cluster especially now that MSLAB is turned on. To enforce the required quorum as specified by hbase.master.wait.on.regionservers.mintostart irrespective of timeout, these conditions need to be modified as following {code:title=ServerManager.java} .. /** * Wait for the region servers to report in. * We will wait until one of this condition is met: * - the master is stopped * - the 'hbase.master.wait.on.regionservers.maxtostart' number of *region servers is reached * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND * there have been no new region server in for * 'hbase.master.wait.on.regionservers.interval' time AND * the 'hbase.master.wait.on.regionservers.timeout' is reached * * @throws InterruptedException */ public void waitForRegionServers(MonitoredTask status) .. .. int minToStart = this.master.getConfiguration(). getInt(hbase.master.wait.on.regionservers.mintostart, 1); int maxToStart = this.master.getConfiguration(). getInt(hbase.master.wait.on.regionservers.maxtostart, Integer.MAX_VALUE); if (maxToStart minToStart) { maxToStart = minToStart; } .. .. while ( !this.master.isStopped() count maxToStart (lastCountChange+interval now || timeout slept || count minToStart) ){ .. {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6406) TestReplicationPeer.testResetZooKeeperSession and TestZooKeeper.testClientSessionExpired fail frequently
[ https://issues.apache.org/jira/browse/HBASE-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418047#comment-13418047 ] Zhihong Ted Yu commented on HBASE-6406: --- TestReplicationPeer.java should be removed from trunk as well, right ? TestReplicationPeer.testResetZooKeeperSession and TestZooKeeper.testClientSessionExpired fail frequently Key: HBASE-6406 URL: https://issues.apache.org/jira/browse/HBASE-6406 Project: HBase Issue Type: Bug Affects Versions: 0.94.1 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.2 Attachments: testReplication.jstack, testZooKeeper.jstack Looking back through the 0.94 test runs these two tests accounted for 11 of 34 failed tests. They should be fixed or (temporarily) disabled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6421) [pom] add jettison and fix netty specification
[ https://issues.apache.org/jira/browse/HBASE-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418048#comment-13418048 ] Zhihong Ted Yu commented on HBASE-6421: --- Integrated to trunk. Thanks for the patch, Jesse. [pom] add jettison and fix netty specification -- Key: HBASE-6421 URL: https://issues.apache.org/jira/browse/HBASE-6421 Project: HBase Issue Type: Bug Reporter: Jesse Yates Assignee: Jesse Yates Attachments: hbase-6421-v0.patch, hbase-6421-v1.patch Currently, jettison isn't required for testing hbase-server, but TestSchemaConfigured requires it, causing the compile phase (at least on my MBP) to fail. Further, in cleaning up the poms, netty should be declared in the parent hbase/pom.xml and then inherited in the subclass. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5447) Support for custom filters with PB-based RPC
[ https://issues.apache.org/jira/browse/HBASE-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418059#comment-13418059 ] Zhihong Ted Yu commented on HBASE-5447: --- bq. just derive from FilterBasePB (where they are required to implement the readFields and write methods, as today) I guess there was a typo above: FilterBaseWritable should have been used. Support for custom filters with PB-based RPC Key: HBASE-5447 URL: https://issues.apache.org/jira/browse/HBASE-5447 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Todd Lipcon Assignee: Todd Lipcon -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6399) MetricsContext be different between RegionServerMetrics and RegionServerDynamicMetrics
[ https://issues.apache.org/jira/browse/HBASE-6399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13416171#comment-13416171 ] Zhihong Ted Yu commented on HBASE-6399: --- {code} +# dynamic.fileName=/tmp/metrics_jvm.log {code} The filename should be metrics_dynamic.log or similar, right ? There are no rrd file pileup with the patch applied, I assume. MetricsContext be different between RegionServerMetrics and RegionServerDynamicMetrics -- Key: HBASE-6399 URL: https://issues.apache.org/jira/browse/HBASE-6399 Project: HBase Issue Type: Bug Components: metrics Affects Versions: 0.94.0 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6399.patch In hadoop-metrics.properties, GangliaContext is optional metrics context, I think we will use ganglia to monitor hbase cluster generally. However, I find a serious problem: RegionServerDynamicMetrics will generate lots of rrd file because we would move region or create/delete table. Especially if table is created everyday in some applications, there are much more and more rrd files in Gmetad Server. It will make Gmetad Server corrupted. IMO, MetricsContext should be different between RegionServerMetrics and RegionServerDynamicMetrics -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6401) HBase may lose edits after a crash if used with HDFS 1.0.3 or older
[ https://issues.apache.org/jira/browse/HBASE-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13416223#comment-13416223 ] Zhihong Ted Yu commented on HBASE-6401: --- The test is for hadoop. Is there HADOOP- JIRA for this bug ? HBase may lose edits after a crash if used with HDFS 1.0.3 or older --- Key: HBASE-6401 URL: https://issues.apache.org/jira/browse/HBASE-6401 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Priority: Critical Attachments: TestReadAppendWithDeadDN.java This comes from a hdfs bug, fixed in some hdfs versions. I haven't found the hdfs jira for this. Context: HBase Write Ahead Log features. This is using hdfs append. If the node crashes, the file that was written is read by other processes to replay the action. - So we have in hdfs one (dead) process writing with another process reading. - But, despite the call to syncFs, we don't always see the data when we have a dead node. It seems to be because the call in DFSClient#updateBlockInfo ignores the ipc errors and set the length to 0. - So we may miss all the writes to the last block if we try to connect to the dead DN. hdfs 1.0.3, branch-1 or branch-1-win: we have the issue http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java?revision=1359853view=markup hdfs branch-2 or trunk: we should not have the issue (but not tested) http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java?view=markup The attached test will fail ~50 of the time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6400) Add getMasterAdmin() and getMasterMonitor() to HConnection
[ https://issues.apache.org/jira/browse/HBASE-6400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13416358#comment-13416358 ] Zhihong Ted Yu commented on HBASE-6400: --- {code} +public MasterAdminProtocol getMasterAdmin() throws MasterNotRunningException { {code} @Override seems to be missing. Add getMasterAdmin() and getMasterMonitor() to HConnection -- Key: HBASE-6400 URL: https://issues.apache.org/jira/browse/HBASE-6400 Project: HBase Issue Type: Improvement Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-6400_v1.patch HConnection used to have getMasterInterface(), but after HBASE-6039 it has been removed. I think we need to expose HConnection.getMasterAdmin() and getMasterMonitor() a la HConnection.getAdmin(), and getClient(). HConnectionImplementation has getKeepAliveMasterAdmin() but, I see no reason to leak keep alive classes to upper layers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6403) RegionCoprocessorHost provides empty config when loading a coprocessor
[ https://issues.apache.org/jira/browse/HBASE-6403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13416421#comment-13416421 ] Zhihong Ted Yu commented on HBASE-6403: --- Nice finding, Eric. Looks like HBaseConfiguration.create() should be aware of the type of Configuration object passed to it. Meaning, CompoundConfiguration should be pulled into hbase-common module. RegionCoprocessorHost provides empty config when loading a coprocessor -- Key: HBASE-6403 URL: https://issues.apache.org/jira/browse/HBASE-6403 Project: HBase Issue Type: Bug Components: regionserver Reporter: Eric Newton Priority: Minor I started playing with Giraffa. I am running it against Hadoop 2.0.0-alpha, and current HBase trunk. On line 159 of RegionCoprocessorHost, the server's configuration is copied... or at least an attempt is made to copy it. However, the server's configuration object, a CompoundConfiguration, does not store the data in the same way as the base Configuration object, and so nothing is copied. This leaves the coprocessor without access to configuration values, like the fs.defaultFS, which Giraffa is looking for. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6400) Add getMasterAdmin() and getMasterMonitor() to HConnection
[ https://issues.apache.org/jira/browse/HBASE-6400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6400: -- Description: HConnection used to have getMaster() which returns HMasterInterface, but after HBASE-6039 it has been removed. I think we need to expose HConnection.getMasterAdmin() and getMasterMonitor() a la HConnection.getAdmin(), and getClient(). HConnectionImplementation has getKeepAliveMasterAdmin() but, I see no reason to leak keep alive classes to upper layers. was: HConnection used to have getMasterInterface(), but after HBASE-6039 it has been removed. I think we need to expose HConnection.getMasterAdmin() and getMasterMonitor() a la HConnection.getAdmin(), and getClient(). HConnectionImplementation has getKeepAliveMasterAdmin() but, I see no reason to leak keep alive classes to upper layers. Fix Version/s: 0.96.0 Add getMasterAdmin() and getMasterMonitor() to HConnection -- Key: HBASE-6400 URL: https://issues.apache.org/jira/browse/HBASE-6400 Project: HBase Issue Type: Improvement Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.96.0 Attachments: HBASE-6400_v1.patch HConnection used to have getMaster() which returns HMasterInterface, but after HBASE-6039 it has been removed. I think we need to expose HConnection.getMasterAdmin() and getMasterMonitor() a la HConnection.getAdmin(), and getClient(). HConnectionImplementation has getKeepAliveMasterAdmin() but, I see no reason to leak keep alive classes to upper layers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6400) Add getMasterAdmin() and getMasterMonitor() to HConnection
[ https://issues.apache.org/jira/browse/HBASE-6400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6400: -- Attachment: 6400-v2.patch Patch v2 adds @Override. Add getMasterAdmin() and getMasterMonitor() to HConnection -- Key: HBASE-6400 URL: https://issues.apache.org/jira/browse/HBASE-6400 Project: HBase Issue Type: Improvement Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.96.0 Attachments: 6400-v2.patch, HBASE-6400_v1.patch HConnection used to have getMaster() which returns HMasterInterface, but after HBASE-6039 it has been removed. I think we need to expose HConnection.getMasterAdmin() and getMasterMonitor() a la HConnection.getAdmin(), and getClient(). HConnectionImplementation has getKeepAliveMasterAdmin() but, I see no reason to leak keep alive classes to upper layers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6400) Add getMasterAdmin() and getMasterMonitor() to HConnection
[ https://issues.apache.org/jira/browse/HBASE-6400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6400: -- Hadoop Flags: Reviewed Status: Patch Available (was: Open) Add getMasterAdmin() and getMasterMonitor() to HConnection -- Key: HBASE-6400 URL: https://issues.apache.org/jira/browse/HBASE-6400 Project: HBase Issue Type: Improvement Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.96.0 Attachments: 6400-v2.patch, HBASE-6400_v1.patch HConnection used to have getMaster() which returns HMasterInterface, but after HBASE-6039 it has been removed. I think we need to expose HConnection.getMasterAdmin() and getMasterMonitor() a la HConnection.getAdmin(), and getClient(). HConnectionImplementation has getKeepAliveMasterAdmin() but, I see no reason to leak keep alive classes to upper layers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4050) Update HBase metrics framework to metrics2 framework
[ https://issues.apache.org/jira/browse/HBASE-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13416543#comment-13416543 ] Zhihong Ted Yu commented on HBASE-4050: --- Would integrate the patch later this afternoon if there is no further review comment. Update HBase metrics framework to metrics2 framework Key: HBASE-4050 URL: https://issues.apache.org/jira/browse/HBASE-4050 Project: HBase Issue Type: New Feature Components: metrics Affects Versions: 0.90.4 Environment: Java 6 Reporter: Eric Yang Assignee: Elliott Clark Priority: Critical Fix For: 0.96.0 Attachments: 4050-metrics-v2.patch, 4050-metrics-v3.patch, HBASE-4050-0.patch, HBASE-4050-1.patch, HBASE-4050-2.patch, HBASE-4050-3.patch, HBASE-4050-5.patch, HBASE-4050-6.patch, HBASE-4050-7.patch, HBASE-4050-8.patch, HBASE-4050.patch Metrics Framework has been marked deprecated in Hadoop 0.20.203+ and 0.22+, and it might get removed in future Hadoop release. Hence, HBase needs to revise the dependency of MetricsContext to use Metrics2 framework. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4050) Update HBase metrics framework to metrics2 framework
[ https://issues.apache.org/jira/browse/HBASE-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13416599#comment-13416599 ] Zhihong Ted Yu commented on HBASE-4050: --- How long would it take to align them with the work from Elliot ? Update HBase metrics framework to metrics2 framework Key: HBASE-4050 URL: https://issues.apache.org/jira/browse/HBASE-4050 Project: HBase Issue Type: New Feature Components: metrics Affects Versions: 0.90.4 Environment: Java 6 Reporter: Eric Yang Assignee: Elliott Clark Priority: Critical Fix For: 0.96.0 Attachments: 4050-metrics-v2.patch, 4050-metrics-v3.patch, HBASE-4050-0.patch, HBASE-4050-1.patch, HBASE-4050-2.patch, HBASE-4050-3.patch, HBASE-4050-5.patch, HBASE-4050-6.patch, HBASE-4050-7.patch, HBASE-4050-8.patch, HBASE-4050.patch Metrics Framework has been marked deprecated in Hadoop 0.20.203+ and 0.22+, and it might get removed in future Hadoop release. Hence, HBase needs to revise the dependency of MetricsContext to use Metrics2 framework. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4050) Update HBase metrics framework to metrics2 framework
[ https://issues.apache.org/jira/browse/HBASE-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13416608#comment-13416608 ] Zhihong Ted Yu commented on HBASE-4050: --- The above is a nice list. The title of this JIRA is general sounding. Does it make sense to create the above sub-tasks (including Elliot's latest patch) under this JIRA ? Update HBase metrics framework to metrics2 framework Key: HBASE-4050 URL: https://issues.apache.org/jira/browse/HBASE-4050 Project: HBase Issue Type: New Feature Components: metrics Affects Versions: 0.90.4 Environment: Java 6 Reporter: Eric Yang Assignee: Elliott Clark Priority: Critical Fix For: 0.96.0 Attachments: 4050-metrics-v2.patch, 4050-metrics-v3.patch, HBASE-4050-0.patch, HBASE-4050-1.patch, HBASE-4050-2.patch, HBASE-4050-3.patch, HBASE-4050-5.patch, HBASE-4050-6.patch, HBASE-4050-7.patch, HBASE-4050-8.patch, HBASE-4050.patch Metrics Framework has been marked deprecated in Hadoop 0.20.203+ and 0.22+, and it might get removed in future Hadoop release. Hence, HBase needs to revise the dependency of MetricsContext to use Metrics2 framework. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6405) Create Hadoop compatibilty modules and Metrics2 implementation of replication metrics
Zhihong Ted Yu created HBASE-6405: - Summary: Create Hadoop compatibilty modules and Metrics2 implementation of replication metrics Key: HBASE-6405 URL: https://issues.apache.org/jira/browse/HBASE-6405 Project: HBase Issue Type: Sub-task Reporter: Zhihong Ted Yu Assignee: Elliott Clark Fix For: 0.96.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4050) Update HBase metrics framework to metrics2 framework
[ https://issues.apache.org/jira/browse/HBASE-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-4050: -- Status: Open (was: Patch Available) Update HBase metrics framework to metrics2 framework Key: HBASE-4050 URL: https://issues.apache.org/jira/browse/HBASE-4050 Project: HBase Issue Type: New Feature Components: metrics Affects Versions: 0.90.4 Environment: Java 6 Reporter: Eric Yang Assignee: Elliott Clark Priority: Critical Fix For: 0.96.0 Attachments: 4050-metrics-v2.patch, 4050-metrics-v3.patch, HBASE-4050-0.patch, HBASE-4050-1.patch, HBASE-4050-2.patch, HBASE-4050-3.patch, HBASE-4050-5.patch, HBASE-4050-6.patch, HBASE-4050-7.patch, HBASE-4050-8.patch, HBASE-4050.patch Metrics Framework has been marked deprecated in Hadoop 0.20.203+ and 0.22+, and it might get removed in future Hadoop release. Hence, HBase needs to revise the dependency of MetricsContext to use Metrics2 framework. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6405) Create Hadoop compatibilty modules and Metrics2 implementation of replication metrics
[ https://issues.apache.org/jira/browse/HBASE-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6405: -- Attachment: 6405.txt Same patch as 4050-8.patch from Elliot. Create Hadoop compatibilty modules and Metrics2 implementation of replication metrics - Key: HBASE-6405 URL: https://issues.apache.org/jira/browse/HBASE-6405 Project: HBase Issue Type: Sub-task Reporter: Zhihong Ted Yu Assignee: Elliott Clark Fix For: 0.96.0 Attachments: 6405.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6405) Create Hadoop compatibilty modules and Metrics2 implementation of replication metrics
[ https://issues.apache.org/jira/browse/HBASE-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13416651#comment-13416651 ] Zhihong Ted Yu commented on HBASE-6405: --- Integrated to trunk. Thanks for the patch, Elliot. Thanks for the review, Stack. Create Hadoop compatibilty modules and Metrics2 implementation of replication metrics - Key: HBASE-6405 URL: https://issues.apache.org/jira/browse/HBASE-6405 Project: HBase Issue Type: Sub-task Reporter: Zhihong Ted Yu Assignee: Elliott Clark Fix For: 0.96.0 Attachments: 6405.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6405) Create Hadoop compatibilty modules and Metrics2 implementation of replication metrics
[ https://issues.apache.org/jira/browse/HBASE-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13416679#comment-13416679 ] Zhihong Ted Yu commented on HBASE-6405: --- Addendum checked in. Thanks for the quick turn-around, Elliot. Create Hadoop compatibilty modules and Metrics2 implementation of replication metrics - Key: HBASE-6405 URL: https://issues.apache.org/jira/browse/HBASE-6405 Project: HBase Issue Type: Sub-task Reporter: Zhihong Ted Yu Assignee: Elliott Clark Fix For: 0.96.0 Attachments: 6405.txt, HBASE-6405-ADD.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6261) Better approximate high-percentile percentile latency metrics
[ https://issues.apache.org/jira/browse/HBASE-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13416706#comment-13416706 ] Zhihong Ted Yu commented on HBASE-6261: --- bq. copy the code over and then refactor it away +1 on above. Better approximate high-percentile percentile latency metrics - Key: HBASE-6261 URL: https://issues.apache.org/jira/browse/HBASE-6261 Project: HBase Issue Type: New Feature Reporter: Andrew Wang Assignee: Andrew Wang Labels: metrics Attachments: Latencyestimation.pdf The existing reservoir-sampling based latency metrics in HBase are not well-suited for providing accurate estimates of high-percentile (e.g. 90th, 95th, or 99th) latency. This is a well-studied problem in the literature (see [1] and [2]), the question is determining which methods best suit our needs and then implementing it. Ideally, we should be able to estimate these high percentiles with minimal memory and CPU usage as well as minimal error (e.g. 1% error on 90th, or .1% on 99th). It's also desirable to provide this over different time-based sliding windows, e.g. last 1 min, 5 mins, 15 mins, and 1 hour. I'll note that this would also be useful in HDFS, or really anywhere latency metrics are kept. [1] http://www.cs.rutgers.edu/~muthu/bquant.pdf [2] http://infolab.stanford.edu/~manku/papers/04pods-sliding.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4050) Update HBase metrics framework to metrics2 framework
[ https://issues.apache.org/jira/browse/HBASE-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13416709#comment-13416709 ] Zhihong Ted Yu commented on HBASE-4050: --- bq. filing JIRA to allow metrics removals +1 on above. Update HBase metrics framework to metrics2 framework Key: HBASE-4050 URL: https://issues.apache.org/jira/browse/HBASE-4050 Project: HBase Issue Type: New Feature Components: metrics Affects Versions: 0.90.4 Environment: Java 6 Reporter: Eric Yang Assignee: Elliott Clark Priority: Critical Fix For: 0.96.0 Attachments: 4050-metrics-v2.patch, 4050-metrics-v3.patch, HBASE-4050-0.patch, HBASE-4050-1.patch, HBASE-4050-2.patch, HBASE-4050-3.patch, HBASE-4050-5.patch, HBASE-4050-6.patch, HBASE-4050-7.patch, HBASE-4050-8.patch, HBASE-4050.patch Metrics Framework has been marked deprecated in Hadoop 0.20.203+ and 0.22+, and it might get removed in future Hadoop release. Hence, HBase needs to revise the dependency of MetricsContext to use Metrics2 framework. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6406) TestReplicationPeer.testResetZooKeeperSession and TestZooKeeper.testClientSessionExpired fail frequently
[ https://issues.apache.org/jira/browse/HBASE-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6406: -- Attachment: testZooKeeper.jstack testReplication.jstack I ran trunk test suite and there were two surefire JVMs hanging. Here're their jstack's TestReplicationPeer.testResetZooKeeperSession and TestZooKeeper.testClientSessionExpired fail frequently Key: HBASE-6406 URL: https://issues.apache.org/jira/browse/HBASE-6406 Project: HBase Issue Type: Bug Affects Versions: 0.94.1 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.2 Attachments: testReplication.jstack, testZooKeeper.jstack Looking back through the 0.94 test runs these two tests accounted for 11 of 34 failed tests. They should be fixed or (temporarily) disabled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6399) MetricsContext be different between RegionServerMetrics and RegionServerDynamicMetrics
[ https://issues.apache.org/jira/browse/HBASE-6399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13416839#comment-13416839 ] Zhihong Ted Yu commented on HBASE-6399: --- Patch v2 looks good to me. MetricsContext be different between RegionServerMetrics and RegionServerDynamicMetrics -- Key: HBASE-6399 URL: https://issues.apache.org/jira/browse/HBASE-6399 Project: HBase Issue Type: Bug Components: metrics Affects Versions: 0.94.0 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6399.patch, HBASE-6399v2.patch In hadoop-metrics.properties, GangliaContext is optional metrics context, I think we will use ganglia to monitor hbase cluster generally. However, I find a serious problem: RegionServerDynamicMetrics will generate lots of rrd file because we would move region or create/delete table. Especially if table is created everyday in some applications, there are much more and more rrd files in Gmetad Server. It will make Gmetad Server corrupted. IMO, MetricsContext should be different between RegionServerMetrics and RegionServerDynamicMetrics -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6399) MetricsContext should be different between RegionServerMetrics and RegionServerDynamicMetrics
[ https://issues.apache.org/jira/browse/HBASE-6399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6399: -- Hadoop Flags: Reviewed Summary: MetricsContext should be different between RegionServerMetrics and RegionServerDynamicMetrics (was: MetricsContext be different between RegionServerMetrics and RegionServerDynamicMetrics) MetricsContext should be different between RegionServerMetrics and RegionServerDynamicMetrics - Key: HBASE-6399 URL: https://issues.apache.org/jira/browse/HBASE-6399 Project: HBase Issue Type: Bug Components: metrics Affects Versions: 0.94.0 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6399.patch, HBASE-6399v2.patch In hadoop-metrics.properties, GangliaContext is optional metrics context, I think we will use ganglia to monitor hbase cluster generally. However, I find a serious problem: RegionServerDynamicMetrics will generate lots of rrd file because we would move region or create/delete table. Especially if table is created everyday in some applications, there are much more and more rrd files in Gmetad Server. It will make Gmetad Server corrupted. IMO, MetricsContext should be different between RegionServerMetrics and RegionServerDynamicMetrics -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96
[ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13415410#comment-13415410 ] Zhihong Ted Yu commented on HBASE-6055: --- Thanks for the hint, Jon. I thought of that approach. I recently looked up related classes in the patch using vi directly. It would be nice if we can reduce the number of classes: controller, monitor, manager, sentinel, etc. It is hard to follow :-) I have gone through about 2.5 pages of diff. I can see there is more work to be done for Global snapshot. Snapshots in HBase 0.96 --- Key: HBASE-6055 URL: https://issues.apache.org/jira/browse/HBASE-6055 Project: HBase Issue Type: New Feature Components: client, master, regionserver, zookeeper Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.96.0 Attachments: Snapshots in HBase.docx Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6272) In-memory region state is inconsistent
[ https://issues.apache.org/jira/browse/HBASE-6272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13415421#comment-13415421 ] Zhihong Ted Yu commented on HBASE-6272: --- Understood, Jimmy. Sounds like a good plan. In-memory region state is inconsistent -- Key: HBASE-6272 URL: https://issues.apache.org/jira/browse/HBASE-6272 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang AssignmentManger stores region state related information in several places: regionsInTransition, regions (region info to server name map), and servers (server name to region info set map). However the access to these places is not coordinated properly. It leads to inconsistent in-memory region state information. Sometimes, some region could even be offline, and not in transition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6336) Split point should not be equal to start row or end row
[ https://issues.apache.org/jira/browse/HBASE-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6336: -- Summary: Split point should not be equal to start row or end row (was: Split point should not be equal with start row or end row) Split point should not be equal to start row or end row --- Key: HBASE-6336 URL: https://issues.apache.org/jira/browse/HBASE-6336 Project: HBase Issue Type: Bug Components: regionserver Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0 Attachments: HBASE-6336.patch Should we allow split point equal with region's start row or end row? {code} // if the midkey is the same as the first and last keys, then we cannot // (ever) split this region. if (this.comparator.compareRows(mk, firstKey) == 0 this.comparator.compareRows(mk, lastKey) == 0) { if (LOG.isDebugEnabled()) { LOG.debug(cannot split because midkey is the same as first or + last row); } {code} Here, I think it is a mistake. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6336) Split point should not be equal to start row or end row
[ https://issues.apache.org/jira/browse/HBASE-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13415874#comment-13415874 ] Zhihong Ted Yu commented on HBASE-6336: --- Integrated to trunk. Thanks for the patch, Chunhui. Thanks for the review, Stack and Ram. Split point should not be equal to start row or end row --- Key: HBASE-6336 URL: https://issues.apache.org/jira/browse/HBASE-6336 Project: HBase Issue Type: Bug Components: regionserver Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0 Attachments: HBASE-6336.patch Should we allow split point equal with region's start row or end row? {code} // if the midkey is the same as the first and last keys, then we cannot // (ever) split this region. if (this.comparator.compareRows(mk, firstKey) == 0 this.comparator.compareRows(mk, lastKey) == 0) { if (LOG.isDebugEnabled()) { LOG.debug(cannot split because midkey is the same as first or + last row); } {code} Here, I think it is a mistake. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13415879#comment-13415879 ] Zhihong Ted Yu commented on HBASE-5416: --- I ran TestJoinedScanners on Linux and observed the following in test output: {code} 2012-07-16 17:31:52,339 INFO [main] regionserver.TestJoinedScanners(152): Slow scanner finished in 96.393137286 seconds, got 1000 rows ... 2012-07-16 17:32:05,026 INFO [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 12.687607287 seconds, got 1000 rows {code} Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Fix For: 0.96.0 Attachments: 5416-Filtered_scans_v6.patch, 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch, Filtered_scans_v5.1.patch, Filtered_scans_v5.patch, Filtered_scans_v7.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96
[ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13414689#comment-13414689 ] Zhihong Ted Yu commented on HBASE-6055: --- Flipping through 5 pages on review board is slow. So I am putting down some notes here. For HStore.java: The license header doesn't look like the standard format. Please add audience and stability annotations to this new interface. {code} + FileStatus[] getStoreFiles() throws IOException; + + ListStoreFile getStorefiles(); {code} Why do we need two methods which are spelled almost the same, yet returning different types ? When refactoring, we should make the code cleaner. There're many methods which don't have javadoc. Please add javadoc for them. {code} + public HStore getDelgate() { {code} Correct spelling for the above method. Snapshots in HBase 0.96 --- Key: HBASE-6055 URL: https://issues.apache.org/jira/browse/HBASE-6055 Project: HBase Issue Type: New Feature Components: client, master, regionserver, zookeeper Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.96.0 Attachments: Snapshots in HBase.docx Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6272) In-memory region state is inconsistent
[ https://issues.apache.org/jira/browse/HBASE-6272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13414690#comment-13414690 ] Zhihong Ted Yu commented on HBASE-6272: --- @Jimmy: Can you clarify whether the test on trunk was performed in a live cluster ? In-memory region state is inconsistent -- Key: HBASE-6272 URL: https://issues.apache.org/jira/browse/HBASE-6272 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang AssignmentManger stores region state related information in several places: regionsInTransition, regions (region info to server name map), and servers (server name to region info set map). However the access to these places is not coordinated properly. It leads to inconsistent in-memory region state information. Sometimes, some region could even be offline, and not in transition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6272) In-memory region state is inconsistent
[ https://issues.apache.org/jira/browse/HBASE-6272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13414767#comment-13414767 ] Zhihong Ted Yu commented on HBASE-6272: --- Please consider the scenario described in HBASE-6060 Thanks In-memory region state is inconsistent -- Key: HBASE-6272 URL: https://issues.apache.org/jira/browse/HBASE-6272 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang AssignmentManger stores region state related information in several places: regionsInTransition, regions (region info to server name map), and servers (server name to region info set map). However the access to these places is not coordinated properly. It leads to inconsistent in-memory region state information. Sometimes, some region could even be offline, and not in transition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6396) Fix NoSuchMethodError running against hadoop 2.0
Zhihong Ted Yu created HBASE-6396: - Summary: Fix NoSuchMethodError running against hadoop 2.0 Key: HBASE-6396 URL: https://issues.apache.org/jira/browse/HBASE-6396 Project: HBase Issue Type: Bug Reporter: Zhihong Ted Yu Assignee: Zhihong Ted Yu HADOOP-8350 changed the signature of NetUtils.getInputStream() This leads to NoSuchMethodError in HBaseClient$Connection.setupIOstreams(). See https://issues.apache.org/jira/browse/HADOOP-8350?focusedCommentId=13414276page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13414276 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6396) Fix NoSuchMethodError running against hadoop 2.0
[ https://issues.apache.org/jira/browse/HBASE-6396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6396: -- Attachment: 6396.txt Patch allows TestPBOnWritableRpc to pass against hadoop 2.0 Fix NoSuchMethodError running against hadoop 2.0 Key: HBASE-6396 URL: https://issues.apache.org/jira/browse/HBASE-6396 Project: HBase Issue Type: Bug Reporter: Zhihong Ted Yu Assignee: Zhihong Ted Yu Fix For: 0.96.0 Attachments: 6396.txt HADOOP-8350 changed the signature of NetUtils.getInputStream() This leads to NoSuchMethodError in HBaseClient$Connection.setupIOstreams(). See https://issues.apache.org/jira/browse/HADOOP-8350?focusedCommentId=13414276page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13414276 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6396) Fix NoSuchMethodError running against hadoop 2.0
[ https://issues.apache.org/jira/browse/HBASE-6396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6396: -- Fix Version/s: 0.96.0 Fix NoSuchMethodError running against hadoop 2.0 Key: HBASE-6396 URL: https://issues.apache.org/jira/browse/HBASE-6396 Project: HBase Issue Type: Bug Reporter: Zhihong Ted Yu Assignee: Zhihong Ted Yu Fix For: 0.96.0 Attachments: 6396.txt HADOOP-8350 changed the signature of NetUtils.getInputStream() This leads to NoSuchMethodError in HBaseClient$Connection.setupIOstreams(). See https://issues.apache.org/jira/browse/HADOOP-8350?focusedCommentId=13414276page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13414276 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6396) Fix NoSuchMethodError running against hadoop 2.0
[ https://issues.apache.org/jira/browse/HBASE-6396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6396: -- Status: Patch Available (was: Open) Fix NoSuchMethodError running against hadoop 2.0 Key: HBASE-6396 URL: https://issues.apache.org/jira/browse/HBASE-6396 Project: HBase Issue Type: Bug Reporter: Zhihong Ted Yu Assignee: Zhihong Ted Yu Fix For: 0.96.0 Attachments: 6396.txt HADOOP-8350 changed the signature of NetUtils.getInputStream() This leads to NoSuchMethodError in HBaseClient$Connection.setupIOstreams(). See https://issues.apache.org/jira/browse/HADOOP-8350?focusedCommentId=13414276page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13414276 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6396) Fix NoSuchMethodError running against hadoop 2.0
[ https://issues.apache.org/jira/browse/HBASE-6396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13414436#comment-13414436 ] Zhihong Ted Yu commented on HBASE-6396: --- I first saw this error from the following thread: http://search-hadoop.com/m/MeBkIxlDj52/Hive+and+CDH4+GAsubj=Hive+and+CDH4+GA I put my suggestion on HADOOP-8350 as to how impact for downstream projects can be minimized. But the cast is safe to have regardless of how HADOOP-8350 is implemented. Fix NoSuchMethodError running against hadoop 2.0 Key: HBASE-6396 URL: https://issues.apache.org/jira/browse/HBASE-6396 Project: HBase Issue Type: Bug Reporter: Zhihong Ted Yu Assignee: Zhihong Ted Yu Fix For: 0.96.0 Attachments: 6396.txt HADOOP-8350 changed the signature of NetUtils.getInputStream() This leads to NoSuchMethodError in HBaseClient$Connection.setupIOstreams(). See https://issues.apache.org/jira/browse/HADOOP-8350?focusedCommentId=13414276page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13414276 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6396) Fix NoSuchMethodError running against hadoop 2.0
[ https://issues.apache.org/jira/browse/HBASE-6396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6396: -- Status: Open (was: Patch Available) Fix NoSuchMethodError running against hadoop 2.0 Key: HBASE-6396 URL: https://issues.apache.org/jira/browse/HBASE-6396 Project: HBase Issue Type: Bug Reporter: Zhihong Ted Yu Assignee: Zhihong Ted Yu Fix For: 0.96.0 HADOOP-8350 changed the signature of NetUtils.getInputStream() This leads to NoSuchMethodError in HBaseClient$Connection.setupIOstreams(). See https://issues.apache.org/jira/browse/HADOOP-8350?focusedCommentId=13414276page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13414276 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6396) Fix NoSuchMethodError running against hadoop 2.0
[ https://issues.apache.org/jira/browse/HBASE-6396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6396: -- Attachment: (was: 6396.txt) Fix NoSuchMethodError running against hadoop 2.0 Key: HBASE-6396 URL: https://issues.apache.org/jira/browse/HBASE-6396 Project: HBase Issue Type: Bug Reporter: Zhihong Ted Yu Assignee: Zhihong Ted Yu Fix For: 0.96.0 HADOOP-8350 changed the signature of NetUtils.getInputStream() This leads to NoSuchMethodError in HBaseClient$Connection.setupIOstreams(). See https://issues.apache.org/jira/browse/HADOOP-8350?focusedCommentId=13414276page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13414276 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6396) Fix NoSuchMethodError running against hadoop 2.0
[ https://issues.apache.org/jira/browse/HBASE-6396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6396: -- Status: Patch Available (was: Open) Fix NoSuchMethodError running against hadoop 2.0 Key: HBASE-6396 URL: https://issues.apache.org/jira/browse/HBASE-6396 Project: HBase Issue Type: Bug Reporter: Zhihong Ted Yu Assignee: Zhihong Ted Yu Fix For: 0.96.0 Attachments: 6396-v2.txt HADOOP-8350 changed the signature of NetUtils.getInputStream() This leads to NoSuchMethodError in HBaseClient$Connection.setupIOstreams(). See https://issues.apache.org/jira/browse/HADOOP-8350?focusedCommentId=13414276page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13414276 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6396) Fix NoSuchMethodError running against hadoop 2.0
[ https://issues.apache.org/jira/browse/HBASE-6396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6396: -- Attachment: 6396-v2.txt I used the following commands to verify patch v2: {code} mvn clean compile nohup mvn help:active-profiles -Dhadoop.profile=2.0 test ../suite.txt {code} Fix NoSuchMethodError running against hadoop 2.0 Key: HBASE-6396 URL: https://issues.apache.org/jira/browse/HBASE-6396 Project: HBase Issue Type: Bug Reporter: Zhihong Ted Yu Assignee: Zhihong Ted Yu Fix For: 0.96.0 Attachments: 6396-v2.txt HADOOP-8350 changed the signature of NetUtils.getInputStream() This leads to NoSuchMethodError in HBaseClient$Connection.setupIOstreams(). See https://issues.apache.org/jira/browse/HADOOP-8350?focusedCommentId=13414276page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13414276 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6394) verifyrep MR job map tasks throws NullPointerException
[ https://issues.apache.org/jira/browse/HBASE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13414161#comment-13414161 ] Zhihong Ted Yu commented on HBASE-6394: --- {code} +replicatedScanner.close(); {code} I was expecting 'replicatedScanner = null' following the above call. verifyrep MR job map tasks throws NullPointerException --- Key: HBASE-6394 URL: https://issues.apache.org/jira/browse/HBASE-6394 Project: HBase Issue Type: Bug Components: replication Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: 6394-trunk.patch {noformat} 2012-07-02 16:23:34,871 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1 2012-07-02 16:23:34,876 WARN org.apache.hadoop.mapred.Child: Error running child java.lang.NullPointerException at org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier.cleanup(VerifyReplication.java:140) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) at org.apache.hadoop.mapred.Child.main(Child.java:264) 2012-07-02 16:23:34,882 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6395) TestFSSchedulerApp should be in scheduler.fair package
Zhihong Ted Yu created HBASE-6395: - Summary: TestFSSchedulerApp should be in scheduler.fair package Key: HBASE-6395 URL: https://issues.apache.org/jira/browse/HBASE-6395 Project: HBase Issue Type: Bug Reporter: Zhihong Ted Yu MAPREDUCE-3451 added Fair Scheduler to MRv2 TestFSSchedulerApp was added under src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair but its package was declared to be org.apache.hadoop.yarn.server.resourcemanager.scheduler -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-6395) TestFSSchedulerApp should be in scheduler.fair package
[ https://issues.apache.org/jira/browse/HBASE-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu resolved HBASE-6395. --- Resolution: Won't Fix This should have been a MAPREDUCE JIRA. TestFSSchedulerApp should be in scheduler.fair package -- Key: HBASE-6395 URL: https://issues.apache.org/jira/browse/HBASE-6395 Project: HBase Issue Type: Bug Reporter: Zhihong Ted Yu MAPREDUCE-3451 added Fair Scheduler to MRv2 TestFSSchedulerApp was added under src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair but its package was declared to be org.apache.hadoop.yarn.server.resourcemanager.scheduler -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6394) verifyrep MR job map tasks throws NullPointerException
[ https://issues.apache.org/jira/browse/HBASE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13414189#comment-13414189 ] Zhihong Ted Yu commented on HBASE-6394: --- +1 on patch v2. verifyrep MR job map tasks throws NullPointerException --- Key: HBASE-6394 URL: https://issues.apache.org/jira/browse/HBASE-6394 Project: HBase Issue Type: Bug Components: replication Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: 6394-trunk.patch, 6394-trunk_v2.patch {noformat} 2012-07-02 16:23:34,871 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1 2012-07-02 16:23:34,876 WARN org.apache.hadoop.mapred.Child: Error running child java.lang.NullPointerException at org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier.cleanup(VerifyReplication.java:140) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) at org.apache.hadoop.mapred.Child.main(Child.java:264) 2012-07-02 16:23:34,882 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6380) bulkload should update the store.storeSize
[ https://issues.apache.org/jira/browse/HBASE-6380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6380: -- Attachment: 6380-trunk.txt Patch rebased for trunk. bulkload should update the store.storeSize -- Key: HBASE-6380 URL: https://issues.apache.org/jira/browse/HBASE-6380 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0, 0.96.0 Reporter: Jie Huang Priority: Critical Attachments: 6380-trunk.txt, hbase-6380_0_94_0.patch After bulkloading some HFiles into the Table, we found the force-split didn't work because of the MidKey == NULL. Only if we re-booted the HBase service, the force-split can work normally. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6380) bulkload should update the store.storeSize
[ https://issues.apache.org/jira/browse/HBASE-6380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6380: -- Attachment: (was: 6380-trunk.txt) bulkload should update the store.storeSize -- Key: HBASE-6380 URL: https://issues.apache.org/jira/browse/HBASE-6380 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0, 0.96.0 Reporter: Jie Huang Priority: Critical Attachments: 6380-trunk.txt, hbase-6380_0_94_0.patch After bulkloading some HFiles into the Table, we found the force-split didn't work because of the MidKey == NULL. Only if we re-booted the HBase service, the force-split can work normally. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6380) bulkload should update the store.storeSize
[ https://issues.apache.org/jira/browse/HBASE-6380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6380: -- Attachment: 6380-trunk.txt bulkload should update the store.storeSize -- Key: HBASE-6380 URL: https://issues.apache.org/jira/browse/HBASE-6380 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0, 0.96.0 Reporter: Jie Huang Priority: Critical Attachments: 6380-trunk.txt, hbase-6380_0_94_0.patch After bulkloading some HFiles into the Table, we found the force-split didn't work because of the MidKey == NULL. Only if we re-booted the HBase service, the force-split can work normally. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6380) bulkload should update the store.storeSize
[ https://issues.apache.org/jira/browse/HBASE-6380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6380: -- Status: Patch Available (was: Open) bulkload should update the store.storeSize -- Key: HBASE-6380 URL: https://issues.apache.org/jira/browse/HBASE-6380 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0, 0.96.0 Reporter: Jie Huang Priority: Critical Attachments: 6380-trunk.txt, hbase-6380_0_94_0.patch After bulkloading some HFiles into the Table, we found the force-split didn't work because of the MidKey == NULL. Only if we re-booted the HBase service, the force-split can work normally. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6382) Upgrade Jersey to 1.8 to match Hadoop 2
[ https://issues.apache.org/jira/browse/HBASE-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413023#comment-13413023 ] Zhihong Ted Yu commented on HBASE-6382: --- Looks like hadoop 1.0 is using 1.8 as well: {code} ivy/libraries.properties:jersey-core.version=1.8 ivy/libraries.properties:jersey-json.version=1.8 ivy/libraries.properties:jersey-server.version=1.8 {code} Suggest changing the title of this JIRA. Upgrade Jersey to 1.8 to match Hadoop 2 --- Key: HBASE-6382 URL: https://issues.apache.org/jira/browse/HBASE-6382 Project: HBase Issue Type: Improvement Components: rest Affects Versions: 0.90.6, 0.92.1, 0.94.0, 0.96.0 Reporter: David S. Wang Assignee: David S. Wang Upgrade Jersey dependency from 1.4 to 1.8 to match Hadoop 2 dependency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6383) Investigate using 2Q for block cache
[ https://issues.apache.org/jira/browse/HBASE-6383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413047#comment-13413047 ] Zhihong Ted Yu commented on HBASE-6383: --- Found this: http://code.google.com/p/custard-cache/source/browse/trunk/custard-cache-policies/src/main/java/com/custardsource/cache/policy/twoq/TwoQCacheManager.java?r=38 custard-cache is Apache License 2.0 Investigate using 2Q for block cache Key: HBASE-6383 URL: https://issues.apache.org/jira/browse/HBASE-6383 Project: HBase Issue Type: New Feature Components: performance, regionserver Affects Versions: 0.96.0 Reporter: Jesse Yates Priority: Minor Currently we use a basic version of LRU to handle block caching. LRU is know to be very susceptible to scan thrashing (not scan resistant), which is a common operation in HBase. 2Q is an efficient caching algorithm that emulates the effectivness of LRU/2 (eviction based not on the last access, but rather the access before the last), but is O(1), rather than O(lg\(n)) in complexity. JD has long been talking about investigating 2Q as it may be far better for HBase than LRU and has been shown to be incredibly useful for traditional database caching on production systems. One would need to implement 2Q (though the pseudocode in the paper is quite explicit) and then test against the existing cache implementation. The link to the original paper is here: www.vldb.org/conf/1994/P439.PDF A short overview of 2Q: 2Q uses two queues (hence the name) and a list of pointers to keep track of cached blocks. The first queue is for new, hot items (Ain). If an item is accessed that isn't in Ain, the coldest block is evicted from Ain and the new item replaces it. Anything accessed in Ain is already stored in memory and kept in Ain. When a block is evicted from Ain, it is moved to Aout _as a pointer_. If Aout is full, the oldest element is evicted and replaced with the new pointer. The key to 2Q comes in that when you access something in Aout, it is reloaded into memory and stored in queue B. If B becomes full, then the coldest block is evicted. This essentially makes Aout a filter for long-term hot items, based on the size of Aout. The original authors found that while you can tune Aout, it generally performs very well at at 50% of the number of pages as would fit into the buffer, but can be tuned as low as 5% at only a slight cost to responsiveness to changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5547) Don't delete HFiles when in backup mode
[ https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-5547: -- Attachment: 5547-v12.txt Don't delete HFiles when in backup mode - Key: HBASE-5547 URL: https://issues.apache.org/jira/browse/HBASE-5547 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Assignee: Jesse Yates Fix For: 0.94.2 Attachments: 5547-v12.txt, hbase-5447-v8.patch, hbase-5447-v8.patch, hbase-5547-v9.patch, java_HBASE-5547_v4.patch, java_HBASE-5547_v5.patch, java_HBASE-5547_v6.patch, java_HBASE-5547_v7.patch This came up in a discussion I had with Stack. It would be nice if HBase could be notified that a backup is in progress (via a znode for example) and in that case either: 1. rename HFiles to be delete to file.bck 2. rename the HFiles into a special directory 3. rename them to a general trash directory (which would not need to be tied to backup mode). That way it should be able to get a consistent backup based on HFiles (HDFS snapshots or hard links would be better options here, but we do not have those). #1 makes cleanup a bit harder. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4050) Update HBase metrics framework to metrics2 framework
[ https://issues.apache.org/jira/browse/HBASE-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413197#comment-13413197 ] Zhihong Ted Yu commented on HBASE-4050: --- Putting patch on review board would help. https://reviews.apache.org/r/new/ gave me Error 500 ... Update HBase metrics framework to metrics2 framework Key: HBASE-4050 URL: https://issues.apache.org/jira/browse/HBASE-4050 Project: HBase Issue Type: New Feature Components: metrics Affects Versions: 0.90.4 Environment: Java 6 Reporter: Eric Yang Assignee: Alex Baranau Priority: Critical Fix For: 0.96.0 Attachments: 4050-metrics-v2.patch, 4050-metrics-v3.patch, HBASE-4050-0.patch, HBASE-4050-1.patch, HBASE-4050-2.patch, HBASE-4050-3.patch, HBASE-4050.patch Metrics Framework has been marked deprecated in Hadoop 0.20.203+ and 0.22+, and it might get removed in future Hadoop release. Hence, HBase needs to revise the dependency of MetricsContext to use Metrics2 framework. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4050) Update HBase metrics framework to metrics2 framework
[ https://issues.apache.org/jira/browse/HBASE-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413319#comment-13413319 ] Zhihong Ted Yu commented on HBASE-4050: --- Latest patch didn't compile against hadoop 2.0 See https://builds.apache.org/job/PreCommit-HBASE-Build/2372/console Update HBase metrics framework to metrics2 framework Key: HBASE-4050 URL: https://issues.apache.org/jira/browse/HBASE-4050 Project: HBase Issue Type: New Feature Components: metrics Affects Versions: 0.90.4 Environment: Java 6 Reporter: Eric Yang Assignee: Alex Baranau Priority: Critical Fix For: 0.96.0 Attachments: 4050-metrics-v2.patch, 4050-metrics-v3.patch, HBASE-4050-0.patch, HBASE-4050-1.patch, HBASE-4050-2.patch, HBASE-4050-3.patch, HBASE-4050.patch Metrics Framework has been marked deprecated in Hadoop 0.20.203+ and 0.22+, and it might get removed in future Hadoop release. Hence, HBase needs to revise the dependency of MetricsContext to use Metrics2 framework. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4050) Update HBase metrics framework to metrics2 framework
[ https://issues.apache.org/jira/browse/HBASE-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413356#comment-13413356 ] Zhihong Ted Yu commented on HBASE-4050: --- {code} +public interface BaseMetricsSource { {code} I suggest adding the following for above interface: {code} @InterfaceAudience.Public @InterfaceStability.Evolving {code} {code} + * Subtract some amount to a gauge. {code} 'to a' - 'from a' {code} +rms = ServiceLoader.load(ReplicationMetricsSource.class).iterator().next(); {code} Shall we traverse the iterator and warn user if there are more than one implementation found ? Will continue on the review board. Update HBase metrics framework to metrics2 framework Key: HBASE-4050 URL: https://issues.apache.org/jira/browse/HBASE-4050 Project: HBase Issue Type: New Feature Components: metrics Affects Versions: 0.90.4 Environment: Java 6 Reporter: Eric Yang Assignee: Alex Baranau Priority: Critical Fix For: 0.96.0 Attachments: 4050-metrics-v2.patch, 4050-metrics-v3.patch, HBASE-4050-0.patch, HBASE-4050-1.patch, HBASE-4050-2.patch, HBASE-4050-3.patch, HBASE-4050-5.patch, HBASE-4050.patch Metrics Framework has been marked deprecated in Hadoop 0.20.203+ and 0.22+, and it might get removed in future Hadoop release. Hence, HBase needs to revise the dependency of MetricsContext to use Metrics2 framework. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6389: -- Hadoop Flags: Reviewed Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments Key: HBASE-6389 URL: https://issues.apache.org/jira/browse/HBASE-6389 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.0, 0.96.0 Reporter: Aditya Kishore Assignee: Aditya Kishore Priority: Critical Fix For: 0.96.0, 0.94.1 Attachments: HBASE-6389_trunk.patch Continuing from HBASE-6375. It seems I was mistaken in my assumption that changing the value of hbase.master.wait.on.regionservers.mintostart to a sufficient number (from default of 1) can help prevent assignment of all regions to one (or a small number of) region server(s). While this was the case in 0.90.x and 0.92.x, the behavior has changed in 0.94.0 onwards to address HBASE-4993. From 0.94.0 onwards, Master will proceed immediately after the timeout has lapsed, even if hbase.master.wait.on.regionservers.mintostart has not reached. Reading the current conditions of waitForRegionServers() clarifies it {code:title=ServerManager.java (trunk rev:1360470)} 581 /** 582 * Wait for the region servers to report in. 583 * We will wait until one of this condition is met: 584 * - the master is stopped 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of 587 *region servers is reached 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND 589 * there have been no new region server in for 590 * 'hbase.master.wait.on.regionservers.interval' time 591 * 592 * @throws InterruptedException 593 */ 594 public void waitForRegionServers(MonitoredTask status) 595 throws InterruptedException { 612 while ( 613 !this.master.isStopped() 614 slept timeout 615 count maxToStart 616 (lastCountChange+interval now || count minToStart) 617 ){ {code} So with the current conditions, the wait will end as soon as timeout is reached even lesser number of RS have checked-in with the Master and the master will proceed with the region assignment among these RSes alone. As mentioned in -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, and I concur, this could have disastrous effect in large cluster especially now that MSLAB is turned on. To enforce the required quorum as specified by hbase.master.wait.on.regionservers.mintostart irrespective of timeout, these conditions need to be modified as following {code:title=ServerManager.java} .. /** * Wait for the region servers to report in. * We will wait until one of this condition is met: * - the master is stopped * - the 'hbase.master.wait.on.regionservers.maxtostart' number of *region servers is reached * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND * there have been no new region server in for * 'hbase.master.wait.on.regionservers.interval' time AND * the 'hbase.master.wait.on.regionservers.timeout' is reached * * @throws InterruptedException */ public void waitForRegionServers(MonitoredTask status) .. .. int minToStart = this.master.getConfiguration(). getInt(hbase.master.wait.on.regionservers.mintostart, 1); int maxToStart = this.master.getConfiguration(). getInt(hbase.master.wait.on.regionservers.maxtostart, Integer.MAX_VALUE); if (maxToStart minToStart) { maxToStart = minToStart; } .. .. while ( !this.master.isStopped() count maxToStart (lastCountChange+interval now || timeout slept || count minToStart) ){ .. {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413441#comment-13413441 ] Zhihong Ted Yu commented on HBASE-6389: --- Several variables are no longer final but I only see this extra assignment: {code} +maxToStart = minToStart; {code} It would be nice to keep other variables final. Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments Key: HBASE-6389 URL: https://issues.apache.org/jira/browse/HBASE-6389 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.0, 0.96.0 Reporter: Aditya Kishore Assignee: Aditya Kishore Priority: Critical Fix For: 0.96.0, 0.94.1 Attachments: HBASE-6389_trunk.patch Continuing from HBASE-6375. It seems I was mistaken in my assumption that changing the value of hbase.master.wait.on.regionservers.mintostart to a sufficient number (from default of 1) can help prevent assignment of all regions to one (or a small number of) region server(s). While this was the case in 0.90.x and 0.92.x, the behavior has changed in 0.94.0 onwards to address HBASE-4993. From 0.94.0 onwards, Master will proceed immediately after the timeout has lapsed, even if hbase.master.wait.on.regionservers.mintostart has not reached. Reading the current conditions of waitForRegionServers() clarifies it {code:title=ServerManager.java (trunk rev:1360470)} 581 /** 582 * Wait for the region servers to report in. 583 * We will wait until one of this condition is met: 584 * - the master is stopped 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of 587 *region servers is reached 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND 589 * there have been no new region server in for 590 * 'hbase.master.wait.on.regionservers.interval' time 591 * 592 * @throws InterruptedException 593 */ 594 public void waitForRegionServers(MonitoredTask status) 595 throws InterruptedException { 612 while ( 613 !this.master.isStopped() 614 slept timeout 615 count maxToStart 616 (lastCountChange+interval now || count minToStart) 617 ){ {code} So with the current conditions, the wait will end as soon as timeout is reached even lesser number of RS have checked-in with the Master and the master will proceed with the region assignment among these RSes alone. As mentioned in -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, and I concur, this could have disastrous effect in large cluster especially now that MSLAB is turned on. To enforce the required quorum as specified by hbase.master.wait.on.regionservers.mintostart irrespective of timeout, these conditions need to be modified as following {code:title=ServerManager.java} .. /** * Wait for the region servers to report in. * We will wait until one of this condition is met: * - the master is stopped * - the 'hbase.master.wait.on.regionservers.maxtostart' number of *region servers is reached * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND * there have been no new region server in for * 'hbase.master.wait.on.regionservers.interval' time AND * the 'hbase.master.wait.on.regionservers.timeout' is reached * * @throws InterruptedException */ public void waitForRegionServers(MonitoredTask status) .. .. int minToStart = this.master.getConfiguration(). getInt(hbase.master.wait.on.regionservers.mintostart, 1); int maxToStart = this.master.getConfiguration(). getInt(hbase.master.wait.on.regionservers.maxtostart, Integer.MAX_VALUE); if (maxToStart minToStart) { maxToStart = minToStart; } .. .. while ( !this.master.isStopped() count maxToStart (lastCountChange+interval now || timeout slept || count minToStart) ){ .. {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6369) HTable is not closed in AggregationClient
[ https://issues.apache.org/jira/browse/HBASE-6369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13411271#comment-13411271 ] Zhihong Ted Yu commented on HBASE-6369: --- Hadoop QA actually passed: {code} [INFO] HBase . SUCCESS [1.927s] [INFO] HBase - Common SUCCESS [4.046s] [INFO] HBase - Server SUCCESS [38:32.534s] [INFO] HBase - Integration Tests . SUCCESS [1.405s] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 38:40.251s [INFO] Finished at: Wed Jul 11 05:42:45 UTC 2012 {code} Will integrate tomorrow if there is no objection. HTable is not closed in AggregationClient - Key: HBASE-6369 URL: https://issues.apache.org/jira/browse/HBASE-6369 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: binlijin Assignee: binlijin Fix For: 0.92.2, 0.96.0, 0.94.2 Attachments: HBASE-6369-0.92-2.patch, HBASE-6369-0.92.patch, HBASE-6369-0.94-2.patch, HBASE-6369-0.94.patch, HBASE-6369-trunk-2.patch, HBASE-6369-trunk.patch In AggregationClient, HTable instance is not closed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6284) Introduce HRegion#doMiniBatchMutation()
[ https://issues.apache.org/jira/browse/HBASE-6284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6284: -- Fix Version/s: (was: 0.94.2) 0.94.1 Integrated to 0.94 branch as well. Thanks for the review, Lars. Introduce HRegion#doMiniBatchMutation() --- Key: HBASE-6284 URL: https://issues.apache.org/jira/browse/HBASE-6284 Project: HBase Issue Type: Bug Components: performance, regionserver Reporter: Zhihong Ted Yu Assignee: Anoop Sam John Fix For: 0.96.0, 0.94.1 Attachments: 6284_Trunk-Addendum.patch, 6284_Trunk-V3.patch, HBASE-6284_94.patch, HBASE-6284_Trunk-V2.patch, HBASE-6284_Trunk-V3.patch, HBASE-6284_Trunk.patch From Anoop under thread 'Can there be a doMiniBatchDelete in HRegion': The HTable#delete(ListDelete) groups the Deletes for the same RS and make one n/w call only. But within the RS, there will be N number of delete calls on the region one by one. This will include N number of HLog write and sync. If this also can be grouped can we get better performance for the multi row delete. I have made the new miniBatchDelete () and made the HTable#delete(ListDelete) to call this new batch delete. Just tested initially with the one node cluster. In that itself I am getting a performance boost which is very much promising. Only one CF and qualifier. 10K total rows delete with a batch of 100 deletes. Only deletes happening on the table from one thread. With the new way the net time taken is reduced by more than 1/10 Will test in a 4 node cluster also. I think it will worth doing this change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6369) HTable is not closed in AggregationClient
[ https://issues.apache.org/jira/browse/HBASE-6369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13411494#comment-13411494 ] Zhihong Ted Yu commented on HBASE-6369: --- Integrated to trunk. Thanks for the patch, binlijin. Thanks for the review, Stack. HTable is not closed in AggregationClient - Key: HBASE-6369 URL: https://issues.apache.org/jira/browse/HBASE-6369 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: binlijin Assignee: binlijin Fix For: 0.92.2, 0.96.0, 0.94.2 Attachments: HBASE-6369-0.92-2.patch, HBASE-6369-0.92.patch, HBASE-6369-0.94-2.patch, HBASE-6369-0.94.patch, HBASE-6369-trunk-2.patch, HBASE-6369-trunk.patch In AggregationClient, HTable instance is not closed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5151) Rename hbase.skip.errors in HRegion as it is too general-sounding.
[ https://issues.apache.org/jira/browse/HBASE-5151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13411570#comment-13411570 ] Zhihong Ted Yu commented on HBASE-5151: --- In trunk build 3118: {code} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.0.2:compile (default-compile) on project hbase-server: Compilation failure: Compilation failure: [ERROR] /home/jenkins/jenkins-slave/workspace/HBase-TRUNK/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:[2854,63] ')' expected [ERROR] [ERROR] /home/jenkins/jenkins-slave/workspace/HBase-TRUNK/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:[2855,19] not a statement [ERROR] [ERROR] /home/jenkins/jenkins-slave/workspace/HBase-TRUNK/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:[2855,22] ';' expected [ERROR] [ERROR] /home/jenkins/jenkins-slave/workspace/HBase-TRUNK/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:[2855,24] not a statement [ERROR] [ERROR] /home/jenkins/jenkins-slave/workspace/HBase-TRUNK/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:[2855,25] ';' expected {code} Rename hbase.skip.errors in HRegion as it is too general-sounding. Key: HBASE-5151 URL: https://issues.apache.org/jira/browse/HBASE-5151 Project: HBase Issue Type: Sub-task Components: documentation Affects Versions: 0.94.0 Reporter: Harsh J Assignee: Harsh J Fix For: 0.96.0 Attachments: HBASE-5151.patch, HBASE-5151.patch We should rename hbase.skip.errors, used in HRegion.java for skipping errors when replaying edits. It should probably be something more like hbase.hregion.edits.replay.skip.errors or so. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5151) Rename hbase.skip.errors in HRegion as it is too general-sounding.
[ https://issues.apache.org/jira/browse/HBASE-5151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13411584#comment-13411584 ] Zhihong Ted Yu commented on HBASE-5151: --- @Harsh: In the future, please version patches with a number. Now we have three attachments with the same name. Rename hbase.skip.errors in HRegion as it is too general-sounding. Key: HBASE-5151 URL: https://issues.apache.org/jira/browse/HBASE-5151 Project: HBase Issue Type: Sub-task Components: documentation Affects Versions: 0.94.0 Reporter: Harsh J Assignee: Harsh J Fix For: 0.96.0 Attachments: HBASE-5151.amend.patch, HBASE-5151.patch, HBASE-5151.patch, HBASE-5151.patch We should rename hbase.skip.errors, used in HRegion.java for skipping errors when replaying edits. It should probably be something more like hbase.hregion.edits.replay.skip.errors or so. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira