[jira] [Updated] (HBASE-7757) Add web UI to REST server and Thrift server
[ https://issues.apache.org/jira/browse/HBASE-7757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-7757: --- Attachment: thrift-0.94.png Add web UI to REST server and Thrift server --- Key: HBASE-7757 URL: https://issues.apache.org/jira/browse/HBASE-7757 Project: HBase Issue Type: Improvement Components: REST, Thrift, UI Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: 0.94_7757_v1.patch, rest-0.94.png, rest-0.96.png, thrift-0.94.png, thrift-0.96.png Add Hadoop HttpServer (web UI) to REST server and Thrift server. The Hadoop HttpServer supports metrics/jmx/conf/logLevel/stacks, which is useful to monitor REST/Thrift server. For REST server, use a separate listener/context to avoid path mapping conflicts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7757) Add web UI to REST server and Thrift server
[ https://issues.apache.org/jira/browse/HBASE-7757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-7757: --- Attachment: trunk-7757_v1.patch Add web UI to REST server and Thrift server --- Key: HBASE-7757 URL: https://issues.apache.org/jira/browse/HBASE-7757 Project: HBase Issue Type: Improvement Components: REST, Thrift, UI Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: 0.94_7757_v1.patch, rest-0.94.png, rest-0.96.png, thrift-0.94.png, thrift-0.96.png, trunk-7757_v1.patch Add Hadoop HttpServer (web UI) to REST server and Thrift server. The Hadoop HttpServer supports metrics/jmx/conf/logLevel/stacks, which is useful to monitor REST/Thrift server. For REST server, use a separate listener/context to avoid path mapping conflicts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7757) Add web UI to REST server and Thrift server
[ https://issues.apache.org/jira/browse/HBASE-7757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-7757: --- Status: Patch Available (was: Open) Add web UI to REST server and Thrift server --- Key: HBASE-7757 URL: https://issues.apache.org/jira/browse/HBASE-7757 Project: HBase Issue Type: Improvement Components: REST, Thrift, UI Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: 0.94_7757_v1.patch, rest-0.94.png, rest-0.96.png, thrift-0.94.png, thrift-0.96.png, trunk-7757_v1.patch Add Hadoop HttpServer (web UI) to REST server and Thrift server. The Hadoop HttpServer supports metrics/jmx/conf/logLevel/stacks, which is useful to monitor REST/Thrift server. For REST server, use a separate listener/context to avoid path mapping conflicts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7776) Use ErrorReporter instead of Log/System.out in hbck
Jimmy Xiang created HBASE-7776: -- Summary: Use ErrorReporter instead of Log/System.out in hbck Key: HBASE-7776 URL: https://issues.apache.org/jira/browse/HBASE-7776 Project: HBase Issue Type: Bug Components: hbck Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Hbck has a pluggable ErrorReporter. However, there are still lots places to log messages with LOG/System.out. We should use ErrorReporter instead, so that it can catch all information from hbck. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7776) Use ErrorReporter/Log instead of System.out in hbck
[ https://issues.apache.org/jira/browse/HBASE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-7776: --- Description: There are lots places to log messages with System.out. We should use ErrorReporter or Log instead, which can be configured to catch what we want. (was: Hbck has a pluggable ErrorReporter. However, there are still lots places to log messages with LOG/System.out. We should use ErrorReporter instead, so that it can catch all information from hbck.) Summary: Use ErrorReporter/Log instead of System.out in hbck (was: Use ErrorReporter instead of Log/System.out in hbck) Use ErrorReporter/Log instead of System.out in hbck --- Key: HBASE-7776 URL: https://issues.apache.org/jira/browse/HBASE-7776 Project: HBase Issue Type: Bug Components: hbck Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor There are lots places to log messages with System.out. We should use ErrorReporter or Log instead, which can be configured to catch what we want. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7776) Use ErrorReporter/Log instead of System.out in hbck
[ https://issues.apache.org/jira/browse/HBASE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572533#comment-13572533 ] Jimmy Xiang commented on HBASE-7776: I noticed the limitation of ErrorReporter. Probably it is intended to work this way. The ErrorReporter is pluggable. If someone plugs in his/her own reporter but it gets only limited information, it is probably not good. For the usage of system.out and system.err in hbck, we can either convert it to LOG, or somehow enhanced ErrorReporter. Use ErrorReporter/Log instead of System.out in hbck --- Key: HBASE-7776 URL: https://issues.apache.org/jira/browse/HBASE-7776 Project: HBase Issue Type: Bug Components: hbck Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor There are lots places to log messages with System.out. We should use ErrorReporter or Log instead, which can be configured to catch what we want. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7757) Add web UI to REST server and Thrift server
[ https://issues.apache.org/jira/browse/HBASE-7757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-7757: --- Resolution: Fixed Fix Version/s: 0.94.5 0.96.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Integrated into trunk and 0.94 (with minor fix). Thanks all for reviewing it. Add web UI to REST server and Thrift server --- Key: HBASE-7757 URL: https://issues.apache.org/jira/browse/HBASE-7757 Project: HBase Issue Type: Improvement Components: REST, Thrift, UI Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.96.0, 0.94.5 Attachments: 0.94_7757_v1.patch, rest-0.94.png, rest-0.96.png, thrift-0.94.png, thrift-0.96.png, trunk-7757_v1.patch Add Hadoop HttpServer (web UI) to REST server and Thrift server. The Hadoop HttpServer supports metrics/jmx/conf/logLevel/stacks, which is useful to monitor REST/Thrift server. For REST server, use a separate listener/context to avoid path mapping conflicts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7776) Use ErrorReporter/Log instead of System.out in hbck
[ https://issues.apache.org/jira/browse/HBASE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572597#comment-13572597 ] Jimmy Xiang commented on HBASE-7776: I'd like to encapsulate all of these (system.out/system.err) in ErrorReporter so that we don't directly using them. Use ErrorReporter/Log instead of System.out in hbck --- Key: HBASE-7776 URL: https://issues.apache.org/jira/browse/HBASE-7776 Project: HBase Issue Type: Bug Components: hbck Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor There are lots places to log messages with System.out. We should use ErrorReporter or Log instead, which can be configured to catch what we want. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6782) HBase shell's 'status 'detailed'' should escape the printed keys
[ https://issues.apache.org/jira/browse/HBASE-6782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6782: --- Assignee: Viji HBase shell's 'status 'detailed'' should escape the printed keys Key: HBASE-6782 URL: https://issues.apache.org/jira/browse/HBASE-6782 Project: HBase Issue Type: Bug Components: shell Affects Versions: 0.90.1 Reporter: Viji Assignee: Viji Priority: Minor Attachments: HBASE-6782.patch Currently the HBase shell's status command prints unescaped keys on the terminal causing the terminal to print garbage characters. We should escape the printed keys. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6782) HBase shell's 'status 'detailed'' should escape the printed keys
[ https://issues.apache.org/jira/browse/HBASE-6782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572714#comment-13572714 ] Jimmy Xiang commented on HBASE-6782: +1 HBase shell's 'status 'detailed'' should escape the printed keys Key: HBASE-6782 URL: https://issues.apache.org/jira/browse/HBASE-6782 Project: HBase Issue Type: Bug Components: shell Affects Versions: 0.90.1 Reporter: Viji Assignee: Viji Priority: Minor Attachments: HBASE-6782.patch Currently the HBase shell's status command prints unescaped keys on the terminal causing the terminal to print garbage characters. We should escape the printed keys. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7698) race between RS shutdown thread and openregionhandler causes region to get stuck
[ https://issues.apache.org/jira/browse/HBASE-7698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572736#comment-13572736 ] Jimmy Xiang commented on HBASE-7698: The code doesn't look efficient to me (an existing issue of course). Not a big deal. I was wondering if we can move calling tryTransitionFromOpeningToFailedOpen(regionInfo) to the final block, so that we don't need the local variable transitionToFailedOpen, and we can cover all scenarios as long as openSuccessful is not true. @Ram, what do you think? race between RS shutdown thread and openregionhandler causes region to get stuck Key: HBASE-7698 URL: https://issues.apache.org/jira/browse/HBASE-7698 Project: HBase Issue Type: Bug Affects Versions: 0.94.4 Reporter: Sergey Shelukhin Assignee: ramkrishna.s.vasudevan Fix For: 0.96.0, 0.94.5 Attachments: HBASE-7698_0.94.patch, HBASE-7698.patch, HBASE-7698_trunk_final.patch, HBASE-7698_withtestcase_1.patch, HBASE-7698_withtestcase_1.patch, HBASE-7698_withtestcase.patch 2013-01-22 17:59:03,237 INFO [Shutdown of org.apache.hadoop.hbase.fs.HFileSystem@5984cf08] hbase.MiniHBaseCluster$SingleFileSystemShutdownThread(186): Hook closing fs=org.apache.hadoop.hbase.fs.HFileSystem@5984cf08 ... 2013-01-22 17:59:03,411 DEBUG [RS_OPEN_REGION-10.11.2.92,50661,1358906192942-0] regionserver.HRegion(1001): Closing IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.: disabling compactions amp; flushes 2013-01-22 17:59:03,411 DEBUG [RS_OPEN_REGION-10.11.2.92,50661,1358906192942-0] regionserver.HRegion(1023): Updates disabled for region IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. 2013-01-22 17:59:03,415 ERROR [RS_OPEN_REGION-10.11.2.92,50661,1358906192942-0] executor.EventHandler(205): Caught throwable while processing event M_RS_OPEN_REGION java.io.IOException: java.io.IOException: java.io.IOException: Filesystem closed at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1058) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:974) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:945) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.cleanupFailedOpen(OpenRegionHandler.java:459) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:143) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:202) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) tryTransitionFromOpeningToFailedOpen or transitionToOpened below is never called and region can get stuck. As an added benefit, the meta is already written by that time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7698) race between RS shutdown thread and openregionhandler causes region to get stuck
[ https://issues.apache.org/jira/browse/HBASE-7698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572757#comment-13572757 ] Jimmy Xiang commented on HBASE-7698: Maybe we move cleanupFailedOpen to the final block as well. Both are for clean up after some issues, which sounds to me better to be in the final block so that it's done all the time if not open successfully. We can address this in a separate jira if you prefer. race between RS shutdown thread and openregionhandler causes region to get stuck Key: HBASE-7698 URL: https://issues.apache.org/jira/browse/HBASE-7698 Project: HBase Issue Type: Bug Affects Versions: 0.94.4 Reporter: Sergey Shelukhin Assignee: ramkrishna.s.vasudevan Fix For: 0.96.0, 0.94.5 Attachments: HBASE-7698_0.94.patch, HBASE-7698.patch, HBASE-7698_trunk_final.patch, HBASE-7698_withtestcase_1.patch, HBASE-7698_withtestcase_1.patch, HBASE-7698_withtestcase.patch 2013-01-22 17:59:03,237 INFO [Shutdown of org.apache.hadoop.hbase.fs.HFileSystem@5984cf08] hbase.MiniHBaseCluster$SingleFileSystemShutdownThread(186): Hook closing fs=org.apache.hadoop.hbase.fs.HFileSystem@5984cf08 ... 2013-01-22 17:59:03,411 DEBUG [RS_OPEN_REGION-10.11.2.92,50661,1358906192942-0] regionserver.HRegion(1001): Closing IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.: disabling compactions amp; flushes 2013-01-22 17:59:03,411 DEBUG [RS_OPEN_REGION-10.11.2.92,50661,1358906192942-0] regionserver.HRegion(1023): Updates disabled for region IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. 2013-01-22 17:59:03,415 ERROR [RS_OPEN_REGION-10.11.2.92,50661,1358906192942-0] executor.EventHandler(205): Caught throwable while processing event M_RS_OPEN_REGION java.io.IOException: java.io.IOException: java.io.IOException: Filesystem closed at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1058) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:974) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:945) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.cleanupFailedOpen(OpenRegionHandler.java:459) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:143) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:202) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) tryTransitionFromOpeningToFailedOpen or transitionToOpened below is never called and region can get stuck. As an added benefit, the meta is already written by that time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7698) race between RS shutdown thread and openregionhandler causes region to get stuck
[ https://issues.apache.org/jira/browse/HBASE-7698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572812#comment-13572812 ] Jimmy Xiang commented on HBASE-7698: Yes, that's right. race between RS shutdown thread and openregionhandler causes region to get stuck Key: HBASE-7698 URL: https://issues.apache.org/jira/browse/HBASE-7698 Project: HBase Issue Type: Bug Affects Versions: 0.94.4 Reporter: Sergey Shelukhin Assignee: ramkrishna.s.vasudevan Fix For: 0.96.0, 0.94.5 Attachments: HBASE-7698_0.94.patch, HBASE-7698.patch, HBASE-7698_trunk_final.patch, HBASE-7698_withtestcase_1.patch, HBASE-7698_withtestcase_1.patch, HBASE-7698_withtestcase.patch 2013-01-22 17:59:03,237 INFO [Shutdown of org.apache.hadoop.hbase.fs.HFileSystem@5984cf08] hbase.MiniHBaseCluster$SingleFileSystemShutdownThread(186): Hook closing fs=org.apache.hadoop.hbase.fs.HFileSystem@5984cf08 ... 2013-01-22 17:59:03,411 DEBUG [RS_OPEN_REGION-10.11.2.92,50661,1358906192942-0] regionserver.HRegion(1001): Closing IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.: disabling compactions amp; flushes 2013-01-22 17:59:03,411 DEBUG [RS_OPEN_REGION-10.11.2.92,50661,1358906192942-0] regionserver.HRegion(1023): Updates disabled for region IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. 2013-01-22 17:59:03,415 ERROR [RS_OPEN_REGION-10.11.2.92,50661,1358906192942-0] executor.EventHandler(205): Caught throwable while processing event M_RS_OPEN_REGION java.io.IOException: java.io.IOException: java.io.IOException: Filesystem closed at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1058) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:974) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:945) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.cleanupFailedOpen(OpenRegionHandler.java:459) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:143) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:202) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) tryTransitionFromOpeningToFailedOpen or transitionToOpened below is never called and region can get stuck. As an added benefit, the meta is already written by that time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7678) make storefile management pluggable, together with compaction
[ https://issues.apache.org/jira/browse/HBASE-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572909#comment-13572909 ] Jimmy Xiang commented on HBASE-7678: This is a big change. Do we have a design doc? It will be great if you can outline the purpose of the change, some design choices, the relationship among those entities such as StoreEngine, StoreFileManager, Compactor, CompactionPolicy, HStore, etc., how this change will be used for stripe/level compaction, and so on. Good stuff. make storefile management pluggable, together with compaction - Key: HBASE-7678 URL: https://issues.apache.org/jira/browse/HBASE-7678 Project: HBase Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-7678--and-7603.patch, HBASE-7678-v0.patch, HBASE-7678-v1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7741) Don't use bulk assigner if assigning just several regions
[ https://issues.apache.org/jira/browse/HBASE-7741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572942#comment-13572942 ] Jimmy Xiang commented on HBASE-7741: It is read from configuration and passed to bulk assigner from AM. It won't wait for ever if waitTillAllAssigned is set to true. It will time out in some time if some regions stuck in transition. Since nobody depends on the regions to be assigned, I think it is better to let the bulk assigner run and not to wait for it. This will speed up the starts up a little. If there are several region servers crash at the same time, this can free up the SSH thread a little sooner too. Don't use bulk assigner if assigning just several regions - Key: HBASE-7741 URL: https://issues.apache.org/jira/browse/HBASE-7741 Project: HBase Issue Type: Improvement Components: Region Assignment Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: trunk-7741.patch, trunk-7741_v2.patch, trunk-7741_v3_1.patch, trunk-7741_v3.patch If just assign one region, bulk assigner may be slower. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7776) Use ErrorReporter/Log instead of System.out in hbck
[ https://issues.apache.org/jira/browse/HBASE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-7776: --- Attachment: trunk-7776_v1.patch Use ErrorReporter/Log instead of System.out in hbck --- Key: HBASE-7776 URL: https://issues.apache.org/jira/browse/HBASE-7776 Project: HBase Issue Type: Bug Components: hbck Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: trunk-7776_v1.patch There are lots places to log messages with System.out. We should use ErrorReporter or Log instead, which can be configured to catch what we want. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7776) Use ErrorReporter/Log instead of System.out in hbck
[ https://issues.apache.org/jira/browse/HBASE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-7776: --- Status: Patch Available (was: Open) Use ErrorReporter/Log instead of System.out in hbck --- Key: HBASE-7776 URL: https://issues.apache.org/jira/browse/HBASE-7776 Project: HBase Issue Type: Bug Components: hbck Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: trunk-7776_v1.patch There are lots places to log messages with System.out. We should use ErrorReporter or Log instead, which can be configured to catch what we want. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7698) race between RS shutdown thread and openregionhandler causes region to get stuck
[ https://issues.apache.org/jira/browse/HBASE-7698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573083#comment-13573083 ] Jimmy Xiang commented on HBASE-7698: It is ok with me. race between RS shutdown thread and openregionhandler causes region to get stuck Key: HBASE-7698 URL: https://issues.apache.org/jira/browse/HBASE-7698 Project: HBase Issue Type: Bug Affects Versions: 0.94.4 Reporter: Sergey Shelukhin Assignee: ramkrishna.s.vasudevan Fix For: 0.96.0, 0.94.5 Attachments: HBASE-7698_0.94.patch, HBASE-7698.patch, HBASE-7698_trunk_final.patch, HBASE-7698_withtestcase_1.patch, HBASE-7698_withtestcase_1.patch, HBASE-7698_withtestcase.patch 2013-01-22 17:59:03,237 INFO [Shutdown of org.apache.hadoop.hbase.fs.HFileSystem@5984cf08] hbase.MiniHBaseCluster$SingleFileSystemShutdownThread(186): Hook closing fs=org.apache.hadoop.hbase.fs.HFileSystem@5984cf08 ... 2013-01-22 17:59:03,411 DEBUG [RS_OPEN_REGION-10.11.2.92,50661,1358906192942-0] regionserver.HRegion(1001): Closing IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.: disabling compactions amp; flushes 2013-01-22 17:59:03,411 DEBUG [RS_OPEN_REGION-10.11.2.92,50661,1358906192942-0] regionserver.HRegion(1023): Updates disabled for region IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. 2013-01-22 17:59:03,415 ERROR [RS_OPEN_REGION-10.11.2.92,50661,1358906192942-0] executor.EventHandler(205): Caught throwable while processing event M_RS_OPEN_REGION java.io.IOException: java.io.IOException: java.io.IOException: Filesystem closed at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1058) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:974) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:945) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.cleanupFailedOpen(OpenRegionHandler.java:459) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:143) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:202) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) tryTransitionFromOpeningToFailedOpen or transitionToOpened below is never called and region can get stuck. As an added benefit, the meta is already written by that time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7698) race between RS shutdown thread and openregionhandler causes region to get stuck
[ https://issues.apache.org/jira/browse/HBASE-7698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573683#comment-13573683 ] Jimmy Xiang commented on HBASE-7698: @Ram, sure, +1 for commit. race between RS shutdown thread and openregionhandler causes region to get stuck Key: HBASE-7698 URL: https://issues.apache.org/jira/browse/HBASE-7698 Project: HBase Issue Type: Bug Affects Versions: 0.94.4 Reporter: Sergey Shelukhin Assignee: ramkrishna.s.vasudevan Fix For: 0.96.0, 0.94.5 Attachments: HBASE-7698_0.94.patch, HBASE-7698.patch, HBASE-7698_trunk_final.patch, HBASE-7698_withtestcase_1.patch, HBASE-7698_withtestcase_1.patch, HBASE-7698_withtestcase.patch 2013-01-22 17:59:03,237 INFO [Shutdown of org.apache.hadoop.hbase.fs.HFileSystem@5984cf08] hbase.MiniHBaseCluster$SingleFileSystemShutdownThread(186): Hook closing fs=org.apache.hadoop.hbase.fs.HFileSystem@5984cf08 ... 2013-01-22 17:59:03,411 DEBUG [RS_OPEN_REGION-10.11.2.92,50661,1358906192942-0] regionserver.HRegion(1001): Closing IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.: disabling compactions amp; flushes 2013-01-22 17:59:03,411 DEBUG [RS_OPEN_REGION-10.11.2.92,50661,1358906192942-0] regionserver.HRegion(1023): Updates disabled for region IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. 2013-01-22 17:59:03,415 ERROR [RS_OPEN_REGION-10.11.2.92,50661,1358906192942-0] executor.EventHandler(205): Caught throwable while processing event M_RS_OPEN_REGION java.io.IOException: java.io.IOException: java.io.IOException: Filesystem closed at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1058) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:974) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:945) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.cleanupFailedOpen(OpenRegionHandler.java:459) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:143) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:202) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) tryTransitionFromOpeningToFailedOpen or transitionToOpened below is never called and region can get stuck. As an added benefit, the meta is already written by that time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7741) Don't use bulk assigner if assigning just several regions
[ https://issues.apache.org/jira/browse/HBASE-7741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-7741: --- Resolution: Fixed Fix Version/s: 0.96.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Integrated into trunk. Thanks Stack and Ram for the review. Don't use bulk assigner if assigning just several regions - Key: HBASE-7741 URL: https://issues.apache.org/jira/browse/HBASE-7741 Project: HBase Issue Type: Improvement Components: Region Assignment Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 Attachments: trunk-7741.patch, trunk-7741_v2.patch, trunk-7741_v3_1.patch, trunk-7741_v3.patch If just assign one region, bulk assigner may be slower. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6782) HBase shell's 'status 'detailed'' should escape the printed keys
[ https://issues.apache.org/jira/browse/HBASE-6782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6782: --- Resolution: Fixed Fix Version/s: 0.96.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Integrated into trunk. Thanks Viji for the patch. HBase shell's 'status 'detailed'' should escape the printed keys Key: HBASE-6782 URL: https://issues.apache.org/jira/browse/HBASE-6782 Project: HBase Issue Type: Bug Components: shell Affects Versions: 0.90.1 Reporter: Viji Assignee: Viji Priority: Minor Fix For: 0.96.0 Attachments: HBASE-6782.patch Currently the HBase shell's status command prints unescaped keys on the terminal causing the terminal to print garbage characters. We should escape the printed keys. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7776) Use ErrorReporter/Log instead of System.out in hbck
[ https://issues.apache.org/jira/browse/HBASE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-7776: --- Status: Open (was: Patch Available) Use ErrorReporter/Log instead of System.out in hbck --- Key: HBASE-7776 URL: https://issues.apache.org/jira/browse/HBASE-7776 Project: HBase Issue Type: Bug Components: hbck Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: 0.94-7776_v1.1.patch, trunk-7776_v1.patch There are lots places to log messages with System.out. We should use ErrorReporter or Log instead, which can be configured to catch what we want. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7776) Use ErrorReporter/Log instead of System.out in hbck
[ https://issues.apache.org/jira/browse/HBASE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-7776: --- Attachment: 0.94-7776_v1.1.patch Use ErrorReporter/Log instead of System.out in hbck --- Key: HBASE-7776 URL: https://issues.apache.org/jira/browse/HBASE-7776 Project: HBase Issue Type: Bug Components: hbck Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: 0.94-7776_v1.1.patch, trunk-7776_v1.patch There are lots places to log messages with System.out. We should use ErrorReporter or Log instead, which can be configured to catch what we want. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7776) Use ErrorReporter/Log instead of System.out in hbck
[ https://issues.apache.org/jira/browse/HBASE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-7776: --- Attachment: trunk-7776_v1.1.patch Minor change. I verified that nothing goes to console using a mock error reporter, and with log4j going to a file. Use ErrorReporter/Log instead of System.out in hbck --- Key: HBASE-7776 URL: https://issues.apache.org/jira/browse/HBASE-7776 Project: HBase Issue Type: Bug Components: hbck Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: 0.94-7776_v1.1.patch, trunk-7776_v1.1.patch, trunk-7776_v1.patch There are lots places to log messages with System.out. We should use ErrorReporter or Log instead, which can be configured to catch what we want. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7776) Use ErrorReporter/Log instead of System.out in hbck
[ https://issues.apache.org/jira/browse/HBASE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-7776: --- Fix Version/s: 0.94.5 0.96.0 Status: Patch Available (was: Open) Use ErrorReporter/Log instead of System.out in hbck --- Key: HBASE-7776 URL: https://issues.apache.org/jira/browse/HBASE-7776 Project: HBase Issue Type: Bug Components: hbck Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.96.0, 0.94.5 Attachments: 0.94-7776_v1.1.patch, trunk-7776_v1.1.patch, trunk-7776_v1.patch There are lots places to log messages with System.out. We should use ErrorReporter or Log instead, which can be configured to catch what we want. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7776) Use ErrorReporter/Log instead of System.out in hbck
[ https://issues.apache.org/jira/browse/HBASE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-7776: --- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Integrated into trunk and 0.94. Thanks Jon for reviewing it. Use ErrorReporter/Log instead of System.out in hbck --- Key: HBASE-7776 URL: https://issues.apache.org/jira/browse/HBASE-7776 Project: HBase Issue Type: Bug Components: hbck Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.96.0, 0.94.5 Attachments: 0.94-7776_v1.1.patch, trunk-7776_v1.1.patch, trunk-7776_v1.patch There are lots places to log messages with System.out. We should use ErrorReporter or Log instead, which can be configured to catch what we want. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7784) move the code related to selection that is specific to default compaction policy, into default compaction policy (from HStore)
[ https://issues.apache.org/jira/browse/HBASE-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13574138#comment-13574138 ] Jimmy Xiang commented on HBASE-7784: Looks good to me. One question, why do we want to do that? {code} + if (priority == Store.PRIORITY_USER) { +++priority; // System compactions cannot have user priority, make less important. + } {code} If there are lots of files to compact, priority could be negative here, which is higher than PRIORITY_USER, right? move the code related to selection that is specific to default compaction policy, into default compaction policy (from HStore) -- Key: HBASE-7784 URL: https://issues.apache.org/jira/browse/HBASE-7784 Project: HBase Issue Type: Sub-task Components: Compaction Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-7784-v0.patch There are some TODO [level compaction] in HBASE-7603 patch, there may also be few other similar places. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-6976) Change assignment related logging to TRACE level
[ https://issues.apache.org/jira/browse/HBASE-6976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang resolved HBASE-6976. Resolution: Won't Fix Based on my testing on the latest trunk code, turning off the logging seems not to help the assignment performance very much. This may be because the code has been changed a lot, compared to 0.94. We can look into this again if we have clear proof this will improve the performance. Change assignment related logging to TRACE level Key: HBASE-6976 URL: https://issues.apache.org/jira/browse/HBASE-6976 Project: HBase Issue Type: Improvement Components: Region Assignment Reporter: Jimmy Xiang Priority: Minor Labels: noob As JD and Elliot mentioned, turning off ZKAssign logging can improve AM performance a lot. During the testing, I also noticed that: after all regions are already opened, master UI still shows lots of regions in transition. It is because AM hasn't finished the ZK event processing yet. Changing the logging level from debug to trace will improve AM performance. With HBASE-6611, I think AM is getting stable and reliable. I hope we don't need to see these logging any more. The logging is still available after turning trace logging level on for AM and ZKAssign class. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6611) Forcing region state offline cause double assignment
[ https://issues.apache.org/jira/browse/HBASE-6611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6611: --- Resolution: Fixed Status: Resolved (was: Patch Available) Integrated into trunk. TestHBaseFsck is fine locally. Forcing region state offline cause double assignment Key: HBASE-6611 URL: https://issues.apache.org/jira/browse/HBASE-6611 Project: HBase Issue Type: Bug Components: master Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 Attachments: trunk-6611_v2.patch, trunk-6611_v5.patch In assigning a region, assignment manager forces the region state offline if it is not. This could cause double assignment, for example, if the region is already assigned and in the Open state, you should not just change it's state to Offline, and assign it again. I think this could be the root cause for all double assignments IF the region state is reliable. After this loophole is closed, TestHBaseFsck should come up a different way to create some assignment inconsistencies, for example, calling region server to open a region directly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7022) Use multi to batch offline regions in zookeeper
Jimmy Xiang created HBASE-7022: -- Summary: Use multi to batch offline regions in zookeeper Key: HBASE-7022 URL: https://issues.apache.org/jira/browse/HBASE-7022 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Bulk assigner needs to set regions offline in zookeeper one by one. I was wondering if we can have some performance improvement if we batch these operations using ZooKeeper#multi. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6977) Multithread processing ZK assignment events
[ https://issues.apache.org/jira/browse/HBASE-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6977: --- Priority: Minor (was: Major) Multithread processing ZK assignment events --- Key: HBASE-6977 URL: https://issues.apache.org/jira/browse/HBASE-6977 Project: HBase Issue Type: Improvement Components: Region Assignment Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Related to HBASE-6976 and HBASE-6611. ZK events processing is a bottle neck for assignments, since there is only one ZK event thread. If we can use multiple threads, it should be better. With multiple threads, the order of events could be messed up. However, if we pass all events related to one region always to the same worker thread, the order should be kept. We need to play with it and find out how much performance imrovement we can get. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6977) Multithread processing ZK assignment events
[ https://issues.apache.org/jira/browse/HBASE-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6977: --- Attachment: trunk-6977_v1.patch Multithread processing ZK assignment events --- Key: HBASE-6977 URL: https://issues.apache.org/jira/browse/HBASE-6977 Project: HBase Issue Type: Improvement Components: Region Assignment Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: trunk-6977_v1.patch Related to HBASE-6976 and HBASE-6611. ZK events processing is a bottle neck for assignments, since there is only one ZK event thread. If we can use multiple threads, it should be better. With multiple threads, the order of events could be messed up. However, if we pass all events related to one region always to the same worker thread, the order should be kept. We need to play with it and find out how much performance imrovement we can get. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6977) Multithread processing ZK assignment events
[ https://issues.apache.org/jira/browse/HBASE-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6977: --- Status: Patch Available (was: Open) Multithread processing ZK assignment events --- Key: HBASE-6977 URL: https://issues.apache.org/jira/browse/HBASE-6977 Project: HBase Issue Type: Improvement Components: Region Assignment Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: trunk-6977_v1.patch Related to HBASE-6976 and HBASE-6611. ZK events processing is a bottle neck for assignments, since there is only one ZK event thread. If we can use multiple threads, it should be better. With multiple threads, the order of events could be messed up. However, if we pass all events related to one region always to the same worker thread, the order should be kept. We need to play with it and find out how much performance imrovement we can get. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6977) Multithread processing ZK assignment events
[ https://issues.apache.org/jira/browse/HBASE-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480634#comment-13480634 ] Jimmy Xiang commented on HBASE-6977: The patch is posted on RB: https://reviews.apache.org/r/7682/ for easy review. In the meantime, I was thinking about HBASE-7022, using multi to batch those zk operations in bulk assigner. I think it will improve the performance a little. I need to play with it and find out. Multithread processing ZK assignment events --- Key: HBASE-6977 URL: https://issues.apache.org/jira/browse/HBASE-6977 Project: HBase Issue Type: Improvement Components: Region Assignment Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: trunk-6977_v1.patch Related to HBASE-6976 and HBASE-6611. ZK events processing is a bottle neck for assignments, since there is only one ZK event thread. If we can use multiple threads, it should be better. With multiple threads, the order of events could be messed up. However, if we pass all events related to one region always to the same worker thread, the order should be kept. We need to play with it and find out how much performance imrovement we can get. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6977) Multithread processing ZK assignment events
[ https://issues.apache.org/jira/browse/HBASE-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6977: --- Status: Open (was: Patch Available) Multithread processing ZK assignment events --- Key: HBASE-6977 URL: https://issues.apache.org/jira/browse/HBASE-6977 Project: HBase Issue Type: Improvement Components: Region Assignment Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: trunk-6977_v1.patch Related to HBASE-6976 and HBASE-6611. ZK events processing is a bottle neck for assignments, since there is only one ZK event thread. If we can use multiple threads, it should be better. With multiple threads, the order of events could be messed up. However, if we pass all events related to one region always to the same worker thread, the order should be kept. We need to play with it and find out how much performance imrovement we can get. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6977) Multithread processing ZK assignment events
[ https://issues.apache.org/jira/browse/HBASE-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481815#comment-13481815 ] Jimmy Xiang commented on HBASE-6977: Thanks Ted for the review. I posted the second patch to RB: https://reviews.apache.org/r/7682/ Multithread processing ZK assignment events --- Key: HBASE-6977 URL: https://issues.apache.org/jira/browse/HBASE-6977 Project: HBase Issue Type: Improvement Components: Region Assignment Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: trunk-6977_v1.patch Related to HBASE-6976 and HBASE-6611. ZK events processing is a bottle neck for assignments, since there is only one ZK event thread. If we can use multiple threads, it should be better. With multiple threads, the order of events could be messed up. However, if we pass all events related to one region always to the same worker thread, the order should be kept. We need to play with it and find out how much performance imrovement we can get. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-6744) Per table balancing could cause regions unbalanced overall
[ https://issues.apache.org/jira/browse/HBASE-6744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang resolved HBASE-6744. Resolution: Won't Fix Per table balancing could cause regions unbalanced overall -- Key: HBASE-6744 URL: https://issues.apache.org/jira/browse/HBASE-6744 Project: HBase Issue Type: Improvement Reporter: Jimmy Xiang Per table balancing just balances regions based on tables. However, overall, regions could be seriously unbalanced. For example, if you shutdown all most all region serves in a cluster, then create tons of new tables (no region pre-split), then start up all region servers. You will see the regions won't move to other region servers since they are balanced per table (only one region for a table at this moment). If we can make the balance algorithm sophisticated enough, we don't need the configuration hbase.master.loadbalance.bytable. We can do the regular and bytable balancing at the same time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6896) sync bulk and regular assigment handling if possible
[ https://issues.apache.org/jira/browse/HBASE-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6896: --- Priority: Minor (was: Major) Assignee: Jimmy Xiang Summary: sync bulk and regular assigment handling if possible (was: sync bulk and regular assigment handling for RegionAlreadyInTransitionException) sync bulk and regular assigment handling if possible Key: HBASE-6896 URL: https://issues.apache.org/jira/browse/HBASE-6896 Project: HBase Issue Type: Bug Components: Region Assignment Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Currently, for bulk assignment, when a region is already in transition, it thinks the region is opened. However, in regular assignment, it throws RegionAlreadyInTransitionException. In regular assignment, in case of RegionAlreadyInTransitionException, it tries to call openRegion again and again without change the region plan, ZK offline node, till the region is out of transition. We may need to sync them up and make sure bulk assignment does the same in this case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6896) sync bulk and regular assigment handling if possible
[ https://issues.apache.org/jira/browse/HBASE-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481886#comment-13481886 ] Jimmy Xiang commented on HBASE-6896: Thought about this and I think the existing RegionAlreadyInTransitionException handling in bulk assigner is fine. Instead, probably there is no need to retry in the regular assignment instead. So I changed the title of the jira. Instead, I will sync up the socket.timeout exception handling. sync bulk and regular assigment handling if possible Key: HBASE-6896 URL: https://issues.apache.org/jira/browse/HBASE-6896 Project: HBase Issue Type: Bug Components: Region Assignment Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Currently, for bulk assignment, when a region is already in transition, it thinks the region is opened. However, in regular assignment, it throws RegionAlreadyInTransitionException. In regular assignment, in case of RegionAlreadyInTransitionException, it tries to call openRegion again and again without change the region plan, ZK offline node, till the region is out of transition. We may need to sync them up and make sure bulk assignment does the same in this case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6896) sync bulk and regular assigment handling if possible
[ https://issues.apache.org/jira/browse/HBASE-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6896: --- Attachment: trunk-6896.patch sync bulk and regular assigment handling if possible Key: HBASE-6896 URL: https://issues.apache.org/jira/browse/HBASE-6896 Project: HBase Issue Type: Bug Components: Region Assignment Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: trunk-6896.patch Currently, for bulk assignment, when a region is already in transition, it thinks the region is opened. However, in regular assignment, it throws RegionAlreadyInTransitionException. In regular assignment, in case of RegionAlreadyInTransitionException, it tries to call openRegion again and again without change the region plan, ZK offline node, till the region is out of transition. We may need to sync them up and make sure bulk assignment does the same in this case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6896) sync bulk and regular assigment handling if possible
[ https://issues.apache.org/jira/browse/HBASE-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6896: --- Status: Patch Available (was: Open) sync bulk and regular assigment handling if possible Key: HBASE-6896 URL: https://issues.apache.org/jira/browse/HBASE-6896 Project: HBase Issue Type: Bug Components: Region Assignment Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: trunk-6896.patch Currently, for bulk assignment, when a region is already in transition, it thinks the region is opened. However, in regular assignment, it throws RegionAlreadyInTransitionException. In regular assignment, in case of RegionAlreadyInTransitionException, it tries to call openRegion again and again without change the region plan, ZK offline node, till the region is out of transition. We may need to sync them up and make sure bulk assignment does the same in this case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7022) Use multi to batch offline regions in zookeeper
[ https://issues.apache.org/jira/browse/HBASE-7022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481969#comment-13481969 ] Jimmy Xiang commented on HBASE-7022: Currently, I think ZooKeeper#multi doesn’t help very much since it's synchronous, transactional, and doesn’t support something like upsert in SQL (i.e. create it if a node doesn’t exist, otherwise, update its data). It supports batch of size less than 1MB. If we can have an asynchronous, non-transactional multi zookeeper client function, which also supports upsert, it will really help. Use multi to batch offline regions in zookeeper --- Key: HBASE-7022 URL: https://issues.apache.org/jira/browse/HBASE-7022 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Bulk assigner needs to set regions offline in zookeeper one by one. I was wondering if we can have some performance improvement if we batch these operations using ZooKeeper#multi. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6977) Multithread processing ZK assignment events
[ https://issues.apache.org/jira/browse/HBASE-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6977: --- Attachment: trunk-6977_v2-1.patch Multithread processing ZK assignment events --- Key: HBASE-6977 URL: https://issues.apache.org/jira/browse/HBASE-6977 Project: HBase Issue Type: Improvement Components: Region Assignment Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: trunk-6977_v1.patch, trunk-6977_v2-1.patch Related to HBASE-6976 and HBASE-6611. ZK events processing is a bottle neck for assignments, since there is only one ZK event thread. If we can use multiple threads, it should be better. With multiple threads, the order of events could be messed up. However, if we pass all events related to one region always to the same worker thread, the order should be kept. We need to play with it and find out how much performance imrovement we can get. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6977) Multithread processing ZK assignment events
[ https://issues.apache.org/jira/browse/HBASE-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6977: --- Hadoop Flags: Reviewed Status: Patch Available (was: Open) Multithread processing ZK assignment events --- Key: HBASE-6977 URL: https://issues.apache.org/jira/browse/HBASE-6977 Project: HBase Issue Type: Improvement Components: Region Assignment Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: trunk-6977_v1.patch, trunk-6977_v2-1.patch Related to HBASE-6976 and HBASE-6611. ZK events processing is a bottle neck for assignments, since there is only one ZK event thread. If we can use multiple threads, it should be better. With multiple threads, the order of events could be messed up. However, if we pass all events related to one region always to the same worker thread, the order should be kept. We need to play with it and find out how much performance imrovement we can get. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-5119) Set the TimeoutMonitor's timeout back down
[ https://issues.apache.org/jira/browse/HBASE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang reassigned HBASE-5119: -- Assignee: (was: Jimmy Xiang) Based on my testing for HBASE-6611/HBASE-6977, in my testing cluster, I can easily open 10+ regions on 4 nodes in around 4 minutes. I think we are fine to set the timeout back to 5 minutes. Set the TimeoutMonitor's timeout back down -- Key: HBASE-5119 URL: https://issues.apache.org/jira/browse/HBASE-5119 Project: HBase Issue Type: Task Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Priority: Blocker Fix For: 0.96.0 The TimeoutMonitor used to be extremely racy and caused more troubles than it fixed, but most of this has been fixed I believe in the context of 0.92 so I think we should set it down back to a useful level. Currently it's 30 minutes, what should the new value be? I think 5 minutes should be good, will do some testing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-7022) Use multi to batch offline regions in zookeeper
[ https://issues.apache.org/jira/browse/HBASE-7022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang reassigned HBASE-7022: -- Assignee: (was: Jimmy Xiang) Use multi to batch offline regions in zookeeper --- Key: HBASE-7022 URL: https://issues.apache.org/jira/browse/HBASE-7022 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Bulk assigner needs to set regions offline in zookeeper one by one. I was wondering if we can have some performance improvement if we batch these operations using ZooKeeper#multi. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-6880) Failure in assigning root causes system hang
[ https://issues.apache.org/jira/browse/HBASE-6880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang reassigned HBASE-6880: -- Assignee: Jimmy Xiang Failure in assigning root causes system hang Key: HBASE-6880 URL: https://issues.apache.org/jira/browse/HBASE-6880 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang In looking into a TestReplication failure, I found out sometimes assignRoot could fail, for example, RS is not serving traffic yet. In this case, the master will keep waiting for root to be available, which could never happen. Need to gracefully terminate master if root is not assigned properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6896) sync bulk and regular assigment handling socket timeout exception
[ https://issues.apache.org/jira/browse/HBASE-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6896: --- Description: In regular assignment, in case of socket network timeout, it tries to call openRegion again and again without change the region plan, ZK offline node, till the region is out of transition, in case the region server is still up. We may need to sync them up and make sure bulk assignment does the same in this case. was: Currently, for bulk assignment, when a region is already in transition, it thinks the region is opened. However, in regular assignment, it throws RegionAlreadyInTransitionException. In regular assignment, in case of RegionAlreadyInTransitionException, it tries to call openRegion again and again without change the region plan, ZK offline node, till the region is out of transition. We may need to sync them up and make sure bulk assignment does the same in this case. Summary: sync bulk and regular assigment handling socket timeout exception (was: sync bulk and regular assigment handling if possible) sync bulk and regular assigment handling socket timeout exception - Key: HBASE-6896 URL: https://issues.apache.org/jira/browse/HBASE-6896 Project: HBase Issue Type: Bug Components: Region Assignment Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: trunk-6896.patch In regular assignment, in case of socket network timeout, it tries to call openRegion again and again without change the region plan, ZK offline node, till the region is out of transition, in case the region server is still up. We may need to sync them up and make sure bulk assignment does the same in this case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6896) sync bulk and regular assigment handling socket timeout exception
[ https://issues.apache.org/jira/browse/HBASE-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482527#comment-13482527 ] Jimmy Xiang commented on HBASE-6896: Thanks a lot for the review. I updated the title and description of the bug, since we are not going to sync up the processing of already in transition exception. I will change isn't gone - didn't go As to include the region names, it is bulk assign. There may be a lot of regions. sync bulk and regular assigment handling socket timeout exception - Key: HBASE-6896 URL: https://issues.apache.org/jira/browse/HBASE-6896 Project: HBase Issue Type: Bug Components: Region Assignment Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: trunk-6896.patch In regular assignment, in case of socket network timeout, it tries to call openRegion again and again without change the region plan, ZK offline node, till the region is out of transition, in case the region server is still up. We may need to sync them up and make sure bulk assignment does the same in this case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-6880) Failure in assigning root causes system hang
[ https://issues.apache.org/jira/browse/HBASE-6880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang resolved HBASE-6880. Resolution: Won't Fix I thought about this issue again. Probably it is right to keep waiting and let the timeout monitor to deal with it. With HBASE-5119, the timeout out period will be dropped from 20 minutes to 5 minutes, which is a little bit helpful. The issue here is that we can't see the assignment of root is completely failed. At least it is put in transition. TM will fix it. The same assign call is used for user regions as well. Changing it to return true/false may be confusing. Failure in assigning root causes system hang Key: HBASE-6880 URL: https://issues.apache.org/jira/browse/HBASE-6880 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang In looking into a TestReplication failure, I found out sometimes assignRoot could fail, for example, RS is not serving traffic yet. In this case, the master will keep waiting for root to be available, which could never happen. Need to gracefully terminate master if root is not assigned properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6896) sync bulk and regular assigment handling socket timeout exception
[ https://issues.apache.org/jira/browse/HBASE-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482742#comment-13482742 ] Jimmy Xiang commented on HBASE-6896: Thanks for the review. Yes, handling is the same in case of socket timeout. It is just condition checking, probably too simple to have a separate method. sync bulk and regular assigment handling socket timeout exception - Key: HBASE-6896 URL: https://issues.apache.org/jira/browse/HBASE-6896 Project: HBase Issue Type: Bug Components: Region Assignment Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: trunk-6896.patch In regular assignment, in case of socket network timeout, it tries to call openRegion again and again without change the region plan, ZK offline node, till the region is out of transition, in case the region server is still up. We may need to sync them up and make sure bulk assignment does the same in this case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6896) sync bulk and regular assigment handling socket timeout exception
[ https://issues.apache.org/jira/browse/HBASE-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6896: --- Status: Open (was: Patch Available) sync bulk and regular assigment handling socket timeout exception - Key: HBASE-6896 URL: https://issues.apache.org/jira/browse/HBASE-6896 Project: HBase Issue Type: Bug Components: Region Assignment Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: trunk-6896.patch In regular assignment, in case of socket network timeout, it tries to call openRegion again and again without change the region plan, ZK offline node, till the region is out of transition, in case the region server is still up. We may need to sync them up and make sure bulk assignment does the same in this case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6896) sync bulk and regular assigment handling socket timeout exception
[ https://issues.apache.org/jira/browse/HBASE-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6896: --- Attachment: trunk-6896_v2.patch sync bulk and regular assigment handling socket timeout exception - Key: HBASE-6896 URL: https://issues.apache.org/jira/browse/HBASE-6896 Project: HBase Issue Type: Bug Components: Region Assignment Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: trunk-6896.patch, trunk-6896_v2.patch In regular assignment, in case of socket network timeout, it tries to call openRegion again and again without change the region plan, ZK offline node, till the region is out of transition, in case the region server is still up. We may need to sync them up and make sure bulk assignment does the same in this case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6896) sync bulk and regular assigment handling socket timeout exception
[ https://issues.apache.org/jira/browse/HBASE-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6896: --- Hadoop Flags: Reviewed Status: Patch Available (was: Open) sync bulk and regular assigment handling socket timeout exception - Key: HBASE-6896 URL: https://issues.apache.org/jira/browse/HBASE-6896 Project: HBase Issue Type: Bug Components: Region Assignment Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: trunk-6896.patch, trunk-6896_v2.patch In regular assignment, in case of socket network timeout, it tries to call openRegion again and again without change the region plan, ZK offline node, till the region is out of transition, in case the region server is still up. We may need to sync them up and make sure bulk assignment does the same in this case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5162) Basic client pushback mechanism
[ https://issues.apache.org/jira/browse/HBASE-5162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482767#comment-13482767 ] Jimmy Xiang commented on HBASE-5162: Isn't the exception way much cleaner and simpler? I think the exception way is greedy, and the hbase client code needs minimal change. Based on the retry count, it can adjust the delay time in the middle. For the load monitoring, we assume the load trend remains the same, which may not be that case actually. The client side has to track the regions whose memstore is under pressure. Every client needs to do the same tracking. Basic client pushback mechanism --- Key: HBASE-5162 URL: https://issues.apache.org/jira/browse/HBASE-5162 Project: HBase Issue Type: New Feature Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Fix For: 0.96.0 Attachments: java_HBASE-5162.patch The current blocking we do when we are close to some limits (memstores over the multiplier factor, too many store files, global memstore memory) is bad, too coarse and confusing. After hitting HBASE-5161, it really becomes obvious that we need something better. I did a little brainstorm with Stack, we came up quickly with two solutions: - Send some exception to the client, like OverloadedException, that's thrown when some situation happens like getting past the low memory barrier. It would be thrown when the client gets a handler and does some check while putting or deleting. The client would treat this a retryable exception but ideally wouldn't check .META. for a new location. It could be fancy and have multiple levels of pushback, like send the exception to 25% of the clients, and then go up if the situation persists. Should be easy to implement but we'll be using a lot more IO to send the payload over and over again (but at least it wouldn't sit in the RS's memory). - Send a message alongside a successful put or delete to tell the client to slow down a little, this way we don't have to do back and forth with the payload between the client and the server. It's a cleaner (I think) but more involved solution. In every case the RS should do very obvious things to notify the operators of this situation, through logs, web UI, metrics, etc. Other ideas? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5162) Basic client pushback mechanism
[ https://issues.apache.org/jira/browse/HBASE-5162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482827#comment-13482827 ] Jimmy Xiang commented on HBASE-5162: I was thinking RegionTooBusyException, so close. As long as the server releases the IPC handler in such scenario, accessing other regions should not be blocked. The point is that we don't want a busy region blocks a whole region server. As to the old clients, right, the behavior is a little different. But they should not fail, as currently they should expect exceptions and handle them properly, for example, retry, although it may not be as efficient as delaying more when the retry count is bigger. As to measure the load, I think it is a good idea. I just have some concern in spending too much efforts on it without trying the simple one at first, which is known to work. Basic client pushback mechanism --- Key: HBASE-5162 URL: https://issues.apache.org/jira/browse/HBASE-5162 Project: HBase Issue Type: New Feature Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Fix For: 0.96.0 Attachments: java_HBASE-5162.patch The current blocking we do when we are close to some limits (memstores over the multiplier factor, too many store files, global memstore memory) is bad, too coarse and confusing. After hitting HBASE-5161, it really becomes obvious that we need something better. I did a little brainstorm with Stack, we came up quickly with two solutions: - Send some exception to the client, like OverloadedException, that's thrown when some situation happens like getting past the low memory barrier. It would be thrown when the client gets a handler and does some check while putting or deleting. The client would treat this a retryable exception but ideally wouldn't check .META. for a new location. It could be fancy and have multiple levels of pushback, like send the exception to 25% of the clients, and then go up if the situation persists. Should be easy to implement but we'll be using a lot more IO to send the payload over and over again (but at least it wouldn't sit in the RS's memory). - Send a message alongside a successful put or delete to tell the client to slow down a little, this way we don't have to do back and forth with the payload between the client and the server. It's a cleaner (I think) but more involved solution. In every case the RS should do very obvious things to notify the operators of this situation, through logs, web UI, metrics, etc. Other ideas? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5162) Basic client pushback mechanism
[ https://issues.apache.org/jira/browse/HBASE-5162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482829#comment-13482829 ] Jimmy Xiang commented on HBASE-5162: For the extra load due to retry, it is mostly on network, not very much on the server. The retry time is configurable based on the tries: ConnectionUtils.getPauseTime(pause, tries) As long as the retry is gradually slowing down, is it acceptable? Basic client pushback mechanism --- Key: HBASE-5162 URL: https://issues.apache.org/jira/browse/HBASE-5162 Project: HBase Issue Type: New Feature Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Fix For: 0.96.0 Attachments: java_HBASE-5162.patch The current blocking we do when we are close to some limits (memstores over the multiplier factor, too many store files, global memstore memory) is bad, too coarse and confusing. After hitting HBASE-5161, it really becomes obvious that we need something better. I did a little brainstorm with Stack, we came up quickly with two solutions: - Send some exception to the client, like OverloadedException, that's thrown when some situation happens like getting past the low memory barrier. It would be thrown when the client gets a handler and does some check while putting or deleting. The client would treat this a retryable exception but ideally wouldn't check .META. for a new location. It could be fancy and have multiple levels of pushback, like send the exception to 25% of the clients, and then go up if the situation persists. Should be easy to implement but we'll be using a lot more IO to send the payload over and over again (but at least it wouldn't sit in the RS's memory). - Send a message alongside a successful put or delete to tell the client to slow down a little, this way we don't have to do back and forth with the payload between the client and the server. It's a cleaner (I think) but more involved solution. In every case the RS should do very obvious things to notify the operators of this situation, through logs, web UI, metrics, etc. Other ideas? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6977) Multithread processing ZK assignment events
[ https://issues.apache.org/jira/browse/HBASE-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6977: --- Status: Open (was: Patch Available) Will address Stack's comments and upload a new patch. Multithread processing ZK assignment events --- Key: HBASE-6977 URL: https://issues.apache.org/jira/browse/HBASE-6977 Project: HBase Issue Type: Improvement Components: Region Assignment Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: trunk-6977_v1.patch, trunk-6977_v2-1.patch Related to HBASE-6976 and HBASE-6611. ZK events processing is a bottle neck for assignments, since there is only one ZK event thread. If we can use multiple threads, it should be better. With multiple threads, the order of events could be messed up. However, if we pass all events related to one region always to the same worker thread, the order should be kept. We need to play with it and find out how much performance imrovement we can get. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6896) sync bulk and regular assigment handling socket timeout exception
[ https://issues.apache.org/jira/browse/HBASE-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6896: --- Resolution: Fixed Status: Resolved (was: Patch Available) Integrated into trunk. Thanks all for the review. sync bulk and regular assigment handling socket timeout exception - Key: HBASE-6896 URL: https://issues.apache.org/jira/browse/HBASE-6896 Project: HBase Issue Type: Bug Components: Region Assignment Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: trunk-6896.patch, trunk-6896_v2.patch In regular assignment, in case of socket network timeout, it tries to call openRegion again and again without change the region plan, ZK offline node, till the region is out of transition, in case the region server is still up. We may need to sync them up and make sure bulk assignment does the same in this case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6977) Multithread processing ZK assignment events
[ https://issues.apache.org/jira/browse/HBASE-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6977: --- Attachment: trunk-6977_v3.patch Multithread processing ZK assignment events --- Key: HBASE-6977 URL: https://issues.apache.org/jira/browse/HBASE-6977 Project: HBase Issue Type: Improvement Components: Region Assignment Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: trunk-6977_v1.patch, trunk-6977_v2-1.patch, trunk-6977_v3.patch Related to HBASE-6976 and HBASE-6611. ZK events processing is a bottle neck for assignments, since there is only one ZK event thread. If we can use multiple threads, it should be better. With multiple threads, the order of events could be messed up. However, if we pass all events related to one region always to the same worker thread, the order should be kept. We need to play with it and find out how much performance imrovement we can get. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6977) Multithread processing ZK assignment events
[ https://issues.apache.org/jira/browse/HBASE-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6977: --- Status: Patch Available (was: Open) Multithread processing ZK assignment events --- Key: HBASE-6977 URL: https://issues.apache.org/jira/browse/HBASE-6977 Project: HBase Issue Type: Improvement Components: Region Assignment Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: trunk-6977_v1.patch, trunk-6977_v2-1.patch, trunk-6977_v3.patch Related to HBASE-6976 and HBASE-6611. ZK events processing is a bottle neck for assignments, since there is only one ZK event thread. If we can use multiple threads, it should be better. With multiple threads, the order of events could be messed up. However, if we pass all events related to one region always to the same worker thread, the order should be kept. We need to play with it and find out how much performance imrovement we can get. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-5179: --- Resolution: Fixed Status: Resolved (was: Patch Available) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.3 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 5179-90v16.patch, 5179-90v17.txt, 5179-90v18.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-92v17.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v17.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6977) Multithread processing ZK assignment events
[ https://issues.apache.org/jira/browse/HBASE-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6977: --- Resolution: Fixed Fix Version/s: 0.96.0 Status: Resolved (was: Patch Available) Integrated into trunk. Thanks all for reviewing it. Multithread processing ZK assignment events --- Key: HBASE-6977 URL: https://issues.apache.org/jira/browse/HBASE-6977 Project: HBase Issue Type: Improvement Components: Region Assignment Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.96.0 Attachments: trunk-6977_v1.patch, trunk-6977_v2-1.patch, trunk-6977_v3.patch Related to HBASE-6976 and HBASE-6611. ZK events processing is a bottle neck for assignments, since there is only one ZK event thread. If we can use multiple threads, it should be better. With multiple threads, the order of events could be messed up. However, if we pass all events related to one region always to the same worker thread, the order should be kept. We need to play with it and find out how much performance imrovement we can get. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6381) AssignmentManager should use the same logic for clean startup and failover
[ https://issues.apache.org/jira/browse/HBASE-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6381: --- Fix Version/s: 0.96.0 AssignmentManager should use the same logic for clean startup and failover -- Key: HBASE-6381 URL: https://issues.apache.org/jira/browse/HBASE-6381 Project: HBase Issue Type: Bug Components: master Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 Attachments: hbase-6381-notes.pdf, hbase-6381.pdf, trunk-6381_v5.patch, trunk-6381_v7.patch, trunk-6381_v8.patch, trunk-6381_v9.patch Currently AssignmentManager handles clean startup and failover very differently. Different logic is mingled together so it is hard to find out which is for which. We should clean it up and share the same logic so that AssignmentManager handles both cases the same way. This way, the code will much easier to understand and maintain. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-6223) Document hbck improvements: HBASE-6173, HBASE-5360
[ https://issues.apache.org/jira/browse/HBASE-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang reassigned HBASE-6223: -- Assignee: Jimmy Xiang Document hbck improvements: HBASE-6173, HBASE-5360 --- Key: HBASE-6223 URL: https://issues.apache.org/jira/browse/HBASE-6223 Project: HBase Issue Type: Task Components: documentation, hbck Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: trunk-6223.patch We had a couple hbck improvements recently: HBASE-6173 and HBASE-5360. We should document them. Especially, for HBASE-5360, it's something one normally doesn't do. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6223) Document hbck improvements: HBASE-6173, HBASE-5360
[ https://issues.apache.org/jira/browse/HBASE-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6223: --- Attachment: trunk-6223.patch Document hbck improvements: HBASE-6173, HBASE-5360 --- Key: HBASE-6223 URL: https://issues.apache.org/jira/browse/HBASE-6223 Project: HBase Issue Type: Task Components: documentation, hbck Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: trunk-6223.patch We had a couple hbck improvements recently: HBASE-6173 and HBASE-5360. We should document them. Especially, for HBASE-5360, it's something one normally doesn't do. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6223) Document hbck improvements: HBASE-6173, HBASE-5360
[ https://issues.apache.org/jira/browse/HBASE-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6223: --- Status: Patch Available (was: Open) Document hbck improvements: HBASE-6173, HBASE-5360 --- Key: HBASE-6223 URL: https://issues.apache.org/jira/browse/HBASE-6223 Project: HBase Issue Type: Task Components: documentation, hbck Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: trunk-6223.patch We had a couple hbck improvements recently: HBASE-6173 and HBASE-5360. We should document them. Especially, for HBASE-5360, it's something one normally doesn't do. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5136) Redundant MonitoredTask instances in case of distributed log splitting retry
[ https://issues.apache.org/jira/browse/HBASE-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13484537#comment-13484537 ] Jimmy Xiang commented on HBASE-5136: @Ted, were you referring to 5136.txt? That's your patch. My patch is 5136-trunk.patch, which is already committed in HBASE-6357. I think we can close this one as a duplicate. Redundant MonitoredTask instances in case of distributed log splitting retry Key: HBASE-5136 URL: https://issues.apache.org/jira/browse/HBASE-5136 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Attachments: 5136-trunk.patch, 5136.txt In case of log splitting retry, the following code would be executed multiple times: {code} public long splitLogDistributed(final ListPath logDirs) throws IOException { MonitoredTask status = TaskMonitor.get().createStatus( Doing distributed log split in + logDirs); {code} leading to multiple MonitoredTask instances. User may get confused by multiple distributed log splitting entries for the same region server on master UI -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6223) Document hbck improvements: HBASE-6173, HBASE-5360
[ https://issues.apache.org/jira/browse/HBASE-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6223: --- Attachment: trunk-6223_v2.patch Document hbck improvements: HBASE-6173, HBASE-5360 --- Key: HBASE-6223 URL: https://issues.apache.org/jira/browse/HBASE-6223 Project: HBase Issue Type: Task Components: documentation, hbck Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: trunk-6223.patch, trunk-6223_v2.patch We had a couple hbck improvements recently: HBASE-6173 and HBASE-5360. We should document them. Especially, for HBASE-5360, it's something one normally doesn't do. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6223) Document hbck improvements: HBASE-6173, HBASE-5360
[ https://issues.apache.org/jira/browse/HBASE-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6223: --- Attachment: trunk-6223_v3.patch Thanks for review. I updated the patch a little bit. Now the sentence is short and easy to parse. :) Document hbck improvements: HBASE-6173, HBASE-5360 --- Key: HBASE-6223 URL: https://issues.apache.org/jira/browse/HBASE-6223 Project: HBase Issue Type: Task Components: documentation, hbck Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: trunk-6223.patch, trunk-6223_v2.patch, trunk-6223_v3.patch We had a couple hbck improvements recently: HBASE-6173 and HBASE-5360. We should document them. Especially, for HBASE-5360, it's something one normally doesn't do. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover
[ https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13485092#comment-13485092 ] Jimmy Xiang commented on HBASE-6060: I think we can let it go for 0.94 since timeout monitor can handle it and there is no better way to fix it, because the region state in 0.94 is not so reliable. For 0.96, this one is not covered yet. It still relies on timeout monitor. Let me cook up a patch for 0.96 now. Regions's in OPENING state from failed regionservers takes a long time to recover - Key: HBASE-6060 URL: https://issues.apache.org/jira/browse/HBASE-6060 Project: HBase Issue Type: Bug Components: master, regionserver Reporter: Enis Soztutar Assignee: rajeshbabu Fix For: 0.92.3, 0.94.3, 0.96.0 Attachments: 6060-94-v3.patch, 6060-94-v4_1.patch, 6060-94-v4_1.patch, 6060-94-v4.patch, 6060_alternative_suggestion.txt, 6060_suggestion2_based_off_v3.patch, 6060_suggestion_based_off_v3.patch, 6060_suggestion_toassign_rs_wentdown_beforerequest.patch, 6060-trunk_2.patch, 6060-trunk_3.patch, 6060-trunk.patch, 6060-trunk.patch, HBASE-6060-92.patch, HBASE-6060-94.patch, HBASE-6060_latest.patch, HBASE-6060_latest.patch, HBASE-6060_latest.patch, HBASE-6060-trunk_4.patch, HBASE-6060_trunk_5.patch we have seen a pattern in tests, that the regions are stuck in OPENING state for a very long time when the region server who is opening the region fails. My understanding of the process: - master calls rs to open the region. If rs is offline, a new plan is generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), HMaster.assign() - RegionServer, starts opening a region, changes the state in znode. But that znode is not ephemeral. (see ZkAssign) - Rs transitions zk node from OFFLINE to OPENING. See OpenRegionHandler.process() - rs then opens the region, and changes znode from OPENING to OPENED - when rs is killed between OPENING and OPENED states, then zk shows OPENING state, and the master just waits for rs to change the region state, but since rs is down, that wont happen. - There is a AssignmentManager.TimeoutMonitor, which does exactly guard against these kind of conditions. It periodically checks (every 10 sec by default) the regions in transition to see whether they timedout (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, which explains what you and I are seeing. - ServerShutdownHandler in Master does not reassign regions in OPENING state, although it handles other states. Lowering that threshold from the configuration is one option, but still I think we can do better. Will investigate more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover
[ https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6060: --- Attachment: trunk-6060.patch Regions's in OPENING state from failed regionservers takes a long time to recover - Key: HBASE-6060 URL: https://issues.apache.org/jira/browse/HBASE-6060 Project: HBase Issue Type: Bug Components: master, regionserver Reporter: Enis Soztutar Assignee: rajeshbabu Fix For: 0.92.3, 0.94.3, 0.96.0 Attachments: 6060-94-v3.patch, 6060-94-v4_1.patch, 6060-94-v4_1.patch, 6060-94-v4.patch, 6060_alternative_suggestion.txt, 6060_suggestion2_based_off_v3.patch, 6060_suggestion_based_off_v3.patch, 6060_suggestion_toassign_rs_wentdown_beforerequest.patch, 6060-trunk_2.patch, 6060-trunk_3.patch, 6060-trunk.patch, 6060-trunk.patch, HBASE-6060-92.patch, HBASE-6060-94.patch, HBASE-6060_latest.patch, HBASE-6060_latest.patch, HBASE-6060_latest.patch, HBASE-6060-trunk_4.patch, HBASE-6060_trunk_5.patch, trunk-6060.patch we have seen a pattern in tests, that the regions are stuck in OPENING state for a very long time when the region server who is opening the region fails. My understanding of the process: - master calls rs to open the region. If rs is offline, a new plan is generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), HMaster.assign() - RegionServer, starts opening a region, changes the state in znode. But that znode is not ephemeral. (see ZkAssign) - Rs transitions zk node from OFFLINE to OPENING. See OpenRegionHandler.process() - rs then opens the region, and changes znode from OPENING to OPENED - when rs is killed between OPENING and OPENED states, then zk shows OPENING state, and the master just waits for rs to change the region state, but since rs is down, that wont happen. - There is a AssignmentManager.TimeoutMonitor, which does exactly guard against these kind of conditions. It periodically checks (every 10 sec by default) the regions in transition to see whether they timedout (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, which explains what you and I are seeing. - ServerShutdownHandler in Master does not reassign regions in OPENING state, although it handles other states. Lowering that threshold from the configuration is one option, but still I think we can do better. Will investigate more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover
[ https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13485173#comment-13485173 ] Jimmy Xiang commented on HBASE-6060: I uploaded a simple patch for 0.96: trunk-6060.patch. Could you please review? Regions's in OPENING state from failed regionservers takes a long time to recover - Key: HBASE-6060 URL: https://issues.apache.org/jira/browse/HBASE-6060 Project: HBase Issue Type: Bug Components: master, regionserver Reporter: Enis Soztutar Assignee: rajeshbabu Fix For: 0.92.3, 0.94.3, 0.96.0 Attachments: 6060-94-v3.patch, 6060-94-v4_1.patch, 6060-94-v4_1.patch, 6060-94-v4.patch, 6060_alternative_suggestion.txt, 6060_suggestion2_based_off_v3.patch, 6060_suggestion_based_off_v3.patch, 6060_suggestion_toassign_rs_wentdown_beforerequest.patch, 6060-trunk_2.patch, 6060-trunk_3.patch, 6060-trunk.patch, 6060-trunk.patch, HBASE-6060-92.patch, HBASE-6060-94.patch, HBASE-6060_latest.patch, HBASE-6060_latest.patch, HBASE-6060_latest.patch, HBASE-6060-trunk_4.patch, HBASE-6060_trunk_5.patch, trunk-6060.patch we have seen a pattern in tests, that the regions are stuck in OPENING state for a very long time when the region server who is opening the region fails. My understanding of the process: - master calls rs to open the region. If rs is offline, a new plan is generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), HMaster.assign() - RegionServer, starts opening a region, changes the state in znode. But that znode is not ephemeral. (see ZkAssign) - Rs transitions zk node from OFFLINE to OPENING. See OpenRegionHandler.process() - rs then opens the region, and changes znode from OPENING to OPENED - when rs is killed between OPENING and OPENED states, then zk shows OPENING state, and the master just waits for rs to change the region state, but since rs is down, that wont happen. - There is a AssignmentManager.TimeoutMonitor, which does exactly guard against these kind of conditions. It periodically checks (every 10 sec by default) the regions in transition to see whether they timedout (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, which explains what you and I are seeing. - ServerShutdownHandler in Master does not reassign regions in OPENING state, although it handles other states. Lowering that threshold from the configuration is one option, but still I think we can do better. Will investigate more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6223) Document hbck improvements: HBASE-6173, HBASE-5360
[ https://issues.apache.org/jira/browse/HBASE-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6223: --- Resolution: Fixed Fix Version/s: 0.96.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Document hbck improvements: HBASE-6173, HBASE-5360 --- Key: HBASE-6223 URL: https://issues.apache.org/jira/browse/HBASE-6223 Project: HBase Issue Type: Task Components: documentation, hbck Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 Attachments: trunk-6223.patch, trunk-6223_v2.patch, trunk-6223_v3.patch We had a couple hbck improvements recently: HBASE-6173 and HBASE-5360. We should document them. Especially, for HBASE-5360, it's something one normally doesn't do. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover
[ https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13485209#comment-13485209 ] Jimmy Xiang commented on HBASE-6060: @Stack, here is my understanding on the problem. Master calls a rs to open a region. Now, in master memory, the region is in pending_open state with this rs' server name. Now the rs dies. When SSH starts, it goes to meta to find all the regions on this rs, minus those regions already in transition, then assign the remaining regions. If the pending_open region (it could be opening too depending on timing) was on this region server before, SSH will take care of it. Otherwise, if it was on a different region server, SSH will not pick it up. In this patch, I just times out the region transition so that tm can change the state and re-assign it, instead of waiting for a long time (now, 20 minutes by default). I'd like to make sure the region states in master memory is reliable. Otherwise, it is of not much use. So I think master always has region control. In 0.96, I think region states is very reliable now. Of course, there could be bugs I am not aware of yet. @Ted, good point. I will include the test. For EnvironmentEdgeManager, I will leave it to another jira. Regions's in OPENING state from failed regionservers takes a long time to recover - Key: HBASE-6060 URL: https://issues.apache.org/jira/browse/HBASE-6060 Project: HBase Issue Type: Bug Components: master, regionserver Reporter: Enis Soztutar Assignee: rajeshbabu Fix For: 0.92.3, 0.94.3, 0.96.0 Attachments: 6060-94-v3.patch, 6060-94-v4_1.patch, 6060-94-v4_1.patch, 6060-94-v4.patch, 6060_alternative_suggestion.txt, 6060_suggestion2_based_off_v3.patch, 6060_suggestion_based_off_v3.patch, 6060_suggestion_toassign_rs_wentdown_beforerequest.patch, 6060-trunk_2.patch, 6060-trunk_3.patch, 6060-trunk.patch, 6060-trunk.patch, HBASE-6060-92.patch, HBASE-6060-94.patch, HBASE-6060_latest.patch, HBASE-6060_latest.patch, HBASE-6060_latest.patch, HBASE-6060-trunk_4.patch, HBASE-6060_trunk_5.patch, trunk-6060.patch we have seen a pattern in tests, that the regions are stuck in OPENING state for a very long time when the region server who is opening the region fails. My understanding of the process: - master calls rs to open the region. If rs is offline, a new plan is generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), HMaster.assign() - RegionServer, starts opening a region, changes the state in znode. But that znode is not ephemeral. (see ZkAssign) - Rs transitions zk node from OFFLINE to OPENING. See OpenRegionHandler.process() - rs then opens the region, and changes znode from OPENING to OPENED - when rs is killed between OPENING and OPENED states, then zk shows OPENING state, and the master just waits for rs to change the region state, but since rs is down, that wont happen. - There is a AssignmentManager.TimeoutMonitor, which does exactly guard against these kind of conditions. It periodically checks (every 10 sec by default) the regions in transition to see whether they timedout (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, which explains what you and I are seeing. - ServerShutdownHandler in Master does not reassign regions in OPENING state, although it handles other states. Lowering that threshold from the configuration is one option, but still I think we can do better. Will investigate more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover
[ https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6060: --- Attachment: trunk-6060_v2.patch Regions's in OPENING state from failed regionservers takes a long time to recover - Key: HBASE-6060 URL: https://issues.apache.org/jira/browse/HBASE-6060 Project: HBase Issue Type: Bug Components: master, regionserver Reporter: Enis Soztutar Assignee: rajeshbabu Fix For: 0.92.3, 0.94.3, 0.96.0 Attachments: 6060-94-v3.patch, 6060-94-v4_1.patch, 6060-94-v4_1.patch, 6060-94-v4.patch, 6060_alternative_suggestion.txt, 6060_suggestion2_based_off_v3.patch, 6060_suggestion_based_off_v3.patch, 6060_suggestion_toassign_rs_wentdown_beforerequest.patch, 6060-trunk_2.patch, 6060-trunk_3.patch, 6060-trunk.patch, 6060-trunk.patch, HBASE-6060-92.patch, HBASE-6060-94.patch, HBASE-6060_latest.patch, HBASE-6060_latest.patch, HBASE-6060_latest.patch, HBASE-6060-trunk_4.patch, HBASE-6060_trunk_5.patch, trunk-6060.patch, trunk-6060_v2.patch we have seen a pattern in tests, that the regions are stuck in OPENING state for a very long time when the region server who is opening the region fails. My understanding of the process: - master calls rs to open the region. If rs is offline, a new plan is generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), HMaster.assign() - RegionServer, starts opening a region, changes the state in znode. But that znode is not ephemeral. (see ZkAssign) - Rs transitions zk node from OFFLINE to OPENING. See OpenRegionHandler.process() - rs then opens the region, and changes znode from OPENING to OPENED - when rs is killed between OPENING and OPENED states, then zk shows OPENING state, and the master just waits for rs to change the region state, but since rs is down, that wont happen. - There is a AssignmentManager.TimeoutMonitor, which does exactly guard against these kind of conditions. It periodically checks (every 10 sec by default) the regions in transition to see whether they timedout (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, which explains what you and I are seeing. - ServerShutdownHandler in Master does not reassign regions in OPENING state, although it handles other states. Lowering that threshold from the configuration is one option, but still I think we can do better. Will investigate more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover
[ https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13485515#comment-13485515 ] Jimmy Xiang commented on HBASE-6060: Added a test, and uploaded v2: trunk-6060_v2.patch. Regions's in OPENING state from failed regionservers takes a long time to recover - Key: HBASE-6060 URL: https://issues.apache.org/jira/browse/HBASE-6060 Project: HBase Issue Type: Bug Components: master, regionserver Reporter: Enis Soztutar Assignee: rajeshbabu Fix For: 0.92.3, 0.94.3, 0.96.0 Attachments: 6060-94-v3.patch, 6060-94-v4_1.patch, 6060-94-v4_1.patch, 6060-94-v4.patch, 6060_alternative_suggestion.txt, 6060_suggestion2_based_off_v3.patch, 6060_suggestion_based_off_v3.patch, 6060_suggestion_toassign_rs_wentdown_beforerequest.patch, 6060-trunk_2.patch, 6060-trunk_3.patch, 6060-trunk.patch, 6060-trunk.patch, HBASE-6060-92.patch, HBASE-6060-94.patch, HBASE-6060_latest.patch, HBASE-6060_latest.patch, HBASE-6060_latest.patch, HBASE-6060-trunk_4.patch, HBASE-6060_trunk_5.patch, trunk-6060.patch, trunk-6060_v2.patch we have seen a pattern in tests, that the regions are stuck in OPENING state for a very long time when the region server who is opening the region fails. My understanding of the process: - master calls rs to open the region. If rs is offline, a new plan is generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), HMaster.assign() - RegionServer, starts opening a region, changes the state in znode. But that znode is not ephemeral. (see ZkAssign) - Rs transitions zk node from OFFLINE to OPENING. See OpenRegionHandler.process() - rs then opens the region, and changes znode from OPENING to OPENED - when rs is killed between OPENING and OPENED states, then zk shows OPENING state, and the master just waits for rs to change the region state, but since rs is down, that wont happen. - There is a AssignmentManager.TimeoutMonitor, which does exactly guard against these kind of conditions. It periodically checks (every 10 sec by default) the regions in transition to see whether they timedout (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, which explains what you and I are seeing. - ServerShutdownHandler in Master does not reassign regions in OPENING state, although it handles other states. Lowering that threshold from the configuration is one option, but still I think we can do better. Will investigate more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5119) Set the TimeoutMonitor's timeout back down
[ https://issues.apache.org/jira/browse/HBASE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-5119: --- Status: Patch Available (was: Open) Set the TimeoutMonitor's timeout back down -- Key: HBASE-5119 URL: https://issues.apache.org/jira/browse/HBASE-5119 Project: HBase Issue Type: Task Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Priority: Blocker Fix For: 0.96.0 Attachments: trunk_5119.patch The TimeoutMonitor used to be extremely racy and caused more troubles than it fixed, but most of this has been fixed I believe in the context of 0.92 so I think we should set it down back to a useful level. Currently it's 30 minutes, what should the new value be? I think 5 minutes should be good, will do some testing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (HBASE-5119) Set the TimeoutMonitor's timeout back down
[ https://issues.apache.org/jira/browse/HBASE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481975#comment-13481975 ] Jimmy Xiang edited comment on HBASE-5119 at 10/29/12 4:45 PM: -- Based on my testing for HBASE-6611/HBASE-6977, in my testing cluster, I can easily open 10+k regions on 4 nodes in around 4 minutes. I think we are fine to set the timeout back to 5 minutes. was (Author: jxiang): Based on my testing for HBASE-6611/HBASE-6977, in my testing cluster, I can easily open 10+ regions on 4 nodes in around 4 minutes. I think we are fine to set the timeout back to 5 minutes. Set the TimeoutMonitor's timeout back down -- Key: HBASE-5119 URL: https://issues.apache.org/jira/browse/HBASE-5119 Project: HBase Issue Type: Task Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Priority: Blocker Fix For: 0.96.0 Attachments: trunk_5119.patch The TimeoutMonitor used to be extremely racy and caused more troubles than it fixed, but most of this has been fixed I believe in the context of 0.92 so I think we should set it down back to a useful level. Currently it's 30 minutes, what should the new value be? I think 5 minutes should be good, will do some testing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5119) Set the TimeoutMonitor's timeout back down
[ https://issues.apache.org/jira/browse/HBASE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-5119: --- Attachment: trunk_5119.patch Set the TimeoutMonitor's timeout back down -- Key: HBASE-5119 URL: https://issues.apache.org/jira/browse/HBASE-5119 Project: HBase Issue Type: Task Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Priority: Blocker Fix For: 0.96.0 Attachments: trunk_5119.patch The TimeoutMonitor used to be extremely racy and caused more troubles than it fixed, but most of this has been fixed I believe in the context of 0.92 so I think we should set it down back to a useful level. Currently it's 30 minutes, what should the new value be? I think 5 minutes should be good, will do some testing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5119) Set the TimeoutMonitor's timeout back down
[ https://issues.apache.org/jira/browse/HBASE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486137#comment-13486137 ] Jimmy Xiang commented on HBASE-5119: I uploaded a patch, which simply set the timeout to 10minutes and the checking period to 30 seconds. Set the TimeoutMonitor's timeout back down -- Key: HBASE-5119 URL: https://issues.apache.org/jira/browse/HBASE-5119 Project: HBase Issue Type: Task Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Priority: Blocker Fix For: 0.96.0 Attachments: trunk_5119.patch The TimeoutMonitor used to be extremely racy and caused more troubles than it fixed, but most of this has been fixed I believe in the context of 0.92 so I think we should set it down back to a useful level. Currently it's 30 minutes, what should the new value be? I think 5 minutes should be good, will do some testing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover
[ https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486333#comment-13486333 ] Jimmy Xiang commented on HBASE-6060: Good review. Thanks. I posted a new patch on RB: https://reviews.apache.org/r/7767/ In the new patch, I re-organized the SSH code a little bit, and handled regions in OPEN state too. It also handles regions in OPENING/PENDING_OPEN/OFFLINE state, but was open on this region server before that. This could happen when the cluster starts up. Regions's in OPENING state from failed regionservers takes a long time to recover - Key: HBASE-6060 URL: https://issues.apache.org/jira/browse/HBASE-6060 Project: HBase Issue Type: Bug Components: master, regionserver Reporter: Enis Soztutar Assignee: rajeshbabu Fix For: 0.92.3, 0.94.3, 0.96.0 Attachments: 6060-94-v3.patch, 6060-94-v4_1.patch, 6060-94-v4_1.patch, 6060-94-v4.patch, 6060_alternative_suggestion.txt, 6060_suggestion2_based_off_v3.patch, 6060_suggestion_based_off_v3.patch, 6060_suggestion_toassign_rs_wentdown_beforerequest.patch, 6060-trunk_2.patch, 6060-trunk_3.patch, 6060-trunk.patch, 6060-trunk.patch, HBASE-6060-92.patch, HBASE-6060-94.patch, HBASE-6060_latest.patch, HBASE-6060_latest.patch, HBASE-6060_latest.patch, HBASE-6060-trunk_4.patch, HBASE-6060_trunk_5.patch, trunk-6060.patch, trunk-6060_v2.patch we have seen a pattern in tests, that the regions are stuck in OPENING state for a very long time when the region server who is opening the region fails. My understanding of the process: - master calls rs to open the region. If rs is offline, a new plan is generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), HMaster.assign() - RegionServer, starts opening a region, changes the state in znode. But that znode is not ephemeral. (see ZkAssign) - Rs transitions zk node from OFFLINE to OPENING. See OpenRegionHandler.process() - rs then opens the region, and changes znode from OPENING to OPENED - when rs is killed between OPENING and OPENED states, then zk shows OPENING state, and the master just waits for rs to change the region state, but since rs is down, that wont happen. - There is a AssignmentManager.TimeoutMonitor, which does exactly guard against these kind of conditions. It periodically checks (every 10 sec by default) the regions in transition to see whether they timedout (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, which explains what you and I are seeing. - ServerShutdownHandler in Master does not reassign regions in OPENING state, although it handles other states. Lowering that threshold from the configuration is one option, but still I think we can do better. Will investigate more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5119) Set the TimeoutMonitor's timeout back down
[ https://issues.apache.org/jira/browse/HBASE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-5119: --- Resolution: Fixed Release Note: The region assignment timeout time is reduced to 10 minutes. The timeout check interval is reduced to 30 seconds from 60 seconds. Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Set the TimeoutMonitor's timeout back down -- Key: HBASE-5119 URL: https://issues.apache.org/jira/browse/HBASE-5119 Project: HBase Issue Type: Task Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Priority: Blocker Fix For: 0.96.0 Attachments: trunk_5119.patch The TimeoutMonitor used to be extremely racy and caused more troubles than it fixed, but most of this has been fixed I believe in the context of 0.92 so I think we should set it down back to a useful level. Currently it's 30 minutes, what should the new value be? I think 5 minutes should be good, will do some testing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7055) port HBASE-6371 tier-based compaction from 0.89-fb to trunk
[ https://issues.apache.org/jira/browse/HBASE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13487214#comment-13487214 ] Jimmy Xiang commented on HBASE-7055: This issue doesn't have a component. Should we need two +1s to commit it? It also introduces a new configuration file. It seems to be a big patch to me. port HBASE-6371 tier-based compaction from 0.89-fb to trunk --- Key: HBASE-7055 URL: https://issues.apache.org/jira/browse/HBASE-7055 Project: HBase Issue Type: Task Affects Versions: 0.96.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-6371-squashed.patch, HBASE-6371-v2-squashed.patch There's divergence in the code :( See HBASE-6371 for details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7055) port HBASE-6371 tier-based compaction from 0.89-fb to trunk
[ https://issues.apache.org/jira/browse/HBASE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13487855#comment-13487855 ] Jimmy Xiang commented on HBASE-7055: I agree with Enis. Imagine if each component has a config file, we will have tons of them. It will be hard to manage. If the same name value pair is defined in different files, which one should we use? port HBASE-6371 tier-based compaction from 0.89-fb to trunk --- Key: HBASE-7055 URL: https://issues.apache.org/jira/browse/HBASE-7055 Project: HBase Issue Type: Task Components: Compaction Affects Versions: 0.96.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 0.96.0 Attachments: HBASE-6371-squashed.patch, HBASE-6371-v2-squashed.patch There's divergence in the code :( See HBASE-6371 for details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7080) TestHFileOutputFormat.testColumnFamilyCompression failing due to UnsatisfiedLinkError
Jimmy Xiang created HBASE-7080: -- Summary: TestHFileOutputFormat.testColumnFamilyCompression failing due to UnsatisfiedLinkError Key: HBASE-7080 URL: https://issues.apache.org/jira/browse/HBASE-7080 Project: HBase Issue Type: Test Components: test Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Due to HADOOP-8756, this test fails {noformat} java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method) at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:62) at org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:127) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:104) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:118) at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236) at org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat.getSupportedCompressionAlgorithms(TestHFileOutputFormat.java:649) at org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat.testColumnFamilyCompression(TestHFileOutputFormat.java:571) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7080) TestHFileOutputFormat.testColumnFamilyCompression failing due to UnsatisfiedLinkError
[ https://issues.apache.org/jira/browse/HBASE-7080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-7080: --- Attachment: trunk_7080.patch TestHFileOutputFormat.testColumnFamilyCompression failing due to UnsatisfiedLinkError - Key: HBASE-7080 URL: https://issues.apache.org/jira/browse/HBASE-7080 Project: HBase Issue Type: Test Components: test Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.96.0 Attachments: trunk_7080.patch Due to HADOOP-8756, this test fails {noformat} java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method) at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:62) at org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:127) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:104) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:118) at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236) at org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat.getSupportedCompressionAlgorithms(TestHFileOutputFormat.java:649) at org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat.testColumnFamilyCompression(TestHFileOutputFormat.java:571) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7080) TestHFileOutputFormat.testColumnFamilyCompression failing due to UnsatisfiedLinkError
[ https://issues.apache.org/jira/browse/HBASE-7080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-7080: --- Fix Version/s: 0.96.0 Status: Patch Available (was: Open) TestHFileOutputFormat.testColumnFamilyCompression failing due to UnsatisfiedLinkError - Key: HBASE-7080 URL: https://issues.apache.org/jira/browse/HBASE-7080 Project: HBase Issue Type: Test Components: test Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.96.0 Attachments: trunk_7080.patch Due to HADOOP-8756, this test fails {noformat} java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method) at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:62) at org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:127) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:104) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:118) at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236) at org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat.getSupportedCompressionAlgorithms(TestHFileOutputFormat.java:649) at org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat.testColumnFamilyCompression(TestHFileOutputFormat.java:571) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6305) TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds.
[ https://issues.apache.org/jira/browse/HBASE-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488032#comment-13488032 ] Jimmy Xiang commented on HBASE-6305: If the scheme is not null. The path will have the scheme twice. The failure seems to be because of the null scheme. Is there a better way to fix it? TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds. Key: HBASE-6305 URL: https://issues.apache.org/jira/browse/HBASE-6305 Project: HBase Issue Type: Sub-task Components: test Affects Versions: 0.92.2, 0.94.1 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Fix For: 0.94.3 Attachments: hbase-6305-94.patch, HBASE-6305-94-v2.patch, HBASE-6305-v1.patch trunk: mvn clean test -Dhadoop.profile=2.0 -Dtest=TestLocalHBaseCluster 0.94: mvn clean test -Dhadoop.profile=23 -Dtest=TestLocalHBaseCluster {code} testLocalHBaseCluster(org.apache.hadoop.hbase.TestLocalHBaseCluster) Time elapsed: 0.022 sec ERROR! java.lang.RuntimeException: Master not initialized after 200 seconds at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:208) at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:424) at org.apache.hadoop.hbase.TestLocalHBaseCluster.testLocalHBaseCluster(TestLocalHBaseCluster.java:66) ... {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6305) TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds.
[ https://issues.apache.org/jira/browse/HBASE-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488053#comment-13488053 ] Jimmy Xiang commented on HBASE-6305: Path.toString() will add the scheme if it's not null. We add one more time here. So it is added twice. TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds. Key: HBASE-6305 URL: https://issues.apache.org/jira/browse/HBASE-6305 Project: HBase Issue Type: Sub-task Components: test Affects Versions: 0.92.2, 0.94.1 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Fix For: 0.94.3 Attachments: hbase-6305-94.patch, HBASE-6305-94-v2.patch, HBASE-6305-v1.patch trunk: mvn clean test -Dhadoop.profile=2.0 -Dtest=TestLocalHBaseCluster 0.94: mvn clean test -Dhadoop.profile=23 -Dtest=TestLocalHBaseCluster {code} testLocalHBaseCluster(org.apache.hadoop.hbase.TestLocalHBaseCluster) Time elapsed: 0.022 sec ERROR! java.lang.RuntimeException: Master not initialized after 200 seconds at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:208) at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:424) at org.apache.hadoop.hbase.TestLocalHBaseCluster.testLocalHBaseCluster(TestLocalHBaseCluster.java:66) ... {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6305) TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds.
[ https://issues.apache.org/jira/browse/HBASE-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488086#comment-13488086 ] Jimmy Xiang commented on HBASE-6305: Can you do something like below? conf.set(HConstants.HBASE_DIR, TEST_UTIL.getDataTestDir(hbase.rootdir).makeQualified(TEST_UTIL.getTestFileSystem()).toString()); TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds. Key: HBASE-6305 URL: https://issues.apache.org/jira/browse/HBASE-6305 Project: HBase Issue Type: Sub-task Components: test Affects Versions: 0.92.2, 0.94.1 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Fix For: 0.94.3 Attachments: hbase-6305-94.patch, HBASE-6305-94-v2.patch, HBASE-6305-v1.patch trunk: mvn clean test -Dhadoop.profile=2.0 -Dtest=TestLocalHBaseCluster 0.94: mvn clean test -Dhadoop.profile=23 -Dtest=TestLocalHBaseCluster {code} testLocalHBaseCluster(org.apache.hadoop.hbase.TestLocalHBaseCluster) Time elapsed: 0.022 sec ERROR! java.lang.RuntimeException: Master not initialized after 200 seconds at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:208) at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:424) at org.apache.hadoop.hbase.TestLocalHBaseCluster.testLocalHBaseCluster(TestLocalHBaseCluster.java:66) ... {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7080) TestHFileOutputFormat.testColumnFamilyCompression failing due to UnsatisfiedLinkError
[ https://issues.apache.org/jira/browse/HBASE-7080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-7080: --- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Integrated into trunk. Thanks Greg for reviewing it. TestHFileOutputFormat.testColumnFamilyCompression failing due to UnsatisfiedLinkError - Key: HBASE-7080 URL: https://issues.apache.org/jira/browse/HBASE-7080 Project: HBase Issue Type: Test Components: test Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.96.0 Attachments: trunk_7080.patch Due to HADOOP-8756, this test fails {noformat} java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method) at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:62) at org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:127) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:104) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:118) at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236) at org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat.getSupportedCompressionAlgorithms(TestHFileOutputFormat.java:649) at org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat.testColumnFamilyCompression(TestHFileOutputFormat.java:571) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7082) TestHFileCleaner#testHFileCleaning fails due to cleaner is reset
Jimmy Xiang created HBASE-7082: -- Summary: TestHFileCleaner#testHFileCleaning fails due to cleaner is reset Key: HBASE-7082 URL: https://issues.apache.org/jira/browse/HBASE-7082 Project: HBase Issue Type: Test Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Trivial Attachments: trunk-7082.patch TestHFileCleaner#testHFileCleaning fails if it runs after testRemovesEmptyDirectories which resets the cleaner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7082) TestHFileCleaner#testHFileCleaning fails due to cleaner is reset
[ https://issues.apache.org/jira/browse/HBASE-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-7082: --- Attachment: trunk-7082.patch TestHFileCleaner#testHFileCleaning fails due to cleaner is reset Key: HBASE-7082 URL: https://issues.apache.org/jira/browse/HBASE-7082 Project: HBase Issue Type: Test Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Trivial Attachments: trunk-7082.patch TestHFileCleaner#testHFileCleaning fails if it runs after testRemovesEmptyDirectories which resets the cleaner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7082) TestHFileCleaner#testHFileCleaning fails due to cleaner is reset
[ https://issues.apache.org/jira/browse/HBASE-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-7082: --- Status: Patch Available (was: Open) TestHFileCleaner#testHFileCleaning fails due to cleaner is reset Key: HBASE-7082 URL: https://issues.apache.org/jira/browse/HBASE-7082 Project: HBase Issue Type: Test Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Trivial Attachments: trunk-7082.patch TestHFileCleaner#testHFileCleaning fails if it runs after testRemovesEmptyDirectories which resets the cleaner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6305) TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds.
[ https://issues.apache.org/jira/browse/HBASE-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488280#comment-13488280 ] Jimmy Xiang commented on HBASE-6305: +1 TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds. Key: HBASE-6305 URL: https://issues.apache.org/jira/browse/HBASE-6305 Project: HBase Issue Type: Sub-task Components: test Affects Versions: 0.92.2, 0.94.1 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Fix For: 0.94.3 Attachments: hbase-6305-94.patch, HBASE-6305-94-v2.patch, HBASE-6305-94-v2.patch, HBASE-6305-v1.patch trunk: mvn clean test -Dhadoop.profile=2.0 -Dtest=TestLocalHBaseCluster 0.94: mvn clean test -Dhadoop.profile=23 -Dtest=TestLocalHBaseCluster {code} testLocalHBaseCluster(org.apache.hadoop.hbase.TestLocalHBaseCluster) Time elapsed: 0.022 sec ERROR! java.lang.RuntimeException: Master not initialized after 200 seconds at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:208) at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:424) at org.apache.hadoop.hbase.TestLocalHBaseCluster.testLocalHBaseCluster(TestLocalHBaseCluster.java:66) ... {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7083) SSH#fixupDaughter should force re-assign missing daughter
Jimmy Xiang created HBASE-7083: -- Summary: SSH#fixupDaughter should force re-assign missing daughter Key: HBASE-7083 URL: https://issues.apache.org/jira/browse/HBASE-7083 Project: HBase Issue Type: Bug Components: Region Assignment Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor In looking into flaky test TestSplitTransactionOnCluster#testShutdownSimpleFixup, I found out that a missing daughter is not assigned by SSH properly. It could be open on the dead server. We need to force re-assign it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7083) SSH#fixupDaughter should force re-assign missing daughter
[ https://issues.apache.org/jira/browse/HBASE-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-7083: --- Fix Version/s: 0.96.0 Status: Patch Available (was: Open) SSH#fixupDaughter should force re-assign missing daughter - Key: HBASE-7083 URL: https://issues.apache.org/jira/browse/HBASE-7083 Project: HBase Issue Type: Bug Components: Region Assignment Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.96.0 Attachments: trunk-7083.patch In looking into flaky test TestSplitTransactionOnCluster#testShutdownSimpleFixup, I found out that a missing daughter is not assigned by SSH properly. It could be open on the dead server. We need to force re-assign it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7083) SSH#fixupDaughter should force re-assign missing daughter
[ https://issues.apache.org/jira/browse/HBASE-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-7083: --- Attachment: trunk-7083.patch SSH#fixupDaughter should force re-assign missing daughter - Key: HBASE-7083 URL: https://issues.apache.org/jira/browse/HBASE-7083 Project: HBase Issue Type: Bug Components: Region Assignment Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.96.0 Attachments: trunk-7083.patch In looking into flaky test TestSplitTransactionOnCluster#testShutdownSimpleFixup, I found out that a missing daughter is not assigned by SSH properly. It could be open on the dead server. We need to force re-assign it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7083) SSH#fixupDaughter should force re-assign missing daughter
[ https://issues.apache.org/jira/browse/HBASE-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-7083: --- Status: Open (was: Patch Available) SSH#fixupDaughter should force re-assign missing daughter - Key: HBASE-7083 URL: https://issues.apache.org/jira/browse/HBASE-7083 Project: HBase Issue Type: Bug Components: Region Assignment Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.96.0 Attachments: trunk-7083.patch In looking into flaky test TestSplitTransactionOnCluster#testShutdownSimpleFixup, I found out that a missing daughter is not assigned by SSH properly. It could be open on the dead server. We need to force re-assign it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira