[jira] [Updated] (HBASE-28438) Add support splitting region into multiple regions(more than 2)
[ https://issues.apache.org/jira/browse/HBASE-28438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-28438: --- Summary: Add support splitting region into multiple regions(more than 2) (was: Add support spitting region into multiple regions(more than 2)) > Add support splitting region into multiple regions(more than 2) > --- > > Key: HBASE-28438 > URL: https://issues.apache.org/jira/browse/HBASE-28438 > Project: HBase > Issue Type: Improvement >Reporter: Rajeshbabu Chintaguntla >Assignee: Rajeshbabu Chintaguntla >Priority: Major > > We have a requirement of splitting one region into multiple hundreds of > regions at a time to distribute loading hot data. To do that we need split a > region and wait for the completion of it and then again split the two regions > etc..which is time consuming activity. > Would be better to support splitting region into multiple regions more than > two so that in single operation we can split the region. > Todo that we need to take care > 1)Supporting admin APIs to take multiple split keys > 2)Implement new procedure to create new regions, creating meta entries and > udpating them to meta > 3) close the parent region and open split regions. > 4) Update the compaction of post split and readers also to use the portion > store file reader based on the range to scan than half store reader. > 5) make sure the catalog jonitor also cleaning the parent regions when there > are all the regions split properly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-19389) Limit concurrency of put with dense (hundreds) columns to prevent write handler exhausted
[ https://issues.apache.org/jira/browse/HBASE-19389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-19389: --- Summary: Limit concurrency of put with dense (hundreds) columns to prevent write handler exhausted (was: Limit concurrency of put with dense (hundreds) columns to prevent write hander exhausted) > Limit concurrency of put with dense (hundreds) columns to prevent write > handler exhausted > - > > Key: HBASE-19389 > URL: https://issues.apache.org/jira/browse/HBASE-19389 > Project: HBase > Issue Type: Improvement > Components: Performance >Affects Versions: 2.0.0 > Environment: 2000+ Region Servers > PCI-E ssd >Reporter: Chance Li >Assignee: Chance Li > Fix For: 2.0.0 > > Attachments: CSLM-concurrent-write.png, > HBASE-19389-branch-2-V2.patch, HBASE-19389-branch-2.patch, metrics-1.png, > ycsb-result.png > > > In a large cluster, with a large number of clients, we found the RS's > handlers are all busy sometimes. And after investigation we found the root > cause is about CSLM, such as compare function heavy load. We reviewed the > related WALs, and found that there were many columns (more than 1000 columns) > were writing at that time. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-14620) Procedure V2: Update HBCK to incorporate the Proc-V2-based assignment
[ https://issues.apache.org/jira/browse/HBASE-14620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16125787#comment-16125787 ] Stephen Yuan Jiang commented on HBASE-14620: [~stack], made a few changes in AM / HBCK code to make HBCK UT run. > Procedure V2: Update HBCK to incorporate the Proc-V2-based assignment > - > > Key: HBASE-14620 > URL: https://issues.apache.org/jira/browse/HBASE-14620 > Project: HBase > Issue Type: Sub-task > Components: hbck, proc-v2 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang > Fix For: 2.0.0 > > Attachments: HBASE-14620.v1-master.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-14620) Procedure V2: Update HBCK to incorporate the Proc-V2-based assignment
[ https://issues.apache.org/jira/browse/HBASE-14620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-14620: --- Status: Patch Available (was: Open) > Procedure V2: Update HBCK to incorporate the Proc-V2-based assignment > - > > Key: HBASE-14620 > URL: https://issues.apache.org/jira/browse/HBASE-14620 > Project: HBase > Issue Type: Sub-task > Components: hbck, proc-v2 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang > Fix For: 2.0.0 > > Attachments: HBASE-14620.v1-master.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-14620) Procedure V2: Update HBCK to incorporate the Proc-V2-based assignment
[ https://issues.apache.org/jira/browse/HBASE-14620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-14620: --- Attachment: HBASE-14620.v1-master.patch > Procedure V2: Update HBCK to incorporate the Proc-V2-based assignment > - > > Key: HBASE-14620 > URL: https://issues.apache.org/jira/browse/HBASE-14620 > Project: HBase > Issue Type: Sub-task > Components: hbck, proc-v2 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang > Fix For: 2.0.0 > > Attachments: HBASE-14620.v1-master.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HBASE-18350) RSGroups are broken under AMv2
[ https://issues.apache.org/jira/browse/HBASE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang reassigned HBASE-18350: -- Assignee: Thiruvel Thirumoolan (was: Stephen Yuan Jiang) > RSGroups are broken under AMv2 > -- > > Key: HBASE-18350 > URL: https://issues.apache.org/jira/browse/HBASE-18350 > Project: HBase > Issue Type: Bug > Components: rsgroup >Affects Versions: 2.0.0-alpha-1 >Reporter: Stephen Yuan Jiang >Assignee: Thiruvel Thirumoolan >Priority: Blocker > Fix For: 2.0.0-beta-2 > > > The following RSGroups tests were disabled by Core Proc-V2 AM in HBASE-14614: > - Disabled/Ignore TestRSGroupsOfflineMode#testOffline; need to dig in on what > offline is. > - Disabled/Ignore TestRSGroups. > This JIRA tracks the work to enable them (or remove/modify if not applicable). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18528) DON'T allow user to modify the passed table/column descriptor
[ https://issues.apache.org/jira/browse/HBASE-18528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120488#comment-16120488 ] Stephen Yuan Jiang commented on HBASE-18528: +1 > DON'T allow user to modify the passed table/column descriptor > - > > Key: HBASE-18528 > URL: https://issues.apache.org/jira/browse/HBASE-18528 > Project: HBase > Issue Type: Sub-task > Components: Coprocessors, master >Reporter: Chia-Ping Tsai >Assignee: Chia-Ping Tsai >Priority: Critical > Fix For: 3.0.0, 2.0.0-alpha-2 > > Attachments: HBASE-18528.v0.patch > > > We are replacing the HTableDescriptor by TableDescriptor from code base. The > TableDescriptor is designed to be a read-only object so user can't modifiy it > through MasterObserver. HBASE-18502 change many methods of MasterObserver to > use TableDescriptor but some deprecated methods still accept the > HTableDescriptor. User may be confused by why some methods can't modify the > table descriptor. > In short, Should we allow user to modify the passed table descriptor? > # if yes, we should introduce a mechanism that user can return a modified > table descripror > # if no, we should pass ImmutableHTableDescriptor to user. Or we just remove > all methods accepting the HTableDescriptor > Ditto for HColumnDescriptor. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18353) Enable TestCorruptedRegionStoreFile that were disabled by Proc-V2 AM in HBASE-14614
[ https://issues.apache.org/jira/browse/HBASE-18353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120208#comment-16120208 ] Stephen Yuan Jiang commented on HBASE-18353: We have already had 'move' method to move region to a different RS. I was wondering why [~Apache9] did not consider this when working on HBASE-17712. We have 'offline' method to completely offline the region; the 'unassign' would just close the region in RS, we can manually assign the region or when ServerCrashProcedure processes the closed region, the region would be re-assigned. Also note that we only can deprecate Admin method for 2.0, we cannot remove it. > Enable TestCorruptedRegionStoreFile that were disabled by Proc-V2 AM in > HBASE-14614 > --- > > Key: HBASE-18353 > URL: https://issues.apache.org/jira/browse/HBASE-18353 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0-alpha-1 >Reporter: Stephen Yuan Jiang >Assignee: Vladimir Rodionov > Attachments: HBASE-18353-v1.patch, HBASE-18353-v2.patch > > > HBASE-14614 disabled TestCorruptedRegionStoreFile, as it depends on a > half-implemented reopen of a region when a store file goes missing. > This JIRA tracks the work to fix/enable the test. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HBASE-18353) Enable TestCorruptedRegionStoreFile that were disabled by Proc-V2 AM in HBASE-14614
[ https://issues.apache.org/jira/browse/HBASE-18353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117914#comment-16117914 ] Stephen Yuan Jiang edited comment on HBASE-18353 at 8/8/17 6:14 AM: [~Apache9], the comments are incorrect (at least in branch-2, I have not checked branch-1). unassign asks RS to close the region and mark the region in CLOSED state once RS successfully closes the region. It would not automatically reopen the region. was (Author: syuanjiang): [~Apache9], the comments are incorrect (at least in branch-2, I have not checked branch-1). unassign asks RS to close the region and mark the region in CLOSED state once RS successfully closes the region. > Enable TestCorruptedRegionStoreFile that were disabled by Proc-V2 AM in > HBASE-14614 > --- > > Key: HBASE-18353 > URL: https://issues.apache.org/jira/browse/HBASE-18353 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0-alpha-1 >Reporter: Stephen Yuan Jiang >Assignee: Vladimir Rodionov > Attachments: HBASE-18353-v1.patch, HBASE-18353-v2.patch > > > HBASE-14614 disabled TestCorruptedRegionStoreFile, as it depends on a > half-implemented reopen of a region when a store file goes missing. > This JIRA tracks the work to fix/enable the test. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18353) Enable TestCorruptedRegionStoreFile that were disabled by Proc-V2 AM in HBASE-14614
[ https://issues.apache.org/jira/browse/HBASE-18353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117914#comment-16117914 ] Stephen Yuan Jiang commented on HBASE-18353: [~Apache9], the comments are incorrect (at least in branch-2, I have not checked branch-1). unassign asks RS to close the region and mark the region in CLOSED state once RS successfully closes the region. > Enable TestCorruptedRegionStoreFile that were disabled by Proc-V2 AM in > HBASE-14614 > --- > > Key: HBASE-18353 > URL: https://issues.apache.org/jira/browse/HBASE-18353 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0-alpha-1 >Reporter: Stephen Yuan Jiang >Assignee: Vladimir Rodionov > Attachments: HBASE-18353-v1.patch, HBASE-18353-v2.patch > > > HBASE-14614 disabled TestCorruptedRegionStoreFile, as it depends on a > half-implemented reopen of a region when a store file goes missing. > This JIRA tracks the work to fix/enable the test. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18353) Enable TestCorruptedRegionStoreFile that were disabled by Proc-V2 AM in HBASE-14614
[ https://issues.apache.org/jira/browse/HBASE-18353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117525#comment-16117525 ] Stephen Yuan Jiang commented on HBASE-18353: [~vrodionov], reading HBASE-17712, I am unsure whether your approach is the correct action. Let us wait for Duo's comment on the change. For the patch, I think you should at least rename the {{RegionUnassigner.java}} file to {{RegionReassigner.java}} > Enable TestCorruptedRegionStoreFile that were disabled by Proc-V2 AM in > HBASE-14614 > --- > > Key: HBASE-18353 > URL: https://issues.apache.org/jira/browse/HBASE-18353 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0-alpha-1 >Reporter: Stephen Yuan Jiang >Assignee: Vladimir Rodionov > Attachments: HBASE-18353-v1.patch, HBASE-18353-v2.patch > > > HBASE-14614 disabled TestCorruptedRegionStoreFile, as it depends on a > half-implemented reopen of a region when a store file goes missing. > This JIRA tracks the work to fix/enable the test. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18353) Enable TestCorruptedRegionStoreFile that were disabled by Proc-V2 AM in HBASE-14614
[ https://issues.apache.org/jira/browse/HBASE-18353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117522#comment-16117522 ] Stephen Yuan Jiang commented on HBASE-18353: [~Apache9], in HBASE-17712, you mentioned that {{"Unassign region asynchronously when hitting FNFE. Can pass TestCorruptedRegionStoreFile. And also fix a problem in TestCorruptedRegionStoreFile that the already opened DFSInputStream may still work after we deleted the storefile because the block replica deletion is asynchronous. We should wait until all the replicas have been removed from DNs."}} You also talked about implementing some reassign logic. [~vrodionov] implemented this by doing unassign/assign of the region if FNFE happens. How do you think? > Enable TestCorruptedRegionStoreFile that were disabled by Proc-V2 AM in > HBASE-14614 > --- > > Key: HBASE-18353 > URL: https://issues.apache.org/jira/browse/HBASE-18353 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0-alpha-1 >Reporter: Stephen Yuan Jiang >Assignee: Vladimir Rodionov > Attachments: HBASE-18353-v1.patch, HBASE-18353-v2.patch > > > HBASE-14614 disabled TestCorruptedRegionStoreFile, as it depends on a > half-implemented reopen of a region when a store file goes missing. > This JIRA tracks the work to fix/enable the test. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-14618) Procedure V2: Implement move shell command to use Proc-V2 assignment
[ https://issues.apache.org/jira/browse/HBASE-14618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16114965#comment-16114965 ] Stephen Yuan Jiang commented on HBASE-14618: No work in this item. > Procedure V2: Implement move shell command to use Proc-V2 assignment > > > Key: HBASE-14618 > URL: https://issues.apache.org/jira/browse/HBASE-14618 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Affects Versions: 2.0.0 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18424) Fix TestAsyncTableGetMultiThreaded
[ https://issues.apache.org/jira/browse/HBASE-18424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16111890#comment-16111890 ] Stephen Yuan Jiang commented on HBASE-18424: [~Apache9], the original test in TestAsyncTableGetMultiThreaded is to split a user table region and move the meta region to a different RS, then try to access the user table. Since this Async table feature is only in 2.0+, it does not make a lot of sense, as meta can only be in master in 2.0 (at least). [~vrodionov] changes the test by moving the user region instead of meta, this looks more sense to me. [~Apache9], if you have no objection, we will commit the patch. > Fix TestAsyncTableGetMultiThreaded > -- > > Key: HBASE-18424 > URL: https://issues.apache.org/jira/browse/HBASE-18424 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: HBASE-18424-v1.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18424) Fix TestAsyncTableGetMultiThreaded
[ https://issues.apache.org/jira/browse/HBASE-18424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16111892#comment-16111892 ] Stephen Yuan Jiang commented on HBASE-18424: +1 from code logic. > Fix TestAsyncTableGetMultiThreaded > -- > > Key: HBASE-18424 > URL: https://issues.apache.org/jira/browse/HBASE-18424 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: HBASE-18424-v1.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18491) [AMv2] Fail UnassignProcedure if source Region Server is not online.
[ https://issues.apache.org/jira/browse/HBASE-18491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16109645#comment-16109645 ] Stephen Yuan Jiang commented on HBASE-18491: Looks good. > [AMv2] Fail UnassignProcedure if source Region Server is not online. > > > Key: HBASE-18491 > URL: https://issues.apache.org/jira/browse/HBASE-18491 > Project: HBase > Issue Type: Bug > Components: amv2 >Affects Versions: 2.0.0 >Reporter: Umesh Agashe >Assignee: Umesh Agashe > Fix For: 2.0.0 > > Attachments: hbase-18491.master.001.patch > > > Currently UnassignProcedure returns success when server carrying a region is > NOT online. Assumption here is that ServerCrashProcedure will handle > splitting logs etc for these regions. When UnassignProcedure completes, > MoveRegionProcedure resumes with AssignProcedure. AssignProcedure can some > times assign regions without pre-requisite steps (done either by > UnassignProcedure or ServerCrashProcedure). Fix is to fail UnassignProcedure > and parent MoveRegionProcedure if source server is not online. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18458) Refactor TestRegionServerHostname to make it robust (Port HBASE-17922 to branch-1)
[ https://issues.apache.org/jira/browse/HBASE-18458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18458: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 1.4.0 Status: Resolved (was: Patch Available) > Refactor TestRegionServerHostname to make it robust (Port HBASE-17922 to > branch-1) > -- > > Key: HBASE-18458 > URL: https://issues.apache.org/jira/browse/HBASE-18458 > Project: HBase > Issue Type: Sub-task > Components: hadoop3 >Affects Versions: 1.4.0 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang >Priority: Minor > Fix For: 1.4.0 > > Attachments: HBASE-17922.v1-branch-1.patch > > > The TestRegionServerHostname is passing in branch-1; however, it always fails > locally. Running tests individually always pass. Failing to start RS in > some combination of test run indicates some resource leak. > {code} > Running org.apache.hadoop.hbase.regionserver.TestRegionServerHostname > Tests run: 4, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 46.042 sec > <<< FAILURE! - in > org.apache.hadoop.hbase.regionserver.TestRegionServerHostname > testRegionServerHostnameReportedToMaster(org.apache.hadoop.hbase.regionserver.TestRegionServerHostname) > Time elapsed: 30.095 sec <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 3 > milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:221) > at > org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:445) > at > org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:225) > at > org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1072) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1028) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:900) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:894) > at > org.apache.hadoop.hbase.regionserver.TestRegionServerHostname.testRegionServerHostnameReportedToMaster(TestRegionServerHostname.java:158) > {code} > When running the testRegionServerHostnameReportedToMaster alone or with > another newly added test, the test passed without problem. > When running the {{testRegionServerHostnameReportedToMaster}} test with > {{testInvalidRegionServerHostnameAbortsServer}} in the same test suite > {{TestRegionServerHostname}}, the region server failed to start: > {noformat} > 2017-07-25 15:34:24,132 FATAL [RS:0;192.168.1.7:64317] > regionserver.HRegionServer(2182): ABORTING region server > 192.168.1.7,64317,1501022063917: Unhandled: Failed suppression of fs shutdown > hook: org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@668e0f60 > java.lang.RuntimeException: Failed suppression of fs shutdown hook: > org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@668e0f60 > at > org.apache.hadoop.hbase.regionserver.ShutdownHook.suppressHdfsShutdownHook(ShutdownHook.java:204) > at > org.apache.hadoop.hbase.regionserver.ShutdownHook.install(ShutdownHook.java:84) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:940) > at > org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156) > at > org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108) > at > org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1846) > at > org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:307) > at > org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138) > at java.lang.Thread.run(Thread.java:745) > {noformat} > HBASE-17922 addressed similar issue in Hadoop 3. I think this change is more > robust than the one in branch-1 right now. Porting the change to branch-1 > (with small modification due to code difference between branch-1 and > branch-2) is a good idea. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18458) Refactor TestRegionServerHostname to make it robust (Port HBASE-17922 to branch-1)
[ https://issues.apache.org/jira/browse/HBASE-18458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102546#comment-16102546 ] Stephen Yuan Jiang commented on HBASE-18458: [~mdrob], almost straightforward, only slight difference in testRegionServerHostnameReportedToMaster due to branch-1 and branch-2 different checking. > Refactor TestRegionServerHostname to make it robust (Port HBASE-17922 to > branch-1) > -- > > Key: HBASE-18458 > URL: https://issues.apache.org/jira/browse/HBASE-18458 > Project: HBase > Issue Type: Sub-task > Components: hadoop3 >Affects Versions: 1.4.0 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang >Priority: Minor > Attachments: HBASE-17922.v1-branch-1.patch > > > The TestRegionServerHostname is passing in branch-1; however, it always fails > locally. Running tests individually always pass. Failing to start RS in > some combination of test run indicates some resource leak. > {code} > Running org.apache.hadoop.hbase.regionserver.TestRegionServerHostname > Tests run: 4, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 46.042 sec > <<< FAILURE! - in > org.apache.hadoop.hbase.regionserver.TestRegionServerHostname > testRegionServerHostnameReportedToMaster(org.apache.hadoop.hbase.regionserver.TestRegionServerHostname) > Time elapsed: 30.095 sec <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 3 > milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:221) > at > org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:445) > at > org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:225) > at > org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1072) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1028) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:900) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:894) > at > org.apache.hadoop.hbase.regionserver.TestRegionServerHostname.testRegionServerHostnameReportedToMaster(TestRegionServerHostname.java:158) > {code} > When running the testRegionServerHostnameReportedToMaster alone or with > another newly added test, the test passed without problem. > When running the {{testRegionServerHostnameReportedToMaster}} test with > {{testInvalidRegionServerHostnameAbortsServer}} in the same test suite > {{TestRegionServerHostname}}, the region server failed to start: > {noformat} > 2017-07-25 15:34:24,132 FATAL [RS:0;192.168.1.7:64317] > regionserver.HRegionServer(2182): ABORTING region server > 192.168.1.7,64317,1501022063917: Unhandled: Failed suppression of fs shutdown > hook: org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@668e0f60 > java.lang.RuntimeException: Failed suppression of fs shutdown hook: > org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@668e0f60 > at > org.apache.hadoop.hbase.regionserver.ShutdownHook.suppressHdfsShutdownHook(ShutdownHook.java:204) > at > org.apache.hadoop.hbase.regionserver.ShutdownHook.install(ShutdownHook.java:84) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:940) > at > org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156) > at > org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108) > at > org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1846) > at > org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:307) > at > org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138) > at java.lang.Thread.run(Thread.java:745) > {noformat} > HBASE-17922 addressed similar issue in Hadoop 3. I think this change is more > robust than the one in branch-1 right now. Porting the change to branch-1 > (with small modification due to code difference between branch-1 and > branch-2) is a good idea. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18458) Refactor TestRegionServerHostname to make it robust (Port HBASE-17922 to branch-1)
[ https://issues.apache.org/jira/browse/HBASE-18458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102547#comment-16102547 ] Stephen Yuan Jiang commented on HBASE-18458: This is test only change (one test suite affected), the failed UTs are unrelated to this change. > Refactor TestRegionServerHostname to make it robust (Port HBASE-17922 to > branch-1) > -- > > Key: HBASE-18458 > URL: https://issues.apache.org/jira/browse/HBASE-18458 > Project: HBase > Issue Type: Sub-task > Components: hadoop3 >Affects Versions: 1.4.0 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang >Priority: Minor > Attachments: HBASE-17922.v1-branch-1.patch > > > The TestRegionServerHostname is passing in branch-1; however, it always fails > locally. Running tests individually always pass. Failing to start RS in > some combination of test run indicates some resource leak. > {code} > Running org.apache.hadoop.hbase.regionserver.TestRegionServerHostname > Tests run: 4, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 46.042 sec > <<< FAILURE! - in > org.apache.hadoop.hbase.regionserver.TestRegionServerHostname > testRegionServerHostnameReportedToMaster(org.apache.hadoop.hbase.regionserver.TestRegionServerHostname) > Time elapsed: 30.095 sec <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 3 > milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:221) > at > org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:445) > at > org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:225) > at > org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1072) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1028) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:900) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:894) > at > org.apache.hadoop.hbase.regionserver.TestRegionServerHostname.testRegionServerHostnameReportedToMaster(TestRegionServerHostname.java:158) > {code} > When running the testRegionServerHostnameReportedToMaster alone or with > another newly added test, the test passed without problem. > When running the {{testRegionServerHostnameReportedToMaster}} test with > {{testInvalidRegionServerHostnameAbortsServer}} in the same test suite > {{TestRegionServerHostname}}, the region server failed to start: > {noformat} > 2017-07-25 15:34:24,132 FATAL [RS:0;192.168.1.7:64317] > regionserver.HRegionServer(2182): ABORTING region server > 192.168.1.7,64317,1501022063917: Unhandled: Failed suppression of fs shutdown > hook: org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@668e0f60 > java.lang.RuntimeException: Failed suppression of fs shutdown hook: > org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@668e0f60 > at > org.apache.hadoop.hbase.regionserver.ShutdownHook.suppressHdfsShutdownHook(ShutdownHook.java:204) > at > org.apache.hadoop.hbase.regionserver.ShutdownHook.install(ShutdownHook.java:84) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:940) > at > org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156) > at > org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108) > at > org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1846) > at > org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:307) > at > org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138) > at java.lang.Thread.run(Thread.java:745) > {noformat} > HBASE-17922 addressed similar issue in Hadoop 3. I think this change is more > robust than the one in branch-1 right now. Porting the change to branch-1 > (with small modification due to code difference between branch-1 and > branch-2) is a good idea. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18458) Refactor TestRegionServerHostname to make it robust (Port HBASE-17922 to branch-1)
[ https://issues.apache.org/jira/browse/HBASE-18458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18458: --- Status: Patch Available (was: Open) > Refactor TestRegionServerHostname to make it robust (Port HBASE-17922 to > branch-1) > -- > > Key: HBASE-18458 > URL: https://issues.apache.org/jira/browse/HBASE-18458 > Project: HBase > Issue Type: Sub-task > Components: hadoop3 >Affects Versions: 1.4.0 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang >Priority: Minor > Attachments: HBASE-17922.v1-branch-1.patch > > > The TestRegionServerHostname is passing in branch-1; however, it always fails > locally. Running tests individually always pass. Failing to start RS in > some combination of test run indicates some resource leak. > {code} > Running org.apache.hadoop.hbase.regionserver.TestRegionServerHostname > Tests run: 4, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 46.042 sec > <<< FAILURE! - in > org.apache.hadoop.hbase.regionserver.TestRegionServerHostname > testRegionServerHostnameReportedToMaster(org.apache.hadoop.hbase.regionserver.TestRegionServerHostname) > Time elapsed: 30.095 sec <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 3 > milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:221) > at > org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:445) > at > org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:225) > at > org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1072) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1028) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:900) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:894) > at > org.apache.hadoop.hbase.regionserver.TestRegionServerHostname.testRegionServerHostnameReportedToMaster(TestRegionServerHostname.java:158) > {code} > When running the testRegionServerHostnameReportedToMaster alone or with > another newly added test, the test passed without problem. > When running the {{testRegionServerHostnameReportedToMaster}} test with > {{testInvalidRegionServerHostnameAbortsServer}} in the same test suite > {{TestRegionServerHostname}}, the region server failed to start: > {noformat} > 2017-07-25 15:34:24,132 FATAL [RS:0;192.168.1.7:64317] > regionserver.HRegionServer(2182): ABORTING region server > 192.168.1.7,64317,1501022063917: Unhandled: Failed suppression of fs shutdown > hook: org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@668e0f60 > java.lang.RuntimeException: Failed suppression of fs shutdown hook: > org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@668e0f60 > at > org.apache.hadoop.hbase.regionserver.ShutdownHook.suppressHdfsShutdownHook(ShutdownHook.java:204) > at > org.apache.hadoop.hbase.regionserver.ShutdownHook.install(ShutdownHook.java:84) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:940) > at > org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156) > at > org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108) > at > org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1846) > at > org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:307) > at > org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138) > at java.lang.Thread.run(Thread.java:745) > {noformat} > HBASE-17922 addressed similar issue in Hadoop 3. I think this change is more > robust than the one in branch-1 right now. Porting the change to branch-1 > (with small modification due to code difference between branch-1 and > branch-2) is a good idea. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18458) Refactor TestRegionServerHostname to make it robust (Port HBASE-17922 to branch-1)
[ https://issues.apache.org/jira/browse/HBASE-18458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18458: --- Summary: Refactor TestRegionServerHostname to make it robust (Port HBASE-17922 to branch-1) (was: Refactor TestRegionServerHostname to make it robust (Port HBASE-17922)) > Refactor TestRegionServerHostname to make it robust (Port HBASE-17922 to > branch-1) > -- > > Key: HBASE-18458 > URL: https://issues.apache.org/jira/browse/HBASE-18458 > Project: HBase > Issue Type: Sub-task > Components: hadoop3 >Affects Versions: 1.4.0 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang >Priority: Minor > Attachments: HBASE-17922.v1-branch-1.patch > > > The TestRegionServerHostname is passing in branch-1; however, it always fails > locally. Running tests individually always pass. Failing to start RS in > some combination of test run indicates some resource leak. > {code} > Running org.apache.hadoop.hbase.regionserver.TestRegionServerHostname > Tests run: 4, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 46.042 sec > <<< FAILURE! - in > org.apache.hadoop.hbase.regionserver.TestRegionServerHostname > testRegionServerHostnameReportedToMaster(org.apache.hadoop.hbase.regionserver.TestRegionServerHostname) > Time elapsed: 30.095 sec <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 3 > milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:221) > at > org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:445) > at > org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:225) > at > org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1072) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1028) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:900) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:894) > at > org.apache.hadoop.hbase.regionserver.TestRegionServerHostname.testRegionServerHostnameReportedToMaster(TestRegionServerHostname.java:158) > {code} > When running the testRegionServerHostnameReportedToMaster alone or with > another newly added test, the test passed without problem. > When running the {{testRegionServerHostnameReportedToMaster}} test with > {{testInvalidRegionServerHostnameAbortsServer}} in the same test suite > {{TestRegionServerHostname}}, the region server failed to start: > {noformat} > 2017-07-25 15:34:24,132 FATAL [RS:0;192.168.1.7:64317] > regionserver.HRegionServer(2182): ABORTING region server > 192.168.1.7,64317,1501022063917: Unhandled: Failed suppression of fs shutdown > hook: org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@668e0f60 > java.lang.RuntimeException: Failed suppression of fs shutdown hook: > org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@668e0f60 > at > org.apache.hadoop.hbase.regionserver.ShutdownHook.suppressHdfsShutdownHook(ShutdownHook.java:204) > at > org.apache.hadoop.hbase.regionserver.ShutdownHook.install(ShutdownHook.java:84) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:940) > at > org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156) > at > org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108) > at > org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1846) > at > org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:307) > at > org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138) > at java.lang.Thread.run(Thread.java:745) > {noformat} > HBASE-17922 addressed similar issue in Hadoop 3. I think this change is more > robust than the one in branch-1 right now. Porting the change to branch-1 > (with small modification due to code difference between branch-1 and > branch-2) is a good idea. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18458) Refactor TestRegionServerHostname to make it robust (Port HBASE-17922)
[ https://issues.apache.org/jira/browse/HBASE-18458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18458: --- Attachment: HBASE-17922.v1-branch-1.patch > Refactor TestRegionServerHostname to make it robust (Port HBASE-17922) > -- > > Key: HBASE-18458 > URL: https://issues.apache.org/jira/browse/HBASE-18458 > Project: HBase > Issue Type: Sub-task > Components: hadoop3 >Affects Versions: 1.4.0 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang >Priority: Minor > Attachments: HBASE-17922.v1-branch-1.patch > > > The TestRegionServerHostname is passing in branch-1; however, it always fails > locally. Running tests individually always pass. Failing to start RS in > some combination of test run indicates some resource leak. > {code} > Running org.apache.hadoop.hbase.regionserver.TestRegionServerHostname > Tests run: 4, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 46.042 sec > <<< FAILURE! - in > org.apache.hadoop.hbase.regionserver.TestRegionServerHostname > testRegionServerHostnameReportedToMaster(org.apache.hadoop.hbase.regionserver.TestRegionServerHostname) > Time elapsed: 30.095 sec <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 3 > milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:221) > at > org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:445) > at > org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:225) > at > org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1072) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1028) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:900) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:894) > at > org.apache.hadoop.hbase.regionserver.TestRegionServerHostname.testRegionServerHostnameReportedToMaster(TestRegionServerHostname.java:158) > {code} > When running the testRegionServerHostnameReportedToMaster alone or with > another newly added test, the test passed without problem. > When running the {{testRegionServerHostnameReportedToMaster}} test with > {{testInvalidRegionServerHostnameAbortsServer}} in the same test suite > {{TestRegionServerHostname}}, the region server failed to start: > {noformat} > 2017-07-25 15:34:24,132 FATAL [RS:0;192.168.1.7:64317] > regionserver.HRegionServer(2182): ABORTING region server > 192.168.1.7,64317,1501022063917: Unhandled: Failed suppression of fs shutdown > hook: org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@668e0f60 > java.lang.RuntimeException: Failed suppression of fs shutdown hook: > org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@668e0f60 > at > org.apache.hadoop.hbase.regionserver.ShutdownHook.suppressHdfsShutdownHook(ShutdownHook.java:204) > at > org.apache.hadoop.hbase.regionserver.ShutdownHook.install(ShutdownHook.java:84) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:940) > at > org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156) > at > org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108) > at > org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1846) > at > org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:307) > at > org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138) > at java.lang.Thread.run(Thread.java:745) > {noformat} > HBASE-17922 addressed similar issue in Hadoop 3. I think this change is more > robust than the one in branch-1 right now. Porting the change to branch-1 > (with small modification due to code difference between branch-1 and > branch-2) is a good idea. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18458) Refactor TestRegionServerHostname to make it robust (Port HBASE-17922)
[ https://issues.apache.org/jira/browse/HBASE-18458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18458: --- Affects Version/s: (was: 2.0.0) 1.4.0 Priority: Minor (was: Major) Fix Version/s: (was: 2.0.0-alpha-2) (was: 3.0.0) > Refactor TestRegionServerHostname to make it robust (Port HBASE-17922) > -- > > Key: HBASE-18458 > URL: https://issues.apache.org/jira/browse/HBASE-18458 > Project: HBase > Issue Type: Sub-task > Components: hadoop3 >Affects Versions: 1.4.0 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang >Priority: Minor > Attachments: HBASE-17922.v1-branch-1.patch > > > The TestRegionServerHostname is passing in branch-1; however, it always fails > locally. Running tests individually always pass. Failing to start RS in > some combination of test run indicates some resource leak. > {code} > Running org.apache.hadoop.hbase.regionserver.TestRegionServerHostname > Tests run: 4, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 46.042 sec > <<< FAILURE! - in > org.apache.hadoop.hbase.regionserver.TestRegionServerHostname > testRegionServerHostnameReportedToMaster(org.apache.hadoop.hbase.regionserver.TestRegionServerHostname) > Time elapsed: 30.095 sec <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 3 > milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:221) > at > org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:445) > at > org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:225) > at > org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1072) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1028) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:900) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:894) > at > org.apache.hadoop.hbase.regionserver.TestRegionServerHostname.testRegionServerHostnameReportedToMaster(TestRegionServerHostname.java:158) > {code} > When running the testRegionServerHostnameReportedToMaster alone or with > another newly added test, the test passed without problem. > When running the {{testRegionServerHostnameReportedToMaster}} test with > {{testInvalidRegionServerHostnameAbortsServer}} in the same test suite > {{TestRegionServerHostname}}, the region server failed to start: > {noformat} > 2017-07-25 15:34:24,132 FATAL [RS:0;192.168.1.7:64317] > regionserver.HRegionServer(2182): ABORTING region server > 192.168.1.7,64317,1501022063917: Unhandled: Failed suppression of fs shutdown > hook: org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@668e0f60 > java.lang.RuntimeException: Failed suppression of fs shutdown hook: > org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@668e0f60 > at > org.apache.hadoop.hbase.regionserver.ShutdownHook.suppressHdfsShutdownHook(ShutdownHook.java:204) > at > org.apache.hadoop.hbase.regionserver.ShutdownHook.install(ShutdownHook.java:84) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:940) > at > org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156) > at > org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108) > at > org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1846) > at > org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:307) > at > org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138) > at java.lang.Thread.run(Thread.java:745) > {noformat} > HBASE-17922 addressed similar issue in Hadoop 3. I think this change is more > robust than the one in branch-1 right now. Porting the change to branch-1 > (with small modification due to code difference between branch-1 and > branch-2) is a good idea. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18458) Refactor TestRegionServerHostname to make it robust (Port HBASE-17922)
[ https://issues.apache.org/jira/browse/HBASE-18458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18458: --- Description: The TestRegionServerHostname is passing in branch-1; however, it always fails locally. Running tests individually always pass. Failing to start RS in some combination of test run indicates some resource leak. {code} Running org.apache.hadoop.hbase.regionserver.TestRegionServerHostname Tests run: 4, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 46.042 sec <<< FAILURE! - in org.apache.hadoop.hbase.regionserver.TestRegionServerHostname testRegionServerHostnameReportedToMaster(org.apache.hadoop.hbase.regionserver.TestRegionServerHostname) Time elapsed: 30.095 sec <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed out after 3 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:221) at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:445) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:225) at org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1072) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1028) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:900) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:894) at org.apache.hadoop.hbase.regionserver.TestRegionServerHostname.testRegionServerHostnameReportedToMaster(TestRegionServerHostname.java:158) {code} When running the testRegionServerHostnameReportedToMaster alone or with another newly added test, the test passed without problem. When running the {{testRegionServerHostnameReportedToMaster}} test with {{testInvalidRegionServerHostnameAbortsServer}} in the same test suite {{TestRegionServerHostname}}, the region server failed to start: {noformat} 2017-07-25 15:34:24,132 FATAL [RS:0;192.168.1.7:64317] regionserver.HRegionServer(2182): ABORTING region server 192.168.1.7,64317,1501022063917: Unhandled: Failed suppression of fs shutdown hook: org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@668e0f60 java.lang.RuntimeException: Failed suppression of fs shutdown hook: org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@668e0f60 at org.apache.hadoop.hbase.regionserver.ShutdownHook.suppressHdfsShutdownHook(ShutdownHook.java:204) at org.apache.hadoop.hbase.regionserver.ShutdownHook.install(ShutdownHook.java:84) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:940) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:360) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1846) at org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:307) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138) at java.lang.Thread.run(Thread.java:745) {noformat} HBASE-17922 addressed similar issue in Hadoop 3. I think this change is more robust than the one in branch-1 right now. Porting the change to branch-1 (with small modification due to code difference between branch-1 and branch-2) is a good idea. was: The {code} Running org.apache.hadoop.hbase.regionserver.TestRegionServerHostname Tests run: 4, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 46.042 sec <<< FAILURE! - in org.apache.hadoop.hbase.regionserver.TestRegionServerHostname testRegionServerHostnameReportedToMaster(org.apache.hadoop.hbase.regionserver.TestRegionServerHostname) Time elapsed: 30.095 sec <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed out after 3 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:221) at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:445) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:225) at org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1072)
[jira] [Updated] (HBASE-18458) Refactor TestRegionServerHostname to make it robust (Port HBASE-17922)
[ https://issues.apache.org/jira/browse/HBASE-18458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18458: --- Description: The {code} Running org.apache.hadoop.hbase.regionserver.TestRegionServerHostname Tests run: 4, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 46.042 sec <<< FAILURE! - in org.apache.hadoop.hbase.regionserver.TestRegionServerHostname testRegionServerHostnameReportedToMaster(org.apache.hadoop.hbase.regionserver.TestRegionServerHostname) Time elapsed: 30.095 sec <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed out after 3 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:221) at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:445) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:225) at org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1072) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1028) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:900) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:894) at org.apache.hadoop.hbase.regionserver.TestRegionServerHostname.testRegionServerHostnameReportedToMaster(TestRegionServerHostname.java:158) {code} was: {code} Running org.apache.hadoop.hbase.regionserver.TestRegionServerHostname Tests run: 4, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 46.042 sec <<< FAILURE! - in org.apache.hadoop.hbase.regionserver.TestRegionServerHostname testRegionServerHostnameReportedToMaster(org.apache.hadoop.hbase.regionserver.TestRegionServerHostname) Time elapsed: 30.095 sec <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed out after 3 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:221) at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:445) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:225) at org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1072) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1028) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:900) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:894) at org.apache.hadoop.hbase.regionserver.TestRegionServerHostname.testRegionServerHostnameReportedToMaster(TestRegionServerHostname.java:158) {code} > Refactor TestRegionServerHostname to make it robust (Port HBASE-17922) > -- > > Key: HBASE-18458 > URL: https://issues.apache.org/jira/browse/HBASE-18458 > Project: HBase > Issue Type: Sub-task > Components: hadoop3 >Affects Versions: 2.0.0 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang > Fix For: 3.0.0, 2.0.0-alpha-2 > > > The > {code} > Running org.apache.hadoop.hbase.regionserver.TestRegionServerHostname > Tests run: 4, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 46.042 sec > <<< FAILURE! - in > org.apache.hadoop.hbase.regionserver.TestRegionServerHostname > testRegionServerHostnameReportedToMaster(org.apache.hadoop.hbase.regionserver.TestRegionServerHostname) > Time elapsed: 30.095 sec <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 3 > milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:221) > at > org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:445) > at > org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:225) > at > org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1072) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1028) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:900) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:894) > at > org.apache.hadoop.hbase.regionserver.TestRegion
[jira] [Assigned] (HBASE-18458) Refactor TestRegionServerHostname to make it robust (Port HBASE-17922)
[ https://issues.apache.org/jira/browse/HBASE-18458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang reassigned HBASE-18458: -- Assignee: Stephen Yuan Jiang (was: Mike Drob) > Refactor TestRegionServerHostname to make it robust (Port HBASE-17922) > -- > > Key: HBASE-18458 > URL: https://issues.apache.org/jira/browse/HBASE-18458 > Project: HBase > Issue Type: Sub-task > Components: hadoop3 >Affects Versions: 2.0.0 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang > Fix For: 3.0.0, 2.0.0-alpha-2 > > > {code} > Running org.apache.hadoop.hbase.regionserver.TestRegionServerHostname > Tests run: 4, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 46.042 sec > <<< FAILURE! - in > org.apache.hadoop.hbase.regionserver.TestRegionServerHostname > testRegionServerHostnameReportedToMaster(org.apache.hadoop.hbase.regionserver.TestRegionServerHostname) > Time elapsed: 30.095 sec <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 3 > milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:221) > at > org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:445) > at > org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:225) > at > org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1072) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1028) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:900) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:894) > at > org.apache.hadoop.hbase.regionserver.TestRegionServerHostname.testRegionServerHostnameReportedToMaster(TestRegionServerHostname.java:158) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18458) Refactor TestRegionServerHostname to make it robust (Port HBASE-17922)
[ https://issues.apache.org/jira/browse/HBASE-18458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18458: --- Description: {code} Running org.apache.hadoop.hbase.regionserver.TestRegionServerHostname Tests run: 4, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 46.042 sec <<< FAILURE! - in org.apache.hadoop.hbase.regionserver.TestRegionServerHostname testRegionServerHostnameReportedToMaster(org.apache.hadoop.hbase.regionserver.TestRegionServerHostname) Time elapsed: 30.095 sec <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed out after 3 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:221) at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:445) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:225) at org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1072) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1028) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:900) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:894) at org.apache.hadoop.hbase.regionserver.TestRegionServerHostname.testRegionServerHostnameReportedToMaster(TestRegionServerHostname.java:158) {code} was: {code} Running org.apache.hadoop.hbase.regionserver.TestRegionServerHostname Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 126.363 sec <<< FAILURE! - in org.apache.hadoop.hbase.regionserver.TestRegionServerHostname testRegionServerHostname(org.apache.hadoop.hbase.regionserver.TestRegionServerHostname) Time elapsed: 120.029 sec <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed out after 12 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:221) at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:405) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:225) at org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1123) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1077) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:948) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:942) at org.apache.hadoop.hbase.regionserver.TestRegionServerHostname.testRegionServerHostname(TestRegionServerHostname.java:88) Results : Tests in error: TestRegionServerHostname.testRegionServerHostname:88 » TestTimedOut test timed... Tests run: 2, Failures: 0, Errors: 1, Skipped: 0 {code} > Refactor TestRegionServerHostname to make it robust (Port HBASE-17922) > -- > > Key: HBASE-18458 > URL: https://issues.apache.org/jira/browse/HBASE-18458 > Project: HBase > Issue Type: Sub-task > Components: hadoop3 >Affects Versions: 2.0.0 >Reporter: Stephen Yuan Jiang >Assignee: Mike Drob > Fix For: 3.0.0, 2.0.0-alpha-2 > > > {code} > Running org.apache.hadoop.hbase.regionserver.TestRegionServerHostname > Tests run: 4, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 46.042 sec > <<< FAILURE! - in > org.apache.hadoop.hbase.regionserver.TestRegionServerHostname > testRegionServerHostnameReportedToMaster(org.apache.hadoop.hbase.regionserver.TestRegionServerHostname) > Time elapsed: 30.095 sec <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 3 > milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:221) > at > org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:445) > at > org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:225) > at > org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1072) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1028) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:900) > at > org.apache.hadoop.hbase.HBaseTestingUt
[jira] [Created] (HBASE-18458) Refactor TestRegionServerHostname to make it robust (Port HBASE-17922)
Stephen Yuan Jiang created HBASE-18458: -- Summary: Refactor TestRegionServerHostname to make it robust (Port HBASE-17922) Key: HBASE-18458 URL: https://issues.apache.org/jira/browse/HBASE-18458 Project: HBase Issue Type: Sub-task Components: hadoop3 Affects Versions: 2.0.0 Reporter: Stephen Yuan Jiang Assignee: Mike Drob Fix For: 3.0.0, 2.0.0-alpha-2 {code} Running org.apache.hadoop.hbase.regionserver.TestRegionServerHostname Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 126.363 sec <<< FAILURE! - in org.apache.hadoop.hbase.regionserver.TestRegionServerHostname testRegionServerHostname(org.apache.hadoop.hbase.regionserver.TestRegionServerHostname) Time elapsed: 120.029 sec <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed out after 12 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:221) at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:405) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:225) at org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1123) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1077) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:948) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:942) at org.apache.hadoop.hbase.regionserver.TestRegionServerHostname.testRegionServerHostname(TestRegionServerHostname.java:88) Results : Tests in error: TestRegionServerHostname.testRegionServerHostname:88 » TestTimedOut test timed... Tests run: 2, Failures: 0, Errors: 1, Skipped: 0 {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18354) Fix TestMasterMetrics that were disabled by Proc-V2 AM in HBASE-14614
[ https://issues.apache.org/jira/browse/HBASE-18354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18354: --- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) > Fix TestMasterMetrics that were disabled by Proc-V2 AM in HBASE-14614 > - > > Key: HBASE-18354 > URL: https://issues.apache.org/jira/browse/HBASE-18354 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0-alpha-1 >Reporter: Stephen Yuan Jiang >Assignee: Vladimir Rodionov > Fix For: 2.0.0, 3.0.0 > > Attachments: HBASE-18354-v1.patch, HBASE-18354-v2.patch > > > With Core Proc-V2 AM change in HBASE-14614, stuff is different now around > startup which messes up the TestMasterMetrics test. HBASE-14614 disabled two > of three tests. > This JIRA tracks work to fix the disabled tests. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18354) Fix TestMasterMetrics that were disabled by Proc-V2 AM in HBASE-14614
[ https://issues.apache.org/jira/browse/HBASE-18354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18354: --- Fix Version/s: 3.0.0 2.0.0 > Fix TestMasterMetrics that were disabled by Proc-V2 AM in HBASE-14614 > - > > Key: HBASE-18354 > URL: https://issues.apache.org/jira/browse/HBASE-18354 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0-alpha-1 >Reporter: Stephen Yuan Jiang >Assignee: Vladimir Rodionov > Fix For: 2.0.0, 3.0.0 > > Attachments: HBASE-18354-v1.patch, HBASE-18354-v2.patch > > > With Core Proc-V2 AM change in HBASE-14614, stuff is different now around > startup which messes up the TestMasterMetrics test. HBASE-14614 disabled two > of three tests. > This JIRA tracks work to fix the disabled tests. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HBASE-18350) Enable RSGroups UT that were disabled by Proc-V2 AM in HBASE-14614
[ https://issues.apache.org/jira/browse/HBASE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang reassigned HBASE-18350: -- Assignee: Stephen Yuan Jiang > Enable RSGroups UT that were disabled by Proc-V2 AM in HBASE-14614 > -- > > Key: HBASE-18350 > URL: https://issues.apache.org/jira/browse/HBASE-18350 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0-alpha-1 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang > > The following RSGroups tests were disabled by Core Proc-V2 AM in HBASE-14614: > - Disabled/Ignore TestRSGroupsOfflineMode#testOffline; need to dig in on what > offline is. > - Disabled/Ignore TestRSGroups. > This JIRA tracks the work to enable them (or remove/modify if not applicable). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18354) Fix TestMasterMetrics that were disabled by Proc-V2 AM in HBASE-14614
[ https://issues.apache.org/jira/browse/HBASE-18354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16095427#comment-16095427 ] Stephen Yuan Jiang commented on HBASE-18354: +1. Looks good to me. > Fix TestMasterMetrics that were disabled by Proc-V2 AM in HBASE-14614 > - > > Key: HBASE-18354 > URL: https://issues.apache.org/jira/browse/HBASE-18354 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0-alpha-1 >Reporter: Stephen Yuan Jiang >Assignee: Vladimir Rodionov > Attachments: HBASE-18354-v1.patch, HBASE-18354-v2.patch > > > With Core Proc-V2 AM change in HBASE-14614, stuff is different now around > startup which messes up the TestMasterMetrics test. HBASE-14614 disabled two > of three tests. > This JIRA tracks work to fix the disabled tests. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18406) In ServerCrashProcedure.java start(MasterProcedureEnv) is a no-op
[ https://issues.apache.org/jira/browse/HBASE-18406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092399#comment-16092399 ] Stephen Yuan Jiang commented on HBASE-18406: Looks good. > In ServerCrashProcedure.java start(MasterProcedureEnv) is a no-op > - > > Key: HBASE-18406 > URL: https://issues.apache.org/jira/browse/HBASE-18406 > Project: HBase > Issue Type: Bug >Reporter: Alex Leblang >Assignee: Alex Leblang > Attachments: HBASE-18406.master.001.patch > > > The comments above this method explain that it exists to set configs and > return, however, no configs are set in the method. > As you can see here: > https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L210-L214 > > It is only ever called here: > https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L142 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18403) [Shell]Truncate permission required
[ https://issues.apache.org/jira/browse/HBASE-18403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092288#comment-16092288 ] Stephen Yuan Jiang commented on HBASE-18403: Which version of HBase you see this problem? At least in TruncateTableProcedure code (1.1+), we check permission at the beginning and then truncate table (delete and then create), I don't see the logic to abort in the middle of procedure and complete the task half way. > [Shell]Truncate permission required > --- > > Key: HBASE-18403 > URL: https://issues.apache.org/jira/browse/HBASE-18403 > Project: HBase > Issue Type: Improvement > Components: shell >Reporter: Yun Zhao >Assignee: Yun Zhao >Priority: Trivial > Attachments: HBASE-18403.patch > > > When a user has only (Create) permission to execute truncate, the table will > be deleted and not re-created -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-16488) Starting namespace and quota services in master startup asynchronizely
[ https://issues.apache.org/jira/browse/HBASE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16091568#comment-16091568 ] Stephen Yuan Jiang commented on HBASE-16488: V10 patch in branch-1 is approved by [~enis]. Most tests are passed in pre-commit. In failed UT, I checked the source code and don't think they are related to this change. I re-run those tests locally, and all except one passed. The only test that fails consistently in my local machine is {{org.apache.hadoop.hbase.regionserver.TestRSKilledWhenInitializing.testRSTerminationAfterRegisteringToMasterBeforeCreatingEphemeralNode}} - I spent some time to debug it and don't think this is related to this change. The test kills one RS and assert that server manager thinks this RS is not online. Without any change, the test passed in my local machine consistently. I added some logging in the test (just some LOG.info statements inside the test, no other changes) and see what is going on, it would fail consistently that server manager thinks RS is still online. If I add some waiting before assert, the test would pass with about 600ms wait in my local machine. This is with only log info messages in test and no real change. Seems there is a delay between "mini cluster get live server thinks the RS is dead" and "master server manager remove the RS from the online server list". With the patch, the same is true, with about 600ms delay (has nothing to do with namespace), the test passed. I think this is test issue and if it consistently repro in pre-commit. I will fix the test in a separate JIRA. > Starting namespace and quota services in master startup asynchronizely > -- > > Key: HBASE-16488 > URL: https://issues.apache.org/jira/browse/HBASE-16488 > Project: HBase > Issue Type: Improvement > Components: master >Affects Versions: 2.0.0, 1.3.0, 1.0.3, 1.4.0, 1.1.5, 1.2.2 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang > Attachments: HBASE-16488.v10-branch-1.patch, > HBASE-16488.v1-branch-1.patch, HBASE-16488.v1-master.patch, > HBASE-16488.v2-branch-1.patch, HBASE-16488.v2-branch-1.patch, > HBASE-16488.v3-branch-1.patch, HBASE-16488.v3-branch-1.patch, > HBASE-16488.v4-branch-1.patch, HBASE-16488.v5-branch-1.patch, > HBASE-16488.v6-branch-1.patch, HBASE-16488.v7-branch-1.patch, > HBASE-16488.v8-branch-1.patch, HBASE-16488.v9-branch-1.patch > > > From time to time, during internal IT test and from customer, we often see > master initialization failed due to namespace table region takes long time to > assign (eg. sometimes split log takes long time or hanging; or sometimes RS > is temporarily not available; sometimes due to some unknown assignment > issue). In the past, there was some proposal to improve this situation, eg. > HBASE-13556 / HBASE-14190 (Assign system tables ahead of user region > assignment) or HBASE-13557 (Special WAL handling for system tables) or > HBASE-14623 (Implement dedicated WAL for system tables). > This JIRA proposes another way to solve this master initialization fail > issue: namespace service is only used by a handful operations (eg. create > table / namespace DDL / get namespace API / some RS group DDL). Only quota > manager depends on it and quota management is off by default. Therefore, > namespace service is not really needed for master to be functional. So we > could start namespace service asynchronizely without blocking master startup. > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-16488) Starting namespace and quota services in master startup asynchronizely
[ https://issues.apache.org/jira/browse/HBASE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-16488: --- Attachment: HBASE-16488.v10-branch-1.patch > Starting namespace and quota services in master startup asynchronizely > -- > > Key: HBASE-16488 > URL: https://issues.apache.org/jira/browse/HBASE-16488 > Project: HBase > Issue Type: Improvement > Components: master >Affects Versions: 2.0.0, 1.3.0, 1.0.3, 1.4.0, 1.1.5, 1.2.2 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang > Attachments: HBASE-16488.v10-branch-1.patch, > HBASE-16488.v1-branch-1.patch, HBASE-16488.v1-master.patch, > HBASE-16488.v2-branch-1.patch, HBASE-16488.v2-branch-1.patch, > HBASE-16488.v3-branch-1.patch, HBASE-16488.v3-branch-1.patch, > HBASE-16488.v4-branch-1.patch, HBASE-16488.v5-branch-1.patch, > HBASE-16488.v6-branch-1.patch, HBASE-16488.v7-branch-1.patch, > HBASE-16488.v8-branch-1.patch, HBASE-16488.v9-branch-1.patch > > > From time to time, during internal IT test and from customer, we often see > master initialization failed due to namespace table region takes long time to > assign (eg. sometimes split log takes long time or hanging; or sometimes RS > is temporarily not available; sometimes due to some unknown assignment > issue). In the past, there was some proposal to improve this situation, eg. > HBASE-13556 / HBASE-14190 (Assign system tables ahead of user region > assignment) or HBASE-13557 (Special WAL handling for system tables) or > HBASE-14623 (Implement dedicated WAL for system tables). > This JIRA proposes another way to solve this master initialization fail > issue: namespace service is only used by a handful operations (eg. create > table / namespace DDL / get namespace API / some RS group DDL). Only quota > manager depends on it and quota management is off by default. Therefore, > namespace service is not really needed for master to be functional. So we > could start namespace service asynchronizely without blocking master startup. > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18357) Enable disabled tests in TestHCM that were disabled by Proc-V2 AM in HBASE-14614
Stephen Yuan Jiang created HBASE-18357: -- Summary: Enable disabled tests in TestHCM that were disabled by Proc-V2 AM in HBASE-14614 Key: HBASE-18357 URL: https://issues.apache.org/jira/browse/HBASE-18357 Project: HBase Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha-1 Reporter: Stephen Yuan Jiang The Core Proc-V2 AM change in HBASE-14614 disabled two tests inTestHCM: testMulti and testRegionCaching This JIRA tracks the work to enable them. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18356) Enable TestFavoredStochasticBalancerPickers#testPickers that was disabled by Proc-V2 AM in HBASE-14614
Stephen Yuan Jiang created HBASE-18356: -- Summary: Enable TestFavoredStochasticBalancerPickers#testPickers that was disabled by Proc-V2 AM in HBASE-14614 Key: HBASE-18356 URL: https://issues.apache.org/jira/browse/HBASE-18356 Project: HBase Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha-1 Reporter: Stephen Yuan Jiang The testPickers in TestFavoredStochasticBalancerPickers hangs after applying the change in Core Proc-V2 AM in HBASE-14614. It was disabled. This JIRA tracks the work to enable it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18355) Enable export snapshot tests that were disabled by Proc-V2 AM in HBASE-14614
Stephen Yuan Jiang created HBASE-18355: -- Summary: Enable export snapshot tests that were disabled by Proc-V2 AM in HBASE-14614 Key: HBASE-18355 URL: https://issues.apache.org/jira/browse/HBASE-18355 Project: HBase Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha-1 Reporter: Stephen Yuan Jiang The Proc-V2 AM in HBASE-14614 disabled the following tests: - Disabled TestExportSnapshot Hangs. - Disabled TestSecureExportSnapshot - Disabled TestMobSecureExportSnapshot and TestMobExportSnapshot This JIRA tracks the work to enable them. If MOB requires more work, we could split to 2 tickets. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18352) Enable Replica tests that were disabled by Proc-V2 AM in HBASE-14614
[ https://issues.apache.org/jira/browse/HBASE-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18352: --- Description: The following replica tests were disabled by Core Proc-V2 AM in HBASE-14614: - Disabled parts of...testCreateTableWithMultipleReplicas in TestMasterOperationsForRegionReplicas There is an issue w/ assigning more replicas if number of replicas is changed on us. See '/* DISABLED! FOR NOW'. - Disabled testRegionReplicasOnMidClusterHighReplication in TestStochasticLoadBalancer2 This JIRA tracks the work to enable them (or modify/remove if not applicable). was: The following replica tests were disabled by Core Proc-V2 AM in HBASE-14614: - Disabled parts of...testCreateTableWithMultipleReplicas in TestMasterOperationsForRegionReplicas There is an issue w/ assigning more replicas if number of replicas is changed on us. See '/* DISABLED! FOR NOW'. This JIRA tracks the work to enable them (or modify/remove if not applicable). > Enable Replica tests that were disabled by Proc-V2 AM in HBASE-14614 > > > Key: HBASE-18352 > URL: https://issues.apache.org/jira/browse/HBASE-18352 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0-alpha-1 >Reporter: Stephen Yuan Jiang > > The following replica tests were disabled by Core Proc-V2 AM in HBASE-14614: > - Disabled parts of...testCreateTableWithMultipleReplicas in > TestMasterOperationsForRegionReplicas There is an issue w/ assigning more > replicas if number of replicas is changed on us. See '/* DISABLED! FOR > NOW'. > - Disabled testRegionReplicasOnMidClusterHighReplication in > TestStochasticLoadBalancer2 > This JIRA tracks the work to enable them (or modify/remove if not applicable). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18352) Enable Replica tests that were disabled by Proc-V2 AM in HBASE-14614
[ https://issues.apache.org/jira/browse/HBASE-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18352: --- Description: The following replica tests were disabled by Core Proc-V2 AM in HBASE-14614: - Disabled parts of...testCreateTableWithMultipleReplicas in TestMasterOperationsForRegionReplicas There is an issue w/ assigning more replicas if number of replicas is changed on us. See '/* DISABLED! FOR NOW'. - Disabled testRegionReplicasOnMidClusterHighReplication in TestStochasticLoadBalancer2 - Disabled testFlushAndCompactionsInPrimary in TestRegionReplicas This JIRA tracks the work to enable them (or modify/remove if not applicable). was: The following replica tests were disabled by Core Proc-V2 AM in HBASE-14614: - Disabled parts of...testCreateTableWithMultipleReplicas in TestMasterOperationsForRegionReplicas There is an issue w/ assigning more replicas if number of replicas is changed on us. See '/* DISABLED! FOR NOW'. - Disabled testRegionReplicasOnMidClusterHighReplication in TestStochasticLoadBalancer2 This JIRA tracks the work to enable them (or modify/remove if not applicable). > Enable Replica tests that were disabled by Proc-V2 AM in HBASE-14614 > > > Key: HBASE-18352 > URL: https://issues.apache.org/jira/browse/HBASE-18352 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0-alpha-1 >Reporter: Stephen Yuan Jiang > > The following replica tests were disabled by Core Proc-V2 AM in HBASE-14614: > - Disabled parts of...testCreateTableWithMultipleReplicas in > TestMasterOperationsForRegionReplicas There is an issue w/ assigning more > replicas if number of replicas is changed on us. See '/* DISABLED! FOR > NOW'. > - Disabled testRegionReplicasOnMidClusterHighReplication in > TestStochasticLoadBalancer2 > - Disabled testFlushAndCompactionsInPrimary in TestRegionReplicas > This JIRA tracks the work to enable them (or modify/remove if not applicable). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18350) Enable RSGroups UT that were disabled by Proc-V2 AM in HBASE-14614
Stephen Yuan Jiang created HBASE-18350: -- Summary: Enable RSGroups UT that were disabled by Proc-V2 AM in HBASE-14614 Key: HBASE-18350 URL: https://issues.apache.org/jira/browse/HBASE-18350 Project: HBase Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha-1 Reporter: Stephen Yuan Jiang The following RSGroups tests were disabled by Core Proc-V2 AM in HBASE-14614: - Disabled/Ignore TestRSGroupsOfflineMode#testOffline; need to dig in on what offline is. - Disabled/Ignore TestRSGroups. This JIRA tracks the work to enable them (or remove/modify if not applicable). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18353) Enable TestCorruptedRegionStoreFile that were disabled by Proc-V2 AM in HBASE-14614
Stephen Yuan Jiang created HBASE-18353: -- Summary: Enable TestCorruptedRegionStoreFile that were disabled by Proc-V2 AM in HBASE-14614 Key: HBASE-18353 URL: https://issues.apache.org/jira/browse/HBASE-18353 Project: HBase Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha-1 Reporter: Stephen Yuan Jiang HBASE-14614 disabled TestCorruptedRegionStoreFile, as it depends on a half-implemented reopen of a region when a store file goes missing. This JIRA tracks the work to fix/enable the test. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18351) Fix tests that carry meta in Master that were disabled by Proc-V2 AM in HBASE-14614
[ https://issues.apache.org/jira/browse/HBASE-18351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18351: --- Description: The following tests were disabled as part of Core Proc-V2 AM in HBASE-14614 - TestRegionRebalancing is disabled because doesn't consider the fact that Master carries system tables only (fix of average in RegionStates brought out the issue). - Disabled testMetaAddressChange in TestMetaWithReplicas because presumes can move meta... you can't - TestAsyncTableGetMultiThreaded wants to move hbase:meta...Balancer does NPEs. AMv2 won't let you move hbase:meta off Master. - TestMasterFailover needs to be rewritten for AMv2. It uses tricks not ordained when up on AMv2. The test is also hobbled by fact that we religiously enforce that only master can carry meta, something we are lose about in old AM This JIRA is tracking the work to enable/modify them. was: The following tests were disabled as part of Core Proc-V2 AM in HBASE-14614 - TestRegionRebalancing is disabled because doesn't consider the fact that Master carries system tables only (fix of average in RegionStates brought out the issue). - Disabled testMetaAddressChange in TestMetaWithReplicas because presumes can move meta... you can't - TestAsyncTableGetMultiThreaded wants to move hbase:meta...Balancer does NPEs. AMv2 won't let you move hbase:meta off Master. This JIRA is tracking the work to enable/modify them. > Fix tests that carry meta in Master that were disabled by Proc-V2 AM in > HBASE-14614 > --- > > Key: HBASE-18351 > URL: https://issues.apache.org/jira/browse/HBASE-18351 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0-alpha-1 >Reporter: Stephen Yuan Jiang > > The following tests were disabled as part of Core Proc-V2 AM in HBASE-14614 > - TestRegionRebalancing is disabled because doesn't consider the fact that > Master carries system tables only (fix of average in RegionStates brought out > the issue). > - Disabled testMetaAddressChange in TestMetaWithReplicas because presumes can > move meta... you can't > - TestAsyncTableGetMultiThreaded wants to move hbase:meta...Balancer does > NPEs. AMv2 won't let you move hbase:meta off Master. > - TestMasterFailover needs to be rewritten for AMv2. It uses tricks not > ordained when up on AMv2. The test is also hobbled by fact that we > religiously enforce that only master can carry meta, something we are lose > about in old AM > This JIRA is tracking the work to enable/modify them. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18354) Fix TestMasterMetrics that were disabled by Proc-V2 AM in HBASE-14614
Stephen Yuan Jiang created HBASE-18354: -- Summary: Fix TestMasterMetrics that were disabled by Proc-V2 AM in HBASE-14614 Key: HBASE-18354 URL: https://issues.apache.org/jira/browse/HBASE-18354 Project: HBase Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha-1 Reporter: Stephen Yuan Jiang With Core Proc-V2 AM change in HBASE-14614, stuff is different now around startup which messes up the TestMasterMetrics test. HBASE-14614 disabled two of three tests. This JIRA tracks work to fix the disabled tests. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18352) Enable Replica tests that were disabled by Proc-V2 AM in HBASE-14614
Stephen Yuan Jiang created HBASE-18352: -- Summary: Enable Replica tests that were disabled by Proc-V2 AM in HBASE-14614 Key: HBASE-18352 URL: https://issues.apache.org/jira/browse/HBASE-18352 Project: HBase Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha-1 Reporter: Stephen Yuan Jiang The following replica tests were disabled by Core Proc-V2 AM in HBASE-14614: - Disabled parts of...testCreateTableWithMultipleReplicas in TestMasterOperationsForRegionReplicas There is an issue w/ assigning more replicas if number of replicas is changed on us. See '/* DISABLED! FOR NOW'. This JIRA tracks the work to enable them (or modify/remove if not applicable). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18349) Enable disabled tests in TestFavoredStochasticLoadBalancer that were disabled by Proc-V2 AM in HBASE-14614
[ https://issues.apache.org/jira/browse/HBASE-18349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18349: --- Summary: Enable disabled tests in TestFavoredStochasticLoadBalancer that were disabled by Proc-V2 AM in HBASE-14614 (was: Enable disabled tests in TestFavoredStochasticLoadBalancer that was disabled by Proc-V2 AM in HBASE-14614) > Enable disabled tests in TestFavoredStochasticLoadBalancer that were disabled > by Proc-V2 AM in HBASE-14614 > -- > > Key: HBASE-18349 > URL: https://issues.apache.org/jira/browse/HBASE-18349 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0-alpha-1 >Reporter: Stephen Yuan Jiang > > The following 3 tests in TestFavoredStochasticLoadBalancerwere disabled by > HBASE-14614 (Core Proc-V2 AM): > - testAllFavoredNodesDead > - testAllFavoredNodesDeadMasterRestarted > - testMisplacedRegions > This JIRA is tracking necessary work to re-able (or remove/change if not > applicable) these UTs -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18351) Fix tests that carry meta in Master that were disabled by Proc-V2 AM in HBASE-14614
Stephen Yuan Jiang created HBASE-18351: -- Summary: Fix tests that carry meta in Master that were disabled by Proc-V2 AM in HBASE-14614 Key: HBASE-18351 URL: https://issues.apache.org/jira/browse/HBASE-18351 Project: HBase Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha-1 Reporter: Stephen Yuan Jiang The following tests were disabled as part of Core Proc-V2 AM in HBASE-14614 - TestRegionRebalancing is disabled because doesn't consider the fact that Master carries system tables only (fix of average in RegionStates brought out the issue). - Disabled testMetaAddressChange in TestMetaWithReplicas because presumes can move meta... you can't - TestAsyncTableGetMultiThreaded wants to move hbase:meta...Balancer does NPEs. AMv2 won't let you move hbase:meta off Master. This JIRA is tracking the work to enable/modify them. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18349) Enable disabled tests in TestFavoredStochasticLoadBalancer that was disabled by Proc-V2 AM in HBASE-14614
Stephen Yuan Jiang created HBASE-18349: -- Summary: Enable disabled tests in TestFavoredStochasticLoadBalancer that was disabled by Proc-V2 AM in HBASE-14614 Key: HBASE-18349 URL: https://issues.apache.org/jira/browse/HBASE-18349 Project: HBase Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha-1 Reporter: Stephen Yuan Jiang The following 3 tests in TestFavoredStochasticLoadBalancerwere disabled by HBASE-14614 (Core Proc-V2 AM): - testAllFavoredNodesDead - testAllFavoredNodesDeadMasterRestarted - testMisplacedRegions This JIRA is tracking necessary work to re-able (or remove/change if not applicable) these UTs -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-16488) Starting namespace and quota services in master startup asynchronizely
[ https://issues.apache.org/jira/browse/HBASE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-16488: --- Attachment: HBASE-16488.v9-branch-1.patch > Starting namespace and quota services in master startup asynchronizely > -- > > Key: HBASE-16488 > URL: https://issues.apache.org/jira/browse/HBASE-16488 > Project: HBase > Issue Type: Improvement > Components: master >Affects Versions: 2.0.0, 1.3.0, 1.0.3, 1.4.0, 1.1.5, 1.2.2 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang > Attachments: HBASE-16488.v1-branch-1.patch, > HBASE-16488.v1-master.patch, HBASE-16488.v2-branch-1.patch, > HBASE-16488.v2-branch-1.patch, HBASE-16488.v3-branch-1.patch, > HBASE-16488.v3-branch-1.patch, HBASE-16488.v4-branch-1.patch, > HBASE-16488.v5-branch-1.patch, HBASE-16488.v6-branch-1.patch, > HBASE-16488.v7-branch-1.patch, HBASE-16488.v8-branch-1.patch, > HBASE-16488.v9-branch-1.patch > > > From time to time, during internal IT test and from customer, we often see > master initialization failed due to namespace table region takes long time to > assign (eg. sometimes split log takes long time or hanging; or sometimes RS > is temporarily not available; sometimes due to some unknown assignment > issue). In the past, there was some proposal to improve this situation, eg. > HBASE-13556 / HBASE-14190 (Assign system tables ahead of user region > assignment) or HBASE-13557 (Special WAL handling for system tables) or > HBASE-14623 (Implement dedicated WAL for system tables). > This JIRA proposes another way to solve this master initialization fail > issue: namespace service is only used by a handful operations (eg. create > table / namespace DDL / get namespace API / some RS group DDL). Only quota > manager depends on it and quota management is off by default. Therefore, > namespace service is not really needed for master to be functional. So we > could start namespace service asynchronizely without blocking master startup. > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18107) [AMv2] Rename DispatchMergingRegionsRequest & DispatchMergingRegions
[ https://issues.apache.org/jira/browse/HBASE-18107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16075619#comment-16075619 ] Stephen Yuan Jiang commented on HBASE-18107: Yeah, we don't need DispatchMergingRegionsProcedure in 2.0.0. Sorry that I did not find this issue in HBASE-14614 when it sneak back this old procedure. > [AMv2] Rename DispatchMergingRegionsRequest & DispatchMergingRegions > > > Key: HBASE-18107 > URL: https://issues.apache.org/jira/browse/HBASE-18107 > Project: HBase > Issue Type: Sub-task > Components: Region Assignment >Affects Versions: 2.0.0 >Reporter: stack > Fix For: 2.0.0 > > > They don't align with how we have named the Split equivalents; i.e. > SplitRegion (so should be MergeRegion...). They probably have these awkward > names because the obvious slots are occupied... so this may not be fixable > but filing issue anyways. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18301) Enable TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster that was disabled by Proc-V2 AM in HBASE-14614
[ https://issues.apache.org/jira/browse/HBASE-18301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18301: --- Fix Version/s: 3.0.0 > Enable > TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster > that was disabled by Proc-V2 AM in HBASE-14614 > --- > > Key: HBASE-18301 > URL: https://issues.apache.org/jira/browse/HBASE-18301 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0-alpha-1 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang > Fix For: 2.0.0, 3.0.0 > > Attachments: HBASE-18301.v1-master.patch, HBASE-18301.v1-master.patch > > > Enable > TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster > that was temporally disabled by HBASE-14614 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18301) Enable TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster that was disabled by Proc-V2 AM in HBASE-14614
[ https://issues.apache.org/jira/browse/HBASE-18301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18301: --- Hadoop Flags: Reviewed > Enable > TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster > that was disabled by Proc-V2 AM in HBASE-14614 > --- > > Key: HBASE-18301 > URL: https://issues.apache.org/jira/browse/HBASE-18301 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0-alpha-1 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang > Fix For: 2.0.0, 3.0.0 > > Attachments: HBASE-18301.v1-master.patch, HBASE-18301.v1-master.patch > > > Enable > TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster > that was temporally disabled by HBASE-14614 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18301) Enable TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster that was disabled by Proc-V2 AM in HBASE-14614
[ https://issues.apache.org/jira/browse/HBASE-18301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18301: --- Resolution: Fixed Status: Resolved (was: Patch Available) > Enable > TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster > that was disabled by Proc-V2 AM in HBASE-14614 > --- > > Key: HBASE-18301 > URL: https://issues.apache.org/jira/browse/HBASE-18301 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0-alpha-1 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang > Fix For: 2.0.0, 3.0.0 > > Attachments: HBASE-18301.v1-master.patch, HBASE-18301.v1-master.patch > > > Enable > TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster > that was temporally disabled by HBASE-14614 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18301) Enable TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster that was disabled by Proc-V2 AM in HBASE-14614
[ https://issues.apache.org/jira/browse/HBASE-18301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18301: --- Summary: Enable TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster that was disabled by Proc-V2 AM in HBASE-14614 (was: Procedure V2 (AM) - Enable TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster that was disabled by HBASE-14614) > Enable > TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster > that was disabled by Proc-V2 AM in HBASE-14614 > --- > > Key: HBASE-18301 > URL: https://issues.apache.org/jira/browse/HBASE-18301 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0-alpha-1 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang > Fix For: 2.0.0 > > Attachments: HBASE-18301.v1-master.patch, HBASE-18301.v1-master.patch > > > Enable > TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster > that was temporally disabled by HBASE-14614 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18301) Procedure V2 (AM) - Enable TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster that was disabled by HBASE-14614
[ https://issues.apache.org/jira/browse/HBASE-18301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18301: --- Attachment: HBASE-18301.v1-master.patch > Procedure V2 (AM) - Enable > TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster > that was disabled by HBASE-14614 > - > > Key: HBASE-18301 > URL: https://issues.apache.org/jira/browse/HBASE-18301 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0-alpha-1 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang > Fix For: 2.0.0 > > Attachments: HBASE-18301.v1-master.patch > > > Enable > TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster > that was temporally disabled by HBASE-14614 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18301) Procedure V2 (AM) - Enable TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster that was disabled by HBASE-14614
[ https://issues.apache.org/jira/browse/HBASE-18301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18301: --- Attachment: (was: HBASE-18301.v1-master.patch) > Procedure V2 (AM) - Enable > TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster > that was disabled by HBASE-14614 > - > > Key: HBASE-18301 > URL: https://issues.apache.org/jira/browse/HBASE-18301 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0-alpha-1 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang > Fix For: 2.0.0 > > > Enable > TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster > that was temporally disabled by HBASE-14614 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18301) Procedure V2 (AM) - Enable TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster that was disabled by HBASE-14614
[ https://issues.apache.org/jira/browse/HBASE-18301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18301: --- Attachment: HBASE-18301.v1-master.patch > Procedure V2 (AM) - Enable > TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster > that was disabled by HBASE-14614 > - > > Key: HBASE-18301 > URL: https://issues.apache.org/jira/browse/HBASE-18301 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0-alpha-1 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang > Fix For: 2.0.0 > > Attachments: HBASE-18301.v1-master.patch > > > Enable > TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster > that was temporally disabled by HBASE-14614 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18301) Procedure V2 (AM) - Enable TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster that was disabled by HBASE-14614
[ https://issues.apache.org/jira/browse/HBASE-18301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18301: --- Status: Patch Available (was: Open) > Procedure V2 (AM) - Enable > TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster > that was disabled by HBASE-14614 > - > > Key: HBASE-18301 > URL: https://issues.apache.org/jira/browse/HBASE-18301 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0-alpha-1 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang > Fix For: 2.0.0 > > Attachments: HBASE-18301.v1-master.patch > > > Enable > TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster > that was temporally disabled by HBASE-14614 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18301) Procedure V2 (AM) - Enable TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster that was disabled by HBASE-14614
Stephen Yuan Jiang created HBASE-18301: -- Summary: Procedure V2 (AM) - Enable TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster that was disabled by HBASE-14614 Key: HBASE-18301 URL: https://issues.apache.org/jira/browse/HBASE-18301 Project: HBase Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha-1 Reporter: Stephen Yuan Jiang Assignee: Stephen Yuan Jiang Fix For: 2.0.0 Enable TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster that was temporally disabled by HBASE-14614 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-16488) Starting namespace and quota services in master startup asynchronizely
[ https://issues.apache.org/jira/browse/HBASE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-16488: --- Attachment: HBASE-16488.v8-branch-1.patch > Starting namespace and quota services in master startup asynchronizely > -- > > Key: HBASE-16488 > URL: https://issues.apache.org/jira/browse/HBASE-16488 > Project: HBase > Issue Type: Improvement > Components: master >Affects Versions: 2.0.0, 1.3.0, 1.0.3, 1.4.0, 1.1.5, 1.2.2 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang > Attachments: HBASE-16488.v1-branch-1.patch, > HBASE-16488.v1-master.patch, HBASE-16488.v2-branch-1.patch, > HBASE-16488.v2-branch-1.patch, HBASE-16488.v3-branch-1.patch, > HBASE-16488.v3-branch-1.patch, HBASE-16488.v4-branch-1.patch, > HBASE-16488.v5-branch-1.patch, HBASE-16488.v6-branch-1.patch, > HBASE-16488.v7-branch-1.patch, HBASE-16488.v8-branch-1.patch > > > From time to time, during internal IT test and from customer, we often see > master initialization failed due to namespace table region takes long time to > assign (eg. sometimes split log takes long time or hanging; or sometimes RS > is temporarily not available; sometimes due to some unknown assignment > issue). In the past, there was some proposal to improve this situation, eg. > HBASE-13556 / HBASE-14190 (Assign system tables ahead of user region > assignment) or HBASE-13557 (Special WAL handling for system tables) or > HBASE-14623 (Implement dedicated WAL for system tables). > This JIRA proposes another way to solve this master initialization fail > issue: namespace service is only used by a handful operations (eg. create > table / namespace DDL / get namespace API / some RS group DDL). Only quota > manager depends on it and quota management is off by default. Therefore, > namespace service is not really needed for master to be functional. So we > could start namespace service asynchronizely without blocking master startup. > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18226) Disable reverse DNS lookup at HMaster and use the hostname provided by RegionServer
[ https://issues.apache.org/jira/browse/HBASE-18226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058541#comment-16058541 ] Stephen Yuan Jiang commented on HBASE-18226: [~onpduo], mind to port to branch-1 as well? > Disable reverse DNS lookup at HMaster and use the hostname provided by > RegionServer > --- > > Key: HBASE-18226 > URL: https://issues.apache.org/jira/browse/HBASE-18226 > Project: HBase > Issue Type: New Feature >Reporter: Duo Xu >Assignee: Duo Xu > Fix For: 3.0.0, 2.0.0-alpha-2 > > Attachments: HBASE-18226.001.patch, HBASE-18226.002.patch, > HBASE-18226.003.patch, HBASE-18226.004.patch, HBASE-18226.005.patch, > HBASE-18226.006.patch > > > Description updated: > In some unusual network environment, forward DNS lookup is supported while > reverse DNS lookup may not work properly. > This JIRA is to address that HMaster uses the hostname passed from RS instead > of doing reverse DNS lookup to tells RS which hostname to use during > reportForDuty() . This has already been implemented by HBASE-12954 by adding > "useThisHostnameInstead" field in RegionServerStatusProtos. > Currently "useThisHostnameInstead" is optional and RS by default only passes > port, server start code and server current time info to HMaster during RS > reportForDuty(). In order to use this field, users currently need to specify > "hbase.regionserver.hostname" on every regionserver node's hbase-site.xml. > This causes some trouble in > 1. some deployments managed by some management tools like Ambari, which > maintains the same copy of hbase-site.xml across all the nodes. > 2. HBASE-12954 is targeting multihomed hosts, which users want to manually > set the hostname value for each node. In the other cases (not multihomed), I > just want RS to use the hostname return by the node and set it in > useThisHostnameInstead and pass to HMaster during reportForDuty(). > I would like to introduce a setting that if the setting is set to true, > "useThisHostnameInstead" will be set to the hostname RS gets from the node. > Then HMaster will skip reverse DNS lookup because it sees > "useThisHostnameInstead" field is set in the request. > "hbase.regionserver.hostname.reported.to.master", is it a good name? > > Regarding the hostname returned by the RS node, I read the source code again > (including hadoop-common dns.java). By default RS gets hostname by calling > InetAddress.getLocalHost().getCanonicalHostName(). If users specify > "hbase.regionserver.dns.interface" or "hbase.regionserver.dns.nameserver" or > some underlying system configuration changes (eg. modifying > /etc/nsswitch.conf), it may first read from DNS or other sources instead of > first checking /etc/hosts file. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-15691) Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-15691: --- Fix Version/s: (was: 1.5.0) (was: 1.4.1) 1.1.12 1.4.0 > Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to > branch-1 > - > > Key: HBASE-15691 > URL: https://issues.apache.org/jira/browse/HBASE-15691 > Project: HBase > Issue Type: Sub-task >Affects Versions: 1.3.0 >Reporter: Andrew Purtell >Assignee: Stephen Yuan Jiang > Fix For: 1.4.0, 1.3.2, 1.2.7, 1.1.12 > > Attachments: HBASE-15691-branch-1.patch, > HBASE-15691.v2-branch-1.patch, HBASE-15691.v3-branch-1.patch > > > HBASE-10205 solves the following problem: > " > The BucketCache WriterThread calls BucketCache.freeSpace() upon draining the > RAM queue containing entries to be cached. freeSpace() in turn calls > BucketSizeInfo.statistics() through BucketAllocator.getIndexStatistics(), > which iterates over 'bucketList'. At the same time another WriterThread might > call BucketAllocator.allocateBlock(), which may call > BucketSizeInfo.allocateBlock(), add a bucket to 'bucketList' and consequently > cause a ConcurrentModificationException. Calls to > BucketAllocator.allocateBlock() are synchronized, but calls to > BucketAllocator.getIndexStatistics() are not, which allows this race to occur. > " > However, for some unknown reason, HBASE-10205 was only committed to master > (2.0 and beyond) and 0.98 branches only. To preserve continuity we should > commit it to branch-1. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-15691) Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-15691: --- Resolution: Fixed Status: Resolved (was: Patch Available) > Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to > branch-1 > - > > Key: HBASE-15691 > URL: https://issues.apache.org/jira/browse/HBASE-15691 > Project: HBase > Issue Type: Sub-task >Affects Versions: 1.3.0 >Reporter: Andrew Purtell >Assignee: Stephen Yuan Jiang > Fix For: 1.4.0, 1.3.2, 1.2.7, 1.1.12 > > Attachments: HBASE-15691-branch-1.patch, > HBASE-15691.v2-branch-1.patch, HBASE-15691.v3-branch-1.patch > > > HBASE-10205 solves the following problem: > " > The BucketCache WriterThread calls BucketCache.freeSpace() upon draining the > RAM queue containing entries to be cached. freeSpace() in turn calls > BucketSizeInfo.statistics() through BucketAllocator.getIndexStatistics(), > which iterates over 'bucketList'. At the same time another WriterThread might > call BucketAllocator.allocateBlock(), which may call > BucketSizeInfo.allocateBlock(), add a bucket to 'bucketList' and consequently > cause a ConcurrentModificationException. Calls to > BucketAllocator.allocateBlock() are synchronized, but calls to > BucketAllocator.getIndexStatistics() are not, which allows this race to occur. > " > However, for some unknown reason, HBASE-10205 was only committed to master > (2.0 and beyond) and 0.98 branches only. To preserve continuity we should > commit it to branch-1. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-15691) Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057967#comment-16057967 ] Stephen Yuan Jiang commented on HBASE-15691: Thanks, [~zjushch], for the review. > Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to > branch-1 > - > > Key: HBASE-15691 > URL: https://issues.apache.org/jira/browse/HBASE-15691 > Project: HBase > Issue Type: Sub-task >Affects Versions: 1.3.0 >Reporter: Andrew Purtell >Assignee: Stephen Yuan Jiang > Fix For: 1.3.2, 1.4.1, 1.5.0, 1.2.7 > > Attachments: HBASE-15691-branch-1.patch, > HBASE-15691.v2-branch-1.patch, HBASE-15691.v3-branch-1.patch > > > HBASE-10205 solves the following problem: > " > The BucketCache WriterThread calls BucketCache.freeSpace() upon draining the > RAM queue containing entries to be cached. freeSpace() in turn calls > BucketSizeInfo.statistics() through BucketAllocator.getIndexStatistics(), > which iterates over 'bucketList'. At the same time another WriterThread might > call BucketAllocator.allocateBlock(), which may call > BucketSizeInfo.allocateBlock(), add a bucket to 'bucketList' and consequently > cause a ConcurrentModificationException. Calls to > BucketAllocator.allocateBlock() are synchronized, but calls to > BucketAllocator.getIndexStatistics() are not, which allows this race to occur. > " > However, for some unknown reason, HBASE-10205 was only committed to master > (2.0 and beyond) and 0.98 branches only. To preserve continuity we should > commit it to branch-1. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-15691) Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-15691: --- Hadoop Flags: Reviewed Status: Patch Available (was: In Progress) > Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to > branch-1 > - > > Key: HBASE-15691 > URL: https://issues.apache.org/jira/browse/HBASE-15691 > Project: HBase > Issue Type: Sub-task >Affects Versions: 1.3.0 >Reporter: Andrew Purtell >Assignee: Stephen Yuan Jiang > Fix For: 1.3.2, 1.4.1, 1.5.0, 1.2.7 > > Attachments: HBASE-15691-branch-1.patch, > HBASE-15691.v2-branch-1.patch, HBASE-15691.v3-branch-1.patch > > > HBASE-10205 solves the following problem: > " > The BucketCache WriterThread calls BucketCache.freeSpace() upon draining the > RAM queue containing entries to be cached. freeSpace() in turn calls > BucketSizeInfo.statistics() through BucketAllocator.getIndexStatistics(), > which iterates over 'bucketList'. At the same time another WriterThread might > call BucketAllocator.allocateBlock(), which may call > BucketSizeInfo.allocateBlock(), add a bucket to 'bucketList' and consequently > cause a ConcurrentModificationException. Calls to > BucketAllocator.allocateBlock() are synchronized, but calls to > BucketAllocator.getIndexStatistics() are not, which allows this race to occur. > " > However, for some unknown reason, HBASE-10205 was only committed to master > (2.0 and beyond) and 0.98 branches only. To preserve continuity we should > commit it to branch-1. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-15691) Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-15691: --- Attachment: HBASE-15691.v3-branch-1.patch > Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to > branch-1 > - > > Key: HBASE-15691 > URL: https://issues.apache.org/jira/browse/HBASE-15691 > Project: HBase > Issue Type: Sub-task >Affects Versions: 1.3.0 >Reporter: Andrew Purtell >Assignee: Stephen Yuan Jiang > Fix For: 1.3.2, 1.4.1, 1.5.0, 1.2.7 > > Attachments: HBASE-15691-branch-1.patch, > HBASE-15691.v2-branch-1.patch, HBASE-15691.v3-branch-1.patch > > > HBASE-10205 solves the following problem: > " > The BucketCache WriterThread calls BucketCache.freeSpace() upon draining the > RAM queue containing entries to be cached. freeSpace() in turn calls > BucketSizeInfo.statistics() through BucketAllocator.getIndexStatistics(), > which iterates over 'bucketList'. At the same time another WriterThread might > call BucketAllocator.allocateBlock(), which may call > BucketSizeInfo.allocateBlock(), add a bucket to 'bucketList' and consequently > cause a ConcurrentModificationException. Calls to > BucketAllocator.allocateBlock() are synchronized, but calls to > BucketAllocator.getIndexStatistics() are not, which allows this race to occur. > " > However, for some unknown reason, HBASE-10205 was only committed to master > (2.0 and beyond) and 0.98 branches only. To preserve continuity we should > commit it to branch-1. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HBASE-18244) org.apache.hadoop.hbase.client.rsgroup.TestShellRSGroups hangs/fails
[ https://issues.apache.org/jira/browse/HBASE-18244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang reassigned HBASE-18244: -- Assignee: Stephen Yuan Jiang > org.apache.hadoop.hbase.client.rsgroup.TestShellRSGroups hangs/fails > > > Key: HBASE-18244 > URL: https://issues.apache.org/jira/browse/HBASE-18244 > Project: HBase > Issue Type: Bug > Components: test >Reporter: Josh Elser >Assignee: Stephen Yuan Jiang > Fix For: 2.0.0 > > > Sometime in the past couple of weeks, TestShellRSGroups has started > timing-out/failing for me. > It will get stuck on a call to moveTables() > {noformat} > "main" #1 prio=5 os_prio=31 tid=0x7ff012004800 nid=0x1703 in > Object.wait() [0x7020d000] >java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at > org.apache.hadoop.hbase.ipc.BlockingRpcCallback.get(BlockingRpcCallback.java:62) > - locked <0x00078d1003f0> (a > org.apache.hadoop.hbase.ipc.BlockingRpcCallback) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:328) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$200(AbstractRpcClient.java:94) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:567) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$BlockingStub.execMasterService(MasterProtos.java) > at > org.apache.hadoop.hbase.client.ConnectionImplementation$3.execMasterService(ConnectionImplementation.java:1500) > at > org.apache.hadoop.hbase.client.HBaseAdmin$67$1.rpcCall(HBaseAdmin.java:2991) > at > org.apache.hadoop.hbase.client.HBaseAdmin$67$1.rpcCall(HBaseAdmin.java:2986) > at > org.apache.hadoop.hbase.client.MasterCallable.call(MasterCallable.java:98) > at > org.apache.hadoop.hbase.client.HBaseAdmin$67.callExecService(HBaseAdmin.java:2997) > at > org.apache.hadoop.hbase.client.SyncCoprocessorRpcChannel.callBlockingMethod(SyncCoprocessorRpcChannel.java:69) > at > org.apache.hadoop.hbase.protobuf.generated.RSGroupAdminProtos$RSGroupAdminService$BlockingStub.moveTables(RSGroupAdminProtos.java:13171) > at > org.apache.hadoop.hbase.rsgroup.RSGroupAdminClient.moveTables(RSGroupAdminClient.java:117) > {noformat} > The server-side end of the RPC is waiting on a procedure to finish: > {noformat} > "RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=64242" #289 daemon > prio=5 os_prio=31 tid=0x7ff015b7c000 nid=0x1e603 waiting on condition > [0x7dbc9000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitFor(ProcedureSyncWait.java:184) > at > org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitFor(ProcedureSyncWait.java:171) > at > org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitForProcedureToComplete(ProcedureSyncWait.java:141) > at > org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitForProcedureToCompleteIOE(ProcedureSyncWait.java:130) > at > org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.submitAndWaitProcedure(ProcedureSyncWait.java:123) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.unassign(AssignmentManager.java:478) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.unassign(AssignmentManager.java:465) > at > org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.moveTables(RSGroupAdminServer.java:432) > at > org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint$RSGroupAdminServiceImpl.moveTables(RSGroupAdminEndpoint.java:174) > at > org.apache.hadoop.hbase.protobuf.generated.RSGroupAdminProtos$RSGroupAdminService.callMethod(RSGroupAdminProtos.java:12786) > at > org.apache.hadoop.hbase.master.MasterRpcServices.execMasterService(MasterRpcServices.java:673) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:406) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:278) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:258) >Locked ownable synchronizers: > - None > {noformat} > I don't see anything else running in the thread dump, but I do se
[jira] [Updated] (HBASE-18244) org.apache.hadoop.hbase.client.rsgroup.TestShellRSGroups hangs/fails
[ https://issues.apache.org/jira/browse/HBASE-18244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18244: --- Fix Version/s: (was: 3.0.0) 2.0.0 > org.apache.hadoop.hbase.client.rsgroup.TestShellRSGroups hangs/fails > > > Key: HBASE-18244 > URL: https://issues.apache.org/jira/browse/HBASE-18244 > Project: HBase > Issue Type: Bug > Components: test >Reporter: Josh Elser > Fix For: 2.0.0 > > > Sometime in the past couple of weeks, TestShellRSGroups has started > timing-out/failing for me. > It will get stuck on a call to moveTables() > {noformat} > "main" #1 prio=5 os_prio=31 tid=0x7ff012004800 nid=0x1703 in > Object.wait() [0x7020d000] >java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at > org.apache.hadoop.hbase.ipc.BlockingRpcCallback.get(BlockingRpcCallback.java:62) > - locked <0x00078d1003f0> (a > org.apache.hadoop.hbase.ipc.BlockingRpcCallback) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:328) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$200(AbstractRpcClient.java:94) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:567) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$BlockingStub.execMasterService(MasterProtos.java) > at > org.apache.hadoop.hbase.client.ConnectionImplementation$3.execMasterService(ConnectionImplementation.java:1500) > at > org.apache.hadoop.hbase.client.HBaseAdmin$67$1.rpcCall(HBaseAdmin.java:2991) > at > org.apache.hadoop.hbase.client.HBaseAdmin$67$1.rpcCall(HBaseAdmin.java:2986) > at > org.apache.hadoop.hbase.client.MasterCallable.call(MasterCallable.java:98) > at > org.apache.hadoop.hbase.client.HBaseAdmin$67.callExecService(HBaseAdmin.java:2997) > at > org.apache.hadoop.hbase.client.SyncCoprocessorRpcChannel.callBlockingMethod(SyncCoprocessorRpcChannel.java:69) > at > org.apache.hadoop.hbase.protobuf.generated.RSGroupAdminProtos$RSGroupAdminService$BlockingStub.moveTables(RSGroupAdminProtos.java:13171) > at > org.apache.hadoop.hbase.rsgroup.RSGroupAdminClient.moveTables(RSGroupAdminClient.java:117) > {noformat} > The server-side end of the RPC is waiting on a procedure to finish: > {noformat} > "RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=64242" #289 daemon > prio=5 os_prio=31 tid=0x7ff015b7c000 nid=0x1e603 waiting on condition > [0x7dbc9000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitFor(ProcedureSyncWait.java:184) > at > org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitFor(ProcedureSyncWait.java:171) > at > org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitForProcedureToComplete(ProcedureSyncWait.java:141) > at > org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitForProcedureToCompleteIOE(ProcedureSyncWait.java:130) > at > org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.submitAndWaitProcedure(ProcedureSyncWait.java:123) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.unassign(AssignmentManager.java:478) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.unassign(AssignmentManager.java:465) > at > org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.moveTables(RSGroupAdminServer.java:432) > at > org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint$RSGroupAdminServiceImpl.moveTables(RSGroupAdminEndpoint.java:174) > at > org.apache.hadoop.hbase.protobuf.generated.RSGroupAdminProtos$RSGroupAdminService.callMethod(RSGroupAdminProtos.java:12786) > at > org.apache.hadoop.hbase.master.MasterRpcServices.execMasterService(MasterRpcServices.java:673) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:406) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:278) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:258) >Locked ownable synchronizers: > - None > {noformat} > I don't see anything else running in the thread dump, but I do see that meta > was cl
[jira] [Commented] (HBASE-15691) Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056708#comment-16056708 ] Stephen Yuan Jiang commented on HBASE-15691: [~zjushch], you reviewed the original patch in HBASE-10205. Could you help review the V2 patch? > Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to > branch-1 > - > > Key: HBASE-15691 > URL: https://issues.apache.org/jira/browse/HBASE-15691 > Project: HBase > Issue Type: Sub-task >Affects Versions: 1.3.0 >Reporter: Andrew Purtell >Assignee: Stephen Yuan Jiang > Fix For: 1.3.2, 1.4.1, 1.5.0, 1.2.7 > > Attachments: HBASE-15691-branch-1.patch, HBASE-15691.v2-branch-1.patch > > > HBASE-10205 solves the following problem: > " > The BucketCache WriterThread calls BucketCache.freeSpace() upon draining the > RAM queue containing entries to be cached. freeSpace() in turn calls > BucketSizeInfo.statistics() through BucketAllocator.getIndexStatistics(), > which iterates over 'bucketList'. At the same time another WriterThread might > call BucketAllocator.allocateBlock(), which may call > BucketSizeInfo.allocateBlock(), add a bucket to 'bucketList' and consequently > cause a ConcurrentModificationException. Calls to > BucketAllocator.allocateBlock() are synchronized, but calls to > BucketAllocator.getIndexStatistics() are not, which allows this race to occur. > " > However, for some unknown reason, HBASE-10205 was only committed to master > (2.0 and beyond) and 0.98 branches only. To preserve continuity we should > commit it to branch-1. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-15691) Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-15691: --- Description: HBASE-10205 solves the following problem: " The BucketCache WriterThread calls BucketCache.freeSpace() upon draining the RAM queue containing entries to be cached. freeSpace() in turn calls BucketSizeInfo.statistics() through BucketAllocator.getIndexStatistics(), which iterates over 'bucketList'. At the same time another WriterThread might call BucketAllocator.allocateBlock(), which may call BucketSizeInfo.allocateBlock(), add a bucket to 'bucketList' and consequently cause a ConcurrentModificationException. Calls to BucketAllocator.allocateBlock() are synchronized, but calls to BucketAllocator.getIndexStatistics() are not, which allows this race to occur. " However, for some unknown reason, HBASE-10205 was only committed to master (2.0 and beyond) and 0.98 branches only. To preserve continuity we should commit it to branch-1. was:HBASE-10205 was committed to trunk and 0.98 branches only. To preserve continuity we should commit it to branch-1. The change requires more than nontrivial fixups so I will attach a backport of the change from trunk to current branch-1 here. > Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to > branch-1 > - > > Key: HBASE-15691 > URL: https://issues.apache.org/jira/browse/HBASE-15691 > Project: HBase > Issue Type: Sub-task >Affects Versions: 1.3.0 >Reporter: Andrew Purtell >Assignee: Stephen Yuan Jiang > Fix For: 1.3.2, 1.4.1, 1.5.0, 1.2.7 > > Attachments: HBASE-15691-branch-1.patch, HBASE-15691.v2-branch-1.patch > > > HBASE-10205 solves the following problem: > " > The BucketCache WriterThread calls BucketCache.freeSpace() upon draining the > RAM queue containing entries to be cached. freeSpace() in turn calls > BucketSizeInfo.statistics() through BucketAllocator.getIndexStatistics(), > which iterates over 'bucketList'. At the same time another WriterThread might > call BucketAllocator.allocateBlock(), which may call > BucketSizeInfo.allocateBlock(), add a bucket to 'bucketList' and consequently > cause a ConcurrentModificationException. Calls to > BucketAllocator.allocateBlock() are synchronized, but calls to > BucketAllocator.getIndexStatistics() are not, which allows this race to occur. > " > However, for some unknown reason, HBASE-10205 was only committed to master > (2.0 and beyond) and 0.98 branches only. To preserve continuity we should > commit it to branch-1. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18036) HBase 1.x : Data locality is not maintained after cluster restart or SSH
[ https://issues.apache.org/jira/browse/HBASE-18036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18036: --- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) > HBase 1.x : Data locality is not maintained after cluster restart or SSH > > > Key: HBASE-18036 > URL: https://issues.apache.org/jira/browse/HBASE-18036 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 1.4.0, 1.3.1, 1.2.5, 1.1.10 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang > Fix For: 1.4.0, 1.3.2, 1.1.11, 1.2.7 > > Attachments: HBASE-18036.v0-branch-1.1.patch, > HBASE-18036.v0-branch-1.patch, HBASE-18036.v1-branch-1.1.patch, > HBASE-18036.v2-branch-1.1.patch > > > After HBASE-2896 / HBASE-4402, we think data locality is maintained after > cluster restart. However, we have seem some complains about data locality > loss when cluster restart (eg. HBASE-17963). > Examining the AssignmentManager#processDeadServersAndRegionsInTransition() > code, for cluster start, I expected to hit the following code path: > {code} > if (!failover) { > // Fresh cluster startup. > LOG.info("Clean cluster startup. Assigning user regions"); > assignAllUserRegions(allRegions); > } > {code} > where assignAllUserRegions would use retainAssignment() call in LoadBalancer; > however, from master log, we usually hit the failover code path: > {code} > // If we found user regions out on cluster, its a failover. > if (failover) { > LOG.info("Found regions out on cluster or in RIT; presuming failover"); > // Process list of dead servers and regions in RIT. > // See HBASE-4580 for more information. > processDeadServersAndRecoverLostRegions(deadServers); > } > {code} > where processDeadServersAndRecoverLostRegions() would put dead servers in SSH > and SSH uses roundRobinAssignment() in LoadBalancer. That is why we would > see loss locality more often than retaining locality during cluster restart. > Note: the code I was looking at is close to branch-1 and branch-1.1. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18036) HBase 1.x : Data locality is not maintained after cluster restart or SSH
[ https://issues.apache.org/jira/browse/HBASE-18036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18036: --- Summary: HBase 1.x : Data locality is not maintained after cluster restart or SSH (was: Data locality is not maintained after cluster restart or SSH) > HBase 1.x : Data locality is not maintained after cluster restart or SSH > > > Key: HBASE-18036 > URL: https://issues.apache.org/jira/browse/HBASE-18036 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 1.4.0, 1.3.1, 1.2.5, 1.1.10 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang > Fix For: 1.4.0, 1.3.2, 1.1.11, 1.2.7 > > Attachments: HBASE-18036.v0-branch-1.1.patch, > HBASE-18036.v0-branch-1.patch, HBASE-18036.v1-branch-1.1.patch, > HBASE-18036.v2-branch-1.1.patch > > > After HBASE-2896 / HBASE-4402, we think data locality is maintained after > cluster restart. However, we have seem some complains about data locality > loss when cluster restart (eg. HBASE-17963). > Examining the AssignmentManager#processDeadServersAndRegionsInTransition() > code, for cluster start, I expected to hit the following code path: > {code} > if (!failover) { > // Fresh cluster startup. > LOG.info("Clean cluster startup. Assigning user regions"); > assignAllUserRegions(allRegions); > } > {code} > where assignAllUserRegions would use retainAssignment() call in LoadBalancer; > however, from master log, we usually hit the failover code path: > {code} > // If we found user regions out on cluster, its a failover. > if (failover) { > LOG.info("Found regions out on cluster or in RIT; presuming failover"); > // Process list of dead servers and regions in RIT. > // See HBASE-4580 for more information. > processDeadServersAndRecoverLostRegions(deadServers); > } > {code} > where processDeadServersAndRecoverLostRegions() would put dead servers in SSH > and SSH uses roundRobinAssignment() in LoadBalancer. That is why we would > see loss locality more often than retaining locality during cluster restart. > Note: the code I was looking at is close to branch-1 and branch-1.1. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18036) Data locality is not maintained after cluster restart or SSH
[ https://issues.apache.org/jira/browse/HBASE-18036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18036: --- Fix Version/s: 1.4.0 > Data locality is not maintained after cluster restart or SSH > > > Key: HBASE-18036 > URL: https://issues.apache.org/jira/browse/HBASE-18036 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 1.4.0, 1.3.1, 1.2.5, 1.1.10 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang > Fix For: 1.4.0, 1.3.2, 1.1.11, 1.2.7 > > Attachments: HBASE-18036.v0-branch-1.1.patch, > HBASE-18036.v0-branch-1.patch, HBASE-18036.v1-branch-1.1.patch, > HBASE-18036.v2-branch-1.1.patch > > > After HBASE-2896 / HBASE-4402, we think data locality is maintained after > cluster restart. However, we have seem some complains about data locality > loss when cluster restart (eg. HBASE-17963). > Examining the AssignmentManager#processDeadServersAndRegionsInTransition() > code, for cluster start, I expected to hit the following code path: > {code} > if (!failover) { > // Fresh cluster startup. > LOG.info("Clean cluster startup. Assigning user regions"); > assignAllUserRegions(allRegions); > } > {code} > where assignAllUserRegions would use retainAssignment() call in LoadBalancer; > however, from master log, we usually hit the failover code path: > {code} > // If we found user regions out on cluster, its a failover. > if (failover) { > LOG.info("Found regions out on cluster or in RIT; presuming failover"); > // Process list of dead servers and regions in RIT. > // See HBASE-4580 for more information. > processDeadServersAndRecoverLostRegions(deadServers); > } > {code} > where processDeadServersAndRecoverLostRegions() would put dead servers in SSH > and SSH uses roundRobinAssignment() in LoadBalancer. That is why we would > see loss locality more often than retaining locality during cluster restart. > Note: the code I was looking at is close to branch-1 and branch-1.1. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18036) Data locality is not maintained after cluster restart or SSH
[ https://issues.apache.org/jira/browse/HBASE-18036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056681#comment-16056681 ] Stephen Yuan Jiang commented on HBASE-18036: [~enis], with Proc-V2 AM, the current change is no longer available. Currently, with initial commit of new AM, SSH calls AM.createAssignProcedures(), with forceNewPlan=true. Even forceNewPlan is false, when we compare existing plan's ServerName, it will not be equal to the dead server due to timestamp change (ServerName is hostname+port+timestamp) & hence a new plan/server would be used for the region assignment. Hence, locality is not guaranteed to be retained. The potential change would be more involved than we have now in 1.x code base. I open HBASE-18246 to track it (FYI, [~stack]). > Data locality is not maintained after cluster restart or SSH > > > Key: HBASE-18036 > URL: https://issues.apache.org/jira/browse/HBASE-18036 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 1.4.0, 1.3.1, 1.2.5, 1.1.10 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang > Fix For: 1.3.2, 1.1.11, 1.2.7 > > Attachments: HBASE-18036.v0-branch-1.1.patch, > HBASE-18036.v0-branch-1.patch, HBASE-18036.v1-branch-1.1.patch, > HBASE-18036.v2-branch-1.1.patch > > > After HBASE-2896 / HBASE-4402, we think data locality is maintained after > cluster restart. However, we have seem some complains about data locality > loss when cluster restart (eg. HBASE-17963). > Examining the AssignmentManager#processDeadServersAndRegionsInTransition() > code, for cluster start, I expected to hit the following code path: > {code} > if (!failover) { > // Fresh cluster startup. > LOG.info("Clean cluster startup. Assigning user regions"); > assignAllUserRegions(allRegions); > } > {code} > where assignAllUserRegions would use retainAssignment() call in LoadBalancer; > however, from master log, we usually hit the failover code path: > {code} > // If we found user regions out on cluster, its a failover. > if (failover) { > LOG.info("Found regions out on cluster or in RIT; presuming failover"); > // Process list of dead servers and regions in RIT. > // See HBASE-4580 for more information. > processDeadServersAndRecoverLostRegions(deadServers); > } > {code} > where processDeadServersAndRecoverLostRegions() would put dead servers in SSH > and SSH uses roundRobinAssignment() in LoadBalancer. That is why we would > see loss locality more often than retaining locality during cluster restart. > Note: the code I was looking at is close to branch-1 and branch-1.1. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18246) Proc-V2 AM: Maintain Data locality in ServerCrashProcedure
[ https://issues.apache.org/jira/browse/HBASE-18246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18246: --- Summary: Proc-V2 AM: Maintain Data locality in ServerCrashProcedure (was: Maintain Data locality in ServerCrashProcedure) > Proc-V2 AM: Maintain Data locality in ServerCrashProcedure > -- > > Key: HBASE-18246 > URL: https://issues.apache.org/jira/browse/HBASE-18246 > Project: HBase > Issue Type: Sub-task > Components: Region Assignment >Affects Versions: 2.0.0 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang > Fix For: 2.0.0 > > > Before HBASE-18036, SSH would use round-robin to re-distribute regions during > processing. Round-robin assignment would loss data locality. HBASE-18036 > retains data locality if the dead region server has already restarted when > the dead RS is processing. > With Proc-V2 based AM, the change of HBASE-18036 in Apache HBASE 1.x releases > is no longer possible. We need to implement the same logic under Proc-V2 > based AM. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18246) Maintain Data locality in ServerCrashProcedure
Stephen Yuan Jiang created HBASE-18246: -- Summary: Maintain Data locality in ServerCrashProcedure Key: HBASE-18246 URL: https://issues.apache.org/jira/browse/HBASE-18246 Project: HBase Issue Type: Sub-task Components: Region Assignment Affects Versions: 2.0.0 Reporter: Stephen Yuan Jiang Assignee: Stephen Yuan Jiang Before HBASE-18036, SSH would use round-robin to re-distribute regions during processing. Round-robin assignment would loss data locality. HBASE-18036 retains data locality if the dead region server has already restarted when the dead RS is processing. With Proc-V2 based AM, the change of HBASE-18036 in Apache HBASE 1.x releases is no longer possible. We need to implement the same logic under Proc-V2 based AM. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18036) Data locality is not maintained after cluster restart or SSH
[ https://issues.apache.org/jira/browse/HBASE-18036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18036: --- Fix Version/s: 1.2.7 1.1.11 1.3.2 > Data locality is not maintained after cluster restart or SSH > > > Key: HBASE-18036 > URL: https://issues.apache.org/jira/browse/HBASE-18036 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 1.4.0, 1.3.1, 1.2.5, 1.1.10 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang > Fix For: 1.3.2, 1.1.11, 1.2.7 > > Attachments: HBASE-18036.v0-branch-1.1.patch, > HBASE-18036.v0-branch-1.patch, HBASE-18036.v1-branch-1.1.patch, > HBASE-18036.v2-branch-1.1.patch > > > After HBASE-2896 / HBASE-4402, we think data locality is maintained after > cluster restart. However, we have seem some complains about data locality > loss when cluster restart (eg. HBASE-17963). > Examining the AssignmentManager#processDeadServersAndRegionsInTransition() > code, for cluster start, I expected to hit the following code path: > {code} > if (!failover) { > // Fresh cluster startup. > LOG.info("Clean cluster startup. Assigning user regions"); > assignAllUserRegions(allRegions); > } > {code} > where assignAllUserRegions would use retainAssignment() call in LoadBalancer; > however, from master log, we usually hit the failover code path: > {code} > // If we found user regions out on cluster, its a failover. > if (failover) { > LOG.info("Found regions out on cluster or in RIT; presuming failover"); > // Process list of dead servers and regions in RIT. > // See HBASE-4580 for more information. > processDeadServersAndRecoverLostRegions(deadServers); > } > {code} > where processDeadServersAndRecoverLostRegions() would put dead servers in SSH > and SSH uses roundRobinAssignment() in LoadBalancer. That is why we would > see loss locality more often than retaining locality during cluster restart. > Note: the code I was looking at is close to branch-1 and branch-1.1. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18225) Fix findbugs regression calling toString() on an array
[ https://issues.apache.org/jira/browse/HBASE-18225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051356#comment-16051356 ] Stephen Yuan Jiang commented on HBASE-18225: Looks good to me. > Fix findbugs regression calling toString() on an array > -- > > Key: HBASE-18225 > URL: https://issues.apache.org/jira/browse/HBASE-18225 > Project: HBase > Issue Type: Bug >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Trivial > Fix For: 2.0.0, 3.0.0 > > Attachments: HBASE-18225.001.patch > > > Looks like we got a findbugs warning as a result of HBASE-18166 > {code} > diff --git > a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java > > b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java > index 1d04944250..b7e0244aa2 100644 > --- > a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java > +++ > b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java > @@ -2807,8 +2807,8 @@ public class RSRpcServices implements > HBaseRPCErrorHandler, > HRegionInfo hri = rsh.s.getRegionInfo(); > // Yes, should be the same instance > if (regionServer.getOnlineRegion(hri.getRegionName()) != rsh.r) { > - String msg = "Region was re-opened after the scanner" + scannerName + > " was created: " > - + hri.getRegionNameAsString(); > + String msg = "Region has changed on the scanner " + scannerName + ": > regionName=" > + + hri.getRegionName() + ", scannerRegionName=" + rsh.r; > {code} > Looks like {{hri.getRegionNameAsString()}} was unintentionally changed to > {{hri.getRegionName()}}, [~syuanjiang]/[~stack]? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18166) [AMv2] We are splitting already-split files
[ https://issues.apache.org/jira/browse/HBASE-18166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16038258#comment-16038258 ] Stephen Yuan Jiang commented on HBASE-18166: [~stack], when I implemented the SplitTableRegionProcedure, I copied the logic from SplitTransactionImpl.java: {code} /** * Creates reference files for top and bottom half of the * @param hstoreFilesToSplit map of store files to create half file references for. * @return the number of reference files that were created. * @throws IOException */ private Pair splitStoreFiles( final Map> hstoreFilesToSplit) throws IOException { if (hstoreFilesToSplit == null) { // Could be null because close didn't succeed -- for now consider it fatal throw new IOException("Close returned empty list of StoreFiles"); } // The following code sets up a thread pool executor with as many slots as // there's files to split. It then fires up everything, waits for // completion and finally checks for any exception int nbFiles = 0; for (Map.Entry> entry: hstoreFilesToSplit.entrySet()) { nbFiles += entry.getValue().size(); ===> possible to have reference files } {code} I just wonder whether we should change the logic in SplitTransactionImpl in branch-1 to skip splitting reference files (I checked HRegion#doClose() and did not see the logic to skip reference files in region server side). > [AMv2] We are splitting already-split files > --- > > Key: HBASE-18166 > URL: https://issues.apache.org/jira/browse/HBASE-18166 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 2.0.0 >Reporter: stack >Assignee: stack > Fix For: 2.0.0 > > Attachments: HBASE-18166.master.001.patch, > HBASE-18166.master.002.patch > > > Interesting issue. The below adds a lag cleaning up files after a compaction > in case of on-going Scanners (for read replicas/offheap). > HBASE-14970 Backport HBASE-13082 and its sub-jira to branch-1 - recommit (Ram) > What the lag means is that now that split is run from the HMaster in master > branch, when it goes to get a listing of the files to split, it can pick up > files that are for archiving but that have not been archived yet. When it > does, it goes ahead and splits them... making references of references. > Its a mess. > I added asking the Region if it is splittable a while back. The Master calls > this from SplitTableRegionProcedure during preparation. If the RegionServer > asked for the split, it is sort of redundant work given the RS asks itself if > any references still; if any, it'll wait before asking for a split. But if a > user/client asks, then this isSplittable over RPC comes in handy. > I was thinking that isSplittable could return list of files > Or, easier, given we know a region is Splittable by the time we go to split > the files, then I think master-side we can just skip any references found > presuming read-for-archive. > Will be back with a patch. Want to test on cluster first (Side-effect is > regions are offline because file at end of the reference to a reference is > removed ... and so the open fails). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-16488) Starting namespace and quota services in master startup asynchronizely
[ https://issues.apache.org/jira/browse/HBASE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-16488: --- Attachment: HBASE-16488.v7-branch-1.patch > Starting namespace and quota services in master startup asynchronizely > -- > > Key: HBASE-16488 > URL: https://issues.apache.org/jira/browse/HBASE-16488 > Project: HBase > Issue Type: Improvement > Components: master >Affects Versions: 2.0.0, 1.3.0, 1.0.3, 1.4.0, 1.1.5, 1.2.2 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang > Attachments: HBASE-16488.v1-branch-1.patch, > HBASE-16488.v1-master.patch, HBASE-16488.v2-branch-1.patch, > HBASE-16488.v2-branch-1.patch, HBASE-16488.v3-branch-1.patch, > HBASE-16488.v3-branch-1.patch, HBASE-16488.v4-branch-1.patch, > HBASE-16488.v5-branch-1.patch, HBASE-16488.v6-branch-1.patch, > HBASE-16488.v7-branch-1.patch > > > From time to time, during internal IT test and from customer, we often see > master initialization failed due to namespace table region takes long time to > assign (eg. sometimes split log takes long time or hanging; or sometimes RS > is temporarily not available; sometimes due to some unknown assignment > issue). In the past, there was some proposal to improve this situation, eg. > HBASE-13556 / HBASE-14190 (Assign system tables ahead of user region > assignment) or HBASE-13557 (Special WAL handling for system tables) or > HBASE-14623 (Implement dedicated WAL for system tables). > This JIRA proposes another way to solve this master initialization fail > issue: namespace service is only used by a handful operations (eg. create > table / namespace DDL / get namespace API / some RS group DDL). Only quota > manager depends on it and quota management is off by default. Therefore, > namespace service is not really needed for master to be functional. So we > could start namespace service asynchronizely without blocking master startup. > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-18093) Overloading the meaning of 'enabled' in Quota Manager to indicate either quota disabled or quota manager not ready is not good
[ https://issues.apache.org/jira/browse/HBASE-18093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18093: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 1.1.11 1.3.2 1.2.6 1.4.0 2.0.0 Status: Resolved (was: Patch Available) > Overloading the meaning of 'enabled' in Quota Manager to indicate either > quota disabled or quota manager not ready is not good > -- > > Key: HBASE-18093 > URL: https://issues.apache.org/jira/browse/HBASE-18093 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 1.1.10 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang >Priority: Minor > Fix For: 2.0.0, 1.4.0, 1.2.6, 1.3.2, 1.1.11 > > Attachments: HBASE-18093.v1-branch-1.patch, > HBASE-18093.v1-master.patch, HBASE-18093.v2-master.patch, > HBASE-18093.v3-master.patch > > > In MasterQuotaManager, a member 'enabled' is used to indicate either quota > feature is disabled or quota manager is not fully initialized. This would > create confusion whether caller should wait for quota manager to be > initialized or change configuration to enable quota. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-18093) Overloading the meaning of 'enabled' in Quota Manager to indicate either quota disabled or quota manager not ready is not good
[ https://issues.apache.org/jira/browse/HBASE-18093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16021752#comment-16021752 ] Stephen Yuan Jiang commented on HBASE-18093: Test failures in branch-1 run are not related to the change. > Overloading the meaning of 'enabled' in Quota Manager to indicate either > quota disabled or quota manager not ready is not good > -- > > Key: HBASE-18093 > URL: https://issues.apache.org/jira/browse/HBASE-18093 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 1.1.10 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang >Priority: Minor > Attachments: HBASE-18093.v1-branch-1.patch, > HBASE-18093.v1-master.patch, HBASE-18093.v2-master.patch, > HBASE-18093.v3-master.patch > > > In MasterQuotaManager, a member 'enabled' is used to indicate either quota > feature is disabled or quota manager is not fully initialized. This would > create confusion whether caller should wait for quota manager to be > initialized or change configuration to enable quota. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-18093) Overloading the meaning of 'enabled' in Quota Manager to indicate either quota disabled or quota manager not ready is not good
[ https://issues.apache.org/jira/browse/HBASE-18093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18093: --- Attachment: HBASE-18093.v1-branch-1.patch > Overloading the meaning of 'enabled' in Quota Manager to indicate either > quota disabled or quota manager not ready is not good > -- > > Key: HBASE-18093 > URL: https://issues.apache.org/jira/browse/HBASE-18093 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 1.1.10 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang >Priority: Minor > Attachments: HBASE-18093.v1-branch-1.patch, > HBASE-18093.v1-master.patch, HBASE-18093.v2-master.patch, > HBASE-18093.v3-master.patch > > > In MasterQuotaManager, a member 'enabled' is used to indicate either quota > feature is disabled or quota manager is not fully initialized. This would > create confusion whether caller should wait for quota manager to be > initialized or change configuration to enable quota. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-18093) Overloading the meaning of 'enabled' in Quota Manager to indicate either quota disabled or quota manager not ready is not good
[ https://issues.apache.org/jira/browse/HBASE-18093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16021201#comment-16021201 ] Stephen Yuan Jiang commented on HBASE-18093: Pre-commit failure unrelated to change; seems env issue. (Also re-run test locally and no problem) > Overloading the meaning of 'enabled' in Quota Manager to indicate either > quota disabled or quota manager not ready is not good > -- > > Key: HBASE-18093 > URL: https://issues.apache.org/jira/browse/HBASE-18093 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 1.1.10 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang >Priority: Minor > Attachments: HBASE-18093.v1-master.patch, > HBASE-18093.v2-master.patch, HBASE-18093.v3-master.patch > > > In MasterQuotaManager, a member 'enabled' is used to indicate either quota > feature is disabled or quota manager is not fully initialized. This would > create confusion whether caller should wait for quota manager to be > initialized or change configuration to enable quota. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-18093) Overloading the meaning of 'enabled' in Quota Manager to indicate either quota disabled or quota manager not ready is not good
[ https://issues.apache.org/jira/browse/HBASE-18093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16020378#comment-16020378 ] Stephen Yuan Jiang commented on HBASE-18093: Rebase the latest change in master in V3 patch. > Overloading the meaning of 'enabled' in Quota Manager to indicate either > quota disabled or quota manager not ready is not good > -- > > Key: HBASE-18093 > URL: https://issues.apache.org/jira/browse/HBASE-18093 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 1.1.10 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang >Priority: Minor > Attachments: HBASE-18093.v1-master.patch, > HBASE-18093.v2-master.patch, HBASE-18093.v3-master.patch > > > In MasterQuotaManager, a member 'enabled' is used to indicate either quota > feature is disabled or quota manager is not fully initialized. This would > create confusion whether caller should wait for quota manager to be > initialized or change configuration to enable quota. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-18093) Overloading the meaning of 'enabled' in Quota Manager to indicate either quota disabled or quota manager not ready is not good
[ https://issues.apache.org/jira/browse/HBASE-18093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18093: --- Attachment: HBASE-18093.v3-master.patch > Overloading the meaning of 'enabled' in Quota Manager to indicate either > quota disabled or quota manager not ready is not good > -- > > Key: HBASE-18093 > URL: https://issues.apache.org/jira/browse/HBASE-18093 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 1.1.10 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang >Priority: Minor > Attachments: HBASE-18093.v1-master.patch, > HBASE-18093.v2-master.patch, HBASE-18093.v3-master.patch > > > In MasterQuotaManager, a member 'enabled' is used to indicate either quota > feature is disabled or quota manager is not fully initialized. This would > create confusion whether caller should wait for quota manager to be > initialized or change configuration to enable quota. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-18093) Overloading the meaning of 'enabled' in Quota Manager to indicate either quota disabled or quota manager not ready is not good
[ https://issues.apache.org/jira/browse/HBASE-18093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16020259#comment-16020259 ] Stephen Yuan Jiang commented on HBASE-18093: The failure from V1 patch does not make sense. Things is good locally. Attach V2 patch addressing the typo issue found by [~te...@apache.org] > Overloading the meaning of 'enabled' in Quota Manager to indicate either > quota disabled or quota manager not ready is not good > -- > > Key: HBASE-18093 > URL: https://issues.apache.org/jira/browse/HBASE-18093 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 1.1.10 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang >Priority: Minor > Attachments: HBASE-18093.v1-master.patch, HBASE-18093.v2-master.patch > > > In MasterQuotaManager, a member 'enabled' is used to indicate either quota > feature is disabled or quota manager is not fully initialized. This would > create confusion whether caller should wait for quota manager to be > initialized or change configuration to enable quota. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-18093) Overloading the meaning of 'enabled' in Quota Manager to indicate either quota disabled or quota manager not ready is not good
[ https://issues.apache.org/jira/browse/HBASE-18093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18093: --- Attachment: HBASE-18093.v2-master.patch > Overloading the meaning of 'enabled' in Quota Manager to indicate either > quota disabled or quota manager not ready is not good > -- > > Key: HBASE-18093 > URL: https://issues.apache.org/jira/browse/HBASE-18093 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 1.1.10 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang >Priority: Minor > Attachments: HBASE-18093.v1-master.patch, HBASE-18093.v2-master.patch > > > In MasterQuotaManager, a member 'enabled' is used to indicate either quota > feature is disabled or quota manager is not fully initialized. This would > create confusion whether caller should wait for quota manager to be > initialized or change configuration to enable quota. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-16488) Starting namespace and quota services in master startup asynchronizely
[ https://issues.apache.org/jira/browse/HBASE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16020184#comment-16020184 ] Stephen Yuan Jiang commented on HBASE-16488: hadoop.hbase.quotas.TestQuotaAdmin failure should be addressed in a generic fix in HBASE-18093 > Starting namespace and quota services in master startup asynchronizely > -- > > Key: HBASE-16488 > URL: https://issues.apache.org/jira/browse/HBASE-16488 > Project: HBase > Issue Type: Improvement > Components: master >Affects Versions: 2.0.0, 1.3.0, 1.0.3, 1.4.0, 1.1.5, 1.2.2 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang > Attachments: HBASE-16488.v1-branch-1.patch, > HBASE-16488.v1-master.patch, HBASE-16488.v2-branch-1.patch, > HBASE-16488.v2-branch-1.patch, HBASE-16488.v3-branch-1.patch, > HBASE-16488.v3-branch-1.patch, HBASE-16488.v4-branch-1.patch, > HBASE-16488.v5-branch-1.patch, HBASE-16488.v6-branch-1.patch > > > From time to time, during internal IT test and from customer, we often see > master initialization failed due to namespace table region takes long time to > assign (eg. sometimes split log takes long time or hanging; or sometimes RS > is temporarily not available; sometimes due to some unknown assignment > issue). In the past, there was some proposal to improve this situation, eg. > HBASE-13556 / HBASE-14190 (Assign system tables ahead of user region > assignment) or HBASE-13557 (Special WAL handling for system tables) or > HBASE-14623 (Implement dedicated WAL for system tables). > This JIRA proposes another way to solve this master initialization fail > issue: namespace service is only used by a handful operations (eg. create > table / namespace DDL / get namespace API / some RS group DDL). Only quota > manager depends on it and quota management is off by default. Therefore, > namespace service is not really needed for master to be functional. So we > could start namespace service asynchronizely without blocking master startup. > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-18093) Overloading the meaning of 'enabled' in Quota Manager to indicate either quota disabled or quota manager not ready is not good
[ https://issues.apache.org/jira/browse/HBASE-18093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16020180#comment-16020180 ] Stephen Yuan Jiang commented on HBASE-18093: V1 patch to distinguish whether quota is disabled or quota manger is uninitialized. > Overloading the meaning of 'enabled' in Quota Manager to indicate either > quota disabled or quota manager not ready is not good > -- > > Key: HBASE-18093 > URL: https://issues.apache.org/jira/browse/HBASE-18093 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 1.1.10 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang >Priority: Minor > Attachments: HBASE-18093.v1-master.patch > > > In MasterQuotaManager, a member 'enabled' is used to indicate either quota > feature is disabled or quota manager is not fully initialized. This would > create confusion whether caller should wait for quota manager to be > initialized or change configuration to enable quota. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-18093) Overloading 'enabled' in Quota Manager to indicate either quota disabled or quota manager not ready is not good
[ https://issues.apache.org/jira/browse/HBASE-18093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18093: --- Summary: Overloading 'enabled' in Quota Manager to indicate either quota disabled or quota manager not ready is not good (was: Overload 'enabled' in Quota Manager) > Overloading 'enabled' in Quota Manager to indicate either quota disabled or > quota manager not ready is not good > --- > > Key: HBASE-18093 > URL: https://issues.apache.org/jira/browse/HBASE-18093 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 1.1.10 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang >Priority: Minor > Attachments: HBASE-18093.v1-master.patch > > > In MasterQuotaManager, a member 'enabled' is used to indicate either quota > feature is disabled or quota manager is not fully initialized. This would > create confusion whether caller should wait for quota manager to be > initialized or change configuration to enable quota. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-18093) Overloading the meaning of 'enabled' in Quota Manager to indicate either quota disabled or quota manager not ready is not good
[ https://issues.apache.org/jira/browse/HBASE-18093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18093: --- Status: Patch Available (was: Open) > Overloading the meaning of 'enabled' in Quota Manager to indicate either > quota disabled or quota manager not ready is not good > -- > > Key: HBASE-18093 > URL: https://issues.apache.org/jira/browse/HBASE-18093 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 1.1.10 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang >Priority: Minor > Attachments: HBASE-18093.v1-master.patch > > > In MasterQuotaManager, a member 'enabled' is used to indicate either quota > feature is disabled or quota manager is not fully initialized. This would > create confusion whether caller should wait for quota manager to be > initialized or change configuration to enable quota. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-18093) Overloading the meaning of 'enabled' in Quota Manager to indicate either quota disabled or quota manager not ready is not good
[ https://issues.apache.org/jira/browse/HBASE-18093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18093: --- Summary: Overloading the meaning of 'enabled' in Quota Manager to indicate either quota disabled or quota manager not ready is not good (was: Overloading 'enabled' in Quota Manager to indicate either quota disabled or quota manager not ready is not good) > Overloading the meaning of 'enabled' in Quota Manager to indicate either > quota disabled or quota manager not ready is not good > -- > > Key: HBASE-18093 > URL: https://issues.apache.org/jira/browse/HBASE-18093 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 1.1.10 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang >Priority: Minor > Attachments: HBASE-18093.v1-master.patch > > > In MasterQuotaManager, a member 'enabled' is used to indicate either quota > feature is disabled or quota manager is not fully initialized. This would > create confusion whether caller should wait for quota manager to be > initialized or change configuration to enable quota. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-18093) Overload 'enabled' in Quota Manager
[ https://issues.apache.org/jira/browse/HBASE-18093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18093: --- Attachment: HBASE-18093.v1-master.patch > Overload 'enabled' in Quota Manager > --- > > Key: HBASE-18093 > URL: https://issues.apache.org/jira/browse/HBASE-18093 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 1.1.10 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang >Priority: Minor > Attachments: HBASE-18093.v1-master.patch > > > In MasterQuotaManager, a member 'enabled' is used to indicate either quota > feature is disabled or quota manager is not fully initialized. This would > create confusion whether caller should wait for quota manager to be > initialized or change configuration to enable quota. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-18093) Overload 'enabled' in Quota Manager
Stephen Yuan Jiang created HBASE-18093: -- Summary: Overload 'enabled' in Quota Manager Key: HBASE-18093 URL: https://issues.apache.org/jira/browse/HBASE-18093 Project: HBase Issue Type: Bug Components: master Affects Versions: 1.1.10 Reporter: Stephen Yuan Jiang Assignee: Stephen Yuan Jiang Priority: Minor In MasterQuotaManager, a member 'enabled' is used to indicate either quota feature is disabled or quota manager is not fully initialized. This would create confusion whether caller should wait for quota manager to be initialized or change configuration to enable quota. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-18067) Support a default converter for data read shell commands
[ https://issues.apache.org/jira/browse/HBASE-18067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16017837#comment-16017837 ] Stephen Yuan Jiang commented on HBASE-18067: +1 Good stuff! Thanks, Josh. > Support a default converter for data read shell commands > > > Key: HBASE-18067 > URL: https://issues.apache.org/jira/browse/HBASE-18067 > Project: HBase > Issue Type: Improvement > Components: shell >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-18067.001.patch, HBASE-18067.002.patch, > HBASE-18067.003.patch > > > The {{get}} and {{scan}} shell commands have the ability to specify some > complicated syntax on how to encode the bytes read from HBase on a per-column > basis. By default, bytes falling outside of a limited range of ASCII are just > printed as hex. > It seems like the intent of these converts was to support conversion of > certain numeric columns as a readable string (e.g. 1234). > However, if non-ascii encoded bytes are stored in the table (e.g. UTF-8 > encoded bytes), we may want to treat all data we read as UTF-8 instead (e.g. > if row+column+value are in Chinese). It would be onerous to require users to > enumerate every column they're reading to parse as UTF-8 instead of the > limited ascii range. We can provide an option to encode all values retrieved > by the command. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-18036) Data locality is not maintained after cluster restart or SSH
[ https://issues.apache.org/jira/browse/HBASE-18036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18036: --- Attachment: HBASE-18036.v0-branch-1.patch > Data locality is not maintained after cluster restart or SSH > > > Key: HBASE-18036 > URL: https://issues.apache.org/jira/browse/HBASE-18036 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 1.4.0, 1.3.1, 1.2.5, 1.1.10 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang > Attachments: HBASE-18036.v0-branch-1.1.patch, > HBASE-18036.v0-branch-1.patch, HBASE-18036.v1-branch-1.1.patch, > HBASE-18036.v2-branch-1.1.patch > > > After HBASE-2896 / HBASE-4402, we think data locality is maintained after > cluster restart. However, we have seem some complains about data locality > loss when cluster restart (eg. HBASE-17963). > Examining the AssignmentManager#processDeadServersAndRegionsInTransition() > code, for cluster start, I expected to hit the following code path: > {code} > if (!failover) { > // Fresh cluster startup. > LOG.info("Clean cluster startup. Assigning user regions"); > assignAllUserRegions(allRegions); > } > {code} > where assignAllUserRegions would use retainAssignment() call in LoadBalancer; > however, from master log, we usually hit the failover code path: > {code} > // If we found user regions out on cluster, its a failover. > if (failover) { > LOG.info("Found regions out on cluster or in RIT; presuming failover"); > // Process list of dead servers and regions in RIT. > // See HBASE-4580 for more information. > processDeadServersAndRecoverLostRegions(deadServers); > } > {code} > where processDeadServersAndRecoverLostRegions() would put dead servers in SSH > and SSH uses roundRobinAssignment() in LoadBalancer. That is why we would > see loss locality more often than retaining locality during cluster restart. > Note: the code I was looking at is close to branch-1 and branch-1.1. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-18036) Data locality is not maintained after cluster restart or SSH
[ https://issues.apache.org/jira/browse/HBASE-18036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-18036: --- Attachment: HBASE-18036.v2-branch-1.1.patch > Data locality is not maintained after cluster restart or SSH > > > Key: HBASE-18036 > URL: https://issues.apache.org/jira/browse/HBASE-18036 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 1.4.0, 1.3.1, 1.2.5, 1.1.10 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang > Attachments: HBASE-18036.v0-branch-1.1.patch, > HBASE-18036.v1-branch-1.1.patch, HBASE-18036.v2-branch-1.1.patch > > > After HBASE-2896 / HBASE-4402, we think data locality is maintained after > cluster restart. However, we have seem some complains about data locality > loss when cluster restart (eg. HBASE-17963). > Examining the AssignmentManager#processDeadServersAndRegionsInTransition() > code, for cluster start, I expected to hit the following code path: > {code} > if (!failover) { > // Fresh cluster startup. > LOG.info("Clean cluster startup. Assigning user regions"); > assignAllUserRegions(allRegions); > } > {code} > where assignAllUserRegions would use retainAssignment() call in LoadBalancer; > however, from master log, we usually hit the failover code path: > {code} > // If we found user regions out on cluster, its a failover. > if (failover) { > LOG.info("Found regions out on cluster or in RIT; presuming failover"); > // Process list of dead servers and regions in RIT. > // See HBASE-4580 for more information. > processDeadServersAndRecoverLostRegions(deadServers); > } > {code} > where processDeadServersAndRecoverLostRegions() would put dead servers in SSH > and SSH uses roundRobinAssignment() in LoadBalancer. That is why we would > see loss locality more often than retaining locality during cluster restart. > Note: the code I was looking at is close to branch-1 and branch-1.1. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HBASE-18036) Data locality is not maintained after cluster restart or SSH
[ https://issues.apache.org/jira/browse/HBASE-18036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008684#comment-16008684 ] Stephen Yuan Jiang edited comment on HBASE-18036 at 5/12/17 9:01 PM: - The V1 patch has minor change based on [~elserj]'s feedback. Also add some logging to make the change clear. The V1 change was tested in a small cluster. I used Ambari to restart cluster and saw the new code path got hit and regions assigned back to its original region server and locality is preserved. Next up: I will use the same logic in branch-1 and other child branches. Base on [~devaraj]'s offline feedback, I will remove the newly introduced "hbase.master.retain.assignment" config in branch-1; but keep the config in other branches (this config is just for in case of regression, user has a way to revert back to original round robin behavior; as patch releases usually don't have full testing) was (Author: syuanjiang): The V1 patch has minor change based on [~elserj]'s feedback. Also add some logging to make the change clear. Next up: I will use the same logic in branch-1 and other child branches. Base on [~devaraj]'s offline feedback, I will remove the newly introduced "hbase.master.retain.assignment" config in branch-1; but keep the config in other branches (this config is just for in case of regression, user has a way to revert back to original round robin behavior; as patch releases usually don't have full testing) > Data locality is not maintained after cluster restart or SSH > > > Key: HBASE-18036 > URL: https://issues.apache.org/jira/browse/HBASE-18036 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 1.4.0, 1.3.1, 1.2.5, 1.1.10 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang > Attachments: HBASE-18036.v0-branch-1.1.patch, > HBASE-18036.v1-branch-1.1.patch > > > After HBASE-2896 / HBASE-4402, we think data locality is maintained after > cluster restart. However, we have seem some complains about data locality > loss when cluster restart (eg. HBASE-17963). > Examining the AssignmentManager#processDeadServersAndRegionsInTransition() > code, for cluster start, I expected to hit the following code path: > {code} > if (!failover) { > // Fresh cluster startup. > LOG.info("Clean cluster startup. Assigning user regions"); > assignAllUserRegions(allRegions); > } > {code} > where assignAllUserRegions would use retainAssignment() call in LoadBalancer; > however, from master log, we usually hit the failover code path: > {code} > // If we found user regions out on cluster, its a failover. > if (failover) { > LOG.info("Found regions out on cluster or in RIT; presuming failover"); > // Process list of dead servers and regions in RIT. > // See HBASE-4580 for more information. > processDeadServersAndRecoverLostRegions(deadServers); > } > {code} > where processDeadServersAndRecoverLostRegions() would put dead servers in SSH > and SSH uses roundRobinAssignment() in LoadBalancer. That is why we would > see loss locality more often than retaining locality during cluster restart. > Note: the code I was looking at is close to branch-1 and branch-1.1. -- This message was sent by Atlassian JIRA (v6.3.15#6346)