[jira] [Updated] (HBASE-28438) Add support splitting region into multiple regions(more than 2)

2024-06-11 Thread Stephen Yuan Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-28438:
---
Summary: Add support splitting region into multiple regions(more than 2)  
(was: Add support spitting region into multiple regions(more than 2))

> Add support splitting region into multiple regions(more than 2)
> ---
>
> Key: HBASE-28438
> URL: https://issues.apache.org/jira/browse/HBASE-28438
> Project: HBase
>  Issue Type: Improvement
>Reporter: Rajeshbabu Chintaguntla
>Assignee: Rajeshbabu Chintaguntla
>Priority: Major
>
> We have a requirement of splitting one region into multiple hundreds of 
> regions at a time to distribute loading hot data. To do that we need split a 
> region and wait for the completion of it and then again split the two regions 
> etc..which is time consuming activity. 
> Would be better to support splitting region into multiple regions more than 
> two so that in single operation we can split the region.
> Todo that we need to take care
> 1)Supporting admin APIs to take multiple split keys
> 2)Implement new procedure to create new regions, creating meta entries and 
> udpating them to meta
> 3) close the parent region and open split regions.
> 4) Update the compaction of post split and readers also to use the portion 
> store file reader based on the range to scan than half store reader.
> 5) make sure the catalog jonitor also cleaning the parent regions when there 
> are all the regions split properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-19389) Limit concurrency of put with dense (hundreds) columns to prevent write handler exhausted

2017-12-05 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-19389:
---
Summary: Limit concurrency of put with dense (hundreds) columns to prevent 
write handler exhausted  (was: Limit concurrency of put with dense (hundreds) 
columns to prevent write hander exhausted)

> Limit concurrency of put with dense (hundreds) columns to prevent write 
> handler exhausted
> -
>
> Key: HBASE-19389
> URL: https://issues.apache.org/jira/browse/HBASE-19389
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 2.0.0
> Environment: 2000+ Region Servers
> PCI-E ssd
>Reporter: Chance Li
>Assignee: Chance Li
> Fix For: 2.0.0
>
> Attachments: CSLM-concurrent-write.png, 
> HBASE-19389-branch-2-V2.patch, HBASE-19389-branch-2.patch, metrics-1.png, 
> ycsb-result.png
>
>
> In a large cluster, with a large number of clients, we found the RS's 
> handlers are all busy sometimes. And after investigation we found the root 
> cause is about CSLM, such as compare function heavy load. We reviewed the 
> related WALs, and found that there were many columns (more than 1000 columns) 
> were writing at that time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-14620) Procedure V2: Update HBCK to incorporate the Proc-V2-based assignment

2017-08-14 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16125787#comment-16125787
 ] 

Stephen Yuan Jiang commented on HBASE-14620:


[~stack], made a few changes in AM / HBCK code to make HBCK UT run.

> Procedure V2: Update HBCK to incorporate the Proc-V2-based assignment
> -
>
> Key: HBASE-14620
> URL: https://issues.apache.org/jira/browse/HBASE-14620
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbck, proc-v2
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 2.0.0
>
> Attachments: HBASE-14620.v1-master.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-14620) Procedure V2: Update HBCK to incorporate the Proc-V2-based assignment

2017-08-13 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-14620:
---
Status: Patch Available  (was: Open)

> Procedure V2: Update HBCK to incorporate the Proc-V2-based assignment
> -
>
> Key: HBASE-14620
> URL: https://issues.apache.org/jira/browse/HBASE-14620
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbck, proc-v2
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 2.0.0
>
> Attachments: HBASE-14620.v1-master.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-14620) Procedure V2: Update HBCK to incorporate the Proc-V2-based assignment

2017-08-13 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-14620:
---
Attachment: HBASE-14620.v1-master.patch

> Procedure V2: Update HBCK to incorporate the Proc-V2-based assignment
> -
>
> Key: HBASE-14620
> URL: https://issues.apache.org/jira/browse/HBASE-14620
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbck, proc-v2
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 2.0.0
>
> Attachments: HBASE-14620.v1-master.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HBASE-18350) RSGroups are broken under AMv2

2017-08-11 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang reassigned HBASE-18350:
--

Assignee: Thiruvel Thirumoolan  (was: Stephen Yuan Jiang)

> RSGroups are broken under AMv2
> --
>
> Key: HBASE-18350
> URL: https://issues.apache.org/jira/browse/HBASE-18350
> Project: HBase
>  Issue Type: Bug
>  Components: rsgroup
>Affects Versions: 2.0.0-alpha-1
>Reporter: Stephen Yuan Jiang
>Assignee: Thiruvel Thirumoolan
>Priority: Blocker
> Fix For: 2.0.0-beta-2
>
>
> The following RSGroups tests were disabled by Core Proc-V2 AM in HBASE-14614:
> - Disabled/Ignore TestRSGroupsOfflineMode#testOffline; need to dig in on what 
> offline is.
> - Disabled/Ignore TestRSGroups.
> This JIRA tracks the work to enable them (or remove/modify if not applicable).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18528) DON'T allow user to modify the passed table/column descriptor

2017-08-09 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120488#comment-16120488
 ] 

Stephen Yuan Jiang commented on HBASE-18528:


+1

> DON'T allow user to modify the passed table/column descriptor
> -
>
> Key: HBASE-18528
> URL: https://issues.apache.org/jira/browse/HBASE-18528
> Project: HBase
>  Issue Type: Sub-task
>  Components: Coprocessors, master
>Reporter: Chia-Ping Tsai
>Assignee: Chia-Ping Tsai
>Priority: Critical
> Fix For: 3.0.0, 2.0.0-alpha-2
>
> Attachments: HBASE-18528.v0.patch
>
>
> We are replacing the HTableDescriptor by TableDescriptor from code base. The 
> TableDescriptor is designed to be a read-only object so user can't modifiy it 
> through MasterObserver. HBASE-18502 change many methods of MasterObserver to 
> use TableDescriptor but some deprecated methods still accept the 
> HTableDescriptor. User may be confused by why some methods can't modify the 
> table descriptor.
> In short, Should we allow user to modify the passed table descriptor?
> # if yes, we should introduce a mechanism that user can return a modified 
> table descripror
> # if no, we should pass ImmutableHTableDescriptor to user. Or we just remove 
> all methods accepting the HTableDescriptor
> Ditto for HColumnDescriptor.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18353) Enable TestCorruptedRegionStoreFile that were disabled by Proc-V2 AM in HBASE-14614

2017-08-09 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120208#comment-16120208
 ] 

Stephen Yuan Jiang commented on HBASE-18353:


We have already had 'move' method to move region to a different RS.  I was 
wondering why [~Apache9] did not consider this when working on HBASE-17712.  

We have 'offline' method to completely offline the region; the 'unassign' would 
just close the region in RS, we can manually assign the region or when 
ServerCrashProcedure processes the closed region, the region would be 
re-assigned.  

Also note that we only can deprecate Admin method for 2.0, we cannot remove it. 
 

> Enable TestCorruptedRegionStoreFile that were disabled by Proc-V2 AM in 
> HBASE-14614
> ---
>
> Key: HBASE-18353
> URL: https://issues.apache.org/jira/browse/HBASE-18353
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-alpha-1
>Reporter: Stephen Yuan Jiang
>Assignee: Vladimir Rodionov
> Attachments: HBASE-18353-v1.patch, HBASE-18353-v2.patch
>
>
> HBASE-14614 disabled TestCorruptedRegionStoreFile, as it depends on a 
> half-implemented reopen of a region when a store file goes missing.
> This JIRA tracks the work to fix/enable the test.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HBASE-18353) Enable TestCorruptedRegionStoreFile that were disabled by Proc-V2 AM in HBASE-14614

2017-08-07 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117914#comment-16117914
 ] 

Stephen Yuan Jiang edited comment on HBASE-18353 at 8/8/17 6:14 AM:


[~Apache9], the comments are incorrect (at least in branch-2, I have not 
checked branch-1).  unassign asks RS to close the region and mark the region in 
CLOSED state once RS successfully closes the region.  It would not 
automatically reopen the region.


was (Author: syuanjiang):
[~Apache9], the comments are incorrect (at least in branch-2, I have not 
checked branch-1).  unassign asks RS to close the region and mark the region in 
CLOSED state once RS successfully closes the region.

> Enable TestCorruptedRegionStoreFile that were disabled by Proc-V2 AM in 
> HBASE-14614
> ---
>
> Key: HBASE-18353
> URL: https://issues.apache.org/jira/browse/HBASE-18353
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-alpha-1
>Reporter: Stephen Yuan Jiang
>Assignee: Vladimir Rodionov
> Attachments: HBASE-18353-v1.patch, HBASE-18353-v2.patch
>
>
> HBASE-14614 disabled TestCorruptedRegionStoreFile, as it depends on a 
> half-implemented reopen of a region when a store file goes missing.
> This JIRA tracks the work to fix/enable the test.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18353) Enable TestCorruptedRegionStoreFile that were disabled by Proc-V2 AM in HBASE-14614

2017-08-07 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117914#comment-16117914
 ] 

Stephen Yuan Jiang commented on HBASE-18353:


[~Apache9], the comments are incorrect (at least in branch-2, I have not 
checked branch-1).  unassign asks RS to close the region and mark the region in 
CLOSED state once RS successfully closes the region.

> Enable TestCorruptedRegionStoreFile that were disabled by Proc-V2 AM in 
> HBASE-14614
> ---
>
> Key: HBASE-18353
> URL: https://issues.apache.org/jira/browse/HBASE-18353
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-alpha-1
>Reporter: Stephen Yuan Jiang
>Assignee: Vladimir Rodionov
> Attachments: HBASE-18353-v1.patch, HBASE-18353-v2.patch
>
>
> HBASE-14614 disabled TestCorruptedRegionStoreFile, as it depends on a 
> half-implemented reopen of a region when a store file goes missing.
> This JIRA tracks the work to fix/enable the test.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18353) Enable TestCorruptedRegionStoreFile that were disabled by Proc-V2 AM in HBASE-14614

2017-08-07 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117525#comment-16117525
 ] 

Stephen Yuan Jiang commented on HBASE-18353:


[~vrodionov], reading HBASE-17712, I am unsure whether your approach is the 
correct action.  Let us wait for Duo's comment on the change.  For the patch, I 
think you should at least rename the {{RegionUnassigner.java}} file to 
{{RegionReassigner.java}}

> Enable TestCorruptedRegionStoreFile that were disabled by Proc-V2 AM in 
> HBASE-14614
> ---
>
> Key: HBASE-18353
> URL: https://issues.apache.org/jira/browse/HBASE-18353
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-alpha-1
>Reporter: Stephen Yuan Jiang
>Assignee: Vladimir Rodionov
> Attachments: HBASE-18353-v1.patch, HBASE-18353-v2.patch
>
>
> HBASE-14614 disabled TestCorruptedRegionStoreFile, as it depends on a 
> half-implemented reopen of a region when a store file goes missing.
> This JIRA tracks the work to fix/enable the test.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18353) Enable TestCorruptedRegionStoreFile that were disabled by Proc-V2 AM in HBASE-14614

2017-08-07 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117522#comment-16117522
 ] 

Stephen Yuan Jiang commented on HBASE-18353:


[~Apache9], in HBASE-17712, you mentioned that {{"Unassign region 
asynchronously when hitting FNFE. Can pass TestCorruptedRegionStoreFile. And 
also fix a problem in TestCorruptedRegionStoreFile that the already opened 
DFSInputStream may still work after we deleted the storefile because the block 
replica deletion is asynchronous. We should wait until all the replicas have 
been removed from DNs."}}  You also talked about implementing some reassign 
logic.  

[~vrodionov] implemented this by doing unassign/assign of the region if FNFE 
happens.  How do you think?

> Enable TestCorruptedRegionStoreFile that were disabled by Proc-V2 AM in 
> HBASE-14614
> ---
>
> Key: HBASE-18353
> URL: https://issues.apache.org/jira/browse/HBASE-18353
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-alpha-1
>Reporter: Stephen Yuan Jiang
>Assignee: Vladimir Rodionov
> Attachments: HBASE-18353-v1.patch, HBASE-18353-v2.patch
>
>
> HBASE-14614 disabled TestCorruptedRegionStoreFile, as it depends on a 
> half-implemented reopen of a region when a store file goes missing.
> This JIRA tracks the work to fix/enable the test.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-14618) Procedure V2: Implement move shell command to use Proc-V2 assignment

2017-08-04 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16114965#comment-16114965
 ] 

Stephen Yuan Jiang commented on HBASE-14618:


No work in this item.

> Procedure V2: Implement move shell command to use Proc-V2 assignment
> 
>
> Key: HBASE-14618
> URL: https://issues.apache.org/jira/browse/HBASE-14618
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Affects Versions: 2.0.0
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18424) Fix TestAsyncTableGetMultiThreaded

2017-08-02 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16111890#comment-16111890
 ] 

Stephen Yuan Jiang commented on HBASE-18424:


[~Apache9], the original test in TestAsyncTableGetMultiThreaded is to split a 
user table region and move the meta region to a different RS, then try to 
access the user table.  Since this Async table feature is only in 2.0+, it does 
not make a lot of sense, as meta can only be in master in 2.0 (at least).  

[~vrodionov] changes the test by moving the user region instead of meta, this 
looks more sense to me.  

[~Apache9], if you have no objection, we will commit the patch.

> Fix TestAsyncTableGetMultiThreaded
> --
>
> Key: HBASE-18424
> URL: https://issues.apache.org/jira/browse/HBASE-18424
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Attachments: HBASE-18424-v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18424) Fix TestAsyncTableGetMultiThreaded

2017-08-02 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16111892#comment-16111892
 ] 

Stephen Yuan Jiang commented on HBASE-18424:


+1 from code logic.

> Fix TestAsyncTableGetMultiThreaded
> --
>
> Key: HBASE-18424
> URL: https://issues.apache.org/jira/browse/HBASE-18424
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Attachments: HBASE-18424-v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18491) [AMv2] Fail UnassignProcedure if source Region Server is not online.

2017-08-01 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16109645#comment-16109645
 ] 

Stephen Yuan Jiang commented on HBASE-18491:


Looks good.

> [AMv2] Fail UnassignProcedure if source Region Server is not online.
> 
>
> Key: HBASE-18491
> URL: https://issues.apache.org/jira/browse/HBASE-18491
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Affects Versions: 2.0.0
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
> Fix For: 2.0.0
>
> Attachments: hbase-18491.master.001.patch
>
>
> Currently UnassignProcedure returns success when server carrying a region is 
> NOT online. Assumption here is that ServerCrashProcedure will handle 
> splitting logs etc for these regions. When UnassignProcedure completes, 
> MoveRegionProcedure resumes with AssignProcedure. AssignProcedure can some 
> times assign regions without pre-requisite steps (done either by 
> UnassignProcedure or ServerCrashProcedure). Fix is to fail UnassignProcedure 
> and parent MoveRegionProcedure if source server is not online.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18458) Refactor TestRegionServerHostname to make it robust (Port HBASE-17922 to branch-1)

2017-07-26 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18458:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 1.4.0
   Status: Resolved  (was: Patch Available)

> Refactor TestRegionServerHostname to make it robust (Port HBASE-17922 to 
> branch-1)
> --
>
> Key: HBASE-18458
> URL: https://issues.apache.org/jira/browse/HBASE-18458
> Project: HBase
>  Issue Type: Sub-task
>  Components: hadoop3
>Affects Versions: 1.4.0
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
>Priority: Minor
> Fix For: 1.4.0
>
> Attachments: HBASE-17922.v1-branch-1.patch
>
>
> The TestRegionServerHostname is passing in branch-1; however, it always fails 
> locally.  Running tests individually always pass.  Failing to start RS in 
> some combination of test run indicates some resource leak.  
> {code}
> Running org.apache.hadoop.hbase.regionserver.TestRegionServerHostname
> Tests run: 4, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 46.042 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.hbase.regionserver.TestRegionServerHostname
> testRegionServerHostnameReportedToMaster(org.apache.hadoop.hbase.regionserver.TestRegionServerHostname)
>   Time elapsed: 30.095 sec  <<< ERROR!
> org.junit.runners.model.TestTimedOutException: test timed out after 3 
> milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:221)
>   at 
> org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:445)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:225)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1072)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1028)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:900)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:894)
>   at 
> org.apache.hadoop.hbase.regionserver.TestRegionServerHostname.testRegionServerHostnameReportedToMaster(TestRegionServerHostname.java:158)
> {code}
> When running the testRegionServerHostnameReportedToMaster alone or with 
> another newly added test, the test passed without problem.
> When running the {{testRegionServerHostnameReportedToMaster}} test with 
> {{testInvalidRegionServerHostnameAbortsServer}} in the same test suite 
> {{TestRegionServerHostname}}, the region server failed to start:
> {noformat}
> 2017-07-25 15:34:24,132 FATAL [RS:0;192.168.1.7:64317] 
> regionserver.HRegionServer(2182): ABORTING region server 
> 192.168.1.7,64317,1501022063917: Unhandled: Failed suppression of fs shutdown 
> hook: org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@668e0f60
> java.lang.RuntimeException: Failed suppression of fs shutdown hook: 
> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@668e0f60
>   at 
> org.apache.hadoop.hbase.regionserver.ShutdownHook.suppressHdfsShutdownHook(ShutdownHook.java:204)
>   at 
> org.apache.hadoop.hbase.regionserver.ShutdownHook.install(ShutdownHook.java:84)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:940)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:360)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1846)
>   at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:307)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> HBASE-17922 addressed similar issue in Hadoop 3.  I think this change is more 
> robust than the one in branch-1 right now.  Porting the change to branch-1 
> (with small modification due to code difference between branch-1 and 
> branch-2) is a good idea.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18458) Refactor TestRegionServerHostname to make it robust (Port HBASE-17922 to branch-1)

2017-07-26 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102546#comment-16102546
 ] 

Stephen Yuan Jiang commented on HBASE-18458:


[~mdrob], almost straightforward, only slight difference in 
testRegionServerHostnameReportedToMaster due to branch-1 and branch-2 different 
checking.

> Refactor TestRegionServerHostname to make it robust (Port HBASE-17922 to 
> branch-1)
> --
>
> Key: HBASE-18458
> URL: https://issues.apache.org/jira/browse/HBASE-18458
> Project: HBase
>  Issue Type: Sub-task
>  Components: hadoop3
>Affects Versions: 1.4.0
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
>Priority: Minor
> Attachments: HBASE-17922.v1-branch-1.patch
>
>
> The TestRegionServerHostname is passing in branch-1; however, it always fails 
> locally.  Running tests individually always pass.  Failing to start RS in 
> some combination of test run indicates some resource leak.  
> {code}
> Running org.apache.hadoop.hbase.regionserver.TestRegionServerHostname
> Tests run: 4, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 46.042 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.hbase.regionserver.TestRegionServerHostname
> testRegionServerHostnameReportedToMaster(org.apache.hadoop.hbase.regionserver.TestRegionServerHostname)
>   Time elapsed: 30.095 sec  <<< ERROR!
> org.junit.runners.model.TestTimedOutException: test timed out after 3 
> milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:221)
>   at 
> org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:445)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:225)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1072)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1028)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:900)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:894)
>   at 
> org.apache.hadoop.hbase.regionserver.TestRegionServerHostname.testRegionServerHostnameReportedToMaster(TestRegionServerHostname.java:158)
> {code}
> When running the testRegionServerHostnameReportedToMaster alone or with 
> another newly added test, the test passed without problem.
> When running the {{testRegionServerHostnameReportedToMaster}} test with 
> {{testInvalidRegionServerHostnameAbortsServer}} in the same test suite 
> {{TestRegionServerHostname}}, the region server failed to start:
> {noformat}
> 2017-07-25 15:34:24,132 FATAL [RS:0;192.168.1.7:64317] 
> regionserver.HRegionServer(2182): ABORTING region server 
> 192.168.1.7,64317,1501022063917: Unhandled: Failed suppression of fs shutdown 
> hook: org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@668e0f60
> java.lang.RuntimeException: Failed suppression of fs shutdown hook: 
> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@668e0f60
>   at 
> org.apache.hadoop.hbase.regionserver.ShutdownHook.suppressHdfsShutdownHook(ShutdownHook.java:204)
>   at 
> org.apache.hadoop.hbase.regionserver.ShutdownHook.install(ShutdownHook.java:84)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:940)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:360)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1846)
>   at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:307)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> HBASE-17922 addressed similar issue in Hadoop 3.  I think this change is more 
> robust than the one in branch-1 right now.  Porting the change to branch-1 
> (with small modification due to code difference between branch-1 and 
> branch-2) is a good idea.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18458) Refactor TestRegionServerHostname to make it robust (Port HBASE-17922 to branch-1)

2017-07-26 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102547#comment-16102547
 ] 

Stephen Yuan Jiang commented on HBASE-18458:


This is test only change (one test suite affected), the failed UTs are 
unrelated to this change.

> Refactor TestRegionServerHostname to make it robust (Port HBASE-17922 to 
> branch-1)
> --
>
> Key: HBASE-18458
> URL: https://issues.apache.org/jira/browse/HBASE-18458
> Project: HBase
>  Issue Type: Sub-task
>  Components: hadoop3
>Affects Versions: 1.4.0
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
>Priority: Minor
> Attachments: HBASE-17922.v1-branch-1.patch
>
>
> The TestRegionServerHostname is passing in branch-1; however, it always fails 
> locally.  Running tests individually always pass.  Failing to start RS in 
> some combination of test run indicates some resource leak.  
> {code}
> Running org.apache.hadoop.hbase.regionserver.TestRegionServerHostname
> Tests run: 4, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 46.042 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.hbase.regionserver.TestRegionServerHostname
> testRegionServerHostnameReportedToMaster(org.apache.hadoop.hbase.regionserver.TestRegionServerHostname)
>   Time elapsed: 30.095 sec  <<< ERROR!
> org.junit.runners.model.TestTimedOutException: test timed out after 3 
> milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:221)
>   at 
> org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:445)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:225)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1072)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1028)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:900)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:894)
>   at 
> org.apache.hadoop.hbase.regionserver.TestRegionServerHostname.testRegionServerHostnameReportedToMaster(TestRegionServerHostname.java:158)
> {code}
> When running the testRegionServerHostnameReportedToMaster alone or with 
> another newly added test, the test passed without problem.
> When running the {{testRegionServerHostnameReportedToMaster}} test with 
> {{testInvalidRegionServerHostnameAbortsServer}} in the same test suite 
> {{TestRegionServerHostname}}, the region server failed to start:
> {noformat}
> 2017-07-25 15:34:24,132 FATAL [RS:0;192.168.1.7:64317] 
> regionserver.HRegionServer(2182): ABORTING region server 
> 192.168.1.7,64317,1501022063917: Unhandled: Failed suppression of fs shutdown 
> hook: org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@668e0f60
> java.lang.RuntimeException: Failed suppression of fs shutdown hook: 
> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@668e0f60
>   at 
> org.apache.hadoop.hbase.regionserver.ShutdownHook.suppressHdfsShutdownHook(ShutdownHook.java:204)
>   at 
> org.apache.hadoop.hbase.regionserver.ShutdownHook.install(ShutdownHook.java:84)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:940)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:360)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1846)
>   at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:307)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> HBASE-17922 addressed similar issue in Hadoop 3.  I think this change is more 
> robust than the one in branch-1 right now.  Porting the change to branch-1 
> (with small modification due to code difference between branch-1 and 
> branch-2) is a good idea.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18458) Refactor TestRegionServerHostname to make it robust (Port HBASE-17922 to branch-1)

2017-07-26 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18458:
---
Status: Patch Available  (was: Open)

> Refactor TestRegionServerHostname to make it robust (Port HBASE-17922 to 
> branch-1)
> --
>
> Key: HBASE-18458
> URL: https://issues.apache.org/jira/browse/HBASE-18458
> Project: HBase
>  Issue Type: Sub-task
>  Components: hadoop3
>Affects Versions: 1.4.0
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
>Priority: Minor
> Attachments: HBASE-17922.v1-branch-1.patch
>
>
> The TestRegionServerHostname is passing in branch-1; however, it always fails 
> locally.  Running tests individually always pass.  Failing to start RS in 
> some combination of test run indicates some resource leak.  
> {code}
> Running org.apache.hadoop.hbase.regionserver.TestRegionServerHostname
> Tests run: 4, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 46.042 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.hbase.regionserver.TestRegionServerHostname
> testRegionServerHostnameReportedToMaster(org.apache.hadoop.hbase.regionserver.TestRegionServerHostname)
>   Time elapsed: 30.095 sec  <<< ERROR!
> org.junit.runners.model.TestTimedOutException: test timed out after 3 
> milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:221)
>   at 
> org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:445)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:225)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1072)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1028)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:900)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:894)
>   at 
> org.apache.hadoop.hbase.regionserver.TestRegionServerHostname.testRegionServerHostnameReportedToMaster(TestRegionServerHostname.java:158)
> {code}
> When running the testRegionServerHostnameReportedToMaster alone or with 
> another newly added test, the test passed without problem.
> When running the {{testRegionServerHostnameReportedToMaster}} test with 
> {{testInvalidRegionServerHostnameAbortsServer}} in the same test suite 
> {{TestRegionServerHostname}}, the region server failed to start:
> {noformat}
> 2017-07-25 15:34:24,132 FATAL [RS:0;192.168.1.7:64317] 
> regionserver.HRegionServer(2182): ABORTING region server 
> 192.168.1.7,64317,1501022063917: Unhandled: Failed suppression of fs shutdown 
> hook: org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@668e0f60
> java.lang.RuntimeException: Failed suppression of fs shutdown hook: 
> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@668e0f60
>   at 
> org.apache.hadoop.hbase.regionserver.ShutdownHook.suppressHdfsShutdownHook(ShutdownHook.java:204)
>   at 
> org.apache.hadoop.hbase.regionserver.ShutdownHook.install(ShutdownHook.java:84)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:940)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:360)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1846)
>   at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:307)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> HBASE-17922 addressed similar issue in Hadoop 3.  I think this change is more 
> robust than the one in branch-1 right now.  Porting the change to branch-1 
> (with small modification due to code difference between branch-1 and 
> branch-2) is a good idea.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18458) Refactor TestRegionServerHostname to make it robust (Port HBASE-17922 to branch-1)

2017-07-26 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18458:
---
Summary: Refactor TestRegionServerHostname to make it robust (Port 
HBASE-17922 to branch-1)  (was: Refactor TestRegionServerHostname to make it 
robust (Port HBASE-17922))

> Refactor TestRegionServerHostname to make it robust (Port HBASE-17922 to 
> branch-1)
> --
>
> Key: HBASE-18458
> URL: https://issues.apache.org/jira/browse/HBASE-18458
> Project: HBase
>  Issue Type: Sub-task
>  Components: hadoop3
>Affects Versions: 1.4.0
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
>Priority: Minor
> Attachments: HBASE-17922.v1-branch-1.patch
>
>
> The TestRegionServerHostname is passing in branch-1; however, it always fails 
> locally.  Running tests individually always pass.  Failing to start RS in 
> some combination of test run indicates some resource leak.  
> {code}
> Running org.apache.hadoop.hbase.regionserver.TestRegionServerHostname
> Tests run: 4, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 46.042 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.hbase.regionserver.TestRegionServerHostname
> testRegionServerHostnameReportedToMaster(org.apache.hadoop.hbase.regionserver.TestRegionServerHostname)
>   Time elapsed: 30.095 sec  <<< ERROR!
> org.junit.runners.model.TestTimedOutException: test timed out after 3 
> milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:221)
>   at 
> org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:445)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:225)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1072)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1028)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:900)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:894)
>   at 
> org.apache.hadoop.hbase.regionserver.TestRegionServerHostname.testRegionServerHostnameReportedToMaster(TestRegionServerHostname.java:158)
> {code}
> When running the testRegionServerHostnameReportedToMaster alone or with 
> another newly added test, the test passed without problem.
> When running the {{testRegionServerHostnameReportedToMaster}} test with 
> {{testInvalidRegionServerHostnameAbortsServer}} in the same test suite 
> {{TestRegionServerHostname}}, the region server failed to start:
> {noformat}
> 2017-07-25 15:34:24,132 FATAL [RS:0;192.168.1.7:64317] 
> regionserver.HRegionServer(2182): ABORTING region server 
> 192.168.1.7,64317,1501022063917: Unhandled: Failed suppression of fs shutdown 
> hook: org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@668e0f60
> java.lang.RuntimeException: Failed suppression of fs shutdown hook: 
> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@668e0f60
>   at 
> org.apache.hadoop.hbase.regionserver.ShutdownHook.suppressHdfsShutdownHook(ShutdownHook.java:204)
>   at 
> org.apache.hadoop.hbase.regionserver.ShutdownHook.install(ShutdownHook.java:84)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:940)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:360)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1846)
>   at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:307)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> HBASE-17922 addressed similar issue in Hadoop 3.  I think this change is more 
> robust than the one in branch-1 right now.  Porting the change to branch-1 
> (with small modification due to code difference between branch-1 and 
> branch-2) is a good idea.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18458) Refactor TestRegionServerHostname to make it robust (Port HBASE-17922)

2017-07-26 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18458:
---
Attachment: HBASE-17922.v1-branch-1.patch

> Refactor TestRegionServerHostname to make it robust (Port HBASE-17922)
> --
>
> Key: HBASE-18458
> URL: https://issues.apache.org/jira/browse/HBASE-18458
> Project: HBase
>  Issue Type: Sub-task
>  Components: hadoop3
>Affects Versions: 1.4.0
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
>Priority: Minor
> Attachments: HBASE-17922.v1-branch-1.patch
>
>
> The TestRegionServerHostname is passing in branch-1; however, it always fails 
> locally.  Running tests individually always pass.  Failing to start RS in 
> some combination of test run indicates some resource leak.  
> {code}
> Running org.apache.hadoop.hbase.regionserver.TestRegionServerHostname
> Tests run: 4, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 46.042 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.hbase.regionserver.TestRegionServerHostname
> testRegionServerHostnameReportedToMaster(org.apache.hadoop.hbase.regionserver.TestRegionServerHostname)
>   Time elapsed: 30.095 sec  <<< ERROR!
> org.junit.runners.model.TestTimedOutException: test timed out after 3 
> milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:221)
>   at 
> org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:445)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:225)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1072)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1028)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:900)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:894)
>   at 
> org.apache.hadoop.hbase.regionserver.TestRegionServerHostname.testRegionServerHostnameReportedToMaster(TestRegionServerHostname.java:158)
> {code}
> When running the testRegionServerHostnameReportedToMaster alone or with 
> another newly added test, the test passed without problem.
> When running the {{testRegionServerHostnameReportedToMaster}} test with 
> {{testInvalidRegionServerHostnameAbortsServer}} in the same test suite 
> {{TestRegionServerHostname}}, the region server failed to start:
> {noformat}
> 2017-07-25 15:34:24,132 FATAL [RS:0;192.168.1.7:64317] 
> regionserver.HRegionServer(2182): ABORTING region server 
> 192.168.1.7,64317,1501022063917: Unhandled: Failed suppression of fs shutdown 
> hook: org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@668e0f60
> java.lang.RuntimeException: Failed suppression of fs shutdown hook: 
> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@668e0f60
>   at 
> org.apache.hadoop.hbase.regionserver.ShutdownHook.suppressHdfsShutdownHook(ShutdownHook.java:204)
>   at 
> org.apache.hadoop.hbase.regionserver.ShutdownHook.install(ShutdownHook.java:84)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:940)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:360)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1846)
>   at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:307)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> HBASE-17922 addressed similar issue in Hadoop 3.  I think this change is more 
> robust than the one in branch-1 right now.  Porting the change to branch-1 
> (with small modification due to code difference between branch-1 and 
> branch-2) is a good idea.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18458) Refactor TestRegionServerHostname to make it robust (Port HBASE-17922)

2017-07-26 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18458:
---
Affects Version/s: (was: 2.0.0)
   1.4.0
 Priority: Minor  (was: Major)
Fix Version/s: (was: 2.0.0-alpha-2)
   (was: 3.0.0)

> Refactor TestRegionServerHostname to make it robust (Port HBASE-17922)
> --
>
> Key: HBASE-18458
> URL: https://issues.apache.org/jira/browse/HBASE-18458
> Project: HBase
>  Issue Type: Sub-task
>  Components: hadoop3
>Affects Versions: 1.4.0
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
>Priority: Minor
> Attachments: HBASE-17922.v1-branch-1.patch
>
>
> The TestRegionServerHostname is passing in branch-1; however, it always fails 
> locally.  Running tests individually always pass.  Failing to start RS in 
> some combination of test run indicates some resource leak.  
> {code}
> Running org.apache.hadoop.hbase.regionserver.TestRegionServerHostname
> Tests run: 4, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 46.042 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.hbase.regionserver.TestRegionServerHostname
> testRegionServerHostnameReportedToMaster(org.apache.hadoop.hbase.regionserver.TestRegionServerHostname)
>   Time elapsed: 30.095 sec  <<< ERROR!
> org.junit.runners.model.TestTimedOutException: test timed out after 3 
> milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:221)
>   at 
> org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:445)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:225)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1072)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1028)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:900)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:894)
>   at 
> org.apache.hadoop.hbase.regionserver.TestRegionServerHostname.testRegionServerHostnameReportedToMaster(TestRegionServerHostname.java:158)
> {code}
> When running the testRegionServerHostnameReportedToMaster alone or with 
> another newly added test, the test passed without problem.
> When running the {{testRegionServerHostnameReportedToMaster}} test with 
> {{testInvalidRegionServerHostnameAbortsServer}} in the same test suite 
> {{TestRegionServerHostname}}, the region server failed to start:
> {noformat}
> 2017-07-25 15:34:24,132 FATAL [RS:0;192.168.1.7:64317] 
> regionserver.HRegionServer(2182): ABORTING region server 
> 192.168.1.7,64317,1501022063917: Unhandled: Failed suppression of fs shutdown 
> hook: org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@668e0f60
> java.lang.RuntimeException: Failed suppression of fs shutdown hook: 
> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@668e0f60
>   at 
> org.apache.hadoop.hbase.regionserver.ShutdownHook.suppressHdfsShutdownHook(ShutdownHook.java:204)
>   at 
> org.apache.hadoop.hbase.regionserver.ShutdownHook.install(ShutdownHook.java:84)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:940)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:360)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1846)
>   at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:307)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> HBASE-17922 addressed similar issue in Hadoop 3.  I think this change is more 
> robust than the one in branch-1 right now.  Porting the change to branch-1 
> (with small modification due to code difference between branch-1 and 
> branch-2) is a good idea.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18458) Refactor TestRegionServerHostname to make it robust (Port HBASE-17922)

2017-07-26 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18458:
---
Description: 
The TestRegionServerHostname is passing in branch-1; however, it always fails 
locally.  Running tests individually always pass.  Failing to start RS in some 
combination of test run indicates some resource leak.  

{code}
Running org.apache.hadoop.hbase.regionserver.TestRegionServerHostname
Tests run: 4, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 46.042 sec <<< 
FAILURE! - in org.apache.hadoop.hbase.regionserver.TestRegionServerHostname
testRegionServerHostnameReportedToMaster(org.apache.hadoop.hbase.regionserver.TestRegionServerHostname)
  Time elapsed: 30.095 sec  <<< ERROR!
org.junit.runners.model.TestTimedOutException: test timed out after 3 
milliseconds
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:221)
at 
org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:445)
at 
org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:225)
at 
org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1072)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1028)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:900)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:894)
at 
org.apache.hadoop.hbase.regionserver.TestRegionServerHostname.testRegionServerHostnameReportedToMaster(TestRegionServerHostname.java:158)
{code}

When running the testRegionServerHostnameReportedToMaster alone or with another 
newly added test, the test passed without problem.
When running the {{testRegionServerHostnameReportedToMaster}} test with 
{{testInvalidRegionServerHostnameAbortsServer}} in the same test suite 
{{TestRegionServerHostname}}, the region server failed to start:

{noformat}
2017-07-25 15:34:24,132 FATAL [RS:0;192.168.1.7:64317] 
regionserver.HRegionServer(2182): ABORTING region server 
192.168.1.7,64317,1501022063917: Unhandled: Failed suppression of fs shutdown 
hook: org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@668e0f60
java.lang.RuntimeException: Failed suppression of fs shutdown hook: 
org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@668e0f60
at 
org.apache.hadoop.hbase.regionserver.ShutdownHook.suppressHdfsShutdownHook(ShutdownHook.java:204)
at 
org.apache.hadoop.hbase.regionserver.ShutdownHook.install(ShutdownHook.java:84)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:940)
at 
org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156)
at 
org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108)
at 
org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:360)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1846)
at 
org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:307)
at 
org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138)
at java.lang.Thread.run(Thread.java:745)
{noformat}

HBASE-17922 addressed similar issue in Hadoop 3.  I think this change is more 
robust than the one in branch-1 right now.  Porting the change to branch-1 
(with small modification due to code difference between branch-1 and branch-2) 
is a good idea.

  was:
The 

{code}
Running org.apache.hadoop.hbase.regionserver.TestRegionServerHostname
Tests run: 4, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 46.042 sec <<< 
FAILURE! - in org.apache.hadoop.hbase.regionserver.TestRegionServerHostname
testRegionServerHostnameReportedToMaster(org.apache.hadoop.hbase.regionserver.TestRegionServerHostname)
  Time elapsed: 30.095 sec  <<< ERROR!
org.junit.runners.model.TestTimedOutException: test timed out after 3 
milliseconds
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:221)
at 
org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:445)
at 
org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:225)
at 
org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1072)

[jira] [Updated] (HBASE-18458) Refactor TestRegionServerHostname to make it robust (Port HBASE-17922)

2017-07-26 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18458:
---
Description: 
The 

{code}
Running org.apache.hadoop.hbase.regionserver.TestRegionServerHostname
Tests run: 4, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 46.042 sec <<< 
FAILURE! - in org.apache.hadoop.hbase.regionserver.TestRegionServerHostname
testRegionServerHostnameReportedToMaster(org.apache.hadoop.hbase.regionserver.TestRegionServerHostname)
  Time elapsed: 30.095 sec  <<< ERROR!
org.junit.runners.model.TestTimedOutException: test timed out after 3 
milliseconds
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:221)
at 
org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:445)
at 
org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:225)
at 
org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1072)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1028)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:900)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:894)
at 
org.apache.hadoop.hbase.regionserver.TestRegionServerHostname.testRegionServerHostnameReportedToMaster(TestRegionServerHostname.java:158)
{code}

  was:


{code}
Running org.apache.hadoop.hbase.regionserver.TestRegionServerHostname
Tests run: 4, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 46.042 sec <<< 
FAILURE! - in org.apache.hadoop.hbase.regionserver.TestRegionServerHostname
testRegionServerHostnameReportedToMaster(org.apache.hadoop.hbase.regionserver.TestRegionServerHostname)
  Time elapsed: 30.095 sec  <<< ERROR!
org.junit.runners.model.TestTimedOutException: test timed out after 3 
milliseconds
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:221)
at 
org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:445)
at 
org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:225)
at 
org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1072)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1028)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:900)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:894)
at 
org.apache.hadoop.hbase.regionserver.TestRegionServerHostname.testRegionServerHostnameReportedToMaster(TestRegionServerHostname.java:158)
{code}


> Refactor TestRegionServerHostname to make it robust (Port HBASE-17922)
> --
>
> Key: HBASE-18458
> URL: https://issues.apache.org/jira/browse/HBASE-18458
> Project: HBase
>  Issue Type: Sub-task
>  Components: hadoop3
>Affects Versions: 2.0.0
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 3.0.0, 2.0.0-alpha-2
>
>
> The 
> {code}
> Running org.apache.hadoop.hbase.regionserver.TestRegionServerHostname
> Tests run: 4, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 46.042 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.hbase.regionserver.TestRegionServerHostname
> testRegionServerHostnameReportedToMaster(org.apache.hadoop.hbase.regionserver.TestRegionServerHostname)
>   Time elapsed: 30.095 sec  <<< ERROR!
> org.junit.runners.model.TestTimedOutException: test timed out after 3 
> milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:221)
>   at 
> org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:445)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:225)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1072)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1028)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:900)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:894)
>   at 
> org.apache.hadoop.hbase.regionserver.TestRegion

[jira] [Assigned] (HBASE-18458) Refactor TestRegionServerHostname to make it robust (Port HBASE-17922)

2017-07-26 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang reassigned HBASE-18458:
--

Assignee: Stephen Yuan Jiang  (was: Mike Drob)

> Refactor TestRegionServerHostname to make it robust (Port HBASE-17922)
> --
>
> Key: HBASE-18458
> URL: https://issues.apache.org/jira/browse/HBASE-18458
> Project: HBase
>  Issue Type: Sub-task
>  Components: hadoop3
>Affects Versions: 2.0.0
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 3.0.0, 2.0.0-alpha-2
>
>
> {code}
> Running org.apache.hadoop.hbase.regionserver.TestRegionServerHostname
> Tests run: 4, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 46.042 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.hbase.regionserver.TestRegionServerHostname
> testRegionServerHostnameReportedToMaster(org.apache.hadoop.hbase.regionserver.TestRegionServerHostname)
>   Time elapsed: 30.095 sec  <<< ERROR!
> org.junit.runners.model.TestTimedOutException: test timed out after 3 
> milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:221)
>   at 
> org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:445)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:225)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1072)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1028)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:900)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:894)
>   at 
> org.apache.hadoop.hbase.regionserver.TestRegionServerHostname.testRegionServerHostnameReportedToMaster(TestRegionServerHostname.java:158)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18458) Refactor TestRegionServerHostname to make it robust (Port HBASE-17922)

2017-07-26 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18458:
---
Description: 


{code}
Running org.apache.hadoop.hbase.regionserver.TestRegionServerHostname
Tests run: 4, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 46.042 sec <<< 
FAILURE! - in org.apache.hadoop.hbase.regionserver.TestRegionServerHostname
testRegionServerHostnameReportedToMaster(org.apache.hadoop.hbase.regionserver.TestRegionServerHostname)
  Time elapsed: 30.095 sec  <<< ERROR!
org.junit.runners.model.TestTimedOutException: test timed out after 3 
milliseconds
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:221)
at 
org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:445)
at 
org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:225)
at 
org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1072)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1028)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:900)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:894)
at 
org.apache.hadoop.hbase.regionserver.TestRegionServerHostname.testRegionServerHostnameReportedToMaster(TestRegionServerHostname.java:158)
{code}

  was:
{code}
Running org.apache.hadoop.hbase.regionserver.TestRegionServerHostname
Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 126.363 sec <<< 
FAILURE! - in org.apache.hadoop.hbase.regionserver.TestRegionServerHostname
testRegionServerHostname(org.apache.hadoop.hbase.regionserver.TestRegionServerHostname)
  Time elapsed: 120.029 sec  <<< ERROR!
org.junit.runners.model.TestTimedOutException: test timed out after 12 
milliseconds
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:221)
at 
org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:405)
at 
org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:225)
at 
org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1123)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1077)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:948)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:942)
at 
org.apache.hadoop.hbase.regionserver.TestRegionServerHostname.testRegionServerHostname(TestRegionServerHostname.java:88)


Results :

Tests in error: 
  TestRegionServerHostname.testRegionServerHostname:88 » TestTimedOut test 
timed...

Tests run: 2, Failures: 0, Errors: 1, Skipped: 0

{code}


> Refactor TestRegionServerHostname to make it robust (Port HBASE-17922)
> --
>
> Key: HBASE-18458
> URL: https://issues.apache.org/jira/browse/HBASE-18458
> Project: HBase
>  Issue Type: Sub-task
>  Components: hadoop3
>Affects Versions: 2.0.0
>Reporter: Stephen Yuan Jiang
>Assignee: Mike Drob
> Fix For: 3.0.0, 2.0.0-alpha-2
>
>
> {code}
> Running org.apache.hadoop.hbase.regionserver.TestRegionServerHostname
> Tests run: 4, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 46.042 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.hbase.regionserver.TestRegionServerHostname
> testRegionServerHostnameReportedToMaster(org.apache.hadoop.hbase.regionserver.TestRegionServerHostname)
>   Time elapsed: 30.095 sec  <<< ERROR!
> org.junit.runners.model.TestTimedOutException: test timed out after 3 
> milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:221)
>   at 
> org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:445)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:225)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1072)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1028)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:900)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUt

[jira] [Created] (HBASE-18458) Refactor TestRegionServerHostname to make it robust (Port HBASE-17922)

2017-07-26 Thread Stephen Yuan Jiang (JIRA)
Stephen Yuan Jiang created HBASE-18458:
--

 Summary: Refactor TestRegionServerHostname to make it robust (Port 
HBASE-17922)
 Key: HBASE-18458
 URL: https://issues.apache.org/jira/browse/HBASE-18458
 Project: HBase
  Issue Type: Sub-task
  Components: hadoop3
Affects Versions: 2.0.0
Reporter: Stephen Yuan Jiang
Assignee: Mike Drob
 Fix For: 3.0.0, 2.0.0-alpha-2


{code}
Running org.apache.hadoop.hbase.regionserver.TestRegionServerHostname
Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 126.363 sec <<< 
FAILURE! - in org.apache.hadoop.hbase.regionserver.TestRegionServerHostname
testRegionServerHostname(org.apache.hadoop.hbase.regionserver.TestRegionServerHostname)
  Time elapsed: 120.029 sec  <<< ERROR!
org.junit.runners.model.TestTimedOutException: test timed out after 12 
milliseconds
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:221)
at 
org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:405)
at 
org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:225)
at 
org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1123)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1077)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:948)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:942)
at 
org.apache.hadoop.hbase.regionserver.TestRegionServerHostname.testRegionServerHostname(TestRegionServerHostname.java:88)


Results :

Tests in error: 
  TestRegionServerHostname.testRegionServerHostname:88 » TestTimedOut test 
timed...

Tests run: 2, Failures: 0, Errors: 1, Skipped: 0

{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18354) Fix TestMasterMetrics that were disabled by Proc-V2 AM in HBASE-14614

2017-07-24 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18354:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> Fix TestMasterMetrics that were disabled by Proc-V2 AM in HBASE-14614
> -
>
> Key: HBASE-18354
> URL: https://issues.apache.org/jira/browse/HBASE-18354
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-alpha-1
>Reporter: Stephen Yuan Jiang
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 3.0.0
>
> Attachments: HBASE-18354-v1.patch, HBASE-18354-v2.patch
>
>
> With Core Proc-V2 AM change in HBASE-14614, stuff is different now around 
> startup which messes up the TestMasterMetrics test. HBASE-14614 disabled two 
> of three tests.
> This JIRA tracks work to fix the disabled tests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18354) Fix TestMasterMetrics that were disabled by Proc-V2 AM in HBASE-14614

2017-07-24 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18354:
---
Fix Version/s: 3.0.0
   2.0.0

> Fix TestMasterMetrics that were disabled by Proc-V2 AM in HBASE-14614
> -
>
> Key: HBASE-18354
> URL: https://issues.apache.org/jira/browse/HBASE-18354
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-alpha-1
>Reporter: Stephen Yuan Jiang
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 3.0.0
>
> Attachments: HBASE-18354-v1.patch, HBASE-18354-v2.patch
>
>
> With Core Proc-V2 AM change in HBASE-14614, stuff is different now around 
> startup which messes up the TestMasterMetrics test. HBASE-14614 disabled two 
> of three tests.
> This JIRA tracks work to fix the disabled tests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HBASE-18350) Enable RSGroups UT that were disabled by Proc-V2 AM in HBASE-14614

2017-07-20 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang reassigned HBASE-18350:
--

Assignee: Stephen Yuan Jiang

> Enable RSGroups UT that were disabled by Proc-V2 AM in HBASE-14614
> --
>
> Key: HBASE-18350
> URL: https://issues.apache.org/jira/browse/HBASE-18350
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-alpha-1
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
>
> The following RSGroups tests were disabled by Core Proc-V2 AM in HBASE-14614:
> - Disabled/Ignore TestRSGroupsOfflineMode#testOffline; need to dig in on what 
> offline is.
> - Disabled/Ignore TestRSGroups.
> This JIRA tracks the work to enable them (or remove/modify if not applicable).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18354) Fix TestMasterMetrics that were disabled by Proc-V2 AM in HBASE-14614

2017-07-20 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16095427#comment-16095427
 ] 

Stephen Yuan Jiang commented on HBASE-18354:


+1. Looks good to me.

> Fix TestMasterMetrics that were disabled by Proc-V2 AM in HBASE-14614
> -
>
> Key: HBASE-18354
> URL: https://issues.apache.org/jira/browse/HBASE-18354
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-alpha-1
>Reporter: Stephen Yuan Jiang
>Assignee: Vladimir Rodionov
> Attachments: HBASE-18354-v1.patch, HBASE-18354-v2.patch
>
>
> With Core Proc-V2 AM change in HBASE-14614, stuff is different now around 
> startup which messes up the TestMasterMetrics test. HBASE-14614 disabled two 
> of three tests.
> This JIRA tracks work to fix the disabled tests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18406) In ServerCrashProcedure.java start(MasterProcedureEnv) is a no-op

2017-07-18 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092399#comment-16092399
 ] 

Stephen Yuan Jiang commented on HBASE-18406:


Looks good.

> In ServerCrashProcedure.java start(MasterProcedureEnv) is a no-op
> -
>
> Key: HBASE-18406
> URL: https://issues.apache.org/jira/browse/HBASE-18406
> Project: HBase
>  Issue Type: Bug
>Reporter: Alex Leblang
>Assignee: Alex Leblang
> Attachments: HBASE-18406.master.001.patch
>
>
> The comments above this method explain that it exists to set configs and 
> return, however, no configs are set in the method.  
> As you can see here:
> https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L210-L214
>  
> It is only ever called here:
> https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L142



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18403) [Shell]Truncate permission required

2017-07-18 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092288#comment-16092288
 ] 

Stephen Yuan Jiang commented on HBASE-18403:


Which version of HBase you see this problem?  At least in 
TruncateTableProcedure code (1.1+), we check permission at the beginning and 
then truncate table (delete and then create), I don't see the logic to abort in 
the middle of procedure and complete the task half way. 

> [Shell]Truncate permission required
> ---
>
> Key: HBASE-18403
> URL: https://issues.apache.org/jira/browse/HBASE-18403
> Project: HBase
>  Issue Type: Improvement
>  Components: shell
>Reporter: Yun Zhao
>Assignee: Yun Zhao
>Priority: Trivial
> Attachments: HBASE-18403.patch
>
>
> When a user has only (Create) permission to execute truncate, the table will 
> be deleted and not re-created



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-16488) Starting namespace and quota services in master startup asynchronizely

2017-07-18 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16091568#comment-16091568
 ] 

Stephen Yuan Jiang commented on HBASE-16488:


V10 patch in branch-1 is approved by [~enis].  

Most tests are passed in pre-commit.  In failed UT, I checked the source code 
and don't think they are related to this change.  I re-run those tests locally, 
and all except one passed.  

The only test that fails consistently in my local machine is 
{{org.apache.hadoop.hbase.regionserver.TestRSKilledWhenInitializing.testRSTerminationAfterRegisteringToMasterBeforeCreatingEphemeralNode}}
 - I spent some time to debug it and don't think this is related to this 
change.  The test kills one RS and assert that server manager thinks this RS is 
not online.   Without any change, the test passed in my local machine 
consistently.  I added some logging in the test (just some LOG.info statements 
inside the test, no other changes) and see what is going on, it would fail 
consistently that server manager thinks RS is still online.  If I add some 
waiting before assert, the test would pass with about 600ms wait in my local 
machine.  This is with only log info messages in test and no real change.  
Seems there is a delay between "mini cluster get live server thinks the RS is 
dead" and "master server manager remove the RS from the online server list".  
With the patch, the same is true, with about 600ms delay (has nothing to do 
with namespace), the test passed.  I think this is test issue and if it 
consistently repro in pre-commit.  I will fix the test in a separate JIRA.

> Starting namespace and quota services in master startup asynchronizely
> --
>
> Key: HBASE-16488
> URL: https://issues.apache.org/jira/browse/HBASE-16488
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 2.0.0, 1.3.0, 1.0.3, 1.4.0, 1.1.5, 1.2.2
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Attachments: HBASE-16488.v10-branch-1.patch, 
> HBASE-16488.v1-branch-1.patch, HBASE-16488.v1-master.patch, 
> HBASE-16488.v2-branch-1.patch, HBASE-16488.v2-branch-1.patch, 
> HBASE-16488.v3-branch-1.patch, HBASE-16488.v3-branch-1.patch, 
> HBASE-16488.v4-branch-1.patch, HBASE-16488.v5-branch-1.patch, 
> HBASE-16488.v6-branch-1.patch, HBASE-16488.v7-branch-1.patch, 
> HBASE-16488.v8-branch-1.patch, HBASE-16488.v9-branch-1.patch
>
>
> From time to time, during internal IT test and from customer, we often see 
> master initialization failed due to namespace table region takes long time to 
> assign (eg. sometimes split log takes long time or hanging; or sometimes RS 
> is temporarily not available; sometimes due to some unknown assignment 
> issue).  In the past, there was some proposal to improve this situation, eg. 
> HBASE-13556 / HBASE-14190 (Assign system tables ahead of user region 
> assignment) or HBASE-13557 (Special WAL handling for system tables) or  
> HBASE-14623 (Implement dedicated WAL for system tables).  
> This JIRA proposes another way to solve this master initialization fail 
> issue: namespace service is only used by a handful operations (eg. create 
> table / namespace DDL / get namespace API / some RS group DDL).  Only quota 
> manager depends on it and quota management is off by default.  Therefore, 
> namespace service is not really needed for master to be functional.  So we 
> could start namespace service asynchronizely without blocking master startup.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-16488) Starting namespace and quota services in master startup asynchronizely

2017-07-14 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-16488:
---
Attachment: HBASE-16488.v10-branch-1.patch

> Starting namespace and quota services in master startup asynchronizely
> --
>
> Key: HBASE-16488
> URL: https://issues.apache.org/jira/browse/HBASE-16488
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 2.0.0, 1.3.0, 1.0.3, 1.4.0, 1.1.5, 1.2.2
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Attachments: HBASE-16488.v10-branch-1.patch, 
> HBASE-16488.v1-branch-1.patch, HBASE-16488.v1-master.patch, 
> HBASE-16488.v2-branch-1.patch, HBASE-16488.v2-branch-1.patch, 
> HBASE-16488.v3-branch-1.patch, HBASE-16488.v3-branch-1.patch, 
> HBASE-16488.v4-branch-1.patch, HBASE-16488.v5-branch-1.patch, 
> HBASE-16488.v6-branch-1.patch, HBASE-16488.v7-branch-1.patch, 
> HBASE-16488.v8-branch-1.patch, HBASE-16488.v9-branch-1.patch
>
>
> From time to time, during internal IT test and from customer, we often see 
> master initialization failed due to namespace table region takes long time to 
> assign (eg. sometimes split log takes long time or hanging; or sometimes RS 
> is temporarily not available; sometimes due to some unknown assignment 
> issue).  In the past, there was some proposal to improve this situation, eg. 
> HBASE-13556 / HBASE-14190 (Assign system tables ahead of user region 
> assignment) or HBASE-13557 (Special WAL handling for system tables) or  
> HBASE-14623 (Implement dedicated WAL for system tables).  
> This JIRA proposes another way to solve this master initialization fail 
> issue: namespace service is only used by a handful operations (eg. create 
> table / namespace DDL / get namespace API / some RS group DDL).  Only quota 
> manager depends on it and quota management is off by default.  Therefore, 
> namespace service is not really needed for master to be functional.  So we 
> could start namespace service asynchronizely without blocking master startup.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18357) Enable disabled tests in TestHCM that were disabled by Proc-V2 AM in HBASE-14614

2017-07-10 Thread Stephen Yuan Jiang (JIRA)
Stephen Yuan Jiang created HBASE-18357:
--

 Summary: Enable disabled tests in TestHCM that were disabled by 
Proc-V2 AM in HBASE-14614
 Key: HBASE-18357
 URL: https://issues.apache.org/jira/browse/HBASE-18357
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0-alpha-1
Reporter: Stephen Yuan Jiang


The Core Proc-V2 AM change in HBASE-14614 disabled two tests inTestHCM: 
testMulti and testRegionCaching

This JIRA tracks the work to enable them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18356) Enable TestFavoredStochasticBalancerPickers#testPickers that was disabled by Proc-V2 AM in HBASE-14614

2017-07-10 Thread Stephen Yuan Jiang (JIRA)
Stephen Yuan Jiang created HBASE-18356:
--

 Summary: Enable TestFavoredStochasticBalancerPickers#testPickers 
that was disabled by Proc-V2 AM in HBASE-14614
 Key: HBASE-18356
 URL: https://issues.apache.org/jira/browse/HBASE-18356
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0-alpha-1
Reporter: Stephen Yuan Jiang


The testPickers in TestFavoredStochasticBalancerPickers hangs after applying 
the change in Core Proc-V2 AM in HBASE-14614.  It was disabled.

This JIRA tracks the work to enable it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18355) Enable export snapshot tests that were disabled by Proc-V2 AM in HBASE-14614

2017-07-10 Thread Stephen Yuan Jiang (JIRA)
Stephen Yuan Jiang created HBASE-18355:
--

 Summary: Enable export snapshot tests that were disabled by 
Proc-V2 AM in HBASE-14614
 Key: HBASE-18355
 URL: https://issues.apache.org/jira/browse/HBASE-18355
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0-alpha-1
Reporter: Stephen Yuan Jiang


The Proc-V2 AM in HBASE-14614 disabled the following tests:
- Disabled TestExportSnapshot Hangs. 
- Disabled TestSecureExportSnapshot
- Disabled TestMobSecureExportSnapshot and TestMobExportSnapshot

This JIRA tracks the work to enable them.  If MOB requires more work, we could 
split to 2 tickets.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18352) Enable Replica tests that were disabled by Proc-V2 AM in HBASE-14614

2017-07-10 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18352:
---
Description: 
The following replica tests were disabled by Core Proc-V2 AM in HBASE-14614:
- Disabled parts of...testCreateTableWithMultipleReplicas in 
TestMasterOperationsForRegionReplicas There is an issue w/ assigning more 
replicas if number of replicas is changed on us. See '/* DISABLED! FOR 
NOW'.
- Disabled testRegionReplicasOnMidClusterHighReplication in 
TestStochasticLoadBalancer2

This JIRA tracks the work to enable them (or modify/remove if not applicable).

  was:
The following replica tests were disabled by Core Proc-V2 AM in HBASE-14614:
- Disabled parts of...testCreateTableWithMultipleReplicas in 
TestMasterOperationsForRegionReplicas There is an issue w/ assigning more 
replicas if number of replicas is changed on us. See '/* DISABLED! FOR 
NOW'.

This JIRA tracks the work to enable them (or modify/remove if not applicable).


> Enable Replica tests that were disabled by Proc-V2 AM in HBASE-14614
> 
>
> Key: HBASE-18352
> URL: https://issues.apache.org/jira/browse/HBASE-18352
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-alpha-1
>Reporter: Stephen Yuan Jiang
>
> The following replica tests were disabled by Core Proc-V2 AM in HBASE-14614:
> - Disabled parts of...testCreateTableWithMultipleReplicas in 
> TestMasterOperationsForRegionReplicas There is an issue w/ assigning more 
> replicas if number of replicas is changed on us. See '/* DISABLED! FOR 
> NOW'.
> - Disabled testRegionReplicasOnMidClusterHighReplication in 
> TestStochasticLoadBalancer2
> This JIRA tracks the work to enable them (or modify/remove if not applicable).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18352) Enable Replica tests that were disabled by Proc-V2 AM in HBASE-14614

2017-07-10 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18352:
---
Description: 
The following replica tests were disabled by Core Proc-V2 AM in HBASE-14614:
- Disabled parts of...testCreateTableWithMultipleReplicas in 
TestMasterOperationsForRegionReplicas There is an issue w/ assigning more 
replicas if number of replicas is changed on us. See '/* DISABLED! FOR 
NOW'.
- Disabled testRegionReplicasOnMidClusterHighReplication in 
TestStochasticLoadBalancer2
- Disabled testFlushAndCompactionsInPrimary in TestRegionReplicas

This JIRA tracks the work to enable them (or modify/remove if not applicable).

  was:
The following replica tests were disabled by Core Proc-V2 AM in HBASE-14614:
- Disabled parts of...testCreateTableWithMultipleReplicas in 
TestMasterOperationsForRegionReplicas There is an issue w/ assigning more 
replicas if number of replicas is changed on us. See '/* DISABLED! FOR 
NOW'.
- Disabled testRegionReplicasOnMidClusterHighReplication in 
TestStochasticLoadBalancer2

This JIRA tracks the work to enable them (or modify/remove if not applicable).


> Enable Replica tests that were disabled by Proc-V2 AM in HBASE-14614
> 
>
> Key: HBASE-18352
> URL: https://issues.apache.org/jira/browse/HBASE-18352
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-alpha-1
>Reporter: Stephen Yuan Jiang
>
> The following replica tests were disabled by Core Proc-V2 AM in HBASE-14614:
> - Disabled parts of...testCreateTableWithMultipleReplicas in 
> TestMasterOperationsForRegionReplicas There is an issue w/ assigning more 
> replicas if number of replicas is changed on us. See '/* DISABLED! FOR 
> NOW'.
> - Disabled testRegionReplicasOnMidClusterHighReplication in 
> TestStochasticLoadBalancer2
> - Disabled testFlushAndCompactionsInPrimary in TestRegionReplicas
> This JIRA tracks the work to enable them (or modify/remove if not applicable).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18350) Enable RSGroups UT that were disabled by Proc-V2 AM in HBASE-14614

2017-07-10 Thread Stephen Yuan Jiang (JIRA)
Stephen Yuan Jiang created HBASE-18350:
--

 Summary: Enable RSGroups UT that were disabled by Proc-V2 AM in 
HBASE-14614
 Key: HBASE-18350
 URL: https://issues.apache.org/jira/browse/HBASE-18350
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0-alpha-1
Reporter: Stephen Yuan Jiang


The following RSGroups tests were disabled by Core Proc-V2 AM in HBASE-14614:
- Disabled/Ignore TestRSGroupsOfflineMode#testOffline; need to dig in on what 
offline is.
- Disabled/Ignore TestRSGroups.

This JIRA tracks the work to enable them (or remove/modify if not applicable).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18353) Enable TestCorruptedRegionStoreFile that were disabled by Proc-V2 AM in HBASE-14614

2017-07-10 Thread Stephen Yuan Jiang (JIRA)
Stephen Yuan Jiang created HBASE-18353:
--

 Summary: Enable TestCorruptedRegionStoreFile that were disabled by 
Proc-V2 AM in HBASE-14614
 Key: HBASE-18353
 URL: https://issues.apache.org/jira/browse/HBASE-18353
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0-alpha-1
Reporter: Stephen Yuan Jiang


HBASE-14614 disabled TestCorruptedRegionStoreFile, as it depends on a 
half-implemented reopen of a region when a store file goes missing.

This JIRA tracks the work to fix/enable the test.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18351) Fix tests that carry meta in Master that were disabled by Proc-V2 AM in HBASE-14614

2017-07-10 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18351:
---
Description: 
The following tests were disabled as part of Core Proc-V2 AM in HBASE-14614
- TestRegionRebalancing is disabled because doesn't consider the fact that 
Master carries system tables only (fix of average in RegionStates brought out 
the issue).
- Disabled testMetaAddressChange in TestMetaWithReplicas because presumes can 
move meta... you can't
- TestAsyncTableGetMultiThreaded wants to move hbase:meta...Balancer does NPEs. 
AMv2 won't let you move hbase:meta off Master.
- TestMasterFailover needs to be rewritten for AMv2. It uses tricks not 
ordained when up on AMv2. The test is also hobbled by fact that we religiously 
enforce that only master can carry meta, something we are lose about in old AM

This JIRA is tracking the work to enable/modify them.

  was:
The following tests were disabled as part of Core Proc-V2 AM in HBASE-14614
- TestRegionRebalancing is disabled because doesn't consider the fact that 
Master carries system tables only (fix of average in RegionStates brought out 
the issue).
- Disabled testMetaAddressChange in TestMetaWithReplicas because presumes can 
move meta... you can't
- TestAsyncTableGetMultiThreaded wants to move hbase:meta...Balancer does NPEs. 
AMv2 won't let you move hbase:meta off Master.

This JIRA is tracking the work to enable/modify them.


> Fix tests that carry meta in Master that were disabled by Proc-V2 AM in 
> HBASE-14614
> ---
>
> Key: HBASE-18351
> URL: https://issues.apache.org/jira/browse/HBASE-18351
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-alpha-1
>Reporter: Stephen Yuan Jiang
>
> The following tests were disabled as part of Core Proc-V2 AM in HBASE-14614
> - TestRegionRebalancing is disabled because doesn't consider the fact that 
> Master carries system tables only (fix of average in RegionStates brought out 
> the issue).
> - Disabled testMetaAddressChange in TestMetaWithReplicas because presumes can 
> move meta... you can't
> - TestAsyncTableGetMultiThreaded wants to move hbase:meta...Balancer does 
> NPEs. AMv2 won't let you move hbase:meta off Master.
> - TestMasterFailover needs to be rewritten for AMv2. It uses tricks not 
> ordained when up on AMv2. The test is also hobbled by fact that we 
> religiously enforce that only master can carry meta, something we are lose 
> about in old AM
> This JIRA is tracking the work to enable/modify them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18354) Fix TestMasterMetrics that were disabled by Proc-V2 AM in HBASE-14614

2017-07-10 Thread Stephen Yuan Jiang (JIRA)
Stephen Yuan Jiang created HBASE-18354:
--

 Summary: Fix TestMasterMetrics that were disabled by Proc-V2 AM in 
HBASE-14614
 Key: HBASE-18354
 URL: https://issues.apache.org/jira/browse/HBASE-18354
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0-alpha-1
Reporter: Stephen Yuan Jiang


With Core Proc-V2 AM change in HBASE-14614, stuff is different now around 
startup which messes up the TestMasterMetrics test. HBASE-14614 disabled two of 
three tests.

This JIRA tracks work to fix the disabled tests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18352) Enable Replica tests that were disabled by Proc-V2 AM in HBASE-14614

2017-07-10 Thread Stephen Yuan Jiang (JIRA)
Stephen Yuan Jiang created HBASE-18352:
--

 Summary: Enable Replica tests that were disabled by Proc-V2 AM in 
HBASE-14614
 Key: HBASE-18352
 URL: https://issues.apache.org/jira/browse/HBASE-18352
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0-alpha-1
Reporter: Stephen Yuan Jiang


The following replica tests were disabled by Core Proc-V2 AM in HBASE-14614:
- Disabled parts of...testCreateTableWithMultipleReplicas in 
TestMasterOperationsForRegionReplicas There is an issue w/ assigning more 
replicas if number of replicas is changed on us. See '/* DISABLED! FOR 
NOW'.

This JIRA tracks the work to enable them (or modify/remove if not applicable).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18349) Enable disabled tests in TestFavoredStochasticLoadBalancer that were disabled by Proc-V2 AM in HBASE-14614

2017-07-10 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18349:
---
Summary: Enable disabled tests in TestFavoredStochasticLoadBalancer that 
were disabled by Proc-V2 AM in HBASE-14614  (was: Enable disabled tests in 
TestFavoredStochasticLoadBalancer that was disabled by Proc-V2 AM in 
HBASE-14614)

> Enable disabled tests in TestFavoredStochasticLoadBalancer that were disabled 
> by Proc-V2 AM in HBASE-14614
> --
>
> Key: HBASE-18349
> URL: https://issues.apache.org/jira/browse/HBASE-18349
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-alpha-1
>Reporter: Stephen Yuan Jiang
>
> The following 3 tests in TestFavoredStochasticLoadBalancerwere disabled by 
> HBASE-14614 (Core Proc-V2 AM):
> - testAllFavoredNodesDead
> - testAllFavoredNodesDeadMasterRestarted
> - testMisplacedRegions
> This JIRA is tracking necessary work to re-able (or remove/change if not 
> applicable) these UTs



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18351) Fix tests that carry meta in Master that were disabled by Proc-V2 AM in HBASE-14614

2017-07-10 Thread Stephen Yuan Jiang (JIRA)
Stephen Yuan Jiang created HBASE-18351:
--

 Summary: Fix tests that carry meta in Master that were disabled by 
Proc-V2 AM in HBASE-14614
 Key: HBASE-18351
 URL: https://issues.apache.org/jira/browse/HBASE-18351
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0-alpha-1
Reporter: Stephen Yuan Jiang


The following tests were disabled as part of Core Proc-V2 AM in HBASE-14614
- TestRegionRebalancing is disabled because doesn't consider the fact that 
Master carries system tables only (fix of average in RegionStates brought out 
the issue).
- Disabled testMetaAddressChange in TestMetaWithReplicas because presumes can 
move meta... you can't
- TestAsyncTableGetMultiThreaded wants to move hbase:meta...Balancer does NPEs. 
AMv2 won't let you move hbase:meta off Master.

This JIRA is tracking the work to enable/modify them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18349) Enable disabled tests in TestFavoredStochasticLoadBalancer that was disabled by Proc-V2 AM in HBASE-14614

2017-07-10 Thread Stephen Yuan Jiang (JIRA)
Stephen Yuan Jiang created HBASE-18349:
--

 Summary: Enable disabled tests in 
TestFavoredStochasticLoadBalancer that was disabled by Proc-V2 AM in HBASE-14614
 Key: HBASE-18349
 URL: https://issues.apache.org/jira/browse/HBASE-18349
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0-alpha-1
Reporter: Stephen Yuan Jiang


The following 3 tests in TestFavoredStochasticLoadBalancerwere disabled by 
HBASE-14614 (Core Proc-V2 AM):
- testAllFavoredNodesDead
- testAllFavoredNodesDeadMasterRestarted
- testMisplacedRegions

This JIRA is tracking necessary work to re-able (or remove/change if not 
applicable) these UTs



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-16488) Starting namespace and quota services in master startup asynchronizely

2017-07-07 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-16488:
---
Attachment: HBASE-16488.v9-branch-1.patch

> Starting namespace and quota services in master startup asynchronizely
> --
>
> Key: HBASE-16488
> URL: https://issues.apache.org/jira/browse/HBASE-16488
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 2.0.0, 1.3.0, 1.0.3, 1.4.0, 1.1.5, 1.2.2
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Attachments: HBASE-16488.v1-branch-1.patch, 
> HBASE-16488.v1-master.patch, HBASE-16488.v2-branch-1.patch, 
> HBASE-16488.v2-branch-1.patch, HBASE-16488.v3-branch-1.patch, 
> HBASE-16488.v3-branch-1.patch, HBASE-16488.v4-branch-1.patch, 
> HBASE-16488.v5-branch-1.patch, HBASE-16488.v6-branch-1.patch, 
> HBASE-16488.v7-branch-1.patch, HBASE-16488.v8-branch-1.patch, 
> HBASE-16488.v9-branch-1.patch
>
>
> From time to time, during internal IT test and from customer, we often see 
> master initialization failed due to namespace table region takes long time to 
> assign (eg. sometimes split log takes long time or hanging; or sometimes RS 
> is temporarily not available; sometimes due to some unknown assignment 
> issue).  In the past, there was some proposal to improve this situation, eg. 
> HBASE-13556 / HBASE-14190 (Assign system tables ahead of user region 
> assignment) or HBASE-13557 (Special WAL handling for system tables) or  
> HBASE-14623 (Implement dedicated WAL for system tables).  
> This JIRA proposes another way to solve this master initialization fail 
> issue: namespace service is only used by a handful operations (eg. create 
> table / namespace DDL / get namespace API / some RS group DDL).  Only quota 
> manager depends on it and quota management is off by default.  Therefore, 
> namespace service is not really needed for master to be functional.  So we 
> could start namespace service asynchronizely without blocking master startup.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18107) [AMv2] Rename DispatchMergingRegionsRequest & DispatchMergingRegions

2017-07-05 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16075619#comment-16075619
 ] 

Stephen Yuan Jiang commented on HBASE-18107:


Yeah, we don't need DispatchMergingRegionsProcedure in 2.0.0.  Sorry that I did 
not find this issue in HBASE-14614 when it sneak back this old procedure. 

> [AMv2] Rename DispatchMergingRegionsRequest & DispatchMergingRegions
> 
>
> Key: HBASE-18107
> URL: https://issues.apache.org/jira/browse/HBASE-18107
> Project: HBase
>  Issue Type: Sub-task
>  Components: Region Assignment
>Affects Versions: 2.0.0
>Reporter: stack
> Fix For: 2.0.0
>
>
> They don't align with how we have named the Split equivalents; i.e. 
> SplitRegion (so should be MergeRegion...). They probably have these awkward 
> names because the obvious slots are occupied... so this may not be fixable 
> but filing issue anyways.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18301) Enable TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster that was disabled by Proc-V2 AM in HBASE-14614

2017-07-05 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18301:
---
Fix Version/s: 3.0.0

> Enable 
> TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster 
> that was disabled by Proc-V2 AM in HBASE-14614
> ---
>
> Key: HBASE-18301
> URL: https://issues.apache.org/jira/browse/HBASE-18301
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-alpha-1
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 2.0.0, 3.0.0
>
> Attachments: HBASE-18301.v1-master.patch, HBASE-18301.v1-master.patch
>
>
> Enable 
> TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster 
> that was temporally disabled by HBASE-14614



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18301) Enable TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster that was disabled by Proc-V2 AM in HBASE-14614

2017-07-05 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18301:
---
Hadoop Flags: Reviewed

> Enable 
> TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster 
> that was disabled by Proc-V2 AM in HBASE-14614
> ---
>
> Key: HBASE-18301
> URL: https://issues.apache.org/jira/browse/HBASE-18301
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-alpha-1
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 2.0.0, 3.0.0
>
> Attachments: HBASE-18301.v1-master.patch, HBASE-18301.v1-master.patch
>
>
> Enable 
> TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster 
> that was temporally disabled by HBASE-14614



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18301) Enable TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster that was disabled by Proc-V2 AM in HBASE-14614

2017-07-05 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18301:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Enable 
> TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster 
> that was disabled by Proc-V2 AM in HBASE-14614
> ---
>
> Key: HBASE-18301
> URL: https://issues.apache.org/jira/browse/HBASE-18301
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-alpha-1
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 2.0.0, 3.0.0
>
> Attachments: HBASE-18301.v1-master.patch, HBASE-18301.v1-master.patch
>
>
> Enable 
> TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster 
> that was temporally disabled by HBASE-14614



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18301) Enable TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster that was disabled by Proc-V2 AM in HBASE-14614

2017-07-05 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18301:
---
Summary: Enable 
TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster that 
was disabled by Proc-V2 AM in HBASE-14614  (was: Procedure V2 (AM) - Enable 
TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster that 
was disabled by HBASE-14614)

> Enable 
> TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster 
> that was disabled by Proc-V2 AM in HBASE-14614
> ---
>
> Key: HBASE-18301
> URL: https://issues.apache.org/jira/browse/HBASE-18301
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-alpha-1
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 2.0.0
>
> Attachments: HBASE-18301.v1-master.patch, HBASE-18301.v1-master.patch
>
>
> Enable 
> TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster 
> that was temporally disabled by HBASE-14614



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18301) Procedure V2 (AM) - Enable TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster that was disabled by HBASE-14614

2017-07-01 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18301:
---
Attachment: HBASE-18301.v1-master.patch

> Procedure V2 (AM) - Enable 
> TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster 
> that was disabled by HBASE-14614
> -
>
> Key: HBASE-18301
> URL: https://issues.apache.org/jira/browse/HBASE-18301
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-alpha-1
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 2.0.0
>
> Attachments: HBASE-18301.v1-master.patch
>
>
> Enable 
> TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster 
> that was temporally disabled by HBASE-14614



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18301) Procedure V2 (AM) - Enable TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster that was disabled by HBASE-14614

2017-07-01 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18301:
---
Attachment: (was: HBASE-18301.v1-master.patch)

> Procedure V2 (AM) - Enable 
> TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster 
> that was disabled by HBASE-14614
> -
>
> Key: HBASE-18301
> URL: https://issues.apache.org/jira/browse/HBASE-18301
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-alpha-1
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 2.0.0
>
>
> Enable 
> TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster 
> that was temporally disabled by HBASE-14614



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18301) Procedure V2 (AM) - Enable TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster that was disabled by HBASE-14614

2017-06-30 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18301:
---
Attachment: HBASE-18301.v1-master.patch

> Procedure V2 (AM) - Enable 
> TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster 
> that was disabled by HBASE-14614
> -
>
> Key: HBASE-18301
> URL: https://issues.apache.org/jira/browse/HBASE-18301
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-alpha-1
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 2.0.0
>
> Attachments: HBASE-18301.v1-master.patch
>
>
> Enable 
> TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster 
> that was temporally disabled by HBASE-14614



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18301) Procedure V2 (AM) - Enable TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster that was disabled by HBASE-14614

2017-06-30 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18301:
---
Status: Patch Available  (was: Open)

> Procedure V2 (AM) - Enable 
> TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster 
> that was disabled by HBASE-14614
> -
>
> Key: HBASE-18301
> URL: https://issues.apache.org/jira/browse/HBASE-18301
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-alpha-1
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 2.0.0
>
> Attachments: HBASE-18301.v1-master.patch
>
>
> Enable 
> TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster 
> that was temporally disabled by HBASE-14614



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18301) Procedure V2 (AM) - Enable TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster that was disabled by HBASE-14614

2017-06-30 Thread Stephen Yuan Jiang (JIRA)
Stephen Yuan Jiang created HBASE-18301:
--

 Summary: Procedure V2 (AM) - Enable 
TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster that 
was disabled by HBASE-14614
 Key: HBASE-18301
 URL: https://issues.apache.org/jira/browse/HBASE-18301
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0-alpha-1
Reporter: Stephen Yuan Jiang
Assignee: Stephen Yuan Jiang
 Fix For: 2.0.0


Enable 
TestSimpleRegionNormalizerOnCluster#testRegionNormalizationMergeOnCluster that 
was temporally disabled by HBASE-14614



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-16488) Starting namespace and quota services in master startup asynchronizely

2017-06-28 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-16488:
---
Attachment: HBASE-16488.v8-branch-1.patch

> Starting namespace and quota services in master startup asynchronizely
> --
>
> Key: HBASE-16488
> URL: https://issues.apache.org/jira/browse/HBASE-16488
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 2.0.0, 1.3.0, 1.0.3, 1.4.0, 1.1.5, 1.2.2
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Attachments: HBASE-16488.v1-branch-1.patch, 
> HBASE-16488.v1-master.patch, HBASE-16488.v2-branch-1.patch, 
> HBASE-16488.v2-branch-1.patch, HBASE-16488.v3-branch-1.patch, 
> HBASE-16488.v3-branch-1.patch, HBASE-16488.v4-branch-1.patch, 
> HBASE-16488.v5-branch-1.patch, HBASE-16488.v6-branch-1.patch, 
> HBASE-16488.v7-branch-1.patch, HBASE-16488.v8-branch-1.patch
>
>
> From time to time, during internal IT test and from customer, we often see 
> master initialization failed due to namespace table region takes long time to 
> assign (eg. sometimes split log takes long time or hanging; or sometimes RS 
> is temporarily not available; sometimes due to some unknown assignment 
> issue).  In the past, there was some proposal to improve this situation, eg. 
> HBASE-13556 / HBASE-14190 (Assign system tables ahead of user region 
> assignment) or HBASE-13557 (Special WAL handling for system tables) or  
> HBASE-14623 (Implement dedicated WAL for system tables).  
> This JIRA proposes another way to solve this master initialization fail 
> issue: namespace service is only used by a handful operations (eg. create 
> table / namespace DDL / get namespace API / some RS group DDL).  Only quota 
> manager depends on it and quota management is off by default.  Therefore, 
> namespace service is not really needed for master to be functional.  So we 
> could start namespace service asynchronizely without blocking master startup.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18226) Disable reverse DNS lookup at HMaster and use the hostname provided by RegionServer

2017-06-21 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058541#comment-16058541
 ] 

Stephen Yuan Jiang commented on HBASE-18226:


[~onpduo], mind to port to branch-1 as well?

> Disable reverse DNS lookup at HMaster and use the hostname provided by 
> RegionServer
> ---
>
> Key: HBASE-18226
> URL: https://issues.apache.org/jira/browse/HBASE-18226
> Project: HBase
>  Issue Type: New Feature
>Reporter: Duo Xu
>Assignee: Duo Xu
> Fix For: 3.0.0, 2.0.0-alpha-2
>
> Attachments: HBASE-18226.001.patch, HBASE-18226.002.patch, 
> HBASE-18226.003.patch, HBASE-18226.004.patch, HBASE-18226.005.patch, 
> HBASE-18226.006.patch
>
>
> Description updated:
> In some unusual network environment, forward DNS lookup is supported while 
> reverse DNS lookup may not work properly.
> This JIRA is to address that HMaster uses the hostname passed from RS instead 
> of doing reverse DNS lookup to tells RS which hostname to use during 
> reportForDuty() . This has already been implemented by HBASE-12954 by adding 
> "useThisHostnameInstead" field in RegionServerStatusProtos.
> Currently "useThisHostnameInstead" is optional and RS by default only passes 
> port, server start code and server current time info to HMaster during RS 
> reportForDuty(). In order to use this field, users currently need to specify 
> "hbase.regionserver.hostname" on every regionserver node's hbase-site.xml. 
> This causes some trouble in
> 1. some deployments managed by some management tools like Ambari, which 
> maintains the same copy of hbase-site.xml across all the nodes.
> 2. HBASE-12954 is targeting multihomed hosts, which users want to manually 
> set the hostname value for each node. In the other cases (not multihomed), I 
> just want RS to use the hostname return by the node and set it in 
> useThisHostnameInstead and pass to HMaster during reportForDuty().
> I would like to introduce a setting that if the setting is set to true, 
> "useThisHostnameInstead" will be set to the hostname RS gets from the node. 
> Then HMaster will skip reverse DNS lookup because it sees 
> "useThisHostnameInstead" field is set in the request.
> "hbase.regionserver.hostname.reported.to.master", is it a good name?
> 
> Regarding the hostname returned by the RS node, I read the source code again 
> (including hadoop-common dns.java). By default RS gets hostname by calling 
> InetAddress.getLocalHost().getCanonicalHostName(). If users specify 
> "hbase.regionserver.dns.interface" or "hbase.regionserver.dns.nameserver" or 
> some underlying system configuration changes (eg. modifying 
> /etc/nsswitch.conf), it may first read from DNS or other sources instead of 
> first checking /etc/hosts file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-15691) Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to branch-1

2017-06-21 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-15691:
---
Fix Version/s: (was: 1.5.0)
   (was: 1.4.1)
   1.1.12
   1.4.0

> Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to 
> branch-1
> -
>
> Key: HBASE-15691
> URL: https://issues.apache.org/jira/browse/HBASE-15691
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 1.3.0
>Reporter: Andrew Purtell
>Assignee: Stephen Yuan Jiang
> Fix For: 1.4.0, 1.3.2, 1.2.7, 1.1.12
>
> Attachments: HBASE-15691-branch-1.patch, 
> HBASE-15691.v2-branch-1.patch, HBASE-15691.v3-branch-1.patch
>
>
> HBASE-10205 solves the following problem:
> "
> The BucketCache WriterThread calls BucketCache.freeSpace() upon draining the 
> RAM queue containing entries to be cached. freeSpace() in turn calls 
> BucketSizeInfo.statistics() through BucketAllocator.getIndexStatistics(), 
> which iterates over 'bucketList'. At the same time another WriterThread might 
> call BucketAllocator.allocateBlock(), which may call 
> BucketSizeInfo.allocateBlock(), add a bucket to 'bucketList' and consequently 
> cause a ConcurrentModificationException. Calls to 
> BucketAllocator.allocateBlock() are synchronized, but calls to 
> BucketAllocator.getIndexStatistics() are not, which allows this race to occur.
> "
> However, for some unknown reason, HBASE-10205 was only committed to master 
> (2.0 and beyond) and 0.98 branches only. To preserve continuity we should 
> commit it to branch-1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-15691) Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to branch-1

2017-06-21 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-15691:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to 
> branch-1
> -
>
> Key: HBASE-15691
> URL: https://issues.apache.org/jira/browse/HBASE-15691
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 1.3.0
>Reporter: Andrew Purtell
>Assignee: Stephen Yuan Jiang
> Fix For: 1.4.0, 1.3.2, 1.2.7, 1.1.12
>
> Attachments: HBASE-15691-branch-1.patch, 
> HBASE-15691.v2-branch-1.patch, HBASE-15691.v3-branch-1.patch
>
>
> HBASE-10205 solves the following problem:
> "
> The BucketCache WriterThread calls BucketCache.freeSpace() upon draining the 
> RAM queue containing entries to be cached. freeSpace() in turn calls 
> BucketSizeInfo.statistics() through BucketAllocator.getIndexStatistics(), 
> which iterates over 'bucketList'. At the same time another WriterThread might 
> call BucketAllocator.allocateBlock(), which may call 
> BucketSizeInfo.allocateBlock(), add a bucket to 'bucketList' and consequently 
> cause a ConcurrentModificationException. Calls to 
> BucketAllocator.allocateBlock() are synchronized, but calls to 
> BucketAllocator.getIndexStatistics() are not, which allows this race to occur.
> "
> However, for some unknown reason, HBASE-10205 was only committed to master 
> (2.0 and beyond) and 0.98 branches only. To preserve continuity we should 
> commit it to branch-1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-15691) Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to branch-1

2017-06-21 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057967#comment-16057967
 ] 

Stephen Yuan Jiang commented on HBASE-15691:


Thanks, [~zjushch], for the review.

> Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to 
> branch-1
> -
>
> Key: HBASE-15691
> URL: https://issues.apache.org/jira/browse/HBASE-15691
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 1.3.0
>Reporter: Andrew Purtell
>Assignee: Stephen Yuan Jiang
> Fix For: 1.3.2, 1.4.1, 1.5.0, 1.2.7
>
> Attachments: HBASE-15691-branch-1.patch, 
> HBASE-15691.v2-branch-1.patch, HBASE-15691.v3-branch-1.patch
>
>
> HBASE-10205 solves the following problem:
> "
> The BucketCache WriterThread calls BucketCache.freeSpace() upon draining the 
> RAM queue containing entries to be cached. freeSpace() in turn calls 
> BucketSizeInfo.statistics() through BucketAllocator.getIndexStatistics(), 
> which iterates over 'bucketList'. At the same time another WriterThread might 
> call BucketAllocator.allocateBlock(), which may call 
> BucketSizeInfo.allocateBlock(), add a bucket to 'bucketList' and consequently 
> cause a ConcurrentModificationException. Calls to 
> BucketAllocator.allocateBlock() are synchronized, but calls to 
> BucketAllocator.getIndexStatistics() are not, which allows this race to occur.
> "
> However, for some unknown reason, HBASE-10205 was only committed to master 
> (2.0 and beyond) and 0.98 branches only. To preserve continuity we should 
> commit it to branch-1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-15691) Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to branch-1

2017-06-21 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-15691:
---
Hadoop Flags: Reviewed
  Status: Patch Available  (was: In Progress)

> Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to 
> branch-1
> -
>
> Key: HBASE-15691
> URL: https://issues.apache.org/jira/browse/HBASE-15691
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 1.3.0
>Reporter: Andrew Purtell
>Assignee: Stephen Yuan Jiang
> Fix For: 1.3.2, 1.4.1, 1.5.0, 1.2.7
>
> Attachments: HBASE-15691-branch-1.patch, 
> HBASE-15691.v2-branch-1.patch, HBASE-15691.v3-branch-1.patch
>
>
> HBASE-10205 solves the following problem:
> "
> The BucketCache WriterThread calls BucketCache.freeSpace() upon draining the 
> RAM queue containing entries to be cached. freeSpace() in turn calls 
> BucketSizeInfo.statistics() through BucketAllocator.getIndexStatistics(), 
> which iterates over 'bucketList'. At the same time another WriterThread might 
> call BucketAllocator.allocateBlock(), which may call 
> BucketSizeInfo.allocateBlock(), add a bucket to 'bucketList' and consequently 
> cause a ConcurrentModificationException. Calls to 
> BucketAllocator.allocateBlock() are synchronized, but calls to 
> BucketAllocator.getIndexStatistics() are not, which allows this race to occur.
> "
> However, for some unknown reason, HBASE-10205 was only committed to master 
> (2.0 and beyond) and 0.98 branches only. To preserve continuity we should 
> commit it to branch-1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-15691) Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to branch-1

2017-06-21 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-15691:
---
Attachment: HBASE-15691.v3-branch-1.patch

> Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to 
> branch-1
> -
>
> Key: HBASE-15691
> URL: https://issues.apache.org/jira/browse/HBASE-15691
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 1.3.0
>Reporter: Andrew Purtell
>Assignee: Stephen Yuan Jiang
> Fix For: 1.3.2, 1.4.1, 1.5.0, 1.2.7
>
> Attachments: HBASE-15691-branch-1.patch, 
> HBASE-15691.v2-branch-1.patch, HBASE-15691.v3-branch-1.patch
>
>
> HBASE-10205 solves the following problem:
> "
> The BucketCache WriterThread calls BucketCache.freeSpace() upon draining the 
> RAM queue containing entries to be cached. freeSpace() in turn calls 
> BucketSizeInfo.statistics() through BucketAllocator.getIndexStatistics(), 
> which iterates over 'bucketList'. At the same time another WriterThread might 
> call BucketAllocator.allocateBlock(), which may call 
> BucketSizeInfo.allocateBlock(), add a bucket to 'bucketList' and consequently 
> cause a ConcurrentModificationException. Calls to 
> BucketAllocator.allocateBlock() are synchronized, but calls to 
> BucketAllocator.getIndexStatistics() are not, which allows this race to occur.
> "
> However, for some unknown reason, HBASE-10205 was only committed to master 
> (2.0 and beyond) and 0.98 branches only. To preserve continuity we should 
> commit it to branch-1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HBASE-18244) org.apache.hadoop.hbase.client.rsgroup.TestShellRSGroups hangs/fails

2017-06-20 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang reassigned HBASE-18244:
--

Assignee: Stephen Yuan Jiang

> org.apache.hadoop.hbase.client.rsgroup.TestShellRSGroups hangs/fails
> 
>
> Key: HBASE-18244
> URL: https://issues.apache.org/jira/browse/HBASE-18244
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: Josh Elser
>Assignee: Stephen Yuan Jiang
> Fix For: 2.0.0
>
>
> Sometime in the past couple of weeks, TestShellRSGroups has started 
> timing-out/failing for me.
> It will get stuck on a call to moveTables()
> {noformat}
> "main" #1 prio=5 os_prio=31 tid=0x7ff012004800 nid=0x1703 in 
> Object.wait() [0x7020d000]
>java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:502)
> at 
> org.apache.hadoop.hbase.ipc.BlockingRpcCallback.get(BlockingRpcCallback.java:62)
> - locked <0x00078d1003f0> (a 
> org.apache.hadoop.hbase.ipc.BlockingRpcCallback)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:328)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$200(AbstractRpcClient.java:94)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:567)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$BlockingStub.execMasterService(MasterProtos.java)
> at 
> org.apache.hadoop.hbase.client.ConnectionImplementation$3.execMasterService(ConnectionImplementation.java:1500)
> at 
> org.apache.hadoop.hbase.client.HBaseAdmin$67$1.rpcCall(HBaseAdmin.java:2991)
> at 
> org.apache.hadoop.hbase.client.HBaseAdmin$67$1.rpcCall(HBaseAdmin.java:2986)
> at 
> org.apache.hadoop.hbase.client.MasterCallable.call(MasterCallable.java:98)
> at 
> org.apache.hadoop.hbase.client.HBaseAdmin$67.callExecService(HBaseAdmin.java:2997)
> at 
> org.apache.hadoop.hbase.client.SyncCoprocessorRpcChannel.callBlockingMethod(SyncCoprocessorRpcChannel.java:69)
> at 
> org.apache.hadoop.hbase.protobuf.generated.RSGroupAdminProtos$RSGroupAdminService$BlockingStub.moveTables(RSGroupAdminProtos.java:13171)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminClient.moveTables(RSGroupAdminClient.java:117)
> {noformat}
> The server-side end of the RPC is waiting on a procedure to finish:
> {noformat}
> "RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=64242" #289 daemon 
> prio=5 os_prio=31 tid=0x7ff015b7c000 nid=0x1e603 waiting on condition 
> [0x7dbc9000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitFor(ProcedureSyncWait.java:184)
> at 
> org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitFor(ProcedureSyncWait.java:171)
> at 
> org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitForProcedureToComplete(ProcedureSyncWait.java:141)
> at 
> org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitForProcedureToCompleteIOE(ProcedureSyncWait.java:130)
> at 
> org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.submitAndWaitProcedure(ProcedureSyncWait.java:123)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.unassign(AssignmentManager.java:478)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.unassign(AssignmentManager.java:465)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.moveTables(RSGroupAdminServer.java:432)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint$RSGroupAdminServiceImpl.moveTables(RSGroupAdminEndpoint.java:174)
> at 
> org.apache.hadoop.hbase.protobuf.generated.RSGroupAdminProtos$RSGroupAdminService.callMethod(RSGroupAdminProtos.java:12786)
> at 
> org.apache.hadoop.hbase.master.MasterRpcServices.execMasterService(MasterRpcServices.java:673)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:406)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:278)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:258)
>Locked ownable synchronizers:
> - None
> {noformat}
> I don't see anything else running in the thread dump, but I do se

[jira] [Updated] (HBASE-18244) org.apache.hadoop.hbase.client.rsgroup.TestShellRSGroups hangs/fails

2017-06-20 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18244:
---
Fix Version/s: (was: 3.0.0)
   2.0.0

> org.apache.hadoop.hbase.client.rsgroup.TestShellRSGroups hangs/fails
> 
>
> Key: HBASE-18244
> URL: https://issues.apache.org/jira/browse/HBASE-18244
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: Josh Elser
> Fix For: 2.0.0
>
>
> Sometime in the past couple of weeks, TestShellRSGroups has started 
> timing-out/failing for me.
> It will get stuck on a call to moveTables()
> {noformat}
> "main" #1 prio=5 os_prio=31 tid=0x7ff012004800 nid=0x1703 in 
> Object.wait() [0x7020d000]
>java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:502)
> at 
> org.apache.hadoop.hbase.ipc.BlockingRpcCallback.get(BlockingRpcCallback.java:62)
> - locked <0x00078d1003f0> (a 
> org.apache.hadoop.hbase.ipc.BlockingRpcCallback)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:328)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$200(AbstractRpcClient.java:94)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:567)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$BlockingStub.execMasterService(MasterProtos.java)
> at 
> org.apache.hadoop.hbase.client.ConnectionImplementation$3.execMasterService(ConnectionImplementation.java:1500)
> at 
> org.apache.hadoop.hbase.client.HBaseAdmin$67$1.rpcCall(HBaseAdmin.java:2991)
> at 
> org.apache.hadoop.hbase.client.HBaseAdmin$67$1.rpcCall(HBaseAdmin.java:2986)
> at 
> org.apache.hadoop.hbase.client.MasterCallable.call(MasterCallable.java:98)
> at 
> org.apache.hadoop.hbase.client.HBaseAdmin$67.callExecService(HBaseAdmin.java:2997)
> at 
> org.apache.hadoop.hbase.client.SyncCoprocessorRpcChannel.callBlockingMethod(SyncCoprocessorRpcChannel.java:69)
> at 
> org.apache.hadoop.hbase.protobuf.generated.RSGroupAdminProtos$RSGroupAdminService$BlockingStub.moveTables(RSGroupAdminProtos.java:13171)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminClient.moveTables(RSGroupAdminClient.java:117)
> {noformat}
> The server-side end of the RPC is waiting on a procedure to finish:
> {noformat}
> "RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=64242" #289 daemon 
> prio=5 os_prio=31 tid=0x7ff015b7c000 nid=0x1e603 waiting on condition 
> [0x7dbc9000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitFor(ProcedureSyncWait.java:184)
> at 
> org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitFor(ProcedureSyncWait.java:171)
> at 
> org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitForProcedureToComplete(ProcedureSyncWait.java:141)
> at 
> org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitForProcedureToCompleteIOE(ProcedureSyncWait.java:130)
> at 
> org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.submitAndWaitProcedure(ProcedureSyncWait.java:123)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.unassign(AssignmentManager.java:478)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.unassign(AssignmentManager.java:465)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.moveTables(RSGroupAdminServer.java:432)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint$RSGroupAdminServiceImpl.moveTables(RSGroupAdminEndpoint.java:174)
> at 
> org.apache.hadoop.hbase.protobuf.generated.RSGroupAdminProtos$RSGroupAdminService.callMethod(RSGroupAdminProtos.java:12786)
> at 
> org.apache.hadoop.hbase.master.MasterRpcServices.execMasterService(MasterRpcServices.java:673)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:406)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:278)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:258)
>Locked ownable synchronizers:
> - None
> {noformat}
> I don't see anything else running in the thread dump, but I do see that meta 
> was cl

[jira] [Commented] (HBASE-15691) Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to branch-1

2017-06-20 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056708#comment-16056708
 ] 

Stephen Yuan Jiang commented on HBASE-15691:


[~zjushch], you reviewed the original patch in HBASE-10205.  Could you help 
review the V2 patch?

> Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to 
> branch-1
> -
>
> Key: HBASE-15691
> URL: https://issues.apache.org/jira/browse/HBASE-15691
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 1.3.0
>Reporter: Andrew Purtell
>Assignee: Stephen Yuan Jiang
> Fix For: 1.3.2, 1.4.1, 1.5.0, 1.2.7
>
> Attachments: HBASE-15691-branch-1.patch, HBASE-15691.v2-branch-1.patch
>
>
> HBASE-10205 solves the following problem:
> "
> The BucketCache WriterThread calls BucketCache.freeSpace() upon draining the 
> RAM queue containing entries to be cached. freeSpace() in turn calls 
> BucketSizeInfo.statistics() through BucketAllocator.getIndexStatistics(), 
> which iterates over 'bucketList'. At the same time another WriterThread might 
> call BucketAllocator.allocateBlock(), which may call 
> BucketSizeInfo.allocateBlock(), add a bucket to 'bucketList' and consequently 
> cause a ConcurrentModificationException. Calls to 
> BucketAllocator.allocateBlock() are synchronized, but calls to 
> BucketAllocator.getIndexStatistics() are not, which allows this race to occur.
> "
> However, for some unknown reason, HBASE-10205 was only committed to master 
> (2.0 and beyond) and 0.98 branches only. To preserve continuity we should 
> commit it to branch-1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-15691) Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to branch-1

2017-06-20 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-15691:
---
Description: 
HBASE-10205 solves the following problem:
"
The BucketCache WriterThread calls BucketCache.freeSpace() upon draining the 
RAM queue containing entries to be cached. freeSpace() in turn calls 
BucketSizeInfo.statistics() through BucketAllocator.getIndexStatistics(), which 
iterates over 'bucketList'. At the same time another WriterThread might call 
BucketAllocator.allocateBlock(), which may call BucketSizeInfo.allocateBlock(), 
add a bucket to 'bucketList' and consequently cause a 
ConcurrentModificationException. Calls to BucketAllocator.allocateBlock() are 
synchronized, but calls to BucketAllocator.getIndexStatistics() are not, which 
allows this race to occur.
"

However, for some unknown reason, HBASE-10205 was only committed to master (2.0 
and beyond) and 0.98 branches only. To preserve continuity we should commit it 
to branch-1.

  was:HBASE-10205 was committed to trunk and 0.98 branches only. To preserve 
continuity we should commit it to branch-1. The change requires more than 
nontrivial fixups so I will attach a backport of the change from trunk to 
current branch-1 here. 


> Port HBASE-10205 (ConcurrentModificationException in BucketAllocator) to 
> branch-1
> -
>
> Key: HBASE-15691
> URL: https://issues.apache.org/jira/browse/HBASE-15691
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 1.3.0
>Reporter: Andrew Purtell
>Assignee: Stephen Yuan Jiang
> Fix For: 1.3.2, 1.4.1, 1.5.0, 1.2.7
>
> Attachments: HBASE-15691-branch-1.patch, HBASE-15691.v2-branch-1.patch
>
>
> HBASE-10205 solves the following problem:
> "
> The BucketCache WriterThread calls BucketCache.freeSpace() upon draining the 
> RAM queue containing entries to be cached. freeSpace() in turn calls 
> BucketSizeInfo.statistics() through BucketAllocator.getIndexStatistics(), 
> which iterates over 'bucketList'. At the same time another WriterThread might 
> call BucketAllocator.allocateBlock(), which may call 
> BucketSizeInfo.allocateBlock(), add a bucket to 'bucketList' and consequently 
> cause a ConcurrentModificationException. Calls to 
> BucketAllocator.allocateBlock() are synchronized, but calls to 
> BucketAllocator.getIndexStatistics() are not, which allows this race to occur.
> "
> However, for some unknown reason, HBASE-10205 was only committed to master 
> (2.0 and beyond) and 0.98 branches only. To preserve continuity we should 
> commit it to branch-1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18036) HBase 1.x : Data locality is not maintained after cluster restart or SSH

2017-06-20 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18036:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> HBase 1.x : Data locality is not maintained after cluster restart or SSH
> 
>
> Key: HBASE-18036
> URL: https://issues.apache.org/jira/browse/HBASE-18036
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 1.4.0, 1.3.1, 1.2.5, 1.1.10
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 1.4.0, 1.3.2, 1.1.11, 1.2.7
>
> Attachments: HBASE-18036.v0-branch-1.1.patch, 
> HBASE-18036.v0-branch-1.patch, HBASE-18036.v1-branch-1.1.patch, 
> HBASE-18036.v2-branch-1.1.patch
>
>
> After HBASE-2896 / HBASE-4402, we think data locality is maintained after 
> cluster restart.  However, we have seem some complains about data locality 
> loss when cluster restart (eg. HBASE-17963).  
> Examining the AssignmentManager#processDeadServersAndRegionsInTransition() 
> code,  for cluster start, I expected to hit the following code path:
> {code}
> if (!failover) {
>   // Fresh cluster startup.
>   LOG.info("Clean cluster startup. Assigning user regions");
>   assignAllUserRegions(allRegions);
> }
> {code}
> where assignAllUserRegions would use retainAssignment() call in LoadBalancer; 
> however, from master log,  we usually hit the failover code path:
> {code}
> // If we found user regions out on cluster, its a failover.
> if (failover) {
>   LOG.info("Found regions out on cluster or in RIT; presuming failover");
>   // Process list of dead servers and regions in RIT.
>   // See HBASE-4580 for more information.
>   processDeadServersAndRecoverLostRegions(deadServers);
> }
> {code}
> where processDeadServersAndRecoverLostRegions() would put dead servers in SSH 
> and SSH uses roundRobinAssignment() in LoadBalancer.  That is why we would 
> see loss locality more often than retaining locality during cluster restart.
> Note: the code I was looking at is close to branch-1 and branch-1.1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18036) HBase 1.x : Data locality is not maintained after cluster restart or SSH

2017-06-20 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18036:
---
Summary: HBase 1.x : Data locality is not maintained after cluster restart 
or SSH  (was: Data locality is not maintained after cluster restart or SSH)

> HBase 1.x : Data locality is not maintained after cluster restart or SSH
> 
>
> Key: HBASE-18036
> URL: https://issues.apache.org/jira/browse/HBASE-18036
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 1.4.0, 1.3.1, 1.2.5, 1.1.10
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 1.4.0, 1.3.2, 1.1.11, 1.2.7
>
> Attachments: HBASE-18036.v0-branch-1.1.patch, 
> HBASE-18036.v0-branch-1.patch, HBASE-18036.v1-branch-1.1.patch, 
> HBASE-18036.v2-branch-1.1.patch
>
>
> After HBASE-2896 / HBASE-4402, we think data locality is maintained after 
> cluster restart.  However, we have seem some complains about data locality 
> loss when cluster restart (eg. HBASE-17963).  
> Examining the AssignmentManager#processDeadServersAndRegionsInTransition() 
> code,  for cluster start, I expected to hit the following code path:
> {code}
> if (!failover) {
>   // Fresh cluster startup.
>   LOG.info("Clean cluster startup. Assigning user regions");
>   assignAllUserRegions(allRegions);
> }
> {code}
> where assignAllUserRegions would use retainAssignment() call in LoadBalancer; 
> however, from master log,  we usually hit the failover code path:
> {code}
> // If we found user regions out on cluster, its a failover.
> if (failover) {
>   LOG.info("Found regions out on cluster or in RIT; presuming failover");
>   // Process list of dead servers and regions in RIT.
>   // See HBASE-4580 for more information.
>   processDeadServersAndRecoverLostRegions(deadServers);
> }
> {code}
> where processDeadServersAndRecoverLostRegions() would put dead servers in SSH 
> and SSH uses roundRobinAssignment() in LoadBalancer.  That is why we would 
> see loss locality more often than retaining locality during cluster restart.
> Note: the code I was looking at is close to branch-1 and branch-1.1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18036) Data locality is not maintained after cluster restart or SSH

2017-06-20 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18036:
---
Fix Version/s: 1.4.0

> Data locality is not maintained after cluster restart or SSH
> 
>
> Key: HBASE-18036
> URL: https://issues.apache.org/jira/browse/HBASE-18036
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 1.4.0, 1.3.1, 1.2.5, 1.1.10
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 1.4.0, 1.3.2, 1.1.11, 1.2.7
>
> Attachments: HBASE-18036.v0-branch-1.1.patch, 
> HBASE-18036.v0-branch-1.patch, HBASE-18036.v1-branch-1.1.patch, 
> HBASE-18036.v2-branch-1.1.patch
>
>
> After HBASE-2896 / HBASE-4402, we think data locality is maintained after 
> cluster restart.  However, we have seem some complains about data locality 
> loss when cluster restart (eg. HBASE-17963).  
> Examining the AssignmentManager#processDeadServersAndRegionsInTransition() 
> code,  for cluster start, I expected to hit the following code path:
> {code}
> if (!failover) {
>   // Fresh cluster startup.
>   LOG.info("Clean cluster startup. Assigning user regions");
>   assignAllUserRegions(allRegions);
> }
> {code}
> where assignAllUserRegions would use retainAssignment() call in LoadBalancer; 
> however, from master log,  we usually hit the failover code path:
> {code}
> // If we found user regions out on cluster, its a failover.
> if (failover) {
>   LOG.info("Found regions out on cluster or in RIT; presuming failover");
>   // Process list of dead servers and regions in RIT.
>   // See HBASE-4580 for more information.
>   processDeadServersAndRecoverLostRegions(deadServers);
> }
> {code}
> where processDeadServersAndRecoverLostRegions() would put dead servers in SSH 
> and SSH uses roundRobinAssignment() in LoadBalancer.  That is why we would 
> see loss locality more often than retaining locality during cluster restart.
> Note: the code I was looking at is close to branch-1 and branch-1.1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18036) Data locality is not maintained after cluster restart or SSH

2017-06-20 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056681#comment-16056681
 ] 

Stephen Yuan Jiang commented on HBASE-18036:


[~enis], with Proc-V2 AM, the current change is no longer available.  
Currently, with initial commit of new AM, SSH calls 
AM.createAssignProcedures(), with forceNewPlan=true.  Even forceNewPlan is 
false, when we compare existing plan's ServerName, it will not be equal to the 
dead server due to timestamp change (ServerName is hostname+port+timestamp) & 
hence a new plan/server would be used for the region assignment.  Hence, 
locality is not guaranteed to be retained.  The potential change would be more 
involved than we have now in 1.x code base.  I open HBASE-18246 to track it 
(FYI, [~stack]).  

> Data locality is not maintained after cluster restart or SSH
> 
>
> Key: HBASE-18036
> URL: https://issues.apache.org/jira/browse/HBASE-18036
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 1.4.0, 1.3.1, 1.2.5, 1.1.10
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 1.3.2, 1.1.11, 1.2.7
>
> Attachments: HBASE-18036.v0-branch-1.1.patch, 
> HBASE-18036.v0-branch-1.patch, HBASE-18036.v1-branch-1.1.patch, 
> HBASE-18036.v2-branch-1.1.patch
>
>
> After HBASE-2896 / HBASE-4402, we think data locality is maintained after 
> cluster restart.  However, we have seem some complains about data locality 
> loss when cluster restart (eg. HBASE-17963).  
> Examining the AssignmentManager#processDeadServersAndRegionsInTransition() 
> code,  for cluster start, I expected to hit the following code path:
> {code}
> if (!failover) {
>   // Fresh cluster startup.
>   LOG.info("Clean cluster startup. Assigning user regions");
>   assignAllUserRegions(allRegions);
> }
> {code}
> where assignAllUserRegions would use retainAssignment() call in LoadBalancer; 
> however, from master log,  we usually hit the failover code path:
> {code}
> // If we found user regions out on cluster, its a failover.
> if (failover) {
>   LOG.info("Found regions out on cluster or in RIT; presuming failover");
>   // Process list of dead servers and regions in RIT.
>   // See HBASE-4580 for more information.
>   processDeadServersAndRecoverLostRegions(deadServers);
> }
> {code}
> where processDeadServersAndRecoverLostRegions() would put dead servers in SSH 
> and SSH uses roundRobinAssignment() in LoadBalancer.  That is why we would 
> see loss locality more often than retaining locality during cluster restart.
> Note: the code I was looking at is close to branch-1 and branch-1.1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18246) Proc-V2 AM: Maintain Data locality in ServerCrashProcedure

2017-06-20 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18246:
---
Summary: Proc-V2 AM: Maintain Data locality in ServerCrashProcedure  (was: 
Maintain Data locality in ServerCrashProcedure)

> Proc-V2 AM: Maintain Data locality in ServerCrashProcedure
> --
>
> Key: HBASE-18246
> URL: https://issues.apache.org/jira/browse/HBASE-18246
> Project: HBase
>  Issue Type: Sub-task
>  Components: Region Assignment
>Affects Versions: 2.0.0
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 2.0.0
>
>
> Before HBASE-18036, SSH would use round-robin to re-distribute regions during 
> processing.  Round-robin assignment would loss data locality.  HBASE-18036 
> retains data locality if the dead region server has already restarted when 
> the dead RS is processing.  
> With Proc-V2 based AM, the change of HBASE-18036 in Apache HBASE 1.x releases 
> is no longer possible.  We need to implement the same logic under Proc-V2 
> based AM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18246) Maintain Data locality in ServerCrashProcedure

2017-06-20 Thread Stephen Yuan Jiang (JIRA)
Stephen Yuan Jiang created HBASE-18246:
--

 Summary: Maintain Data locality in ServerCrashProcedure
 Key: HBASE-18246
 URL: https://issues.apache.org/jira/browse/HBASE-18246
 Project: HBase
  Issue Type: Sub-task
  Components: Region Assignment
Affects Versions: 2.0.0
Reporter: Stephen Yuan Jiang
Assignee: Stephen Yuan Jiang


Before HBASE-18036, SSH would use round-robin to re-distribute regions during 
processing.  Round-robin assignment would loss data locality.  HBASE-18036 
retains data locality if the dead region server has already restarted when the 
dead RS is processing.  

With Proc-V2 based AM, the change of HBASE-18036 in Apache HBASE 1.x releases 
is no longer possible.  We need to implement the same logic under Proc-V2 based 
AM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18036) Data locality is not maintained after cluster restart or SSH

2017-06-20 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18036:
---
Fix Version/s: 1.2.7
   1.1.11
   1.3.2

> Data locality is not maintained after cluster restart or SSH
> 
>
> Key: HBASE-18036
> URL: https://issues.apache.org/jira/browse/HBASE-18036
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 1.4.0, 1.3.1, 1.2.5, 1.1.10
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 1.3.2, 1.1.11, 1.2.7
>
> Attachments: HBASE-18036.v0-branch-1.1.patch, 
> HBASE-18036.v0-branch-1.patch, HBASE-18036.v1-branch-1.1.patch, 
> HBASE-18036.v2-branch-1.1.patch
>
>
> After HBASE-2896 / HBASE-4402, we think data locality is maintained after 
> cluster restart.  However, we have seem some complains about data locality 
> loss when cluster restart (eg. HBASE-17963).  
> Examining the AssignmentManager#processDeadServersAndRegionsInTransition() 
> code,  for cluster start, I expected to hit the following code path:
> {code}
> if (!failover) {
>   // Fresh cluster startup.
>   LOG.info("Clean cluster startup. Assigning user regions");
>   assignAllUserRegions(allRegions);
> }
> {code}
> where assignAllUserRegions would use retainAssignment() call in LoadBalancer; 
> however, from master log,  we usually hit the failover code path:
> {code}
> // If we found user regions out on cluster, its a failover.
> if (failover) {
>   LOG.info("Found regions out on cluster or in RIT; presuming failover");
>   // Process list of dead servers and regions in RIT.
>   // See HBASE-4580 for more information.
>   processDeadServersAndRecoverLostRegions(deadServers);
> }
> {code}
> where processDeadServersAndRecoverLostRegions() would put dead servers in SSH 
> and SSH uses roundRobinAssignment() in LoadBalancer.  That is why we would 
> see loss locality more often than retaining locality during cluster restart.
> Note: the code I was looking at is close to branch-1 and branch-1.1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18225) Fix findbugs regression calling toString() on an array

2017-06-15 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051356#comment-16051356
 ] 

Stephen Yuan Jiang commented on HBASE-18225:


Looks good to me.  

> Fix findbugs regression calling toString() on an array
> --
>
> Key: HBASE-18225
> URL: https://issues.apache.org/jira/browse/HBASE-18225
> Project: HBase
>  Issue Type: Bug
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Trivial
> Fix For: 2.0.0, 3.0.0
>
> Attachments: HBASE-18225.001.patch
>
>
> Looks like we got a findbugs warning as a result of HBASE-18166
> {code}
> diff --git 
> a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
>  
> b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
> index 1d04944250..b7e0244aa2 100644
> --- 
> a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
> +++ 
> b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
> @@ -2807,8 +2807,8 @@ public class RSRpcServices implements 
> HBaseRPCErrorHandler,
>  HRegionInfo hri = rsh.s.getRegionInfo();
>  // Yes, should be the same instance
>  if (regionServer.getOnlineRegion(hri.getRegionName()) != rsh.r) {
> -  String msg = "Region was re-opened after the scanner" + scannerName + 
> " was created: "
> -  + hri.getRegionNameAsString();
> +  String msg = "Region has changed on the scanner " + scannerName + ": 
> regionName="
> +  + hri.getRegionName() + ", scannerRegionName=" + rsh.r;
> {code}
> Looks like {{hri.getRegionNameAsString()}} was unintentionally changed to 
> {{hri.getRegionName()}}, [~syuanjiang]/[~stack]?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18166) [AMv2] We are splitting already-split files

2017-06-05 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16038258#comment-16038258
 ] 

Stephen Yuan Jiang commented on HBASE-18166:


[~stack], when I implemented the SplitTableRegionProcedure, I copied the logic 
from SplitTransactionImpl.java:
{code}
  /**
   * Creates reference files for top and bottom half of the
   * @param hstoreFilesToSplit map of store files to create half file 
references for.
   * @return the number of reference files that were created.
   * @throws IOException
   */
  private Pair splitStoreFiles(
  final Map> hstoreFilesToSplit)
  throws IOException {
if (hstoreFilesToSplit == null) {
  // Could be null because close didn't succeed -- for now consider it fatal
  throw new IOException("Close returned empty list of StoreFiles");
}
// The following code sets up a thread pool executor with as many slots as
// there's files to split. It then fires up everything, waits for
// completion and finally checks for any exception
int nbFiles = 0;
for (Map.Entry> entry: 
hstoreFilesToSplit.entrySet()) {
nbFiles += entry.getValue().size();  ===> possible to have reference 
files 
}
{code}

I just wonder whether we should change the logic in SplitTransactionImpl in 
branch-1 to skip splitting reference files (I checked HRegion#doClose() and did 
not see the logic to skip reference files in region server side).

> [AMv2] We are splitting already-split files
> ---
>
> Key: HBASE-18166
> URL: https://issues.apache.org/jira/browse/HBASE-18166
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 2.0.0
>Reporter: stack
>Assignee: stack
> Fix For: 2.0.0
>
> Attachments: HBASE-18166.master.001.patch, 
> HBASE-18166.master.002.patch
>
>
> Interesting issue. The below adds a lag cleaning up files after a compaction 
> in case of on-going Scanners (for read replicas/offheap).
> HBASE-14970 Backport HBASE-13082 and its sub-jira to branch-1 - recommit (Ram)
> What the lag means is that now that split is run from the HMaster in master 
> branch, when it goes to get a listing of the files to split, it can pick up 
> files that are for archiving but that have not been archived yet.  When it 
> does, it goes ahead and splits them... making references of references.
> Its a mess.
> I added asking the Region if it is splittable a while back. The Master calls 
> this from SplitTableRegionProcedure during preparation. If the RegionServer 
> asked for the split, it is sort of redundant work given the RS asks itself if 
> any references still; if any, it'll wait before asking for a split. But if a 
> user/client asks, then this isSplittable over RPC comes in handy.
> I was thinking that isSplittable could return list of files 
> Or, easier, given we know a region is Splittable by the time we go to split 
> the files, then I think master-side we can just skip any references found 
> presuming read-for-archive.
> Will be back with a patch. Want to test on cluster first (Side-effect is 
> regions are offline because file at end of the reference to a reference is 
> removed ... and so the open fails).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-16488) Starting namespace and quota services in master startup asynchronizely

2017-05-23 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-16488:
---
Attachment: HBASE-16488.v7-branch-1.patch

> Starting namespace and quota services in master startup asynchronizely
> --
>
> Key: HBASE-16488
> URL: https://issues.apache.org/jira/browse/HBASE-16488
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 2.0.0, 1.3.0, 1.0.3, 1.4.0, 1.1.5, 1.2.2
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Attachments: HBASE-16488.v1-branch-1.patch, 
> HBASE-16488.v1-master.patch, HBASE-16488.v2-branch-1.patch, 
> HBASE-16488.v2-branch-1.patch, HBASE-16488.v3-branch-1.patch, 
> HBASE-16488.v3-branch-1.patch, HBASE-16488.v4-branch-1.patch, 
> HBASE-16488.v5-branch-1.patch, HBASE-16488.v6-branch-1.patch, 
> HBASE-16488.v7-branch-1.patch
>
>
> From time to time, during internal IT test and from customer, we often see 
> master initialization failed due to namespace table region takes long time to 
> assign (eg. sometimes split log takes long time or hanging; or sometimes RS 
> is temporarily not available; sometimes due to some unknown assignment 
> issue).  In the past, there was some proposal to improve this situation, eg. 
> HBASE-13556 / HBASE-14190 (Assign system tables ahead of user region 
> assignment) or HBASE-13557 (Special WAL handling for system tables) or  
> HBASE-14623 (Implement dedicated WAL for system tables).  
> This JIRA proposes another way to solve this master initialization fail 
> issue: namespace service is only used by a handful operations (eg. create 
> table / namespace DDL / get namespace API / some RS group DDL).  Only quota 
> manager depends on it and quota management is off by default.  Therefore, 
> namespace service is not really needed for master to be functional.  So we 
> could start namespace service asynchronizely without blocking master startup.
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18093) Overloading the meaning of 'enabled' in Quota Manager to indicate either quota disabled or quota manager not ready is not good

2017-05-23 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18093:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 1.1.11
   1.3.2
   1.2.6
   1.4.0
   2.0.0
   Status: Resolved  (was: Patch Available)

> Overloading the meaning of 'enabled' in Quota Manager to indicate either 
> quota disabled or quota manager not ready is not good
> --
>
> Key: HBASE-18093
> URL: https://issues.apache.org/jira/browse/HBASE-18093
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.1.10
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
>Priority: Minor
> Fix For: 2.0.0, 1.4.0, 1.2.6, 1.3.2, 1.1.11
>
> Attachments: HBASE-18093.v1-branch-1.patch, 
> HBASE-18093.v1-master.patch, HBASE-18093.v2-master.patch, 
> HBASE-18093.v3-master.patch
>
>
> In MasterQuotaManager, a member 'enabled' is used to indicate either quota 
> feature is disabled or quota manager is not fully initialized.  This would 
> create confusion whether caller should wait for quota manager to be 
> initialized or change configuration to enable quota.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18093) Overloading the meaning of 'enabled' in Quota Manager to indicate either quota disabled or quota manager not ready is not good

2017-05-23 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16021752#comment-16021752
 ] 

Stephen Yuan Jiang commented on HBASE-18093:


Test failures in branch-1 run are not related to the change.

> Overloading the meaning of 'enabled' in Quota Manager to indicate either 
> quota disabled or quota manager not ready is not good
> --
>
> Key: HBASE-18093
> URL: https://issues.apache.org/jira/browse/HBASE-18093
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.1.10
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
>Priority: Minor
> Attachments: HBASE-18093.v1-branch-1.patch, 
> HBASE-18093.v1-master.patch, HBASE-18093.v2-master.patch, 
> HBASE-18093.v3-master.patch
>
>
> In MasterQuotaManager, a member 'enabled' is used to indicate either quota 
> feature is disabled or quota manager is not fully initialized.  This would 
> create confusion whether caller should wait for quota manager to be 
> initialized or change configuration to enable quota.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18093) Overloading the meaning of 'enabled' in Quota Manager to indicate either quota disabled or quota manager not ready is not good

2017-05-23 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18093:
---
Attachment: HBASE-18093.v1-branch-1.patch

> Overloading the meaning of 'enabled' in Quota Manager to indicate either 
> quota disabled or quota manager not ready is not good
> --
>
> Key: HBASE-18093
> URL: https://issues.apache.org/jira/browse/HBASE-18093
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.1.10
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
>Priority: Minor
> Attachments: HBASE-18093.v1-branch-1.patch, 
> HBASE-18093.v1-master.patch, HBASE-18093.v2-master.patch, 
> HBASE-18093.v3-master.patch
>
>
> In MasterQuotaManager, a member 'enabled' is used to indicate either quota 
> feature is disabled or quota manager is not fully initialized.  This would 
> create confusion whether caller should wait for quota manager to be 
> initialized or change configuration to enable quota.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18093) Overloading the meaning of 'enabled' in Quota Manager to indicate either quota disabled or quota manager not ready is not good

2017-05-23 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16021201#comment-16021201
 ] 

Stephen Yuan Jiang commented on HBASE-18093:


Pre-commit failure unrelated to change; seems env issue.  (Also re-run test 
locally and no problem)

> Overloading the meaning of 'enabled' in Quota Manager to indicate either 
> quota disabled or quota manager not ready is not good
> --
>
> Key: HBASE-18093
> URL: https://issues.apache.org/jira/browse/HBASE-18093
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.1.10
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
>Priority: Minor
> Attachments: HBASE-18093.v1-master.patch, 
> HBASE-18093.v2-master.patch, HBASE-18093.v3-master.patch
>
>
> In MasterQuotaManager, a member 'enabled' is used to indicate either quota 
> feature is disabled or quota manager is not fully initialized.  This would 
> create confusion whether caller should wait for quota manager to be 
> initialized or change configuration to enable quota.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18093) Overloading the meaning of 'enabled' in Quota Manager to indicate either quota disabled or quota manager not ready is not good

2017-05-22 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16020378#comment-16020378
 ] 

Stephen Yuan Jiang commented on HBASE-18093:


Rebase the latest change in master in V3 patch.

> Overloading the meaning of 'enabled' in Quota Manager to indicate either 
> quota disabled or quota manager not ready is not good
> --
>
> Key: HBASE-18093
> URL: https://issues.apache.org/jira/browse/HBASE-18093
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.1.10
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
>Priority: Minor
> Attachments: HBASE-18093.v1-master.patch, 
> HBASE-18093.v2-master.patch, HBASE-18093.v3-master.patch
>
>
> In MasterQuotaManager, a member 'enabled' is used to indicate either quota 
> feature is disabled or quota manager is not fully initialized.  This would 
> create confusion whether caller should wait for quota manager to be 
> initialized or change configuration to enable quota.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18093) Overloading the meaning of 'enabled' in Quota Manager to indicate either quota disabled or quota manager not ready is not good

2017-05-22 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18093:
---
Attachment: HBASE-18093.v3-master.patch

> Overloading the meaning of 'enabled' in Quota Manager to indicate either 
> quota disabled or quota manager not ready is not good
> --
>
> Key: HBASE-18093
> URL: https://issues.apache.org/jira/browse/HBASE-18093
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.1.10
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
>Priority: Minor
> Attachments: HBASE-18093.v1-master.patch, 
> HBASE-18093.v2-master.patch, HBASE-18093.v3-master.patch
>
>
> In MasterQuotaManager, a member 'enabled' is used to indicate either quota 
> feature is disabled or quota manager is not fully initialized.  This would 
> create confusion whether caller should wait for quota manager to be 
> initialized or change configuration to enable quota.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18093) Overloading the meaning of 'enabled' in Quota Manager to indicate either quota disabled or quota manager not ready is not good

2017-05-22 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16020259#comment-16020259
 ] 

Stephen Yuan Jiang commented on HBASE-18093:


The failure from V1 patch does not make sense.  Things is good locally.  Attach 
V2 patch addressing the typo issue found by [~te...@apache.org]

> Overloading the meaning of 'enabled' in Quota Manager to indicate either 
> quota disabled or quota manager not ready is not good
> --
>
> Key: HBASE-18093
> URL: https://issues.apache.org/jira/browse/HBASE-18093
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.1.10
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
>Priority: Minor
> Attachments: HBASE-18093.v1-master.patch, HBASE-18093.v2-master.patch
>
>
> In MasterQuotaManager, a member 'enabled' is used to indicate either quota 
> feature is disabled or quota manager is not fully initialized.  This would 
> create confusion whether caller should wait for quota manager to be 
> initialized or change configuration to enable quota.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18093) Overloading the meaning of 'enabled' in Quota Manager to indicate either quota disabled or quota manager not ready is not good

2017-05-22 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18093:
---
Attachment: HBASE-18093.v2-master.patch

> Overloading the meaning of 'enabled' in Quota Manager to indicate either 
> quota disabled or quota manager not ready is not good
> --
>
> Key: HBASE-18093
> URL: https://issues.apache.org/jira/browse/HBASE-18093
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.1.10
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
>Priority: Minor
> Attachments: HBASE-18093.v1-master.patch, HBASE-18093.v2-master.patch
>
>
> In MasterQuotaManager, a member 'enabled' is used to indicate either quota 
> feature is disabled or quota manager is not fully initialized.  This would 
> create confusion whether caller should wait for quota manager to be 
> initialized or change configuration to enable quota.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16488) Starting namespace and quota services in master startup asynchronizely

2017-05-22 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16020184#comment-16020184
 ] 

Stephen Yuan Jiang commented on HBASE-16488:


hadoop.hbase.quotas.TestQuotaAdmin failure should be addressed in a generic fix 
in HBASE-18093

> Starting namespace and quota services in master startup asynchronizely
> --
>
> Key: HBASE-16488
> URL: https://issues.apache.org/jira/browse/HBASE-16488
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 2.0.0, 1.3.0, 1.0.3, 1.4.0, 1.1.5, 1.2.2
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Attachments: HBASE-16488.v1-branch-1.patch, 
> HBASE-16488.v1-master.patch, HBASE-16488.v2-branch-1.patch, 
> HBASE-16488.v2-branch-1.patch, HBASE-16488.v3-branch-1.patch, 
> HBASE-16488.v3-branch-1.patch, HBASE-16488.v4-branch-1.patch, 
> HBASE-16488.v5-branch-1.patch, HBASE-16488.v6-branch-1.patch
>
>
> From time to time, during internal IT test and from customer, we often see 
> master initialization failed due to namespace table region takes long time to 
> assign (eg. sometimes split log takes long time or hanging; or sometimes RS 
> is temporarily not available; sometimes due to some unknown assignment 
> issue).  In the past, there was some proposal to improve this situation, eg. 
> HBASE-13556 / HBASE-14190 (Assign system tables ahead of user region 
> assignment) or HBASE-13557 (Special WAL handling for system tables) or  
> HBASE-14623 (Implement dedicated WAL for system tables).  
> This JIRA proposes another way to solve this master initialization fail 
> issue: namespace service is only used by a handful operations (eg. create 
> table / namespace DDL / get namespace API / some RS group DDL).  Only quota 
> manager depends on it and quota management is off by default.  Therefore, 
> namespace service is not really needed for master to be functional.  So we 
> could start namespace service asynchronizely without blocking master startup.
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18093) Overloading the meaning of 'enabled' in Quota Manager to indicate either quota disabled or quota manager not ready is not good

2017-05-22 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16020180#comment-16020180
 ] 

Stephen Yuan Jiang commented on HBASE-18093:


V1 patch to distinguish whether quota is disabled or quota manger is 
uninitialized. 

> Overloading the meaning of 'enabled' in Quota Manager to indicate either 
> quota disabled or quota manager not ready is not good
> --
>
> Key: HBASE-18093
> URL: https://issues.apache.org/jira/browse/HBASE-18093
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.1.10
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
>Priority: Minor
> Attachments: HBASE-18093.v1-master.patch
>
>
> In MasterQuotaManager, a member 'enabled' is used to indicate either quota 
> feature is disabled or quota manager is not fully initialized.  This would 
> create confusion whether caller should wait for quota manager to be 
> initialized or change configuration to enable quota.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18093) Overloading 'enabled' in Quota Manager to indicate either quota disabled or quota manager not ready is not good

2017-05-22 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18093:
---
Summary: Overloading 'enabled' in Quota Manager to indicate either quota 
disabled or quota manager not ready is not good  (was: Overload 'enabled' in 
Quota Manager)

> Overloading 'enabled' in Quota Manager to indicate either quota disabled or 
> quota manager not ready is not good
> ---
>
> Key: HBASE-18093
> URL: https://issues.apache.org/jira/browse/HBASE-18093
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.1.10
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
>Priority: Minor
> Attachments: HBASE-18093.v1-master.patch
>
>
> In MasterQuotaManager, a member 'enabled' is used to indicate either quota 
> feature is disabled or quota manager is not fully initialized.  This would 
> create confusion whether caller should wait for quota manager to be 
> initialized or change configuration to enable quota.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18093) Overloading the meaning of 'enabled' in Quota Manager to indicate either quota disabled or quota manager not ready is not good

2017-05-22 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18093:
---
Status: Patch Available  (was: Open)

> Overloading the meaning of 'enabled' in Quota Manager to indicate either 
> quota disabled or quota manager not ready is not good
> --
>
> Key: HBASE-18093
> URL: https://issues.apache.org/jira/browse/HBASE-18093
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.1.10
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
>Priority: Minor
> Attachments: HBASE-18093.v1-master.patch
>
>
> In MasterQuotaManager, a member 'enabled' is used to indicate either quota 
> feature is disabled or quota manager is not fully initialized.  This would 
> create confusion whether caller should wait for quota manager to be 
> initialized or change configuration to enable quota.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18093) Overloading the meaning of 'enabled' in Quota Manager to indicate either quota disabled or quota manager not ready is not good

2017-05-22 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18093:
---
Summary: Overloading the meaning of 'enabled' in Quota Manager to indicate 
either quota disabled or quota manager not ready is not good  (was: Overloading 
'enabled' in Quota Manager to indicate either quota disabled or quota manager 
not ready is not good)

> Overloading the meaning of 'enabled' in Quota Manager to indicate either 
> quota disabled or quota manager not ready is not good
> --
>
> Key: HBASE-18093
> URL: https://issues.apache.org/jira/browse/HBASE-18093
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.1.10
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
>Priority: Minor
> Attachments: HBASE-18093.v1-master.patch
>
>
> In MasterQuotaManager, a member 'enabled' is used to indicate either quota 
> feature is disabled or quota manager is not fully initialized.  This would 
> create confusion whether caller should wait for quota manager to be 
> initialized or change configuration to enable quota.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18093) Overload 'enabled' in Quota Manager

2017-05-22 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18093:
---
Attachment: HBASE-18093.v1-master.patch

> Overload 'enabled' in Quota Manager
> ---
>
> Key: HBASE-18093
> URL: https://issues.apache.org/jira/browse/HBASE-18093
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.1.10
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
>Priority: Minor
> Attachments: HBASE-18093.v1-master.patch
>
>
> In MasterQuotaManager, a member 'enabled' is used to indicate either quota 
> feature is disabled or quota manager is not fully initialized.  This would 
> create confusion whether caller should wait for quota manager to be 
> initialized or change configuration to enable quota.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-18093) Overload 'enabled' in Quota Manager

2017-05-22 Thread Stephen Yuan Jiang (JIRA)
Stephen Yuan Jiang created HBASE-18093:
--

 Summary: Overload 'enabled' in Quota Manager
 Key: HBASE-18093
 URL: https://issues.apache.org/jira/browse/HBASE-18093
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 1.1.10
Reporter: Stephen Yuan Jiang
Assignee: Stephen Yuan Jiang
Priority: Minor


In MasterQuotaManager, a member 'enabled' is used to indicate either quota 
feature is disabled or quota manager is not fully initialized.  This would 
create confusion whether caller should wait for quota manager to be initialized 
or change configuration to enable quota.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18067) Support a default converter for data read shell commands

2017-05-19 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16017837#comment-16017837
 ] 

Stephen Yuan Jiang commented on HBASE-18067:


+1 Good stuff!  Thanks, Josh.

> Support a default converter for data read shell commands
> 
>
> Key: HBASE-18067
> URL: https://issues.apache.org/jira/browse/HBASE-18067
> Project: HBase
>  Issue Type: Improvement
>  Components: shell
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-18067.001.patch, HBASE-18067.002.patch, 
> HBASE-18067.003.patch
>
>
> The {{get}} and {{scan}} shell commands have the ability to specify some 
> complicated syntax on how to encode the bytes read from HBase on a per-column 
> basis. By default, bytes falling outside of a limited range of ASCII are just 
> printed as hex.
> It seems like the intent of these converts was to support conversion of 
> certain numeric columns as a readable string (e.g. 1234).
> However, if non-ascii encoded bytes are stored in the table (e.g. UTF-8 
> encoded bytes), we may want to treat all data we read as UTF-8 instead (e.g. 
> if row+column+value are in Chinese). It would be onerous to require users to 
> enumerate every column they're reading to parse as UTF-8 instead of the 
> limited ascii range. We can provide an option to encode all values retrieved 
> by the command.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18036) Data locality is not maintained after cluster restart or SSH

2017-05-14 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18036:
---
Attachment: HBASE-18036.v0-branch-1.patch

> Data locality is not maintained after cluster restart or SSH
> 
>
> Key: HBASE-18036
> URL: https://issues.apache.org/jira/browse/HBASE-18036
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 1.4.0, 1.3.1, 1.2.5, 1.1.10
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Attachments: HBASE-18036.v0-branch-1.1.patch, 
> HBASE-18036.v0-branch-1.patch, HBASE-18036.v1-branch-1.1.patch, 
> HBASE-18036.v2-branch-1.1.patch
>
>
> After HBASE-2896 / HBASE-4402, we think data locality is maintained after 
> cluster restart.  However, we have seem some complains about data locality 
> loss when cluster restart (eg. HBASE-17963).  
> Examining the AssignmentManager#processDeadServersAndRegionsInTransition() 
> code,  for cluster start, I expected to hit the following code path:
> {code}
> if (!failover) {
>   // Fresh cluster startup.
>   LOG.info("Clean cluster startup. Assigning user regions");
>   assignAllUserRegions(allRegions);
> }
> {code}
> where assignAllUserRegions would use retainAssignment() call in LoadBalancer; 
> however, from master log,  we usually hit the failover code path:
> {code}
> // If we found user regions out on cluster, its a failover.
> if (failover) {
>   LOG.info("Found regions out on cluster or in RIT; presuming failover");
>   // Process list of dead servers and regions in RIT.
>   // See HBASE-4580 for more information.
>   processDeadServersAndRecoverLostRegions(deadServers);
> }
> {code}
> where processDeadServersAndRecoverLostRegions() would put dead servers in SSH 
> and SSH uses roundRobinAssignment() in LoadBalancer.  That is why we would 
> see loss locality more often than retaining locality during cluster restart.
> Note: the code I was looking at is close to branch-1 and branch-1.1.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18036) Data locality is not maintained after cluster restart or SSH

2017-05-12 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-18036:
---
Attachment: HBASE-18036.v2-branch-1.1.patch

> Data locality is not maintained after cluster restart or SSH
> 
>
> Key: HBASE-18036
> URL: https://issues.apache.org/jira/browse/HBASE-18036
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 1.4.0, 1.3.1, 1.2.5, 1.1.10
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Attachments: HBASE-18036.v0-branch-1.1.patch, 
> HBASE-18036.v1-branch-1.1.patch, HBASE-18036.v2-branch-1.1.patch
>
>
> After HBASE-2896 / HBASE-4402, we think data locality is maintained after 
> cluster restart.  However, we have seem some complains about data locality 
> loss when cluster restart (eg. HBASE-17963).  
> Examining the AssignmentManager#processDeadServersAndRegionsInTransition() 
> code,  for cluster start, I expected to hit the following code path:
> {code}
> if (!failover) {
>   // Fresh cluster startup.
>   LOG.info("Clean cluster startup. Assigning user regions");
>   assignAllUserRegions(allRegions);
> }
> {code}
> where assignAllUserRegions would use retainAssignment() call in LoadBalancer; 
> however, from master log,  we usually hit the failover code path:
> {code}
> // If we found user regions out on cluster, its a failover.
> if (failover) {
>   LOG.info("Found regions out on cluster or in RIT; presuming failover");
>   // Process list of dead servers and regions in RIT.
>   // See HBASE-4580 for more information.
>   processDeadServersAndRecoverLostRegions(deadServers);
> }
> {code}
> where processDeadServersAndRecoverLostRegions() would put dead servers in SSH 
> and SSH uses roundRobinAssignment() in LoadBalancer.  That is why we would 
> see loss locality more often than retaining locality during cluster restart.
> Note: the code I was looking at is close to branch-1 and branch-1.1.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HBASE-18036) Data locality is not maintained after cluster restart or SSH

2017-05-12 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008684#comment-16008684
 ] 

Stephen Yuan Jiang edited comment on HBASE-18036 at 5/12/17 9:01 PM:
-

The V1 patch has minor change based on [~elserj]'s feedback.  Also add some 
logging to make the change clear.

The V1 change was tested in a small cluster.  I used Ambari to restart cluster 
and saw the new code path got hit and regions assigned back to its original 
region server and locality is preserved.

Next up: I will use the same logic in branch-1 and other child branches.  Base 
on [~devaraj]'s offline feedback, I will remove the newly introduced 
"hbase.master.retain.assignment" config in branch-1; but keep the config in 
other branches (this config is just for in case of regression, user has a way 
to revert back to original round robin behavior; as patch releases usually 
don't have full testing)


was (Author: syuanjiang):
The V1 patch has minor change based on [~elserj]'s feedback.  Also add some 
logging to make the change clear.

Next up: I will use the same logic in branch-1 and other child branches.  Base 
on [~devaraj]'s offline feedback, I will remove the newly introduced 
"hbase.master.retain.assignment" config in branch-1; but keep the config in 
other branches (this config is just for in case of regression, user has a way 
to revert back to original round robin behavior; as patch releases usually 
don't have full testing)

> Data locality is not maintained after cluster restart or SSH
> 
>
> Key: HBASE-18036
> URL: https://issues.apache.org/jira/browse/HBASE-18036
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 1.4.0, 1.3.1, 1.2.5, 1.1.10
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Attachments: HBASE-18036.v0-branch-1.1.patch, 
> HBASE-18036.v1-branch-1.1.patch
>
>
> After HBASE-2896 / HBASE-4402, we think data locality is maintained after 
> cluster restart.  However, we have seem some complains about data locality 
> loss when cluster restart (eg. HBASE-17963).  
> Examining the AssignmentManager#processDeadServersAndRegionsInTransition() 
> code,  for cluster start, I expected to hit the following code path:
> {code}
> if (!failover) {
>   // Fresh cluster startup.
>   LOG.info("Clean cluster startup. Assigning user regions");
>   assignAllUserRegions(allRegions);
> }
> {code}
> where assignAllUserRegions would use retainAssignment() call in LoadBalancer; 
> however, from master log,  we usually hit the failover code path:
> {code}
> // If we found user regions out on cluster, its a failover.
> if (failover) {
>   LOG.info("Found regions out on cluster or in RIT; presuming failover");
>   // Process list of dead servers and regions in RIT.
>   // See HBASE-4580 for more information.
>   processDeadServersAndRecoverLostRegions(deadServers);
> }
> {code}
> where processDeadServersAndRecoverLostRegions() would put dead servers in SSH 
> and SSH uses roundRobinAssignment() in LoadBalancer.  That is why we would 
> see loss locality more often than retaining locality during cluster restart.
> Note: the code I was looking at is close to branch-1 and branch-1.1.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


  1   2   3   4   5   6   7   8   9   10   >