[jira] [Commented] (HBASE-22356) API to get hdfs block distribution from regionservers

2019-11-08 Thread Thiruvel Thirumoolan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970624#comment-16970624
 ] 

Thiruvel Thirumoolan commented on HBASE-22356:
--

[~binlijin] - Updated the patch on reviewboard for you to quickly see the 
differences and also submitted a PR to github master branch. Thanks for your 
time in reviews.

> API to get hdfs block distribution from regionservers
> -
>
> Key: HBASE-22356
> URL: https://issues.apache.org/jira/browse/HBASE-22356
> Project: HBase
>  Issue Type: Sub-task
>  Components: API, Balancer, regionserver
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
>  Labels: balancer
> Fix For: 3.0.0, 2.3.0, 1.6.0
>
> Attachments: HBASE-22356.master.001.patch, 
> HBASE-22356.master.002.patch, HBASE-22356.master.003.patch
>
>
> A RegionServer API has to be added which will return HDFSBlockDistribution 
> for all the regions it hosts. RS already has this info cached and updated 
> when flush/compaction happens. Master can query and get instead of hitting 
> the namenode and caching. The larger the cluster becomes, the more costly it 
> becomes to get this information and more stale the cached information becomes.
> This jira is only to add the API to regionserver.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-22356) API to get hdfs block distribution from regionservers

2019-11-08 Thread Thiruvel Thirumoolan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-22356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-22356:
-
Attachment: HBASE-22356.master.003.patch

> API to get hdfs block distribution from regionservers
> -
>
> Key: HBASE-22356
> URL: https://issues.apache.org/jira/browse/HBASE-22356
> Project: HBase
>  Issue Type: Sub-task
>  Components: API, Balancer, regionserver
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
>  Labels: balancer
> Fix For: 3.0.0, 2.3.0, 1.6.0
>
> Attachments: HBASE-22356.master.001.patch, 
> HBASE-22356.master.002.patch, HBASE-22356.master.003.patch
>
>
> A RegionServer API has to be added which will return HDFSBlockDistribution 
> for all the regions it hosts. RS already has this info cached and updated 
> when flush/compaction happens. Master can query and get instead of hitting 
> the namenode and caching. The larger the cluster becomes, the more costly it 
> becomes to get this information and more stale the cached information becomes.
> This jira is only to add the API to regionserver.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23219) Re-enable ZKLess tests for branch-1 (Revert HBASE-14622)

2019-11-01 Thread Thiruvel Thirumoolan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16965139#comment-16965139
 ] 

Thiruvel Thirumoolan commented on HBASE-23219:
--

Thanks [~apurtell]

> Re-enable ZKLess tests for branch-1 (Revert HBASE-14622)
> 
>
> Key: HBASE-23219
> URL: https://issues.apache.org/jira/browse/HBASE-23219
> Project: HBase
>  Issue Type: Task
>  Components: test
>Affects Versions: 1.3.6
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Trivial
> Fix For: 1.6.0, 1.4.12, 1.3.7
>
> Attachments: HBASE-23219.branch-1.001.patch, 
> HBASE-23219.branch-1.3.001.patch, HBASE-23219.branch-1.4.001.patch
>
>
> Since we are using zkless in our production setup, we would like to enable 
> these tests back in apache on branch-1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-23219) Re-enable ZKLess tests for branch-1 (Revert HBASE-14622)

2019-10-30 Thread Thiruvel Thirumoolan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-23219:
-
Status: Patch Available  (was: Open)

> Re-enable ZKLess tests for branch-1 (Revert HBASE-14622)
> 
>
> Key: HBASE-23219
> URL: https://issues.apache.org/jira/browse/HBASE-23219
> Project: HBase
>  Issue Type: Task
>  Components: test
>Affects Versions: 1.3.6
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Trivial
> Fix For: 1.4.12, 1.3.7
>
> Attachments: HBASE-23219.branch-1.001.patch, 
> HBASE-23219.branch-1.3.001.patch, HBASE-23219.branch-1.4.001.patch
>
>
> Since we are using zkless in our production setup, we would like to enable 
> these tests back in apache on branch-1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-23219) Re-enable ZKLess tests for branch-1 (Revert HBASE-14622)

2019-10-30 Thread Thiruvel Thirumoolan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-23219:
-
Fix Version/s: (was: 1.5.1)
   (was: 1.6.0)
   Status: Open  (was: Patch Available)

> Re-enable ZKLess tests for branch-1 (Revert HBASE-14622)
> 
>
> Key: HBASE-23219
> URL: https://issues.apache.org/jira/browse/HBASE-23219
> Project: HBase
>  Issue Type: Task
>  Components: test
>Affects Versions: 1.3.6
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Trivial
> Fix For: 1.4.12, 1.3.7
>
> Attachments: HBASE-23219.branch-1.001.patch, 
> HBASE-23219.branch-1.3.001.patch, HBASE-23219.branch-1.4.001.patch
>
>
> Since we are using zkless in our production setup, we would like to enable 
> these tests back in apache on branch-1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-23219) Re-enable ZKLess tests for branch-1 (Revert HBASE-14622)

2019-10-29 Thread Thiruvel Thirumoolan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-23219:
-
Attachment: HBASE-23219.branch-1.4.001.patch

> Re-enable ZKLess tests for branch-1 (Revert HBASE-14622)
> 
>
> Key: HBASE-23219
> URL: https://issues.apache.org/jira/browse/HBASE-23219
> Project: HBase
>  Issue Type: Task
>  Components: test
>Affects Versions: 1.3.6
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Trivial
> Fix For: 1.6.0, 1.4.12, 1.3.7, 1.5.1
>
> Attachments: HBASE-23219.branch-1.001.patch, 
> HBASE-23219.branch-1.3.001.patch, HBASE-23219.branch-1.4.001.patch
>
>
> Since we are using zkless in our production setup, we would like to enable 
> these tests back in apache on branch-1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-23219) Re-enable ZKLess tests for branch-1 (Revert HBASE-14622)

2019-10-29 Thread Thiruvel Thirumoolan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-23219:
-
Attachment: HBASE-23219.branch-1.3.001.patch

> Re-enable ZKLess tests for branch-1 (Revert HBASE-14622)
> 
>
> Key: HBASE-23219
> URL: https://issues.apache.org/jira/browse/HBASE-23219
> Project: HBase
>  Issue Type: Task
>  Components: test
>Affects Versions: 1.3.6
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Trivial
> Fix For: 1.6.0, 1.4.12, 1.3.7, 1.5.1
>
> Attachments: HBASE-23219.branch-1.001.patch, 
> HBASE-23219.branch-1.3.001.patch
>
>
> Since we are using zkless in our production setup, we would like to enable 
> these tests back in apache on branch-1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23219) Re-enable ZKLess tests for branch-1 (Revert HBASE-14622)

2019-10-25 Thread Thiruvel Thirumoolan (Jira)
Thiruvel Thirumoolan created HBASE-23219:


 Summary: Re-enable ZKLess tests for branch-1 (Revert HBASE-14622)
 Key: HBASE-23219
 URL: https://issues.apache.org/jira/browse/HBASE-23219
 Project: HBase
  Issue Type: Task
  Components: test
Affects Versions: 1.3.6
Reporter: Thiruvel Thirumoolan
Assignee: Thiruvel Thirumoolan
 Fix For: 1.4.12, 1.3.7


Since we are using zkless in our production setup, we would like to enable 
these tests back in apache on branch-1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-23219) Re-enable ZKLess tests for branch-1 (Revert HBASE-14622)

2019-10-25 Thread Thiruvel Thirumoolan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-23219:
-
Status: Patch Available  (was: Open)

> Re-enable ZKLess tests for branch-1 (Revert HBASE-14622)
> 
>
> Key: HBASE-23219
> URL: https://issues.apache.org/jira/browse/HBASE-23219
> Project: HBase
>  Issue Type: Task
>  Components: test
>Affects Versions: 1.3.6
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Trivial
> Fix For: 1.4.12, 1.3.7
>
> Attachments: HBASE-23219.branch-1.001.patch
>
>
> Since we are using zkless in our production setup, we would like to enable 
> these tests back in apache on branch-1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-23219) Re-enable ZKLess tests for branch-1 (Revert HBASE-14622)

2019-10-25 Thread Thiruvel Thirumoolan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-23219:
-
Fix Version/s: 1.5.1
   1.6.0

> Re-enable ZKLess tests for branch-1 (Revert HBASE-14622)
> 
>
> Key: HBASE-23219
> URL: https://issues.apache.org/jira/browse/HBASE-23219
> Project: HBase
>  Issue Type: Task
>  Components: test
>Affects Versions: 1.3.6
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Trivial
> Fix For: 1.6.0, 1.4.12, 1.3.7, 1.5.1
>
> Attachments: HBASE-23219.branch-1.001.patch
>
>
> Since we are using zkless in our production setup, we would like to enable 
> these tests back in apache on branch-1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-23219) Re-enable ZKLess tests for branch-1 (Revert HBASE-14622)

2019-10-25 Thread Thiruvel Thirumoolan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-23219:
-
Attachment: HBASE-23219.branch-1.001.patch

> Re-enable ZKLess tests for branch-1 (Revert HBASE-14622)
> 
>
> Key: HBASE-23219
> URL: https://issues.apache.org/jira/browse/HBASE-23219
> Project: HBase
>  Issue Type: Task
>  Components: test
>Affects Versions: 1.3.6
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Trivial
> Fix For: 1.4.12, 1.3.7
>
> Attachments: HBASE-23219.branch-1.001.patch
>
>
> Since we are using zkless in our production setup, we would like to enable 
> these tests back in apache on branch-1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-22356) API to get hdfs block distribution from regionservers

2019-05-03 Thread Thiruvel Thirumoolan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-22356:
-
Attachment: HBASE-22356.master.002.patch

> API to get hdfs block distribution from regionservers
> -
>
> Key: HBASE-22356
> URL: https://issues.apache.org/jira/browse/HBASE-22356
> Project: HBase
>  Issue Type: Sub-task
>  Components: API, Balancer, regionserver
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
>  Labels: balancer
> Fix For: 3.0.0, 2.2.1, 1.5.1
>
> Attachments: HBASE-22356.master.001.patch, 
> HBASE-22356.master.002.patch
>
>
> A RegionServer API has to be added which will return HDFSBlockDistribution 
> for all the regions it hosts. RS already has this info cached and updated 
> when flush/compaction happens. Master can query and get instead of hitting 
> the namenode and caching. The larger the cluster becomes, the more costly it 
> becomes to get this information and more stale the cached information becomes.
> This jira is only to add the API to regionserver.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22356) API to get hdfs block distribution from regionservers

2019-05-02 Thread Thiruvel Thirumoolan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-22356:
-
Status: Patch Available  (was: Open)

Submitting precommit for master. Once patch is good for master, will work on 
branch-1 patch.

> API to get hdfs block distribution from regionservers
> -
>
> Key: HBASE-22356
> URL: https://issues.apache.org/jira/browse/HBASE-22356
> Project: HBase
>  Issue Type: Sub-task
>  Components: API, Balancer, regionserver
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 3.0.0, 2.2.1, 1.5.1
>
> Attachments: HBASE-22356.master.001.patch
>
>
> A RegionServer API has to be added which will return HDFSBlockDistribution 
> for all the regions it hosts. RS already has this info cached and updated 
> when flush/compaction happens. Master can query and get instead of hitting 
> the namenode and caching. The larger the cluster becomes, the more costly it 
> becomes to get this information and more stale the cached information becomes.
> This jira is only to add the API to regionserver.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22356) API to get hdfs block distribution from regionservers

2019-05-02 Thread Thiruvel Thirumoolan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-22356:
-
Attachment: HBASE-22356.master.001.patch

> API to get hdfs block distribution from regionservers
> -
>
> Key: HBASE-22356
> URL: https://issues.apache.org/jira/browse/HBASE-22356
> Project: HBase
>  Issue Type: Sub-task
>  Components: API, Balancer, regionserver
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 3.0.0, 2.2.1, 1.5.1
>
> Attachments: HBASE-22356.master.001.patch
>
>
> A RegionServer API has to be added which will return HDFSBlockDistribution 
> for all the regions it hosts. RS already has this info cached and updated 
> when flush/compaction happens. Master can query and get instead of hitting 
> the namenode and caching. The larger the cluster becomes, the more costly it 
> becomes to get this information and more stale the cached information becomes.
> This jira is only to add the API to regionserver.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-22356) API to get hdfs block distribution from regionservers

2019-05-02 Thread Thiruvel Thirumoolan (JIRA)
Thiruvel Thirumoolan created HBASE-22356:


 Summary: API to get hdfs block distribution from regionservers
 Key: HBASE-22356
 URL: https://issues.apache.org/jira/browse/HBASE-22356
 Project: HBase
  Issue Type: Sub-task
  Components: API, Balancer, regionserver
Reporter: Thiruvel Thirumoolan
Assignee: Thiruvel Thirumoolan
 Fix For: 3.0.0, 2.2.1, 1.5.1


A RegionServer API has to be added which will return HDFSBlockDistribution for 
all the regions it hosts. RS already has this info cached and updated when 
flush/compaction happens. Master can query and get instead of hitting the 
namenode and caching. The larger the cluster becomes, the more costly it 
becomes to get this information and more stale the cached information becomes.

This jira is only to add the API to regionserver.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-15533) Add RSGroup Favored Balancer

2019-04-18 Thread Thiruvel Thirumoolan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820830#comment-16820830
 ] 

Thiruvel Thirumoolan commented on HBASE-15533:
--

[~zodvik],

Thanks for the interest. Most of this patch should work as it is. Since 
majority of these tests dependent on FavoredStochastic unit tests, lemme get 
HBASE-18349 in before continuing to work on this. I think I have a patch to fix 
all but one unit test. So will post a partial patch or the whole patch on 
HBASE-18349 and then will resume work on this.

> Add RSGroup Favored Balancer
> 
>
> Key: HBASE-15533
> URL: https://issues.apache.org/jira/browse/HBASE-15533
> Project: HBase
>  Issue Type: Sub-task
>  Components: FavoredNodes
>Reporter: Francis Liu
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Attachments: HBASE-15533.master.001.patch, 
> HBASE-15533.master.002.patch, HBASE-15533.patch, HBASE-15533.rough.draft.patch
>
>
> HBASE-16942 added favored stochastic load balancer so we can pick and choose 
> nodes to assign based on the favored nodes and load/locality. The intention 
> of this jira is to add a group based load balancer on top of the favored 
> stochastic balancer. This will ensure splits/merges will only use favored 
> nodes from that group and will inherit from the parents appropriately.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20546) Improve perf of RegionLocationFinder.mapHostNameToServerName

2019-04-11 Thread Thiruvel Thirumoolan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815682#comment-16815682
 ] 

Thiruvel Thirumoolan commented on HBASE-20546:
--

Thanks Andy, I think I had a draft patch of it a while back, shelved. Will 
resume work on it.

> Improve perf of RegionLocationFinder.mapHostNameToServerName
> 
>
> Key: HBASE-20546
> URL: https://issues.apache.org/jira/browse/HBASE-20546
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.4.4, 2.0.0
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Attachments: HBASE-20546.branch-1.4.001.patch
>
>
> RegionLocationFinder.getTopBlockLocations() is called multiple times during 
> balancer. While profiling on a large table balance, mapHostNameToServerName() 
> seem to take a lot of time. One of the maps is repeatedly created for each 
> iteration, while we can just initialize it once.
> Goes into both branch-1 and branch-2, although patches differ slightly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20643) Getting HDFSBlockDist in Master by querying RegionServers

2019-04-11 Thread Thiruvel Thirumoolan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815680#comment-16815680
 ] 

Thiruvel Thirumoolan commented on HBASE-20643:
--

We rolled part of this patch internally and its been good, lemme split this 
into two sub tasks and get this in. It's definitely needed for large clusters.

> Getting HDFSBlockDist in Master by querying RegionServers
> -
>
> Key: HBASE-20643
> URL: https://issues.apache.org/jira/browse/HBASE-20643
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 2.3.0
>
>
> Region locality information is needed by the balancer to generate region 
> plans. Computing HDFSBlockDistribution is expensive on larger clusters and 
> adds load to the NameNode. This also needs to be recomputed on a master 
> restart. The proposal is to get the HDFSBlockDistribution from the 
> RegionServers instead of computing it in Master. RS already has this 
> information and we could just reuse it by querying it. RS already passes 
> dataLocality info via RegionLoad today.
> Proposed Implementation: This is a high-level overview.
> # A RegionServer API has to be added which will return HDFSBlockDistribution 
> for all the regions it hosts. RS already has this info. Since ClusterStatus 
> has already become bulky and we don’t need updated locality so fast, it’s 
> better to have another API rather than add this to RegionLoad and pass it 
> along with RSReport.
> # Master will have a Chore to query all RegionServers and will cache the 
> HDFSBlockDistribution for those regions. This is easy and quick. Admins can 
> tune the frequency based on size of the cluster. On a ~90 nodes cluster with 
> 500k regions and a prototype implementation and no load, it took about 5 
> seconds to get all HDFSBlockDistribution from RS.
> # The cache will be an extension of RegionLocationFinder (subclass), if 
> needed to keep the implementation simple. Probably will get clear with 
> implementation.
> # Balancer will use the new cache to get all HDFSBlockDistribution. If there 
> is a new region and Chore didn’t get the block distribution from RS during 
> its previous run, then it will be computed by RegionLocationFinder the same 
> way it has been done now. If the Chore runs more frequently like every hour, 
> then this recomputation will be drastically reduced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21883) Enhancements to Major Compaction tool from HBASE-19528

2019-04-02 Thread Thiruvel Thirumoolan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808270#comment-16808270
 ] 

Thiruvel Thirumoolan commented on HBASE-21883:
--

[~apurtell] I have a master patch up, pls let me know if you have any feedback, 
thanks!

> Enhancements to Major Compaction tool from HBASE-19528
> --
>
> Key: HBASE-21883
> URL: https://issues.apache.org/jira/browse/HBASE-21883
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Compaction, tooling
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Minor
> Fix For: 3.0.0, 2.3.0, 1.5.1
>
> Attachments: HBASE-21883.branch-1.001.patch, 
> HBASE-21883.branch-1.002.patch, HBASE-21883.master.001.patch, 
> HBASE-21883.master.002.patch
>
>
> I would like to add new compaction tools based on [~churromorales]'s tool at 
> HBASE-19528.
> We internally have tools that pick and compact regions based on multiple 
> criteria. Since Rahul already has a version in community, we would like to 
> build on top of it instead of pushing yet another tool.
> With this jira, I would like to add a tool which looks at regions beyond TTL 
> and compacts them in a rsgroup. We have time series data and those regions 
> will become dead after a while, so we compact those regions to save disk 
> space. We also merge those empty regions to reduce load, but that tool comes 
> later.
> Will prep a patch for 2.x once 1.5 gets in.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21883) Enhancements to Major Compaction tool from HBASE-19528

2019-03-27 Thread Thiruvel Thirumoolan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803390#comment-16803390
 ] 

Thiruvel Thirumoolan commented on HBASE-21883:
--

[~stack] - Apologies, updated release notes.

> Enhancements to Major Compaction tool from HBASE-19528
> --
>
> Key: HBASE-21883
> URL: https://issues.apache.org/jira/browse/HBASE-21883
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Compaction, tooling
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Minor
> Fix For: 3.0.0, 2.3.0, 1.5.1
>
> Attachments: HBASE-21883.branch-1.001.patch, 
> HBASE-21883.branch-1.002.patch, HBASE-21883.master.001.patch, 
> HBASE-21883.master.002.patch
>
>
> I would like to add new compaction tools based on [~churromorales]'s tool at 
> HBASE-19528.
> We internally have tools that pick and compact regions based on multiple 
> criteria. Since Rahul already has a version in community, we would like to 
> build on top of it instead of pushing yet another tool.
> With this jira, I would like to add a tool which looks at regions beyond TTL 
> and compacts them in a rsgroup. We have time series data and those regions 
> will become dead after a while, so we compact those regions to save disk 
> space. We also merge those empty regions to reduce load, but that tool comes 
> later.
> Will prep a patch for 2.x once 1.5 gets in.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21883) Enhancements to Major Compaction tool from HBASE-19528

2019-03-27 Thread Thiruvel Thirumoolan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-21883:
-
Release Note: 
MajorCompactorTTL Tool allows to compact all regions in a table that have been 
TTLed out. This saves space on DFS and is useful for tables which are similar 
to time series data. This is typically scheduled to run frequently (say via 
cron) to cleanup old data on an ongoing basis.

RSGroupMajorCompactionTTL tool is similar to MajorCompactorTTL but runs at a 
region server group level. If multiple tables in an rsgroup are similar to 
time-series data, then it runs a single command to clean them up. As more 
tables are added/removed from rsgroup, it's easy to have a single command to 
take care of all of them.

> Enhancements to Major Compaction tool from HBASE-19528
> --
>
> Key: HBASE-21883
> URL: https://issues.apache.org/jira/browse/HBASE-21883
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Compaction, tooling
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Minor
> Fix For: 3.0.0, 2.3.0, 1.5.1
>
> Attachments: HBASE-21883.branch-1.001.patch, 
> HBASE-21883.branch-1.002.patch, HBASE-21883.master.001.patch, 
> HBASE-21883.master.002.patch
>
>
> I would like to add new compaction tools based on [~churromorales]'s tool at 
> HBASE-19528.
> We internally have tools that pick and compact regions based on multiple 
> criteria. Since Rahul already has a version in community, we would like to 
> build on top of it instead of pushing yet another tool.
> With this jira, I would like to add a tool which looks at regions beyond TTL 
> and compacts them in a rsgroup. We have time series data and those regions 
> will become dead after a while, so we compact those regions to save disk 
> space. We also merge those empty regions to reduce load, but that tool comes 
> later.
> Will prep a patch for 2.x once 1.5 gets in.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21883) Enhancements to Major Compaction tool from HBASE-19528

2019-03-27 Thread Thiruvel Thirumoolan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-21883:
-
Attachment: HBASE-21883.master.002.patch

> Enhancements to Major Compaction tool from HBASE-19528
> --
>
> Key: HBASE-21883
> URL: https://issues.apache.org/jira/browse/HBASE-21883
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Compaction, tooling
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Minor
> Fix For: 3.0.0, 2.3.0, 1.5.1
>
> Attachments: HBASE-21883.branch-1.001.patch, 
> HBASE-21883.branch-1.002.patch, HBASE-21883.master.001.patch, 
> HBASE-21883.master.002.patch
>
>
> I would like to add new compaction tools based on [~churromorales]'s tool at 
> HBASE-19528.
> We internally have tools that pick and compact regions based on multiple 
> criteria. Since Rahul already has a version in community, we would like to 
> build on top of it instead of pushing yet another tool.
> With this jira, I would like to add a tool which looks at regions beyond TTL 
> and compacts them in a rsgroup. We have time series data and those regions 
> will become dead after a while, so we compact those regions to save disk 
> space. We also merge those empty regions to reduce load, but that tool comes 
> later.
> Will prep a patch for 2.x once 1.5 gets in.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21883) Enhancements to Major Compaction tool from HBASE-19528

2019-03-26 Thread Thiruvel Thirumoolan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-21883:
-
Status: Patch Available  (was: Open)

Kicking off pre-commit build for master branch.

> Enhancements to Major Compaction tool from HBASE-19528
> --
>
> Key: HBASE-21883
> URL: https://issues.apache.org/jira/browse/HBASE-21883
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Compaction, tooling
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Minor
> Fix For: 3.0.0, 2.3.0, 1.5.1
>
> Attachments: HBASE-21883.branch-1.001.patch, 
> HBASE-21883.branch-1.002.patch, HBASE-21883.master.001.patch
>
>
> I would like to add new compaction tools based on [~churromorales]'s tool at 
> HBASE-19528.
> We internally have tools that pick and compact regions based on multiple 
> criteria. Since Rahul already has a version in community, we would like to 
> build on top of it instead of pushing yet another tool.
> With this jira, I would like to add a tool which looks at regions beyond TTL 
> and compacts them in a rsgroup. We have time series data and those regions 
> will become dead after a while, so we compact those regions to save disk 
> space. We also merge those empty regions to reduce load, but that tool comes 
> later.
> Will prep a patch for 2.x once 1.5 gets in.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21883) Enhancements to Major Compaction tool from HBASE-19528

2019-03-26 Thread Thiruvel Thirumoolan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-21883:
-
Attachment: HBASE-21883.master.001.patch

> Enhancements to Major Compaction tool from HBASE-19528
> --
>
> Key: HBASE-21883
> URL: https://issues.apache.org/jira/browse/HBASE-21883
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Compaction, tooling
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Minor
> Fix For: 3.0.0, 2.3.0, 1.5.1
>
> Attachments: HBASE-21883.branch-1.001.patch, 
> HBASE-21883.branch-1.002.patch, HBASE-21883.master.001.patch
>
>
> I would like to add new compaction tools based on [~churromorales]'s tool at 
> HBASE-19528.
> We internally have tools that pick and compact regions based on multiple 
> criteria. Since Rahul already has a version in community, we would like to 
> build on top of it instead of pushing yet another tool.
> With this jira, I would like to add a tool which looks at regions beyond TTL 
> and compacts them in a rsgroup. We have time series data and those regions 
> will become dead after a while, so we compact those regions to save disk 
> space. We also merge those empty regions to reduce load, but that tool comes 
> later.
> Will prep a patch for 2.x once 1.5 gets in.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21903) Backport major compaction tool HBASE-19528 from to 1.4 and 1.3

2019-02-14 Thread Thiruvel Thirumoolan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-21903:
-
Attachment: HBASE-21903-branch-1.3-addendum.patch

> Backport major compaction tool HBASE-19528 from to 1.4 and 1.3
> --
>
> Key: HBASE-21903
> URL: https://issues.apache.org/jira/browse/HBASE-21903
> Project: HBase
>  Issue Type: Task
>  Components: Client, Compaction, tooling
>Affects Versions: 1.3.3, 1.4.9
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.4.10, 1.3.4
>
> Attachments: HBASE-21903-branch-1.3-addendum.patch
>
>
> Our internal deployments are based on branch-1.3. We will be using the major 
> compaction tool HBASE-19528 from [~churromorales] and the enhancements on top 
> of it HBASE-21883 on our 1.3 clusters. I would like to backport HBASE-19528 
> to 1.3 and hence 1.4 as well. Since its a standalone tool without any other 
> dependency or code changes, I believe that should be ok.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21903) Backport major compaction tool HBASE-19528 from to 1.4 and 1.3

2019-02-14 Thread Thiruvel Thirumoolan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768656#comment-16768656
 ] 

Thiruvel Thirumoolan commented on HBASE-21903:
--

For 1.3, we have to cherry pick 
https://github.com/apache/hbase/commit/673e8092506cd7c2c09093a3d085dc197ff14f53 
and then apply [^HBASE-21903-branch-1.3-addendum.patch] to it, which is changes 
to only unit test framework.

For 1.4, the patch applies as is, so just cherry pick.

> Backport major compaction tool HBASE-19528 from to 1.4 and 1.3
> --
>
> Key: HBASE-21903
> URL: https://issues.apache.org/jira/browse/HBASE-21903
> Project: HBase
>  Issue Type: Task
>  Components: Client, Compaction, tooling
>Affects Versions: 1.3.3, 1.4.9
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.4.10, 1.3.4
>
> Attachments: HBASE-21903-branch-1.3-addendum.patch
>
>
> Our internal deployments are based on branch-1.3. We will be using the major 
> compaction tool HBASE-19528 from [~churromorales] and the enhancements on top 
> of it HBASE-21883 on our 1.3 clusters. I would like to backport HBASE-19528 
> to 1.3 and hence 1.4 as well. Since its a standalone tool without any other 
> dependency or code changes, I believe that should be ok.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21903) Backport major compaction tool HBASE-19528 from to 1.4 and 1.3

2019-02-14 Thread Thiruvel Thirumoolan (JIRA)
Thiruvel Thirumoolan created HBASE-21903:


 Summary: Backport major compaction tool HBASE-19528 from to 1.4 
and 1.3
 Key: HBASE-21903
 URL: https://issues.apache.org/jira/browse/HBASE-21903
 Project: HBase
  Issue Type: Task
  Components: Client, Compaction, tooling
Affects Versions: 1.4.9, 1.3.3
Reporter: Thiruvel Thirumoolan
Assignee: Thiruvel Thirumoolan
 Fix For: 1.4.10, 1.3.4


Our internal deployments are based on branch-1.3. We will be using the major 
compaction tool HBASE-19528 from [~churromorales] and the enhancements on top 
of it HBASE-21883 on our 1.3 clusters. I would like to backport HBASE-19528 to 
1.3 and hence 1.4 as well. Since its a standalone tool without any other 
dependency or code changes, I believe that should be ok.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21883) Enhancements to Major Compaction tool from HBASE-19528

2019-02-14 Thread Thiruvel Thirumoolan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768606#comment-16768606
 ] 

Thiruvel Thirumoolan commented on HBASE-21883:
--

Unit test failures are unrelated to the patch.

> Enhancements to Major Compaction tool from HBASE-19528
> --
>
> Key: HBASE-21883
> URL: https://issues.apache.org/jira/browse/HBASE-21883
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Compaction, tooling
>Affects Versions: 1.5.0
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Minor
> Fix For: 1.5.1
>
> Attachments: HBASE-21883.branch-1.001.patch, 
> HBASE-21883.branch-1.002.patch
>
>
> I would like to add new compaction tools based on [~churromorales]'s tool at 
> HBASE-19528.
> We internally have tools that pick and compact regions based on multiple 
> criteria. Since Rahul already has a version in community, we would like to 
> build on top of it instead of pushing yet another tool.
> With this jira, I would like to add a tool which looks at regions beyond TTL 
> and compacts them in a rsgroup. We have time series data and those regions 
> will become dead after a while, so we compact those regions to save disk 
> space. We also merge those empty regions to reduce load, but that tool comes 
> later.
> Will prep a patch for 2.x once 1.5 gets in.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21883) Enhancements to Major Compaction tool from HBASE-19528

2019-02-14 Thread Thiruvel Thirumoolan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-21883:
-
Attachment: HBASE-21883.branch-1.002.patch

> Enhancements to Major Compaction tool from HBASE-19528
> --
>
> Key: HBASE-21883
> URL: https://issues.apache.org/jira/browse/HBASE-21883
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Compaction, tooling
>Affects Versions: 1.5.0
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Minor
> Fix For: 1.5.1
>
> Attachments: HBASE-21883.branch-1.001.patch, 
> HBASE-21883.branch-1.002.patch
>
>
> I would like to add new compaction tools based on [~churromorales]'s tool at 
> HBASE-19528.
> We internally have tools that pick and compact regions based on multiple 
> criteria. Since Rahul already has a version in community, we would like to 
> build on top of it instead of pushing yet another tool.
> With this jira, I would like to add a tool which looks at regions beyond TTL 
> and compacts them in a rsgroup. We have time series data and those regions 
> will become dead after a while, so we compact those regions to save disk 
> space. We also merge those empty regions to reduce load, but that tool comes 
> later.
> Will prep a patch for 2.x once 1.5 gets in.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21883) Enhancements to Major Compaction tool from HBASE-19528

2019-02-13 Thread Thiruvel Thirumoolan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-21883:
-
Status: Patch Available  (was: Open)

Kicking off precommit build.

> Enhancements to Major Compaction tool from HBASE-19528
> --
>
> Key: HBASE-21883
> URL: https://issues.apache.org/jira/browse/HBASE-21883
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Compaction, tooling
>Affects Versions: 1.5.0
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Minor
> Fix For: 1.5.1
>
> Attachments: HBASE-21883.branch-1.001.patch
>
>
> I would like to add new compaction tools based on [~churromorales]'s tool at 
> HBASE-19528.
> We internally have tools that pick and compact regions based on multiple 
> criteria. Since Rahul already has a version in community, we would like to 
> build on top of it instead of pushing yet another tool.
> With this jira, I would like to add a tool which looks at regions beyond TTL 
> and compacts them in a rsgroup. We have time series data and those regions 
> will become dead after a while, so we compact those regions to save disk 
> space. We also merge those empty regions to reduce load, but that tool comes 
> later.
> Will prep a patch for 2.x once 1.5 gets in.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21883) Enhancements to Major Compaction tool from HBASE-19528

2019-02-13 Thread Thiruvel Thirumoolan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-21883:
-
Attachment: HBASE-21883.branch-1.001.patch

> Enhancements to Major Compaction tool from HBASE-19528
> --
>
> Key: HBASE-21883
> URL: https://issues.apache.org/jira/browse/HBASE-21883
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Compaction, tooling
>Affects Versions: 1.5.0
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Minor
> Fix For: 1.5.1
>
> Attachments: HBASE-21883.branch-1.001.patch
>
>
> I would like to add new compaction tools based on [~churromorales]'s tool at 
> HBASE-19528.
> We internally have tools that pick and compact regions based on multiple 
> criteria. Since Rahul already has a version in community, we would like to 
> build on top of it instead of pushing yet another tool.
> With this jira, I would like to add a tool which looks at regions beyond TTL 
> and compacts them in a rsgroup. We have time series data and those regions 
> will become dead after a while, so we compact those regions to save disk 
> space. We also merge those empty regions to reduce load, but that tool comes 
> later.
> Will prep a patch for 2.x once 1.5 gets in.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21883) Enhancements to Major Compaction tool from HBASE-19528

2019-02-12 Thread Thiruvel Thirumoolan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-21883:
-
Description: 
I would like to add new compaction tools based on [~churromorales]'s tool at 
HBASE-19528.

We internally have tools that pick and compact regions based on multiple 
criteria. Since Rahul already has a version in community, we would like to 
build on top of it instead of pushing yet another tool.

With this jira, I would like to add a tool which looks at regions beyond TTL 
and compacts them in a rsgroup. We have time series data and those regions will 
become dead after a while, so we compact those regions to save disk space. We 
also merge those empty regions to reduce load, but that tool comes later.

Will prep a patch for 2.x once 1.5 gets in.

  was:
I would like to add new compaction tools based on [~churromorales]'s tool at 
HBASE-19528.

We internally have tools that pick and compact regions based on multiple 
criteria. Since Rahul already has a version in community, we would like to 
build on top of it instead of pushing yet another tool.

With this jira, I would like to add a tool which looks at regions beyond TTL 
and compacts them in a rsgroup. We have time series data and those regions will 
become dead after a while, so we compact those regions to save disk space. We 
also merge those empty regions to reduce load, but that tool comes later.


> Enhancements to Major Compaction tool from HBASE-19528
> --
>
> Key: HBASE-21883
> URL: https://issues.apache.org/jira/browse/HBASE-21883
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Compaction, tooling
>Affects Versions: 1.5.0
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Minor
> Fix For: 1.5.1
>
>
> I would like to add new compaction tools based on [~churromorales]'s tool at 
> HBASE-19528.
> We internally have tools that pick and compact regions based on multiple 
> criteria. Since Rahul already has a version in community, we would like to 
> build on top of it instead of pushing yet another tool.
> With this jira, I would like to add a tool which looks at regions beyond TTL 
> and compacts them in a rsgroup. We have time series data and those regions 
> will become dead after a while, so we compact those regions to save disk 
> space. We also merge those empty regions to reduce load, but that tool comes 
> later.
> Will prep a patch for 2.x once 1.5 gets in.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21883) Enhancements to Major Compaction tool from HBASE-19528

2019-02-12 Thread Thiruvel Thirumoolan (JIRA)
Thiruvel Thirumoolan created HBASE-21883:


 Summary: Enhancements to Major Compaction tool from HBASE-19528
 Key: HBASE-21883
 URL: https://issues.apache.org/jira/browse/HBASE-21883
 Project: HBase
  Issue Type: Improvement
  Components: Client, Compaction, tooling
Affects Versions: 1.5.0
Reporter: Thiruvel Thirumoolan
Assignee: Thiruvel Thirumoolan
 Fix For: 1.5.1


I would like to add new compaction tools based on [~churromorales]'s tool at 
HBASE-19528.

We internally have tools that pick and compact regions based on multiple 
criteria. Since Rahul already has a version in community, we would like to 
build on top of it instead of pushing yet another tool.

With this jira, I would like to add a tool which looks at regions beyond TTL 
and compacts them in a rsgroup. We have time series data and those regions will 
become dead after a while, so we compact those regions to save disk space. We 
also merge those empty regions to reduce load, but that tool comes later.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20546) Improve perf of RegionLocationFinder.mapHostNameToServerName

2018-06-07 Thread Thiruvel Thirumoolan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505127#comment-16505127
 ] 

Thiruvel Thirumoolan commented on HBASE-20546:
--

I can move this map to ServerManager, so its always updated. Would that be ok?

Since its called from picker, its significant. It came up when I profiled 
balancer on our cluster setup.

> Improve perf of RegionLocationFinder.mapHostNameToServerName
> 
>
> Key: HBASE-20546
> URL: https://issues.apache.org/jira/browse/HBASE-20546
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.4.4, 2.0.0
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 2.0.1, 1.4.6
>
> Attachments: HBASE-20546.branch-1.4.001.patch
>
>
> RegionLocationFinder.getTopBlockLocations() is called multiple times during 
> balancer. While profiling on a large table balance, mapHostNameToServerName() 
> seem to take a lot of time. One of the maps is repeatedly created for each 
> iteration, while we can just initialize it once.
> Goes into both branch-1 and branch-2, although patches differ slightly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20643) Getting HDFSBlockDist in Master by querying RegionServers

2018-05-24 Thread Thiruvel Thirumoolan (JIRA)
Thiruvel Thirumoolan created HBASE-20643:


 Summary: Getting HDFSBlockDist in Master by querying RegionServers
 Key: HBASE-20643
 URL: https://issues.apache.org/jira/browse/HBASE-20643
 Project: HBase
  Issue Type: Improvement
  Components: Balancer
Reporter: Thiruvel Thirumoolan
Assignee: Thiruvel Thirumoolan
 Fix For: 2.1.0, 1.5.0, 1.4.5


Region locality information is needed by the balancer to generate region plans. 
Computing HDFSBlockDistribution is expensive on larger clusters and adds load 
to the NameNode. This also needs to be recomputed on a master restart. The 
proposal is to get the HDFSBlockDistribution from the RegionServers instead of 
computing it in Master. RS already has this information and we could just reuse 
it by querying it. RS already passes dataLocality info via RegionLoad today.

Proposed Implementation: This is a high-level overview.

# A RegionServer API has to be added which will return HDFSBlockDistribution 
for all the regions it hosts. RS already has this info. Since ClusterStatus has 
already become bulky and we don’t need updated locality so fast, it’s better to 
have another API rather than add this to RegionLoad and pass it along with 
RSReport.
# Master will have a Chore to query all RegionServers and will cache the 
HDFSBlockDistribution for those regions. This is easy and quick. Admins can 
tune the frequency based on size of the cluster. On a ~90 nodes cluster with 
500k regions and a prototype implementation and no load, it took about 5 
seconds to get all HDFSBlockDistribution from RS.
# The cache will be an extension of RegionLocationFinder (subclass), if needed 
to keep the implementation simple. Probably will get clear with implementation.
# Balancer will use the new cache to get all HDFSBlockDistribution. If there is 
a new region and Chore didn’t get the block distribution from RS during its 
previous run, then it will be computed by RegionLocationFinder the same way it 
has been done now. If the Chore runs more frequently like every hour, then this 
recomputation will be drastically reduced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-20548) Master fails to startup on large clusters, refreshing block distribution

2018-05-24 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488631#comment-16488631
 ] 

Thiruvel Thirumoolan edited comment on HBASE-20548 at 5/24/18 8:38 AM:
---

Uploaded patch for master. Change was RegionInfo instead of HRegionInfo.

Uploaded patch for branch-2.0 as well, if needed. Since HBASE-20545 is not in 
branch-2.0, patch is slightly different.


was (Author: thiruvel):
Uploaded patch for master. Change was RegionInfo instead of HRegionInfo.

> Master fails to startup on large clusters, refreshing block distribution
> 
>
> Key: HBASE-20548
> URL: https://issues.apache.org/jira/browse/HBASE-20548
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.4.4
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 2.0.1, 1.4.5
>
> Attachments: HBASE-20548.branch-1.4.001.patch, 
> HBASE-20548.branch-2.0.001.patch, HBASE-20548.master.001.patch
>
>
> On our large clusters with, master has failed to startup within specified 
> time and aborted itself since it was initializing HDFS block distribution. 
> Enable table also takes time for larger tables for the same reason. My 
> proposal is to refresh HDFS block distribution at the end of master 
> initialization and not at retainAssignment()'s createCluster(). This would 
> address HBASE-16570's intention, but avoid the problems we ran into.
> cc [~aoxiang] [~tedyu]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20548) Master fails to startup on large clusters, refreshing block distribution

2018-05-24 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-20548:
-
Fix Version/s: 2.0.1

> Master fails to startup on large clusters, refreshing block distribution
> 
>
> Key: HBASE-20548
> URL: https://issues.apache.org/jira/browse/HBASE-20548
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.4.4
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 2.0.1, 1.4.5
>
> Attachments: HBASE-20548.branch-1.4.001.patch, 
> HBASE-20548.branch-2.0.001.patch, HBASE-20548.master.001.patch
>
>
> On our large clusters with, master has failed to startup within specified 
> time and aborted itself since it was initializing HDFS block distribution. 
> Enable table also takes time for larger tables for the same reason. My 
> proposal is to refresh HDFS block distribution at the end of master 
> initialization and not at retainAssignment()'s createCluster(). This would 
> address HBASE-16570's intention, but avoid the problems we ran into.
> cc [~aoxiang] [~tedyu]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20548) Master fails to startup on large clusters, refreshing block distribution

2018-05-24 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-20548:
-
Attachment: HBASE-20548.branch-2.0.001.patch

> Master fails to startup on large clusters, refreshing block distribution
> 
>
> Key: HBASE-20548
> URL: https://issues.apache.org/jira/browse/HBASE-20548
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.4.4
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 2.0.1, 1.4.5
>
> Attachments: HBASE-20548.branch-1.4.001.patch, 
> HBASE-20548.branch-2.0.001.patch, HBASE-20548.master.001.patch
>
>
> On our large clusters with, master has failed to startup within specified 
> time and aborted itself since it was initializing HDFS block distribution. 
> Enable table also takes time for larger tables for the same reason. My 
> proposal is to refresh HDFS block distribution at the end of master 
> initialization and not at retainAssignment()'s createCluster(). This would 
> address HBASE-16570's intention, but avoid the problems we ran into.
> cc [~aoxiang] [~tedyu]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20548) Master fails to startup on large clusters, refreshing block distribution

2018-05-24 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488631#comment-16488631
 ] 

Thiruvel Thirumoolan commented on HBASE-20548:
--

Uploaded patch for master. Change was RegionInfo instead of HRegionInfo.

> Master fails to startup on large clusters, refreshing block distribution
> 
>
> Key: HBASE-20548
> URL: https://issues.apache.org/jira/browse/HBASE-20548
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.4.4
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 1.4.5
>
> Attachments: HBASE-20548.branch-1.4.001.patch, 
> HBASE-20548.master.001.patch
>
>
> On our large clusters with, master has failed to startup within specified 
> time and aborted itself since it was initializing HDFS block distribution. 
> Enable table also takes time for larger tables for the same reason. My 
> proposal is to refresh HDFS block distribution at the end of master 
> initialization and not at retainAssignment()'s createCluster(). This would 
> address HBASE-16570's intention, but avoid the problems we ran into.
> cc [~aoxiang] [~tedyu]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20548) Master fails to startup on large clusters, refreshing block distribution

2018-05-24 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-20548:
-
Attachment: HBASE-20548.master.001.patch

> Master fails to startup on large clusters, refreshing block distribution
> 
>
> Key: HBASE-20548
> URL: https://issues.apache.org/jira/browse/HBASE-20548
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.4.4
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 1.4.5
>
> Attachments: HBASE-20548.branch-1.4.001.patch, 
> HBASE-20548.master.001.patch
>
>
> On our large clusters with, master has failed to startup within specified 
> time and aborted itself since it was initializing HDFS block distribution. 
> Enable table also takes time for larger tables for the same reason. My 
> proposal is to refresh HDFS block distribution at the end of master 
> initialization and not at retainAssignment()'s createCluster(). This would 
> address HBASE-16570's intention, but avoid the problems we ran into.
> cc [~aoxiang] [~tedyu]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20548) Master fails to startup on large clusters, refreshing block distribution

2018-05-22 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-20548:
-
Status: Patch Available  (was: Open)

> Master fails to startup on large clusters, refreshing block distribution
> 
>
> Key: HBASE-20548
> URL: https://issues.apache.org/jira/browse/HBASE-20548
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.4.4
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 1.4.5
>
> Attachments: HBASE-20548.branch-1.4.001.patch
>
>
> On our large clusters with, master has failed to startup within specified 
> time and aborted itself since it was initializing HDFS block distribution. 
> Enable table also takes time for larger tables for the same reason. My 
> proposal is to refresh HDFS block distribution at the end of master 
> initialization and not at retainAssignment()'s createCluster(). This would 
> address HBASE-16570's intention, but avoid the problems we ran into.
> cc [~aoxiang] [~tedyu]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20548) Master fails to startup on large clusters, refreshing block distribution

2018-05-22 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16486573#comment-16486573
 ] 

Thiruvel Thirumoolan commented on HBASE-20548:
--

Uploaded a patch that addresses the issue. I added a postMasterInitialize 
method to LoadBalancer and called that at the end of master initialization, so 
nothing is blocked. Will kickoff precommit build.

> Master fails to startup on large clusters, refreshing block distribution
> 
>
> Key: HBASE-20548
> URL: https://issues.apache.org/jira/browse/HBASE-20548
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.4.4
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 1.4.5
>
> Attachments: HBASE-20548.branch-1.4.001.patch
>
>
> On our large clusters with, master has failed to startup within specified 
> time and aborted itself since it was initializing HDFS block distribution. 
> Enable table also takes time for larger tables for the same reason. My 
> proposal is to refresh HDFS block distribution at the end of master 
> initialization and not at retainAssignment()'s createCluster(). This would 
> address HBASE-16570's intention, but avoid the problems we ran into.
> cc [~aoxiang] [~tedyu]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20548) Master fails to startup on large clusters, refreshing block distribution

2018-05-22 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-20548:
-
Attachment: HBASE-20548.branch-1.4.001.patch

> Master fails to startup on large clusters, refreshing block distribution
> 
>
> Key: HBASE-20548
> URL: https://issues.apache.org/jira/browse/HBASE-20548
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.4.4
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 1.4.5
>
> Attachments: HBASE-20548.branch-1.4.001.patch
>
>
> On our large clusters with, master has failed to startup within specified 
> time and aborted itself since it was initializing HDFS block distribution. 
> Enable table also takes time for larger tables for the same reason. My 
> proposal is to refresh HDFS block distribution at the end of master 
> initialization and not at retainAssignment()'s createCluster(). This would 
> address HBASE-16570's intention, but avoid the problems we ran into.
> cc [~aoxiang] [~tedyu]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20545) Improve performance of BaseLoadBalancer.retainAssignment

2018-05-21 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482964#comment-16482964
 ] 

Thiruvel Thirumoolan commented on HBASE-20545:
--

Thanks [~apurtell],  [~yuzhih...@gmail.com] and [~aoxiang] for reviews.

> Improve performance of BaseLoadBalancer.retainAssignment
> 
>
> Key: HBASE-20545
> URL: https://issues.apache.org/jira/browse/HBASE-20545
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.4.4, 2.0.0
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.5
>
> Attachments: HBASE-20545.branch-1.4.001.patch, 
> HBASE-20545.branch-1.4.002.patch, HBASE-20545.branch-2.001.patch
>
>
> I was measuring perf at scale with a 1m region table and noticed some 
> improvements can be made to BaseLoadBalancer.retainAssignment().
> retainAssignment() spends a few mins to enable a 1m regions table and also 
> generates a lot of objects unnecessarily. This jira is to make the most 
> common case go faster with very minimal changes. A slightly modified version 
> of this patch takes about 5 seconds for a 1m region table ignoring the time 
> spent in createCluster(). I think locality can be refreshed during master 
> startup in different ways without taking time in retainAssignment, but will 
> follow up on that in subsequent jiras. Leaving it untouched here, but wanted 
> to call out the time taken without that method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20545) Improve performance of BaseLoadBalancer.retainAssignment

2018-05-11 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472831#comment-16472831
 ] 

Thiruvel Thirumoolan commented on HBASE-20545:
--

Doesn't look the failure is related to my patch, have resubmitted 002 patch to 
trigger the build again.

> Improve performance of BaseLoadBalancer.retainAssignment
> 
>
> Key: HBASE-20545
> URL: https://issues.apache.org/jira/browse/HBASE-20545
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.4.4, 2.0.0
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 3.0.0, 2.1.0, 1.5.0, 1.4.5
>
> Attachments: HBASE-20545.branch-1.4.001.patch, 
> HBASE-20545.branch-1.4.002.patch, HBASE-20545.branch-2.001.patch
>
>
> I was measuring perf at scale with a 1m region table and noticed some 
> improvements can be made to BaseLoadBalancer.retainAssignment().
> retainAssignment() spends a few mins to enable a 1m regions table and also 
> generates a lot of objects unnecessarily. This jira is to make the most 
> common case go faster with very minimal changes. A slightly modified version 
> of this patch takes about 5 seconds for a 1m region table ignoring the time 
> spent in createCluster(). I think locality can be refreshed during master 
> startup in different ways without taking time in retainAssignment, but will 
> follow up on that in subsequent jiras. Leaving it untouched here, but wanted 
> to call out the time taken without that method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20545) Improve performance of BaseLoadBalancer.retainAssignment

2018-05-11 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-20545:
-
Attachment: HBASE-20545.branch-1.4.002.patch

> Improve performance of BaseLoadBalancer.retainAssignment
> 
>
> Key: HBASE-20545
> URL: https://issues.apache.org/jira/browse/HBASE-20545
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.4.4, 2.0.0
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 3.0.0, 2.1.0, 1.5.0, 1.4.5
>
> Attachments: HBASE-20545.branch-1.4.001.patch, 
> HBASE-20545.branch-1.4.002.patch, HBASE-20545.branch-2.001.patch
>
>
> I was measuring perf at scale with a 1m region table and noticed some 
> improvements can be made to BaseLoadBalancer.retainAssignment().
> retainAssignment() spends a few mins to enable a 1m regions table and also 
> generates a lot of objects unnecessarily. This jira is to make the most 
> common case go faster with very minimal changes. A slightly modified version 
> of this patch takes about 5 seconds for a 1m region table ignoring the time 
> spent in createCluster(). I think locality can be refreshed during master 
> startup in different ways without taking time in retainAssignment, but will 
> follow up on that in subsequent jiras. Leaving it untouched here, but wanted 
> to call out the time taken without that method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20546) Improve perf of RegionLocationFinder.mapHostNameToServerName

2018-05-10 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16471173#comment-16471173
 ] 

Thiruvel Thirumoolan commented on HBASE-20546:
--

Thanks [~chia7712]. Good point.

It looks like ClusterStatusChore will be updating clusterStatus. Its intention 
is to make StochasticBalancer better by updating regionload. Not sure why 
regionFinder's clusterstatus needs to be updated. I think we can have a 
setClusterStatus and updateClusterStatus API, so ClusterStatusChore can use the 
latter. I am not sure if updating clusterstatus or re-initializing the 
hostserver map I introduce in regionfinder in the middle of balance is a good 
idea. What do you think?

> Improve perf of RegionLocationFinder.mapHostNameToServerName
> 
>
> Key: HBASE-20546
> URL: https://issues.apache.org/jira/browse/HBASE-20546
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.4.4, 2.0.0
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 2.0.1, 1.4.5
>
> Attachments: HBASE-20546.branch-1.4.001.patch
>
>
> RegionLocationFinder.getTopBlockLocations() is called multiple times during 
> balancer. While profiling on a large table balance, mapHostNameToServerName() 
> seem to take a lot of time. One of the maps is repeatedly created for each 
> iteration, while we can just initialize it once.
> Goes into both branch-1 and branch-2, although patches differ slightly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20546) Improve perf of RegionLocationFinder.mapHostNameToServerName

2018-05-10 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-20546:
-
Description: 
RegionLocationFinder.getTopBlockLocations() is called multiple times during 
balancer. While profiling on a large table balance, mapHostNameToServerName() 
seem to take a lot of time. One of the maps is repeatedly created for each 
iteration, while we can just initialize it once.

Goes into both branch-1 and branch-2, although patches differ slightly.

  was:
RegionLocationFinder.getTopBlockLocations() is called multiple times during 
balancer. While profiling on a large table balance, mapHostNameToServerName() 
seem to take a lot of time. One of the maps is repeatedly consumed for each 
iteration, while we can just initialize it once.

Goes into both branch-1 and branch-2, although patches differ slightly.


> Improve perf of RegionLocationFinder.mapHostNameToServerName
> 
>
> Key: HBASE-20546
> URL: https://issues.apache.org/jira/browse/HBASE-20546
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.4.4, 2.0.0
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 2.0.1, 1.4.5
>
> Attachments: HBASE-20546.branch-1.4.001.patch
>
>
> RegionLocationFinder.getTopBlockLocations() is called multiple times during 
> balancer. While profiling on a large table balance, mapHostNameToServerName() 
> seem to take a lot of time. One of the maps is repeatedly created for each 
> iteration, while we can just initialize it once.
> Goes into both branch-1 and branch-2, although patches differ slightly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20545) Improve performance of BaseLoadBalancer.retainAssignment

2018-05-10 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16470829#comment-16470829
 ] 

Thiruvel Thirumoolan commented on HBASE-20545:
--

[~yuzhih...@gmail.com], Looks like the build you triggered passed. Since 
existing tests were sufficient, I didn't add any.

Can we also get this into branch-2?

> Improve performance of BaseLoadBalancer.retainAssignment
> 
>
> Key: HBASE-20545
> URL: https://issues.apache.org/jira/browse/HBASE-20545
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.4.4, 2.0.0
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 1.4.5
>
> Attachments: HBASE-20545.branch-1.4.001.patch, 
> HBASE-20545.branch-2.001.patch
>
>
> I was measuring perf at scale with a 1m region table and noticed some 
> improvements can be made to BaseLoadBalancer.retainAssignment().
> retainAssignment() spends a few mins to enable a 1m regions table and also 
> generates a lot of objects unnecessarily. This jira is to make the most 
> common case go faster with very minimal changes. A slightly modified version 
> of this patch takes about 5 seconds for a 1m region table ignoring the time 
> spent in createCluster(). I think locality can be refreshed during master 
> startup in different ways without taking time in retainAssignment, but will 
> follow up on that in subsequent jiras. Leaving it untouched here, but wanted 
> to call out the time taken without that method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20545) Improve performance of BaseLoadBalancer.retainAssignment

2018-05-09 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16469292#comment-16469292
 ] 

Thiruvel Thirumoolan commented on HBASE-20545:
--

Uploaded patch for master branch. The only change from branch-1 patch is 
RegionInfo instead of HRegionInfo.

> Improve performance of BaseLoadBalancer.retainAssignment
> 
>
> Key: HBASE-20545
> URL: https://issues.apache.org/jira/browse/HBASE-20545
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.4.4, 2.0.0
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 2.0.1, 1.4.5
>
> Attachments: HBASE-20545.branch-1.4.001.patch, 
> HBASE-20545.branch-2.001.patch
>
>
> I was measuring perf at scale with a 1m region table and noticed some 
> improvements can be made to BaseLoadBalancer.retainAssignment().
> retainAssignment() spends a few mins to enable a 1m regions table and also 
> generates a lot of objects unnecessarily. This jira is to make the most 
> common case go faster with very minimal changes. A slightly modified version 
> of this patch takes about 5 seconds for a 1m region table ignoring the time 
> spent in createCluster(). I think locality can be refreshed during master 
> startup in different ways without taking time in retainAssignment, but will 
> follow up on that in subsequent jiras. Leaving it untouched here, but wanted 
> to call out the time taken without that method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20545) Improve performance of BaseLoadBalancer.retainAssignment

2018-05-09 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-20545:
-
Attachment: HBASE-20545.branch-2.001.patch

> Improve performance of BaseLoadBalancer.retainAssignment
> 
>
> Key: HBASE-20545
> URL: https://issues.apache.org/jira/browse/HBASE-20545
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.4.4, 2.0.0
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 2.0.1, 1.4.5
>
> Attachments: HBASE-20545.branch-1.4.001.patch, 
> HBASE-20545.branch-2.001.patch
>
>
> I was measuring perf at scale with a 1m region table and noticed some 
> improvements can be made to BaseLoadBalancer.retainAssignment().
> retainAssignment() spends a few mins to enable a 1m regions table and also 
> generates a lot of objects unnecessarily. This jira is to make the most 
> common case go faster with very minimal changes. A slightly modified version 
> of this patch takes about 5 seconds for a 1m region table ignoring the time 
> spent in createCluster(). I think locality can be refreshed during master 
> startup in different ways without taking time in retainAssignment, but will 
> follow up on that in subsequent jiras. Leaving it untouched here, but wanted 
> to call out the time taken without that method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20545) Improve performance of BaseLoadBalancer.retainAssignment

2018-05-09 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16469222#comment-16469222
 ] 

Thiruvel Thirumoolan commented on HBASE-20545:
--

Linking HBASE-20548 for improvements to locality refresh during master startup.

> Improve performance of BaseLoadBalancer.retainAssignment
> 
>
> Key: HBASE-20545
> URL: https://issues.apache.org/jira/browse/HBASE-20545
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.4.4, 2.0.0
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 2.0.1, 1.4.5
>
> Attachments: HBASE-20545.branch-1.4.001.patch
>
>
> I was measuring perf at scale with a 1m region table and noticed some 
> improvements can be made to BaseLoadBalancer.retainAssignment().
> retainAssignment() spends a few mins to enable a 1m regions table and also 
> generates a lot of objects unnecessarily. This jira is to make the most 
> common case go faster with very minimal changes. A slightly modified version 
> of this patch takes about 5 seconds for a 1m region table ignoring the time 
> spent in createCluster(). I think locality can be refreshed during master 
> startup in different ways without taking time in retainAssignment, but will 
> follow up on that in subsequent jiras. Leaving it untouched here, but wanted 
> to call out the time taken without that method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20548) Master fails to startup on large clusters, refreshing block distribution

2018-05-08 Thread Thiruvel Thirumoolan (JIRA)
Thiruvel Thirumoolan created HBASE-20548:


 Summary: Master fails to startup on large clusters, refreshing 
block distribution
 Key: HBASE-20548
 URL: https://issues.apache.org/jira/browse/HBASE-20548
 Project: HBase
  Issue Type: Improvement
Affects Versions: 1.4.4
Reporter: Thiruvel Thirumoolan
Assignee: Thiruvel Thirumoolan
 Fix For: 1.5.0, 1.4.5


On our large clusters with, master has failed to startup within specified time 
and aborted itself since it was initializing HDFS block distribution. Enable 
table also takes time for larger tables for the same reason. My proposal is to 
refresh HDFS block distribution at the end of master initialization and not at 
retainAssignment()'s createCluster(). This would address HBASE-16570's 
intention, but avoid the problems we ran into.

cc [~aoxiang] [~tedyu]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20546) Improve perf of RegionLocationFinder.mapHostNameToServerName

2018-05-08 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-20546:
-
Status: Patch Available  (was: Open)

Will submit patch for branch-2 once the branch-1 patch gets in.

> Improve perf of RegionLocationFinder.mapHostNameToServerName
> 
>
> Key: HBASE-20546
> URL: https://issues.apache.org/jira/browse/HBASE-20546
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0, 1.4.4
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 2.0.1, 1.4.5
>
> Attachments: HBASE-20546.branch-1.4.001.patch
>
>
> RegionLocationFinder.getTopBlockLocations() is called multiple times during 
> balancer. While profiling on a large table balance, mapHostNameToServerName() 
> seem to take a lot of time. One of the maps is repeatedly consumed for each 
> iteration, while we can just initialize it once.
> Goes into both branch-1 and branch-2, although patches differ slightly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20546) Improve perf of RegionLocationFinder.mapHostNameToServerName

2018-05-08 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-20546:
-
Attachment: HBASE-20546.branch-1.4.001.patch

> Improve perf of RegionLocationFinder.mapHostNameToServerName
> 
>
> Key: HBASE-20546
> URL: https://issues.apache.org/jira/browse/HBASE-20546
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.4.4, 2.0.0
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 2.0.1, 1.4.5
>
> Attachments: HBASE-20546.branch-1.4.001.patch
>
>
> RegionLocationFinder.getTopBlockLocations() is called multiple times during 
> balancer. While profiling on a large table balance, mapHostNameToServerName() 
> seem to take a lot of time. One of the maps is repeatedly consumed for each 
> iteration, while we can just initialize it once.
> Goes into both branch-1 and branch-2, although patches differ slightly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20546) Improve perf of RegionLocationFinder.mapHostNameToServerName

2018-05-08 Thread Thiruvel Thirumoolan (JIRA)
Thiruvel Thirumoolan created HBASE-20546:


 Summary: Improve perf of 
RegionLocationFinder.mapHostNameToServerName
 Key: HBASE-20546
 URL: https://issues.apache.org/jira/browse/HBASE-20546
 Project: HBase
  Issue Type: Improvement
Affects Versions: 2.0.0, 1.4.4
Reporter: Thiruvel Thirumoolan
Assignee: Thiruvel Thirumoolan
 Fix For: 1.5.0, 2.0.1, 1.4.5


RegionLocationFinder.getTopBlockLocations() is called multiple times during 
balancer. While profiling on a large table balance, mapHostNameToServerName() 
seem to take a lot of time. One of the maps is repeatedly consumed for each 
iteration, while we can just initialize it once.

Goes into both branch-1 and branch-2, although patches differ slightly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20545) Improve performance of BaseLoadBalancer.retainAssignment

2018-05-08 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-20545:
-
Status: Patch Available  (was: Open)

Once branch-1 patch gets in, I will submit patch for 2.x, should be very 
similar.

> Improve performance of BaseLoadBalancer.retainAssignment
> 
>
> Key: HBASE-20545
> URL: https://issues.apache.org/jira/browse/HBASE-20545
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 2.0.0, 1.4.4
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 2.0.1, 1.4.5
>
> Attachments: HBASE-20545.branch-1.4.001.patch
>
>
> I was measuring perf at scale with a 1m region table and noticed some 
> improvements can be made to BaseLoadBalancer.retainAssignment().
> retainAssignment() spends a few mins to enable a 1m regions table and also 
> generates a lot of objects unnecessarily. This jira is to make the most 
> common case go faster with very minimal changes. A slightly modified version 
> of this patch takes about 5 seconds for a 1m region table ignoring the time 
> spent in createCluster(). I think locality can be refreshed during master 
> startup in different ways without taking time in retainAssignment, but will 
> follow up on that in subsequent jiras. Leaving it untouched here, but wanted 
> to call out the time taken without that method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20545) Improve performance of BaseLoadBalancer.retainAssignment

2018-05-08 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16467984#comment-16467984
 ] 

Thiruvel Thirumoolan commented on HBASE-20545:
--

Uploaded patch for 1.4, will also upload one for branch 2.x. The approach is 
that - for most scenarios, the servers are the same when retainAssignment is 
called. But we unnecessarily create Cluster object and populate it for every 
region. At scale, this times time (array copies in Cluster.doAssignRegion) and 
also lot of garbage. Since thats not common case, I moved out that to a 
separate loop and only initialize them when random assignment is required. So 
in the worst case, we still take that time, but for common scenarios, its much 
faster and less memory garbage. We can make the worst case scenario also 
faster, but thats later.

> Improve performance of BaseLoadBalancer.retainAssignment
> 
>
> Key: HBASE-20545
> URL: https://issues.apache.org/jira/browse/HBASE-20545
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.4.4, 2.0.0
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 2.0.1, 1.4.5
>
> Attachments: HBASE-20545.branch-1.4.001.patch
>
>
> I was measuring perf at scale with a 1m region table and noticed some 
> improvements can be made to BaseLoadBalancer.retainAssignment().
> retainAssignment() spends a few mins to enable a 1m regions table and also 
> generates a lot of objects unnecessarily. This jira is to make the most 
> common case go faster with very minimal changes. A slightly modified version 
> of this patch takes about 5 seconds for a 1m region table ignoring the time 
> spent in createCluster(). I think locality can be refreshed during master 
> startup in different ways without taking time in retainAssignment, but will 
> follow up on that in subsequent jiras. Leaving it untouched here, but wanted 
> to call out the time taken without that method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20545) Improve performance of BaseLoadBalancer.retainAssignment

2018-05-08 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-20545:
-
Attachment: HBASE-20545.branch-1.4.001.patch

> Improve performance of BaseLoadBalancer.retainAssignment
> 
>
> Key: HBASE-20545
> URL: https://issues.apache.org/jira/browse/HBASE-20545
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.4.4, 2.0.0
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 2.0.1, 1.4.5
>
> Attachments: HBASE-20545.branch-1.4.001.patch
>
>
> I was measuring perf at scale with a 1m region table and noticed some 
> improvements can be made to BaseLoadBalancer.retainAssignment().
> retainAssignment() spends a few mins to enable a 1m regions table and also 
> generates a lot of objects unnecessarily. This jira is to make the most 
> common case go faster with very minimal changes. A slightly modified version 
> of this patch takes about 5 seconds for a 1m region table ignoring the time 
> spent in createCluster(). I think locality can be refreshed during master 
> startup in different ways without taking time in retainAssignment, but will 
> follow up on that in subsequent jiras. Leaving it untouched here, but wanted 
> to call out the time taken without that method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20545) Improve performance of BaseLoadBalancer.retainAssignment

2018-05-08 Thread Thiruvel Thirumoolan (JIRA)
Thiruvel Thirumoolan created HBASE-20545:


 Summary: Improve performance of BaseLoadBalancer.retainAssignment
 Key: HBASE-20545
 URL: https://issues.apache.org/jira/browse/HBASE-20545
 Project: HBase
  Issue Type: Improvement
  Components: Balancer
Affects Versions: 2.0.0, 1.4.4
Reporter: Thiruvel Thirumoolan
Assignee: Thiruvel Thirumoolan
 Fix For: 1.5.0, 2.0.1, 1.4.5


I was measuring perf at scale with a 1m region table and noticed some 
improvements can be made to BaseLoadBalancer.retainAssignment().

retainAssignment() spends a few mins to enable a 1m regions table and also 
generates a lot of objects unnecessarily. This jira is to make the most common 
case go faster with very minimal changes. A slightly modified version of this 
patch takes about 5 seconds for a 1m region table ignoring the time spent in 
createCluster(). I think locality can be refreshed during master startup in 
different ways without taking time in retainAssignment, but will follow up on 
that in subsequent jiras. Leaving it untouched here, but wanted to call out the 
time taken without that method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20322) CME in StoreScanner causes region server crash

2018-04-09 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431034#comment-16431034
 ] 

Thiruvel Thirumoolan commented on HBASE-20322:
--

[~mdrob] - I raised HBASE-20373 to confirm that and fix it. Will get to it 
sometime this week or next.

> CME in StoreScanner causes region server crash
> --
>
> Key: HBASE-20322
> URL: https://issues.apache.org/jira/browse/HBASE-20322
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.2
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 1.3.3, 1.4.4
>
> Attachments: HBASE-20322.branch-1.3.001.patch, 
> HBASE-20322.branch-1.3.002-addendum.patch, HBASE-20322.branch-1.3.002.patch, 
> HBASE-20322.branch-1.4.001.patch
>
>
> RS crashed with ConcurrentModificationException on our 1.3 cluster, stack 
> trace below. [~toffer] and I checked and there is a race condition between 
> flush and scanner close. When StoreScanner.updateReaders() is updating the 
> scanners after a newly flushed file (in this trace below a region close 
> during a split), the client's scanner could be closing thus causing CME.
> Its rare, but since it crashes the region server, needs to be fixed.
> FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server 
> : Replay of WAL required. Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: 
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207)
> at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
> at java.util.ArrayList$Itr.next(ArrayList.java:851)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155)
> PS: ignore the line no in the above stack trace, method calls should help 
> understand whats happening.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20373) Check and forward port HBASE-20322 (RS crash due to CME in StoreScanner)

2018-04-09 Thread Thiruvel Thirumoolan (JIRA)
Thiruvel Thirumoolan created HBASE-20373:


 Summary: Check and forward port HBASE-20322 (RS crash due to CME 
in StoreScanner)
 Key: HBASE-20373
 URL: https://issues.apache.org/jira/browse/HBASE-20373
 Project: HBase
  Issue Type: Bug
Reporter: Thiruvel Thirumoolan
Assignee: Thiruvel Thirumoolan


I think the same problem that causes HBASE-20322 in 1.x exists in branch-2 
also. This jira is to confirm that and fix it if required.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20322) CME in StoreScanner causes region server crash

2018-04-03 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424475#comment-16424475
 ] 

Thiruvel Thirumoolan commented on HBASE-20322:
--

[~apurtell] - There is a small bug in patch which causes TestAtomicOperation to 
fail. I see the bug at-least once when I run TestAtomicOperation 25 times. Pls 
let me know if you would like me to follow up on a separate Jira. I have 
attached the addendum along.

> CME in StoreScanner causes region server crash
> --
>
> Key: HBASE-20322
> URL: https://issues.apache.org/jira/browse/HBASE-20322
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.2
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 1.3.3, 1.4.4
>
> Attachments: HBASE-20322.branch-1.3.001.patch, 
> HBASE-20322.branch-1.3.002-addendum.patch, HBASE-20322.branch-1.3.002.patch, 
> HBASE-20322.branch-1.4.001.patch
>
>
> RS crashed with ConcurrentModificationException on our 1.3 cluster, stack 
> trace below. [~toffer] and I checked and there is a race condition between 
> flush and scanner close. When StoreScanner.updateReaders() is updating the 
> scanners after a newly flushed file (in this trace below a region close 
> during a split), the client's scanner could be closing thus causing CME.
> Its rare, but since it crashes the region server, needs to be fixed.
> FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server 
> : Replay of WAL required. Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: 
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207)
> at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
> at java.util.ArrayList$Itr.next(ArrayList.java:851)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155)
> PS: ignore the line no in the above stack trace, method calls should help 
> understand whats happening.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20322) CME in StoreScanner causes region server crash

2018-04-03 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-20322:
-
Attachment: HBASE-20322.branch-1.3.002-addendum.patch

> CME in StoreScanner causes region server crash
> --
>
> Key: HBASE-20322
> URL: https://issues.apache.org/jira/browse/HBASE-20322
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.2
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 1.3.3, 1.4.4
>
> Attachments: HBASE-20322.branch-1.3.001.patch, 
> HBASE-20322.branch-1.3.002-addendum.patch, HBASE-20322.branch-1.3.002.patch, 
> HBASE-20322.branch-1.4.001.patch
>
>
> RS crashed with ConcurrentModificationException on our 1.3 cluster, stack 
> trace below. [~toffer] and I checked and there is a race condition between 
> flush and scanner close. When StoreScanner.updateReaders() is updating the 
> scanners after a newly flushed file (in this trace below a region close 
> during a split), the client's scanner could be closing thus causing CME.
> Its rare, but since it crashes the region server, needs to be fixed.
> FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server 
> : Replay of WAL required. Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: 
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207)
> at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
> at java.util.ArrayList$Itr.next(ArrayList.java:851)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155)
> PS: ignore the line no in the above stack trace, method calls should help 
> understand whats happening.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-11288) Splittable Meta

2018-04-02 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan reassigned HBASE-11288:


Assignee: Francis Liu  (was: Thiruvel Thirumoolan)

> Splittable Meta
> ---
>
> Key: HBASE-11288
> URL: https://issues.apache.org/jira/browse/HBASE-11288
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Francis Liu
>Assignee: Francis Liu
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-11288) Splittable Meta

2018-04-02 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan reassigned HBASE-11288:


Assignee: Thiruvel Thirumoolan  (was: Francis Liu)

> Splittable Meta
> ---
>
> Key: HBASE-11288
> URL: https://issues.apache.org/jira/browse/HBASE-11288
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Francis Liu
>Assignee: Thiruvel Thirumoolan
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20322) CME in StoreScanner causes region server crash

2018-04-02 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422869#comment-16422869
 ] 

Thiruvel Thirumoolan commented on HBASE-20322:
--

Thanks [~apurtell] [~yuzhih...@gmail.com] for reviews.

[~apurtell] - I responded to your question, pls let me know if it looks ok.

 

I have re-uploaded same patch for precommit to run, last failure wasn't related 
to the patch.

> CME in StoreScanner causes region server crash
> --
>
> Key: HBASE-20322
> URL: https://issues.apache.org/jira/browse/HBASE-20322
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.2
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 1.3.3, 1.4.4
>
> Attachments: HBASE-20322.branch-1.3.001.patch, 
> HBASE-20322.branch-1.3.002.patch, HBASE-20322.branch-1.4.001.patch
>
>
> RS crashed with ConcurrentModificationException on our 1.3 cluster, stack 
> trace below. [~toffer] and I checked and there is a race condition between 
> flush and scanner close. When StoreScanner.updateReaders() is updating the 
> scanners after a newly flushed file (in this trace below a region close 
> during a split), the client's scanner could be closing thus causing CME.
> Its rare, but since it crashes the region server, needs to be fixed.
> FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server 
> : Replay of WAL required. Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: 
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207)
> at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
> at java.util.ArrayList$Itr.next(ArrayList.java:851)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155)
> PS: ignore the line no in the above stack trace, method calls should help 
> understand whats happening.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20322) CME in StoreScanner causes region server crash

2018-04-02 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-20322:
-
Attachment: HBASE-20322.branch-1.3.002.patch

> CME in StoreScanner causes region server crash
> --
>
> Key: HBASE-20322
> URL: https://issues.apache.org/jira/browse/HBASE-20322
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.2
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 1.3.3, 1.4.4
>
> Attachments: HBASE-20322.branch-1.3.001.patch, 
> HBASE-20322.branch-1.3.002.patch, HBASE-20322.branch-1.4.001.patch
>
>
> RS crashed with ConcurrentModificationException on our 1.3 cluster, stack 
> trace below. [~toffer] and I checked and there is a race condition between 
> flush and scanner close. When StoreScanner.updateReaders() is updating the 
> scanners after a newly flushed file (in this trace below a region close 
> during a split), the client's scanner could be closing thus causing CME.
> Its rare, but since it crashes the region server, needs to be fixed.
> FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server 
> : Replay of WAL required. Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: 
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207)
> at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
> at java.util.ArrayList$Itr.next(ArrayList.java:851)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155)
> PS: ignore the line no in the above stack trace, method calls should help 
> understand whats happening.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20322) CME in StoreScanner causes region server crash

2018-03-30 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-20322:
-
Status: Patch Available  (was: Open)

Uploaded patches for 1.3 and 1.4. Kicking off pre-commit builds. Thanks to 
[~toffer] for the internal reviews. Will check 2.x next week.

> CME in StoreScanner causes region server crash
> --
>
> Key: HBASE-20322
> URL: https://issues.apache.org/jira/browse/HBASE-20322
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.2
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 1.3.3, 1.4.4
>
> Attachments: HBASE-20322.branch-1.3.001.patch, 
> HBASE-20322.branch-1.4.001.patch
>
>
> RS crashed with ConcurrentModificationException on our 1.3 cluster, stack 
> trace below. [~toffer] and I checked and there is a race condition between 
> flush and scanner close. When StoreScanner.updateReaders() is updating the 
> scanners after a newly flushed file (in this trace below a region close 
> during a split), the client's scanner could be closing thus causing CME.
> Its rare, but since it crashes the region server, needs to be fixed.
> FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server 
> : Replay of WAL required. Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: 
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207)
> at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
> at java.util.ArrayList$Itr.next(ArrayList.java:851)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155)
> PS: ignore the line no in the above stack trace, method calls should help 
> understand whats happening.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20322) CME in StoreScanner causes region server crash

2018-03-30 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-20322:
-
Attachment: HBASE-20322.branch-1.4.001.patch

> CME in StoreScanner causes region server crash
> --
>
> Key: HBASE-20322
> URL: https://issues.apache.org/jira/browse/HBASE-20322
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.2
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 1.3.3, 1.4.4
>
> Attachments: HBASE-20322.branch-1.3.001.patch, 
> HBASE-20322.branch-1.4.001.patch
>
>
> RS crashed with ConcurrentModificationException on our 1.3 cluster, stack 
> trace below. [~toffer] and I checked and there is a race condition between 
> flush and scanner close. When StoreScanner.updateReaders() is updating the 
> scanners after a newly flushed file (in this trace below a region close 
> during a split), the client's scanner could be closing thus causing CME.
> Its rare, but since it crashes the region server, needs to be fixed.
> FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server 
> : Replay of WAL required. Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: 
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207)
> at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
> at java.util.ArrayList$Itr.next(ArrayList.java:851)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155)
> PS: ignore the line no in the above stack trace, method calls should help 
> understand whats happening.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20322) CME in StoreScanner causes region server crash

2018-03-30 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-20322:
-
Fix Version/s: 1.4.4
   1.5.0

> CME in StoreScanner causes region server crash
> --
>
> Key: HBASE-20322
> URL: https://issues.apache.org/jira/browse/HBASE-20322
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.2
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 1.3.3, 1.4.4
>
> Attachments: HBASE-20322.branch-1.3.001.patch
>
>
> RS crashed with ConcurrentModificationException on our 1.3 cluster, stack 
> trace below. [~toffer] and I checked and there is a race condition between 
> flush and scanner close. When StoreScanner.updateReaders() is updating the 
> scanners after a newly flushed file (in this trace below a region close 
> during a split), the client's scanner could be closing thus causing CME.
> Its rare, but since it crashes the region server, needs to be fixed.
> FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server 
> : Replay of WAL required. Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: 
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207)
> at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
> at java.util.ArrayList$Itr.next(ArrayList.java:851)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155)
> PS: ignore the line no in the above stack trace, method calls should help 
> understand whats happening.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20322) CME in StoreScanner causes region server crash

2018-03-30 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-20322:
-
Attachment: HBASE-20322.branch-1.3.001.patch

> CME in StoreScanner causes region server crash
> --
>
> Key: HBASE-20322
> URL: https://issues.apache.org/jira/browse/HBASE-20322
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.2
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.3.3
>
> Attachments: HBASE-20322.branch-1.3.001.patch
>
>
> RS crashed with ConcurrentModificationException on our 1.3 cluster, stack 
> trace below. [~toffer] and I checked and there is a race condition between 
> flush and scanner close. When StoreScanner.updateReaders() is updating the 
> scanners after a newly flushed file (in this trace below a region close 
> during a split), the client's scanner could be closing thus causing CME.
> Its rare, but since it crashes the region server, needs to be fixed.
> FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server 
> : Replay of WAL required. Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: 
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207)
> at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
> at java.util.ArrayList$Itr.next(ArrayList.java:851)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155)
> PS: ignore the line no in the above stack trace, method calls should help 
> understand whats happening.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20322) CME in StoreScanner causes region server crash

2018-03-30 Thread Thiruvel Thirumoolan (JIRA)
Thiruvel Thirumoolan created HBASE-20322:


 Summary: CME in StoreScanner causes region server crash
 Key: HBASE-20322
 URL: https://issues.apache.org/jira/browse/HBASE-20322
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.3.2
Reporter: Thiruvel Thirumoolan


RS crashed with ConcurrentModificationException on our 1.3 cluster, stack trace 
below. [~toffer] and I checked and there is a race condition between flush and 
scanner close. When StoreScanner.updateReaders() is updating the scanners after 
a newly flushed file (in this trace below a region close during a split), the 
client's scanner could be closing thus causing CME.

Its rare, but since it crashes the region server, needs to be fixed.

FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server 
: Replay of WAL required. Forcing server shutdown
org.apache.hadoop.hbase.DroppedSnapshotException: region: 
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579)
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255)
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217)
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207)
at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501)
at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420)
at 
org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398)
at 
org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
at 
org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566)
at 
org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.ConcurrentModificationException
at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
at java.util.ArrayList$Itr.next(ArrayList.java:851)
at 
org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797)
at 
org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825)
at 
org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155)

PS: ignore the line no in the above stack trace, method calls should help 
understand whats happening.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20322) CME in StoreScanner causes region server crash

2018-03-30 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-20322:
-
Fix Version/s: 1.3.3

> CME in StoreScanner causes region server crash
> --
>
> Key: HBASE-20322
> URL: https://issues.apache.org/jira/browse/HBASE-20322
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.2
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.3.3
>
>
> RS crashed with ConcurrentModificationException on our 1.3 cluster, stack 
> trace below. [~toffer] and I checked and there is a race condition between 
> flush and scanner close. When StoreScanner.updateReaders() is updating the 
> scanners after a newly flushed file (in this trace below a region close 
> during a split), the client's scanner could be closing thus causing CME.
> Its rare, but since it crashes the region server, needs to be fixed.
> FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server 
> : Replay of WAL required. Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: 
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207)
> at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
> at java.util.ArrayList$Itr.next(ArrayList.java:851)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155)
> PS: ignore the line no in the above stack trace, method calls should help 
> understand whats happening.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-20322) CME in StoreScanner causes region server crash

2018-03-30 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan reassigned HBASE-20322:


Assignee: Thiruvel Thirumoolan

> CME in StoreScanner causes region server crash
> --
>
> Key: HBASE-20322
> URL: https://issues.apache.org/jira/browse/HBASE-20322
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.2
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.3.3
>
>
> RS crashed with ConcurrentModificationException on our 1.3 cluster, stack 
> trace below. [~toffer] and I checked and there is a race condition between 
> flush and scanner close. When StoreScanner.updateReaders() is updating the 
> scanners after a newly flushed file (in this trace below a region close 
> during a split), the client's scanner could be closing thus causing CME.
> Its rare, but since it crashes the region server, needs to be fixed.
> FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server 
> : Replay of WAL required. Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: 
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207)
> at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
> at java.util.ArrayList$Itr.next(ArrayList.java:851)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155)
> PS: ignore the line no in the above stack trace, method calls should help 
> understand whats happening.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-15533) Add RSGroup Favored Balancer

2018-03-02 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384007#comment-16384007
 ] 

Thiruvel Thirumoolan commented on HBASE-15533:
--

[~stack], We moved to 1.3 and that's keeping me busy. Will get back to this, 
hopefully in a month or two.

> Add RSGroup Favored Balancer
> 
>
> Key: HBASE-15533
> URL: https://issues.apache.org/jira/browse/HBASE-15533
> Project: HBase
>  Issue Type: Sub-task
>  Components: FavoredNodes
>Reporter: Francis Liu
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Attachments: HBASE-15533.master.001.patch, 
> HBASE-15533.master.002.patch, HBASE-15533.patch, HBASE-15533.rough.draft.patch
>
>
> HBASE-16942 added favored stochastic load balancer so we can pick and choose 
> nodes to assign based on the favored nodes and load/locality. The intention 
> of this jira is to add a group based load balancer on top of the favored 
> stochastic balancer. This will ensure splits/merges will only use favored 
> nodes from that group and will inherit from the parents appropriately.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region

2018-02-27 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379124#comment-16379124
 ] 

Thiruvel Thirumoolan commented on HBASE-20001:
--

Thanks [~chia7712] and [~yuzhih...@gmail.com] for the reviews.

> cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
> -
>
> Key: HBASE-20001
> URL: https://issues.apache.org/jira/browse/HBASE-20001
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7
>Reporter: Francis Liu
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.3.2, 1.5.0, 1.2.7, 1.4.3
>
> Attachments: HBASE-20001.branch-1.2.001.patch, 
> HBASE-20001.branch-1.3.001.patch, HBASE-20001.branch-1.4.001.patch, 
> HBASE-20001.branch-1.4.002.patch, HBASE-20001.branch-1.4.003.patch, 
> HBASE-20001.branch-1.4.004.patch, HBASE-20001.branch-1.4.005.patch, 
> HBASE-20001.branch-1.4.006.patch
>
>
> In RegionStates.cleanIfNoMetaEntry()
> {{if (MetaTableAccessor.getRegion(server.getConnection(), 
> hri.getEncodedNameAsBytes()) == null) {}}
> {{regionOffline(hri);}}
> {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}}
>  }
> But api expects regionname
> {{public static Pair getRegion(Connection 
> connection, byte [] regionName)}}
> So we might end up cleaning good regions.
>  
> ADDENDUM:
> The scenario mentioned occurs when zkless assignment is used.  With zk-based 
> assignment without the patch what could occur is the daughter regions are 
> offlined and have no hdfs directory but have entries in meta. The daughter 
> meta entries will prolly be picked up by the client causing NSREs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region

2018-02-26 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16378144#comment-16378144
 ] 

Thiruvel Thirumoolan commented on HBASE-20001:
--

Pre commit results for branch-1.3 patch:

TestEndToEndSplitTransaction#testMasterOpsWhileSplitting has been failing for a 
while, see nightly build 
[https://builds.apache.org/job/HBase%20Nightly/job/branch-1.3/245/testReport/]

 

Pre commit results for branch-1.2 patch:

TestMultiTableSnapshotInputFormat.testScanOBBToOPP - is flaky, as can be seen 
in one of the nightly builds 
[https://builds.apache.org/job/HBase%20Nightly/job/branch-1.2/243/] 

TestSplitTransactionOnCluster.testMasterRestartWhenSplittingIsPartial is flaky 
too, as can be seen here 
[https://builds.apache.org/job/HBase%20Nightly/job/branch-1.2/246/] 

> cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
> -
>
> Key: HBASE-20001
> URL: https://issues.apache.org/jira/browse/HBASE-20001
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7
>Reporter: Francis Liu
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.3.2, 1.5.0, 1.2.7, 1.4.3
>
> Attachments: HBASE-20001.branch-1.2.001.patch, 
> HBASE-20001.branch-1.3.001.patch, HBASE-20001.branch-1.4.001.patch, 
> HBASE-20001.branch-1.4.002.patch, HBASE-20001.branch-1.4.003.patch, 
> HBASE-20001.branch-1.4.004.patch, HBASE-20001.branch-1.4.005.patch, 
> HBASE-20001.branch-1.4.006.patch
>
>
> In RegionStates.cleanIfNoMetaEntry()
> {{if (MetaTableAccessor.getRegion(server.getConnection(), 
> hri.getEncodedNameAsBytes()) == null) {}}
> {{regionOffline(hri);}}
> {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}}
>  }
> But api expects regionname
> {{public static Pair getRegion(Connection 
> connection, byte [] regionName)}}
> So we might end up cleaning good regions.
>  
> ADDENDUM:
> The scenario mentioned occurs when zkless assignment is used.  With zk-based 
> assignment without the patch what could occur is the daughter regions are 
> offlined and have no hdfs directory but have entries in meta. The daughter 
> meta entries will prolly be picked up by the client causing NSREs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region

2018-02-26 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377944#comment-16377944
 ] 

Thiruvel Thirumoolan commented on HBASE-20001:
--

Uploaded 1.3 and 1.2 branch patches, the 1.4 patch didn't apply cleanly, but 
there were only minor changes. The new tests passed locally, will wait for 
pre-commit result.

> cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
> -
>
> Key: HBASE-20001
> URL: https://issues.apache.org/jira/browse/HBASE-20001
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7
>Reporter: Francis Liu
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.3.2, 1.5.0, 1.2.7, 1.4.3
>
> Attachments: HBASE-20001.branch-1.2.001.patch, 
> HBASE-20001.branch-1.3.001.patch, HBASE-20001.branch-1.4.001.patch, 
> HBASE-20001.branch-1.4.002.patch, HBASE-20001.branch-1.4.003.patch, 
> HBASE-20001.branch-1.4.004.patch, HBASE-20001.branch-1.4.005.patch, 
> HBASE-20001.branch-1.4.006.patch
>
>
> In RegionStates.cleanIfNoMetaEntry()
> {{if (MetaTableAccessor.getRegion(server.getConnection(), 
> hri.getEncodedNameAsBytes()) == null) {}}
> {{regionOffline(hri);}}
> {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}}
>  }
> But api expects regionname
> {{public static Pair getRegion(Connection 
> connection, byte [] regionName)}}
> So we might end up cleaning good regions.
>  
> ADDENDUM:
> The scenario mentioned occurs when zkless assignment is used.  With zk-based 
> assignment without the patch what could occur is the daughter regions are 
> offlined and have no hdfs directory but have entries in meta. The daughter 
> meta entries will prolly be picked up by the client causing NSREs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region

2018-02-26 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-20001:
-
Attachment: HBASE-20001.branch-1.2.001.patch

> cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
> -
>
> Key: HBASE-20001
> URL: https://issues.apache.org/jira/browse/HBASE-20001
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7
>Reporter: Francis Liu
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.3.2, 1.5.0, 1.2.7, 1.4.3
>
> Attachments: HBASE-20001.branch-1.2.001.patch, 
> HBASE-20001.branch-1.3.001.patch, HBASE-20001.branch-1.4.001.patch, 
> HBASE-20001.branch-1.4.002.patch, HBASE-20001.branch-1.4.003.patch, 
> HBASE-20001.branch-1.4.004.patch, HBASE-20001.branch-1.4.005.patch, 
> HBASE-20001.branch-1.4.006.patch
>
>
> In RegionStates.cleanIfNoMetaEntry()
> {{if (MetaTableAccessor.getRegion(server.getConnection(), 
> hri.getEncodedNameAsBytes()) == null) {}}
> {{regionOffline(hri);}}
> {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}}
>  }
> But api expects regionname
> {{public static Pair getRegion(Connection 
> connection, byte [] regionName)}}
> So we might end up cleaning good regions.
>  
> ADDENDUM:
> The scenario mentioned occurs when zkless assignment is used.  With zk-based 
> assignment without the patch what could occur is the daughter regions are 
> offlined and have no hdfs directory but have entries in meta. The daughter 
> meta entries will prolly be picked up by the client causing NSREs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region

2018-02-26 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-20001:
-
Attachment: (was: HBASE-20001.branch-1.2.001.patch)

> cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
> -
>
> Key: HBASE-20001
> URL: https://issues.apache.org/jira/browse/HBASE-20001
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7
>Reporter: Francis Liu
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.3.2, 1.5.0, 1.2.7, 1.4.3
>
> Attachments: HBASE-20001.branch-1.3.001.patch, 
> HBASE-20001.branch-1.4.001.patch, HBASE-20001.branch-1.4.002.patch, 
> HBASE-20001.branch-1.4.003.patch, HBASE-20001.branch-1.4.004.patch, 
> HBASE-20001.branch-1.4.005.patch, HBASE-20001.branch-1.4.006.patch
>
>
> In RegionStates.cleanIfNoMetaEntry()
> {{if (MetaTableAccessor.getRegion(server.getConnection(), 
> hri.getEncodedNameAsBytes()) == null) {}}
> {{regionOffline(hri);}}
> {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}}
>  }
> But api expects regionname
> {{public static Pair getRegion(Connection 
> connection, byte [] regionName)}}
> So we might end up cleaning good regions.
>  
> ADDENDUM:
> The scenario mentioned occurs when zkless assignment is used.  With zk-based 
> assignment without the patch what could occur is the daughter regions are 
> offlined and have no hdfs directory but have entries in meta. The daughter 
> meta entries will prolly be picked up by the client causing NSREs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region

2018-02-26 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-20001:
-
Attachment: HBASE-20001.branch-1.2.001.patch

> cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
> -
>
> Key: HBASE-20001
> URL: https://issues.apache.org/jira/browse/HBASE-20001
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7
>Reporter: Francis Liu
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.3.2, 1.5.0, 1.2.7, 1.4.3
>
> Attachments: HBASE-20001.branch-1.2.001.patch, 
> HBASE-20001.branch-1.3.001.patch, HBASE-20001.branch-1.4.001.patch, 
> HBASE-20001.branch-1.4.002.patch, HBASE-20001.branch-1.4.003.patch, 
> HBASE-20001.branch-1.4.004.patch, HBASE-20001.branch-1.4.005.patch, 
> HBASE-20001.branch-1.4.006.patch
>
>
> In RegionStates.cleanIfNoMetaEntry()
> {{if (MetaTableAccessor.getRegion(server.getConnection(), 
> hri.getEncodedNameAsBytes()) == null) {}}
> {{regionOffline(hri);}}
> {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}}
>  }
> But api expects regionname
> {{public static Pair getRegion(Connection 
> connection, byte [] regionName)}}
> So we might end up cleaning good regions.
>  
> ADDENDUM:
> The scenario mentioned occurs when zkless assignment is used.  With zk-based 
> assignment without the patch what could occur is the daughter regions are 
> offlined and have no hdfs directory but have entries in meta. The daughter 
> meta entries will prolly be picked up by the client causing NSREs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region

2018-02-26 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-20001:
-
Attachment: HBASE-20001.branch-1.3.001.patch

> cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
> -
>
> Key: HBASE-20001
> URL: https://issues.apache.org/jira/browse/HBASE-20001
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7
>Reporter: Francis Liu
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.3.2, 1.5.0, 1.2.7, 1.4.3
>
> Attachments: HBASE-20001.branch-1.3.001.patch, 
> HBASE-20001.branch-1.4.001.patch, HBASE-20001.branch-1.4.002.patch, 
> HBASE-20001.branch-1.4.003.patch, HBASE-20001.branch-1.4.004.patch, 
> HBASE-20001.branch-1.4.005.patch, HBASE-20001.branch-1.4.006.patch
>
>
> In RegionStates.cleanIfNoMetaEntry()
> {{if (MetaTableAccessor.getRegion(server.getConnection(), 
> hri.getEncodedNameAsBytes()) == null) {}}
> {{regionOffline(hri);}}
> {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}}
>  }
> But api expects regionname
> {{public static Pair getRegion(Connection 
> connection, byte [] regionName)}}
> So we might end up cleaning good regions.
>  
> ADDENDUM:
> The scenario mentioned occurs when zkless assignment is used.  With zk-based 
> assignment without the patch what could occur is the daughter regions are 
> offlined and have no hdfs directory but have entries in meta. The daughter 
> meta entries will prolly be picked up by the client causing NSREs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region

2018-02-26 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377889#comment-16377889
 ] 

Thiruvel Thirumoolan commented on HBASE-20001:
--

[~yuzhih...@gmail.com], [~chia7712] - Patch with pre-commit passed. Lemme know 
if branch-1.4 patch can get in. I can start working on 1.3 and 1.2 patches.

> cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
> -
>
> Key: HBASE-20001
> URL: https://issues.apache.org/jira/browse/HBASE-20001
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7
>Reporter: Francis Liu
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.3.2, 1.5.0, 1.2.7, 1.4.3
>
> Attachments: HBASE-20001.branch-1.4.001.patch, 
> HBASE-20001.branch-1.4.002.patch, HBASE-20001.branch-1.4.003.patch, 
> HBASE-20001.branch-1.4.004.patch, HBASE-20001.branch-1.4.005.patch, 
> HBASE-20001.branch-1.4.006.patch
>
>
> In RegionStates.cleanIfNoMetaEntry()
> {{if (MetaTableAccessor.getRegion(server.getConnection(), 
> hri.getEncodedNameAsBytes()) == null) {}}
> {{regionOffline(hri);}}
> {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}}
>  }
> But api expects regionname
> {{public static Pair getRegion(Connection 
> connection, byte [] regionName)}}
> So we might end up cleaning good regions.
>  
> ADDENDUM:
> The scenario mentioned occurs when zkless assignment is used.  With zk-based 
> assignment without the patch what could occur is the daughter regions are 
> offlined and have no hdfs directory but have entries in meta. The daughter 
> meta entries will prolly be picked up by the client causing NSREs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region

2018-02-26 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377456#comment-16377456
 ] 

Thiruvel Thirumoolan commented on HBASE-20001:
--

Thanks [~chia7712], the check is not required at the moment.

Updated patch with comments addressed.

> cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
> -
>
> Key: HBASE-20001
> URL: https://issues.apache.org/jira/browse/HBASE-20001
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7
>Reporter: Francis Liu
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.3.2, 1.5.0, 1.2.7, 1.4.3
>
> Attachments: HBASE-20001.branch-1.4.001.patch, 
> HBASE-20001.branch-1.4.002.patch, HBASE-20001.branch-1.4.003.patch, 
> HBASE-20001.branch-1.4.004.patch, HBASE-20001.branch-1.4.005.patch, 
> HBASE-20001.branch-1.4.006.patch
>
>
> In RegionStates.cleanIfNoMetaEntry()
> {{if (MetaTableAccessor.getRegion(server.getConnection(), 
> hri.getEncodedNameAsBytes()) == null) {}}
> {{regionOffline(hri);}}
> {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}}
>  }
> But api expects regionname
> {{public static Pair getRegion(Connection 
> connection, byte [] regionName)}}
> So we might end up cleaning good regions.
>  
> ADDENDUM:
> The scenario mentioned occurs when zkless assignment is used.  With zk-based 
> assignment without the patch what could occur is the daughter regions are 
> offlined and have no hdfs directory but have entries in meta. The daughter 
> meta entries will prolly be picked up by the client causing NSREs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region

2018-02-26 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-20001:
-
Attachment: HBASE-20001.branch-1.4.006.patch

> cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
> -
>
> Key: HBASE-20001
> URL: https://issues.apache.org/jira/browse/HBASE-20001
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7
>Reporter: Francis Liu
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.3.2, 1.5.0, 1.2.7, 1.4.3
>
> Attachments: HBASE-20001.branch-1.4.001.patch, 
> HBASE-20001.branch-1.4.002.patch, HBASE-20001.branch-1.4.003.patch, 
> HBASE-20001.branch-1.4.004.patch, HBASE-20001.branch-1.4.005.patch, 
> HBASE-20001.branch-1.4.006.patch
>
>
> In RegionStates.cleanIfNoMetaEntry()
> {{if (MetaTableAccessor.getRegion(server.getConnection(), 
> hri.getEncodedNameAsBytes()) == null) {}}
> {{regionOffline(hri);}}
> {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}}
>  }
> But api expects regionname
> {{public static Pair getRegion(Connection 
> connection, byte [] regionName)}}
> So we might end up cleaning good regions.
>  
> ADDENDUM:
> The scenario mentioned occurs when zkless assignment is used.  With zk-based 
> assignment without the patch what could occur is the daughter regions are 
> offlined and have no hdfs directory but have entries in meta. The daughter 
> meta entries will prolly be picked up by the client causing NSREs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region

2018-02-23 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374928#comment-16374928
 ] 

Thiruvel Thirumoolan commented on HBASE-20001:
--

[~yuzhih...@gmail.com], sure, will wait for any other feedback and address 
them. I will change the method to cleanFailedSplitMergeRegions() if thats ok.

 
 [~chia7712],
{quote}Is there a chance that null server is passed to RegionStates?
{quote}
I remember a failure without the check, lemme run tests locally and check.
{quote}Why we always remove the daughter region directories when using the 
ZooKeeper based region assignments? It seems to me that it is another kind of 
data lose?
{quote}
We rollback during split and merge failures in zk assignment. So, we online the 
parents and cleanup the failed splits/merges. This is demonstrated by the unit 
tests - testMergeIsRolledBackOnMERGEFailure and  
testSplitIsRolledBackOnSPLITFailure. I also explained this in one of the 
comments above, lemme know what you think.

Thanks!

> cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
> -
>
> Key: HBASE-20001
> URL: https://issues.apache.org/jira/browse/HBASE-20001
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7
>Reporter: Francis Liu
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.3.2, 1.5.0, 1.2.7, 1.4.3
>
> Attachments: HBASE-20001.branch-1.4.001.patch, 
> HBASE-20001.branch-1.4.002.patch, HBASE-20001.branch-1.4.003.patch, 
> HBASE-20001.branch-1.4.004.patch, HBASE-20001.branch-1.4.005.patch
>
>
> In RegionStates.cleanIfNoMetaEntry()
> {{if (MetaTableAccessor.getRegion(server.getConnection(), 
> hri.getEncodedNameAsBytes()) == null) {}}
> {{regionOffline(hri);}}
> {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}}
> }
> But api expects regionname
> {{public static Pair getRegion(Connection 
> connection, byte [] regionName)}}
> So we might end up cleaning good regions.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region

2018-02-23 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374069#comment-16374069
 ] 

Thiruvel Thirumoolan commented on HBASE-20001:
--

Thanks [~yuzhih...@gmail.com]

Patch HBASE-20001.branch-1.4.005.patch passed precommit tests (failure 
unrelated) and is ready for review.

I can post patches for other branches if the 1.4 one looks ok.

> cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
> -
>
> Key: HBASE-20001
> URL: https://issues.apache.org/jira/browse/HBASE-20001
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7
>Reporter: Francis Liu
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.3.2, 1.5.0, 1.2.7, 1.4.3
>
> Attachments: HBASE-20001.branch-1.4.001.patch, 
> HBASE-20001.branch-1.4.002.patch, HBASE-20001.branch-1.4.003.patch, 
> HBASE-20001.branch-1.4.004.patch, HBASE-20001.branch-1.4.005.patch
>
>
> In RegionStates.cleanIfNoMetaEntry()
> {{if (MetaTableAccessor.getRegion(server.getConnection(), 
> hri.getEncodedNameAsBytes()) == null) {}}
> {{regionOffline(hri);}}
> {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}}
> }
> But api expects regionname
> {{public static Pair getRegion(Connection 
> connection, byte [] regionName)}}
> So we might end up cleaning good regions.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region

2018-02-22 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-20001:
-
Attachment: HBASE-20001.branch-1.4.005.patch

> cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
> -
>
> Key: HBASE-20001
> URL: https://issues.apache.org/jira/browse/HBASE-20001
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7
>Reporter: Francis Liu
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.3.2, 1.5.0, 1.2.7, 1.4.3
>
> Attachments: HBASE-20001.branch-1.4.001.patch, 
> HBASE-20001.branch-1.4.002.patch, HBASE-20001.branch-1.4.003.patch, 
> HBASE-20001.branch-1.4.004.patch, HBASE-20001.branch-1.4.005.patch
>
>
> In RegionStates.cleanIfNoMetaEntry()
> {{if (MetaTableAccessor.getRegion(server.getConnection(), 
> hri.getEncodedNameAsBytes()) == null) {}}
> {{regionOffline(hri);}}
> {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}}
> }
> But api expects regionname
> {{public static Pair getRegion(Connection 
> connection, byte [] regionName)}}
> So we might end up cleaning good regions.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region

2018-02-22 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-20001:
-
Attachment: HBASE-20001.branch-1.4.004.patch

> cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
> -
>
> Key: HBASE-20001
> URL: https://issues.apache.org/jira/browse/HBASE-20001
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7
>Reporter: Francis Liu
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.3.2, 1.5.0, 1.2.7, 1.4.3
>
> Attachments: HBASE-20001.branch-1.4.001.patch, 
> HBASE-20001.branch-1.4.002.patch, HBASE-20001.branch-1.4.003.patch, 
> HBASE-20001.branch-1.4.004.patch
>
>
> In RegionStates.cleanIfNoMetaEntry()
> {{if (MetaTableAccessor.getRegion(server.getConnection(), 
> hri.getEncodedNameAsBytes()) == null) {}}
> {{regionOffline(hri);}}
> {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}}
> }
> But api expects regionname
> {{public static Pair getRegion(Connection 
> connection, byte [] regionName)}}
> So we might end up cleaning good regions.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region

2018-02-22 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16373621#comment-16373621
 ] 

Thiruvel Thirumoolan commented on HBASE-20001:
--

Uploaded HBASE-20001.branch-1.4.003.patch to address the issues we ([~toffer] 
and I) found. This addresses two issues:
 # regionName fix which caused the data loss issue for us.
 # ZK split/merge rollback on failure and unit tests.

 

testRSSplitEphemeralsDisappearButDaughtersAreOnlinedAfterShutdownHandling - 
This test failed and caused subsequent tests to fail. It was failing in 
deletion of the test table (finally clause) because the daughters were in 
transition (SPLITTING_NEW) due to regionName fix. Without the regionName fix, 
the daughters were offlined and HDFS dir removed and the test passed, which is 
wrong.

[~toffer] pointed out that the test was waiting for daughters to be online, but 
in zk based assignment, we rollback and not forward. So, we should be waiting 
for parent. The test was still passing all these checks because there were not 
enough barriers. So we fixed the test to comply with the zk based behavior. We 
also introduced a similar test for merge in zk mode. I will raise separate Jira 
for re-introducing zkless based tests back and will add the appropriate zkless 
tests in follow up.

 

Once we fixed the test, we realized the failed daughters were in transition and 
not offlined. We fixed that also in RegionStates.java as part of this Jira 
itself.

Please let us know what do you guys think. Thanks!

> cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
> -
>
> Key: HBASE-20001
> URL: https://issues.apache.org/jira/browse/HBASE-20001
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7
>Reporter: Francis Liu
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.3.2, 1.5.0, 1.2.7, 1.4.3
>
> Attachments: HBASE-20001.branch-1.4.001.patch, 
> HBASE-20001.branch-1.4.002.patch, HBASE-20001.branch-1.4.003.patch
>
>
> In RegionStates.cleanIfNoMetaEntry()
> {{if (MetaTableAccessor.getRegion(server.getConnection(), 
> hri.getEncodedNameAsBytes()) == null) {}}
> {{regionOffline(hri);}}
> {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}}
> }
> But api expects regionname
> {{public static Pair getRegion(Connection 
> connection, byte [] regionName)}}
> So we might end up cleaning good regions.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region

2018-02-22 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-20001:
-
Attachment: HBASE-20001.branch-1.4.003.patch

> cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
> -
>
> Key: HBASE-20001
> URL: https://issues.apache.org/jira/browse/HBASE-20001
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7
>Reporter: Francis Liu
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.3.2, 1.5.0, 1.2.7, 1.4.3
>
> Attachments: HBASE-20001.branch-1.4.001.patch, 
> HBASE-20001.branch-1.4.002.patch, HBASE-20001.branch-1.4.003.patch
>
>
> In RegionStates.cleanIfNoMetaEntry()
> {{if (MetaTableAccessor.getRegion(server.getConnection(), 
> hri.getEncodedNameAsBytes()) == null) {}}
> {{regionOffline(hri);}}
> {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}}
> }
> But api expects regionname
> {{public static Pair getRegion(Connection 
> connection, byte [] regionName)}}
> So we might end up cleaning good regions.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region

2018-02-21 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371084#comment-16371084
 ] 

Thiruvel Thirumoolan commented on HBASE-20001:
--

We found some more issues. Will upload another patch tomm, would be better to 
explain issues along with the patch.

> cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
> -
>
> Key: HBASE-20001
> URL: https://issues.apache.org/jira/browse/HBASE-20001
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7
>Reporter: Francis Liu
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.3.2, 1.5.0, 1.2.7, 1.4.3
>
> Attachments: HBASE-20001.branch-1.4.001.patch, 
> HBASE-20001.branch-1.4.002.patch
>
>
> In RegionStates.cleanIfNoMetaEntry()
> {{if (MetaTableAccessor.getRegion(server.getConnection(), 
> hri.getEncodedNameAsBytes()) == null) {}}
> {{regionOffline(hri);}}
> {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}}
> }
> But api expects regionname
> {{public static Pair getRegion(Connection 
> connection, byte [] regionName)}}
> So we might end up cleaning good regions.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region

2018-02-16 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16368013#comment-16368013
 ] 

Thiruvel Thirumoolan commented on HBASE-20001:
--

[~chia7712] - Yes. We lost data on our test clusters. Daughters were in 
SPLITTING_NEW state and since there was no meta entry for the encoded name 
(bcoz of this bug), the daughter region directories on HDFS were removed. 
Catalog Janitor didn't find split references and removed the parent too, 
causing hbck to complain of region hole.

> cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
> -
>
> Key: HBASE-20001
> URL: https://issues.apache.org/jira/browse/HBASE-20001
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7
>Reporter: Francis Liu
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.3.2, 1.5.0, 1.4.3
>
> Attachments: HBASE-20001.branch-1.4.001.patch
>
>
> In RegionStates.cleanIfNoMetaEntry()
> {{if (MetaTableAccessor.getRegion(server.getConnection(), 
> hri.getEncodedNameAsBytes()) == null) {}}
> {{regionOffline(hri);}}
> {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}}
> }
> But api expects regionname
> {{public static Pair getRegion(Connection 
> connection, byte [] regionName)}}
> So we might end up cleaning good regions.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region

2018-02-15 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366534#comment-16366534
 ] 

Thiruvel Thirumoolan commented on HBASE-20001:
--

[~chia7712] - I have added a comment in the reviewboard, can we pls continue 
the conversation there? thanks for your feedback.

Some tests in TestSplitTransactionOnCluster also failed, guessing its mostly 
test based, will check it out.

> cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
> -
>
> Key: HBASE-20001
> URL: https://issues.apache.org/jira/browse/HBASE-20001
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7
>Reporter: Francis Liu
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.3.2, 1.5.0, 1.4.3
>
> Attachments: HBASE-20001.branch-1.4.001.patch
>
>
> In RegionStates.cleanIfNoMetaEntry()
> {{if (MetaTableAccessor.getRegion(server.getConnection(), 
> hri.getEncodedNameAsBytes()) == null) {}}
> {{regionOffline(hri);}}
> {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}}
> }
> But api expects regionname
> {{public static Pair getRegion(Connection 
> connection, byte [] regionName)}}
> So we might end up cleaning good regions.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region

2018-02-15 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366158#comment-16366158
 ] 

Thiruvel Thirumoolan commented on HBASE-20001:
--

[~chia7712] When we were investigating this problem internally, it was hard to 
trace through regions that were cleaned up, we had to lookup namenode audit 
logs.

> cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
> -
>
> Key: HBASE-20001
> URL: https://issues.apache.org/jira/browse/HBASE-20001
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7
>Reporter: Francis Liu
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.3.2, 1.5.0, 1.4.2
>
> Attachments: HBASE-20001.branch-1.4.001.patch
>
>
> In RegionStates.cleanIfNoMetaEntry()
> {{if (MetaTableAccessor.getRegion(server.getConnection(), 
> hri.getEncodedNameAsBytes()) == null) {}}
> {{regionOffline(hri);}}
> {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}}
> }
> But api expects regionname
> {{public static Pair getRegion(Connection 
> connection, byte [] regionName)}}
> So we might end up cleaning good regions.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region

2018-02-15 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-20001:
-
Fix Version/s: 1.4.2
   1.5.0
   1.3.2

> cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
> -
>
> Key: HBASE-20001
> URL: https://issues.apache.org/jira/browse/HBASE-20001
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7
>Reporter: Francis Liu
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.3.2, 1.5.0, 1.4.2
>
> Attachments: HBASE-20001.branch-1.4.001.patch
>
>
> In RegionStates.cleanIfNoMetaEntry()
> {{if (MetaTableAccessor.getRegion(server.getConnection(), 
> hri.getEncodedNameAsBytes()) == null) {}}
> {{regionOffline(hri);}}
> {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}}
> }
> But api expects regionname
> {{public static Pair getRegion(Connection 
> connection, byte [] regionName)}}
> So we might end up cleaning good regions.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region

2018-02-15 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-20001:
-
Status: Patch Available  (was: Open)

> cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
> -
>
> Key: HBASE-20001
> URL: https://issues.apache.org/jira/browse/HBASE-20001
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.1.7, 1.4.0, 1.3.0, 1.2.0
>Reporter: Francis Liu
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.3.2, 1.5.0, 1.4.2
>
> Attachments: HBASE-20001.branch-1.4.001.patch
>
>
> In RegionStates.cleanIfNoMetaEntry()
> {{if (MetaTableAccessor.getRegion(server.getConnection(), 
> hri.getEncodedNameAsBytes()) == null) {}}
> {{regionOffline(hri);}}
> {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}}
> }
> But api expects regionname
> {{public static Pair getRegion(Connection 
> connection, byte [] regionName)}}
> So we might end up cleaning good regions.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region

2018-02-15 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-20001:
-
Attachment: HBASE-20001.branch-1.4.001.patch

> cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
> -
>
> Key: HBASE-20001
> URL: https://issues.apache.org/jira/browse/HBASE-20001
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7
>Reporter: Francis Liu
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Attachments: HBASE-20001.branch-1.4.001.patch
>
>
> In RegionStates.cleanIfNoMetaEntry()
> {{if (MetaTableAccessor.getRegion(server.getConnection(), 
> hri.getEncodedNameAsBytes()) == null) {}}
> {{regionOffline(hri);}}
> {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}}
> }
> But api expects regionname
> {{public static Pair getRegion(Connection 
> connection, byte [] regionName)}}
> So we might end up cleaning good regions.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19996) Some nonce procs might not be cleaned up (follow up HBASE-19756)

2018-02-15 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365306#comment-16365306
 ] 

Thiruvel Thirumoolan commented on HBASE-19996:
--

[~yuzhih...@gmail.com] - Uploaded patch for master branch, test case. Lemme 
know if this is ok or would you like me to push that in another jira.

> Some nonce procs might not be cleaned up (follow up HBASE-19756)
> 
>
> Key: HBASE-19996
> URL: https://issues.apache.org/jira/browse/HBASE-19996
> Project: HBase
>  Issue Type: Bug
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.3.2, 1.5.0, 1.4.2
>
> Attachments: HBASE-19996.branch-1.4.001.patch, 
> HBASE-19996.branch-1.4.001.patch, HBASE-19996.master.001.patch
>
>
> Follow up to HBASE-19756 which dealt with NPEs during proc cleanup. 
> Unfortunately, the patch for branch-1 might not remove some valid procs too. 
> The branch-2 patch doesn't have this problem. This fixes the branch-1 bug and 
> also adds another test to branch-2. Thanks to [~toffer] for flagging this 
> internally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   4   >