[jira] [Commented] (HBASE-22356) API to get hdfs block distribution from regionservers
[ https://issues.apache.org/jira/browse/HBASE-22356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970624#comment-16970624 ] Thiruvel Thirumoolan commented on HBASE-22356: -- [~binlijin] - Updated the patch on reviewboard for you to quickly see the differences and also submitted a PR to github master branch. Thanks for your time in reviews. > API to get hdfs block distribution from regionservers > - > > Key: HBASE-22356 > URL: https://issues.apache.org/jira/browse/HBASE-22356 > Project: HBase > Issue Type: Sub-task > Components: API, Balancer, regionserver >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Labels: balancer > Fix For: 3.0.0, 2.3.0, 1.6.0 > > Attachments: HBASE-22356.master.001.patch, > HBASE-22356.master.002.patch, HBASE-22356.master.003.patch > > > A RegionServer API has to be added which will return HDFSBlockDistribution > for all the regions it hosts. RS already has this info cached and updated > when flush/compaction happens. Master can query and get instead of hitting > the namenode and caching. The larger the cluster becomes, the more costly it > becomes to get this information and more stale the cached information becomes. > This jira is only to add the API to regionserver. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-22356) API to get hdfs block distribution from regionservers
[ https://issues.apache.org/jira/browse/HBASE-22356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-22356: - Attachment: HBASE-22356.master.003.patch > API to get hdfs block distribution from regionservers > - > > Key: HBASE-22356 > URL: https://issues.apache.org/jira/browse/HBASE-22356 > Project: HBase > Issue Type: Sub-task > Components: API, Balancer, regionserver >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Labels: balancer > Fix For: 3.0.0, 2.3.0, 1.6.0 > > Attachments: HBASE-22356.master.001.patch, > HBASE-22356.master.002.patch, HBASE-22356.master.003.patch > > > A RegionServer API has to be added which will return HDFSBlockDistribution > for all the regions it hosts. RS already has this info cached and updated > when flush/compaction happens. Master can query and get instead of hitting > the namenode and caching. The larger the cluster becomes, the more costly it > becomes to get this information and more stale the cached information becomes. > This jira is only to add the API to regionserver. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23219) Re-enable ZKLess tests for branch-1 (Revert HBASE-14622)
[ https://issues.apache.org/jira/browse/HBASE-23219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16965139#comment-16965139 ] Thiruvel Thirumoolan commented on HBASE-23219: -- Thanks [~apurtell] > Re-enable ZKLess tests for branch-1 (Revert HBASE-14622) > > > Key: HBASE-23219 > URL: https://issues.apache.org/jira/browse/HBASE-23219 > Project: HBase > Issue Type: Task > Components: test >Affects Versions: 1.3.6 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Trivial > Fix For: 1.6.0, 1.4.12, 1.3.7 > > Attachments: HBASE-23219.branch-1.001.patch, > HBASE-23219.branch-1.3.001.patch, HBASE-23219.branch-1.4.001.patch > > > Since we are using zkless in our production setup, we would like to enable > these tests back in apache on branch-1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-23219) Re-enable ZKLess tests for branch-1 (Revert HBASE-14622)
[ https://issues.apache.org/jira/browse/HBASE-23219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-23219: - Status: Patch Available (was: Open) > Re-enable ZKLess tests for branch-1 (Revert HBASE-14622) > > > Key: HBASE-23219 > URL: https://issues.apache.org/jira/browse/HBASE-23219 > Project: HBase > Issue Type: Task > Components: test >Affects Versions: 1.3.6 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Trivial > Fix For: 1.4.12, 1.3.7 > > Attachments: HBASE-23219.branch-1.001.patch, > HBASE-23219.branch-1.3.001.patch, HBASE-23219.branch-1.4.001.patch > > > Since we are using zkless in our production setup, we would like to enable > these tests back in apache on branch-1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-23219) Re-enable ZKLess tests for branch-1 (Revert HBASE-14622)
[ https://issues.apache.org/jira/browse/HBASE-23219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-23219: - Fix Version/s: (was: 1.5.1) (was: 1.6.0) Status: Open (was: Patch Available) > Re-enable ZKLess tests for branch-1 (Revert HBASE-14622) > > > Key: HBASE-23219 > URL: https://issues.apache.org/jira/browse/HBASE-23219 > Project: HBase > Issue Type: Task > Components: test >Affects Versions: 1.3.6 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Trivial > Fix For: 1.4.12, 1.3.7 > > Attachments: HBASE-23219.branch-1.001.patch, > HBASE-23219.branch-1.3.001.patch, HBASE-23219.branch-1.4.001.patch > > > Since we are using zkless in our production setup, we would like to enable > these tests back in apache on branch-1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-23219) Re-enable ZKLess tests for branch-1 (Revert HBASE-14622)
[ https://issues.apache.org/jira/browse/HBASE-23219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-23219: - Attachment: HBASE-23219.branch-1.4.001.patch > Re-enable ZKLess tests for branch-1 (Revert HBASE-14622) > > > Key: HBASE-23219 > URL: https://issues.apache.org/jira/browse/HBASE-23219 > Project: HBase > Issue Type: Task > Components: test >Affects Versions: 1.3.6 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Trivial > Fix For: 1.6.0, 1.4.12, 1.3.7, 1.5.1 > > Attachments: HBASE-23219.branch-1.001.patch, > HBASE-23219.branch-1.3.001.patch, HBASE-23219.branch-1.4.001.patch > > > Since we are using zkless in our production setup, we would like to enable > these tests back in apache on branch-1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-23219) Re-enable ZKLess tests for branch-1 (Revert HBASE-14622)
[ https://issues.apache.org/jira/browse/HBASE-23219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-23219: - Attachment: HBASE-23219.branch-1.3.001.patch > Re-enable ZKLess tests for branch-1 (Revert HBASE-14622) > > > Key: HBASE-23219 > URL: https://issues.apache.org/jira/browse/HBASE-23219 > Project: HBase > Issue Type: Task > Components: test >Affects Versions: 1.3.6 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Trivial > Fix For: 1.6.0, 1.4.12, 1.3.7, 1.5.1 > > Attachments: HBASE-23219.branch-1.001.patch, > HBASE-23219.branch-1.3.001.patch > > > Since we are using zkless in our production setup, we would like to enable > these tests back in apache on branch-1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23219) Re-enable ZKLess tests for branch-1 (Revert HBASE-14622)
Thiruvel Thirumoolan created HBASE-23219: Summary: Re-enable ZKLess tests for branch-1 (Revert HBASE-14622) Key: HBASE-23219 URL: https://issues.apache.org/jira/browse/HBASE-23219 Project: HBase Issue Type: Task Components: test Affects Versions: 1.3.6 Reporter: Thiruvel Thirumoolan Assignee: Thiruvel Thirumoolan Fix For: 1.4.12, 1.3.7 Since we are using zkless in our production setup, we would like to enable these tests back in apache on branch-1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-23219) Re-enable ZKLess tests for branch-1 (Revert HBASE-14622)
[ https://issues.apache.org/jira/browse/HBASE-23219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-23219: - Status: Patch Available (was: Open) > Re-enable ZKLess tests for branch-1 (Revert HBASE-14622) > > > Key: HBASE-23219 > URL: https://issues.apache.org/jira/browse/HBASE-23219 > Project: HBase > Issue Type: Task > Components: test >Affects Versions: 1.3.6 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Trivial > Fix For: 1.4.12, 1.3.7 > > Attachments: HBASE-23219.branch-1.001.patch > > > Since we are using zkless in our production setup, we would like to enable > these tests back in apache on branch-1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-23219) Re-enable ZKLess tests for branch-1 (Revert HBASE-14622)
[ https://issues.apache.org/jira/browse/HBASE-23219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-23219: - Fix Version/s: 1.5.1 1.6.0 > Re-enable ZKLess tests for branch-1 (Revert HBASE-14622) > > > Key: HBASE-23219 > URL: https://issues.apache.org/jira/browse/HBASE-23219 > Project: HBase > Issue Type: Task > Components: test >Affects Versions: 1.3.6 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Trivial > Fix For: 1.6.0, 1.4.12, 1.3.7, 1.5.1 > > Attachments: HBASE-23219.branch-1.001.patch > > > Since we are using zkless in our production setup, we would like to enable > these tests back in apache on branch-1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-23219) Re-enable ZKLess tests for branch-1 (Revert HBASE-14622)
[ https://issues.apache.org/jira/browse/HBASE-23219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-23219: - Attachment: HBASE-23219.branch-1.001.patch > Re-enable ZKLess tests for branch-1 (Revert HBASE-14622) > > > Key: HBASE-23219 > URL: https://issues.apache.org/jira/browse/HBASE-23219 > Project: HBase > Issue Type: Task > Components: test >Affects Versions: 1.3.6 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Trivial > Fix For: 1.4.12, 1.3.7 > > Attachments: HBASE-23219.branch-1.001.patch > > > Since we are using zkless in our production setup, we would like to enable > these tests back in apache on branch-1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-22356) API to get hdfs block distribution from regionservers
[ https://issues.apache.org/jira/browse/HBASE-22356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-22356: - Attachment: HBASE-22356.master.002.patch > API to get hdfs block distribution from regionservers > - > > Key: HBASE-22356 > URL: https://issues.apache.org/jira/browse/HBASE-22356 > Project: HBase > Issue Type: Sub-task > Components: API, Balancer, regionserver >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Labels: balancer > Fix For: 3.0.0, 2.2.1, 1.5.1 > > Attachments: HBASE-22356.master.001.patch, > HBASE-22356.master.002.patch > > > A RegionServer API has to be added which will return HDFSBlockDistribution > for all the regions it hosts. RS already has this info cached and updated > when flush/compaction happens. Master can query and get instead of hitting > the namenode and caching. The larger the cluster becomes, the more costly it > becomes to get this information and more stale the cached information becomes. > This jira is only to add the API to regionserver. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-22356) API to get hdfs block distribution from regionservers
[ https://issues.apache.org/jira/browse/HBASE-22356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-22356: - Status: Patch Available (was: Open) Submitting precommit for master. Once patch is good for master, will work on branch-1 patch. > API to get hdfs block distribution from regionservers > - > > Key: HBASE-22356 > URL: https://issues.apache.org/jira/browse/HBASE-22356 > Project: HBase > Issue Type: Sub-task > Components: API, Balancer, regionserver >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 3.0.0, 2.2.1, 1.5.1 > > Attachments: HBASE-22356.master.001.patch > > > A RegionServer API has to be added which will return HDFSBlockDistribution > for all the regions it hosts. RS already has this info cached and updated > when flush/compaction happens. Master can query and get instead of hitting > the namenode and caching. The larger the cluster becomes, the more costly it > becomes to get this information and more stale the cached information becomes. > This jira is only to add the API to regionserver. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-22356) API to get hdfs block distribution from regionservers
[ https://issues.apache.org/jira/browse/HBASE-22356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-22356: - Attachment: HBASE-22356.master.001.patch > API to get hdfs block distribution from regionservers > - > > Key: HBASE-22356 > URL: https://issues.apache.org/jira/browse/HBASE-22356 > Project: HBase > Issue Type: Sub-task > Components: API, Balancer, regionserver >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 3.0.0, 2.2.1, 1.5.1 > > Attachments: HBASE-22356.master.001.patch > > > A RegionServer API has to be added which will return HDFSBlockDistribution > for all the regions it hosts. RS already has this info cached and updated > when flush/compaction happens. Master can query and get instead of hitting > the namenode and caching. The larger the cluster becomes, the more costly it > becomes to get this information and more stale the cached information becomes. > This jira is only to add the API to regionserver. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22356) API to get hdfs block distribution from regionservers
Thiruvel Thirumoolan created HBASE-22356: Summary: API to get hdfs block distribution from regionservers Key: HBASE-22356 URL: https://issues.apache.org/jira/browse/HBASE-22356 Project: HBase Issue Type: Sub-task Components: API, Balancer, regionserver Reporter: Thiruvel Thirumoolan Assignee: Thiruvel Thirumoolan Fix For: 3.0.0, 2.2.1, 1.5.1 A RegionServer API has to be added which will return HDFSBlockDistribution for all the regions it hosts. RS already has this info cached and updated when flush/compaction happens. Master can query and get instead of hitting the namenode and caching. The larger the cluster becomes, the more costly it becomes to get this information and more stale the cached information becomes. This jira is only to add the API to regionserver. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-15533) Add RSGroup Favored Balancer
[ https://issues.apache.org/jira/browse/HBASE-15533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820830#comment-16820830 ] Thiruvel Thirumoolan commented on HBASE-15533: -- [~zodvik], Thanks for the interest. Most of this patch should work as it is. Since majority of these tests dependent on FavoredStochastic unit tests, lemme get HBASE-18349 in before continuing to work on this. I think I have a patch to fix all but one unit test. So will post a partial patch or the whole patch on HBASE-18349 and then will resume work on this. > Add RSGroup Favored Balancer > > > Key: HBASE-15533 > URL: https://issues.apache.org/jira/browse/HBASE-15533 > Project: HBase > Issue Type: Sub-task > Components: FavoredNodes >Reporter: Francis Liu >Assignee: Thiruvel Thirumoolan >Priority: Major > Attachments: HBASE-15533.master.001.patch, > HBASE-15533.master.002.patch, HBASE-15533.patch, HBASE-15533.rough.draft.patch > > > HBASE-16942 added favored stochastic load balancer so we can pick and choose > nodes to assign based on the favored nodes and load/locality. The intention > of this jira is to add a group based load balancer on top of the favored > stochastic balancer. This will ensure splits/merges will only use favored > nodes from that group and will inherit from the parents appropriately. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20546) Improve perf of RegionLocationFinder.mapHostNameToServerName
[ https://issues.apache.org/jira/browse/HBASE-20546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815682#comment-16815682 ] Thiruvel Thirumoolan commented on HBASE-20546: -- Thanks Andy, I think I had a draft patch of it a while back, shelved. Will resume work on it. > Improve perf of RegionLocationFinder.mapHostNameToServerName > > > Key: HBASE-20546 > URL: https://issues.apache.org/jira/browse/HBASE-20546 > Project: HBase > Issue Type: Improvement > Components: Balancer >Affects Versions: 1.4.4, 2.0.0 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Attachments: HBASE-20546.branch-1.4.001.patch > > > RegionLocationFinder.getTopBlockLocations() is called multiple times during > balancer. While profiling on a large table balance, mapHostNameToServerName() > seem to take a lot of time. One of the maps is repeatedly created for each > iteration, while we can just initialize it once. > Goes into both branch-1 and branch-2, although patches differ slightly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20643) Getting HDFSBlockDist in Master by querying RegionServers
[ https://issues.apache.org/jira/browse/HBASE-20643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815680#comment-16815680 ] Thiruvel Thirumoolan commented on HBASE-20643: -- We rolled part of this patch internally and its been good, lemme split this into two sub tasks and get this in. It's definitely needed for large clusters. > Getting HDFSBlockDist in Master by querying RegionServers > - > > Key: HBASE-20643 > URL: https://issues.apache.org/jira/browse/HBASE-20643 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.5.0, 2.3.0 > > > Region locality information is needed by the balancer to generate region > plans. Computing HDFSBlockDistribution is expensive on larger clusters and > adds load to the NameNode. This also needs to be recomputed on a master > restart. The proposal is to get the HDFSBlockDistribution from the > RegionServers instead of computing it in Master. RS already has this > information and we could just reuse it by querying it. RS already passes > dataLocality info via RegionLoad today. > Proposed Implementation: This is a high-level overview. > # A RegionServer API has to be added which will return HDFSBlockDistribution > for all the regions it hosts. RS already has this info. Since ClusterStatus > has already become bulky and we don’t need updated locality so fast, it’s > better to have another API rather than add this to RegionLoad and pass it > along with RSReport. > # Master will have a Chore to query all RegionServers and will cache the > HDFSBlockDistribution for those regions. This is easy and quick. Admins can > tune the frequency based on size of the cluster. On a ~90 nodes cluster with > 500k regions and a prototype implementation and no load, it took about 5 > seconds to get all HDFSBlockDistribution from RS. > # The cache will be an extension of RegionLocationFinder (subclass), if > needed to keep the implementation simple. Probably will get clear with > implementation. > # Balancer will use the new cache to get all HDFSBlockDistribution. If there > is a new region and Chore didn’t get the block distribution from RS during > its previous run, then it will be computed by RegionLocationFinder the same > way it has been done now. If the Chore runs more frequently like every hour, > then this recomputation will be drastically reduced. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21883) Enhancements to Major Compaction tool from HBASE-19528
[ https://issues.apache.org/jira/browse/HBASE-21883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808270#comment-16808270 ] Thiruvel Thirumoolan commented on HBASE-21883: -- [~apurtell] I have a master patch up, pls let me know if you have any feedback, thanks! > Enhancements to Major Compaction tool from HBASE-19528 > -- > > Key: HBASE-21883 > URL: https://issues.apache.org/jira/browse/HBASE-21883 > Project: HBase > Issue Type: Improvement > Components: Client, Compaction, tooling >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Minor > Fix For: 3.0.0, 2.3.0, 1.5.1 > > Attachments: HBASE-21883.branch-1.001.patch, > HBASE-21883.branch-1.002.patch, HBASE-21883.master.001.patch, > HBASE-21883.master.002.patch > > > I would like to add new compaction tools based on [~churromorales]'s tool at > HBASE-19528. > We internally have tools that pick and compact regions based on multiple > criteria. Since Rahul already has a version in community, we would like to > build on top of it instead of pushing yet another tool. > With this jira, I would like to add a tool which looks at regions beyond TTL > and compacts them in a rsgroup. We have time series data and those regions > will become dead after a while, so we compact those regions to save disk > space. We also merge those empty regions to reduce load, but that tool comes > later. > Will prep a patch for 2.x once 1.5 gets in. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21883) Enhancements to Major Compaction tool from HBASE-19528
[ https://issues.apache.org/jira/browse/HBASE-21883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803390#comment-16803390 ] Thiruvel Thirumoolan commented on HBASE-21883: -- [~stack] - Apologies, updated release notes. > Enhancements to Major Compaction tool from HBASE-19528 > -- > > Key: HBASE-21883 > URL: https://issues.apache.org/jira/browse/HBASE-21883 > Project: HBase > Issue Type: Improvement > Components: Client, Compaction, tooling >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Minor > Fix For: 3.0.0, 2.3.0, 1.5.1 > > Attachments: HBASE-21883.branch-1.001.patch, > HBASE-21883.branch-1.002.patch, HBASE-21883.master.001.patch, > HBASE-21883.master.002.patch > > > I would like to add new compaction tools based on [~churromorales]'s tool at > HBASE-19528. > We internally have tools that pick and compact regions based on multiple > criteria. Since Rahul already has a version in community, we would like to > build on top of it instead of pushing yet another tool. > With this jira, I would like to add a tool which looks at regions beyond TTL > and compacts them in a rsgroup. We have time series data and those regions > will become dead after a while, so we compact those regions to save disk > space. We also merge those empty regions to reduce load, but that tool comes > later. > Will prep a patch for 2.x once 1.5 gets in. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21883) Enhancements to Major Compaction tool from HBASE-19528
[ https://issues.apache.org/jira/browse/HBASE-21883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-21883: - Release Note: MajorCompactorTTL Tool allows to compact all regions in a table that have been TTLed out. This saves space on DFS and is useful for tables which are similar to time series data. This is typically scheduled to run frequently (say via cron) to cleanup old data on an ongoing basis. RSGroupMajorCompactionTTL tool is similar to MajorCompactorTTL but runs at a region server group level. If multiple tables in an rsgroup are similar to time-series data, then it runs a single command to clean them up. As more tables are added/removed from rsgroup, it's easy to have a single command to take care of all of them. > Enhancements to Major Compaction tool from HBASE-19528 > -- > > Key: HBASE-21883 > URL: https://issues.apache.org/jira/browse/HBASE-21883 > Project: HBase > Issue Type: Improvement > Components: Client, Compaction, tooling >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Minor > Fix For: 3.0.0, 2.3.0, 1.5.1 > > Attachments: HBASE-21883.branch-1.001.patch, > HBASE-21883.branch-1.002.patch, HBASE-21883.master.001.patch, > HBASE-21883.master.002.patch > > > I would like to add new compaction tools based on [~churromorales]'s tool at > HBASE-19528. > We internally have tools that pick and compact regions based on multiple > criteria. Since Rahul already has a version in community, we would like to > build on top of it instead of pushing yet another tool. > With this jira, I would like to add a tool which looks at regions beyond TTL > and compacts them in a rsgroup. We have time series data and those regions > will become dead after a while, so we compact those regions to save disk > space. We also merge those empty regions to reduce load, but that tool comes > later. > Will prep a patch for 2.x once 1.5 gets in. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21883) Enhancements to Major Compaction tool from HBASE-19528
[ https://issues.apache.org/jira/browse/HBASE-21883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-21883: - Attachment: HBASE-21883.master.002.patch > Enhancements to Major Compaction tool from HBASE-19528 > -- > > Key: HBASE-21883 > URL: https://issues.apache.org/jira/browse/HBASE-21883 > Project: HBase > Issue Type: Improvement > Components: Client, Compaction, tooling >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Minor > Fix For: 3.0.0, 2.3.0, 1.5.1 > > Attachments: HBASE-21883.branch-1.001.patch, > HBASE-21883.branch-1.002.patch, HBASE-21883.master.001.patch, > HBASE-21883.master.002.patch > > > I would like to add new compaction tools based on [~churromorales]'s tool at > HBASE-19528. > We internally have tools that pick and compact regions based on multiple > criteria. Since Rahul already has a version in community, we would like to > build on top of it instead of pushing yet another tool. > With this jira, I would like to add a tool which looks at regions beyond TTL > and compacts them in a rsgroup. We have time series data and those regions > will become dead after a while, so we compact those regions to save disk > space. We also merge those empty regions to reduce load, but that tool comes > later. > Will prep a patch for 2.x once 1.5 gets in. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21883) Enhancements to Major Compaction tool from HBASE-19528
[ https://issues.apache.org/jira/browse/HBASE-21883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-21883: - Status: Patch Available (was: Open) Kicking off pre-commit build for master branch. > Enhancements to Major Compaction tool from HBASE-19528 > -- > > Key: HBASE-21883 > URL: https://issues.apache.org/jira/browse/HBASE-21883 > Project: HBase > Issue Type: Improvement > Components: Client, Compaction, tooling >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Minor > Fix For: 3.0.0, 2.3.0, 1.5.1 > > Attachments: HBASE-21883.branch-1.001.patch, > HBASE-21883.branch-1.002.patch, HBASE-21883.master.001.patch > > > I would like to add new compaction tools based on [~churromorales]'s tool at > HBASE-19528. > We internally have tools that pick and compact regions based on multiple > criteria. Since Rahul already has a version in community, we would like to > build on top of it instead of pushing yet another tool. > With this jira, I would like to add a tool which looks at regions beyond TTL > and compacts them in a rsgroup. We have time series data and those regions > will become dead after a while, so we compact those regions to save disk > space. We also merge those empty regions to reduce load, but that tool comes > later. > Will prep a patch for 2.x once 1.5 gets in. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21883) Enhancements to Major Compaction tool from HBASE-19528
[ https://issues.apache.org/jira/browse/HBASE-21883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-21883: - Attachment: HBASE-21883.master.001.patch > Enhancements to Major Compaction tool from HBASE-19528 > -- > > Key: HBASE-21883 > URL: https://issues.apache.org/jira/browse/HBASE-21883 > Project: HBase > Issue Type: Improvement > Components: Client, Compaction, tooling >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Minor > Fix For: 3.0.0, 2.3.0, 1.5.1 > > Attachments: HBASE-21883.branch-1.001.patch, > HBASE-21883.branch-1.002.patch, HBASE-21883.master.001.patch > > > I would like to add new compaction tools based on [~churromorales]'s tool at > HBASE-19528. > We internally have tools that pick and compact regions based on multiple > criteria. Since Rahul already has a version in community, we would like to > build on top of it instead of pushing yet another tool. > With this jira, I would like to add a tool which looks at regions beyond TTL > and compacts them in a rsgroup. We have time series data and those regions > will become dead after a while, so we compact those regions to save disk > space. We also merge those empty regions to reduce load, but that tool comes > later. > Will prep a patch for 2.x once 1.5 gets in. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21903) Backport major compaction tool HBASE-19528 from to 1.4 and 1.3
[ https://issues.apache.org/jira/browse/HBASE-21903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-21903: - Attachment: HBASE-21903-branch-1.3-addendum.patch > Backport major compaction tool HBASE-19528 from to 1.4 and 1.3 > -- > > Key: HBASE-21903 > URL: https://issues.apache.org/jira/browse/HBASE-21903 > Project: HBase > Issue Type: Task > Components: Client, Compaction, tooling >Affects Versions: 1.3.3, 1.4.9 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.4.10, 1.3.4 > > Attachments: HBASE-21903-branch-1.3-addendum.patch > > > Our internal deployments are based on branch-1.3. We will be using the major > compaction tool HBASE-19528 from [~churromorales] and the enhancements on top > of it HBASE-21883 on our 1.3 clusters. I would like to backport HBASE-19528 > to 1.3 and hence 1.4 as well. Since its a standalone tool without any other > dependency or code changes, I believe that should be ok. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21903) Backport major compaction tool HBASE-19528 from to 1.4 and 1.3
[ https://issues.apache.org/jira/browse/HBASE-21903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768656#comment-16768656 ] Thiruvel Thirumoolan commented on HBASE-21903: -- For 1.3, we have to cherry pick https://github.com/apache/hbase/commit/673e8092506cd7c2c09093a3d085dc197ff14f53 and then apply [^HBASE-21903-branch-1.3-addendum.patch] to it, which is changes to only unit test framework. For 1.4, the patch applies as is, so just cherry pick. > Backport major compaction tool HBASE-19528 from to 1.4 and 1.3 > -- > > Key: HBASE-21903 > URL: https://issues.apache.org/jira/browse/HBASE-21903 > Project: HBase > Issue Type: Task > Components: Client, Compaction, tooling >Affects Versions: 1.3.3, 1.4.9 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.4.10, 1.3.4 > > Attachments: HBASE-21903-branch-1.3-addendum.patch > > > Our internal deployments are based on branch-1.3. We will be using the major > compaction tool HBASE-19528 from [~churromorales] and the enhancements on top > of it HBASE-21883 on our 1.3 clusters. I would like to backport HBASE-19528 > to 1.3 and hence 1.4 as well. Since its a standalone tool without any other > dependency or code changes, I believe that should be ok. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21903) Backport major compaction tool HBASE-19528 from to 1.4 and 1.3
Thiruvel Thirumoolan created HBASE-21903: Summary: Backport major compaction tool HBASE-19528 from to 1.4 and 1.3 Key: HBASE-21903 URL: https://issues.apache.org/jira/browse/HBASE-21903 Project: HBase Issue Type: Task Components: Client, Compaction, tooling Affects Versions: 1.4.9, 1.3.3 Reporter: Thiruvel Thirumoolan Assignee: Thiruvel Thirumoolan Fix For: 1.4.10, 1.3.4 Our internal deployments are based on branch-1.3. We will be using the major compaction tool HBASE-19528 from [~churromorales] and the enhancements on top of it HBASE-21883 on our 1.3 clusters. I would like to backport HBASE-19528 to 1.3 and hence 1.4 as well. Since its a standalone tool without any other dependency or code changes, I believe that should be ok. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21883) Enhancements to Major Compaction tool from HBASE-19528
[ https://issues.apache.org/jira/browse/HBASE-21883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768606#comment-16768606 ] Thiruvel Thirumoolan commented on HBASE-21883: -- Unit test failures are unrelated to the patch. > Enhancements to Major Compaction tool from HBASE-19528 > -- > > Key: HBASE-21883 > URL: https://issues.apache.org/jira/browse/HBASE-21883 > Project: HBase > Issue Type: Improvement > Components: Client, Compaction, tooling >Affects Versions: 1.5.0 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Minor > Fix For: 1.5.1 > > Attachments: HBASE-21883.branch-1.001.patch, > HBASE-21883.branch-1.002.patch > > > I would like to add new compaction tools based on [~churromorales]'s tool at > HBASE-19528. > We internally have tools that pick and compact regions based on multiple > criteria. Since Rahul already has a version in community, we would like to > build on top of it instead of pushing yet another tool. > With this jira, I would like to add a tool which looks at regions beyond TTL > and compacts them in a rsgroup. We have time series data and those regions > will become dead after a while, so we compact those regions to save disk > space. We also merge those empty regions to reduce load, but that tool comes > later. > Will prep a patch for 2.x once 1.5 gets in. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21883) Enhancements to Major Compaction tool from HBASE-19528
[ https://issues.apache.org/jira/browse/HBASE-21883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-21883: - Attachment: HBASE-21883.branch-1.002.patch > Enhancements to Major Compaction tool from HBASE-19528 > -- > > Key: HBASE-21883 > URL: https://issues.apache.org/jira/browse/HBASE-21883 > Project: HBase > Issue Type: Improvement > Components: Client, Compaction, tooling >Affects Versions: 1.5.0 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Minor > Fix For: 1.5.1 > > Attachments: HBASE-21883.branch-1.001.patch, > HBASE-21883.branch-1.002.patch > > > I would like to add new compaction tools based on [~churromorales]'s tool at > HBASE-19528. > We internally have tools that pick and compact regions based on multiple > criteria. Since Rahul already has a version in community, we would like to > build on top of it instead of pushing yet another tool. > With this jira, I would like to add a tool which looks at regions beyond TTL > and compacts them in a rsgroup. We have time series data and those regions > will become dead after a while, so we compact those regions to save disk > space. We also merge those empty regions to reduce load, but that tool comes > later. > Will prep a patch for 2.x once 1.5 gets in. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21883) Enhancements to Major Compaction tool from HBASE-19528
[ https://issues.apache.org/jira/browse/HBASE-21883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-21883: - Status: Patch Available (was: Open) Kicking off precommit build. > Enhancements to Major Compaction tool from HBASE-19528 > -- > > Key: HBASE-21883 > URL: https://issues.apache.org/jira/browse/HBASE-21883 > Project: HBase > Issue Type: Improvement > Components: Client, Compaction, tooling >Affects Versions: 1.5.0 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Minor > Fix For: 1.5.1 > > Attachments: HBASE-21883.branch-1.001.patch > > > I would like to add new compaction tools based on [~churromorales]'s tool at > HBASE-19528. > We internally have tools that pick and compact regions based on multiple > criteria. Since Rahul already has a version in community, we would like to > build on top of it instead of pushing yet another tool. > With this jira, I would like to add a tool which looks at regions beyond TTL > and compacts them in a rsgroup. We have time series data and those regions > will become dead after a while, so we compact those regions to save disk > space. We also merge those empty regions to reduce load, but that tool comes > later. > Will prep a patch for 2.x once 1.5 gets in. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21883) Enhancements to Major Compaction tool from HBASE-19528
[ https://issues.apache.org/jira/browse/HBASE-21883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-21883: - Attachment: HBASE-21883.branch-1.001.patch > Enhancements to Major Compaction tool from HBASE-19528 > -- > > Key: HBASE-21883 > URL: https://issues.apache.org/jira/browse/HBASE-21883 > Project: HBase > Issue Type: Improvement > Components: Client, Compaction, tooling >Affects Versions: 1.5.0 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Minor > Fix For: 1.5.1 > > Attachments: HBASE-21883.branch-1.001.patch > > > I would like to add new compaction tools based on [~churromorales]'s tool at > HBASE-19528. > We internally have tools that pick and compact regions based on multiple > criteria. Since Rahul already has a version in community, we would like to > build on top of it instead of pushing yet another tool. > With this jira, I would like to add a tool which looks at regions beyond TTL > and compacts them in a rsgroup. We have time series data and those regions > will become dead after a while, so we compact those regions to save disk > space. We also merge those empty regions to reduce load, but that tool comes > later. > Will prep a patch for 2.x once 1.5 gets in. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21883) Enhancements to Major Compaction tool from HBASE-19528
[ https://issues.apache.org/jira/browse/HBASE-21883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-21883: - Description: I would like to add new compaction tools based on [~churromorales]'s tool at HBASE-19528. We internally have tools that pick and compact regions based on multiple criteria. Since Rahul already has a version in community, we would like to build on top of it instead of pushing yet another tool. With this jira, I would like to add a tool which looks at regions beyond TTL and compacts them in a rsgroup. We have time series data and those regions will become dead after a while, so we compact those regions to save disk space. We also merge those empty regions to reduce load, but that tool comes later. Will prep a patch for 2.x once 1.5 gets in. was: I would like to add new compaction tools based on [~churromorales]'s tool at HBASE-19528. We internally have tools that pick and compact regions based on multiple criteria. Since Rahul already has a version in community, we would like to build on top of it instead of pushing yet another tool. With this jira, I would like to add a tool which looks at regions beyond TTL and compacts them in a rsgroup. We have time series data and those regions will become dead after a while, so we compact those regions to save disk space. We also merge those empty regions to reduce load, but that tool comes later. > Enhancements to Major Compaction tool from HBASE-19528 > -- > > Key: HBASE-21883 > URL: https://issues.apache.org/jira/browse/HBASE-21883 > Project: HBase > Issue Type: Improvement > Components: Client, Compaction, tooling >Affects Versions: 1.5.0 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Minor > Fix For: 1.5.1 > > > I would like to add new compaction tools based on [~churromorales]'s tool at > HBASE-19528. > We internally have tools that pick and compact regions based on multiple > criteria. Since Rahul already has a version in community, we would like to > build on top of it instead of pushing yet another tool. > With this jira, I would like to add a tool which looks at regions beyond TTL > and compacts them in a rsgroup. We have time series data and those regions > will become dead after a while, so we compact those regions to save disk > space. We also merge those empty regions to reduce load, but that tool comes > later. > Will prep a patch for 2.x once 1.5 gets in. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21883) Enhancements to Major Compaction tool from HBASE-19528
Thiruvel Thirumoolan created HBASE-21883: Summary: Enhancements to Major Compaction tool from HBASE-19528 Key: HBASE-21883 URL: https://issues.apache.org/jira/browse/HBASE-21883 Project: HBase Issue Type: Improvement Components: Client, Compaction, tooling Affects Versions: 1.5.0 Reporter: Thiruvel Thirumoolan Assignee: Thiruvel Thirumoolan Fix For: 1.5.1 I would like to add new compaction tools based on [~churromorales]'s tool at HBASE-19528. We internally have tools that pick and compact regions based on multiple criteria. Since Rahul already has a version in community, we would like to build on top of it instead of pushing yet another tool. With this jira, I would like to add a tool which looks at regions beyond TTL and compacts them in a rsgroup. We have time series data and those regions will become dead after a while, so we compact those regions to save disk space. We also merge those empty regions to reduce load, but that tool comes later. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20546) Improve perf of RegionLocationFinder.mapHostNameToServerName
[ https://issues.apache.org/jira/browse/HBASE-20546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505127#comment-16505127 ] Thiruvel Thirumoolan commented on HBASE-20546: -- I can move this map to ServerManager, so its always updated. Would that be ok? Since its called from picker, its significant. It came up when I profiled balancer on our cluster setup. > Improve perf of RegionLocationFinder.mapHostNameToServerName > > > Key: HBASE-20546 > URL: https://issues.apache.org/jira/browse/HBASE-20546 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.4.4, 2.0.0 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.5.0, 2.0.1, 1.4.6 > > Attachments: HBASE-20546.branch-1.4.001.patch > > > RegionLocationFinder.getTopBlockLocations() is called multiple times during > balancer. While profiling on a large table balance, mapHostNameToServerName() > seem to take a lot of time. One of the maps is repeatedly created for each > iteration, while we can just initialize it once. > Goes into both branch-1 and branch-2, although patches differ slightly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20643) Getting HDFSBlockDist in Master by querying RegionServers
Thiruvel Thirumoolan created HBASE-20643: Summary: Getting HDFSBlockDist in Master by querying RegionServers Key: HBASE-20643 URL: https://issues.apache.org/jira/browse/HBASE-20643 Project: HBase Issue Type: Improvement Components: Balancer Reporter: Thiruvel Thirumoolan Assignee: Thiruvel Thirumoolan Fix For: 2.1.0, 1.5.0, 1.4.5 Region locality information is needed by the balancer to generate region plans. Computing HDFSBlockDistribution is expensive on larger clusters and adds load to the NameNode. This also needs to be recomputed on a master restart. The proposal is to get the HDFSBlockDistribution from the RegionServers instead of computing it in Master. RS already has this information and we could just reuse it by querying it. RS already passes dataLocality info via RegionLoad today. Proposed Implementation: This is a high-level overview. # A RegionServer API has to be added which will return HDFSBlockDistribution for all the regions it hosts. RS already has this info. Since ClusterStatus has already become bulky and we don’t need updated locality so fast, it’s better to have another API rather than add this to RegionLoad and pass it along with RSReport. # Master will have a Chore to query all RegionServers and will cache the HDFSBlockDistribution for those regions. This is easy and quick. Admins can tune the frequency based on size of the cluster. On a ~90 nodes cluster with 500k regions and a prototype implementation and no load, it took about 5 seconds to get all HDFSBlockDistribution from RS. # The cache will be an extension of RegionLocationFinder (subclass), if needed to keep the implementation simple. Probably will get clear with implementation. # Balancer will use the new cache to get all HDFSBlockDistribution. If there is a new region and Chore didn’t get the block distribution from RS during its previous run, then it will be computed by RegionLocationFinder the same way it has been done now. If the Chore runs more frequently like every hour, then this recomputation will be drastically reduced. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-20548) Master fails to startup on large clusters, refreshing block distribution
[ https://issues.apache.org/jira/browse/HBASE-20548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488631#comment-16488631 ] Thiruvel Thirumoolan edited comment on HBASE-20548 at 5/24/18 8:38 AM: --- Uploaded patch for master. Change was RegionInfo instead of HRegionInfo. Uploaded patch for branch-2.0 as well, if needed. Since HBASE-20545 is not in branch-2.0, patch is slightly different. was (Author: thiruvel): Uploaded patch for master. Change was RegionInfo instead of HRegionInfo. > Master fails to startup on large clusters, refreshing block distribution > > > Key: HBASE-20548 > URL: https://issues.apache.org/jira/browse/HBASE-20548 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.4.4 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.5.0, 2.0.1, 1.4.5 > > Attachments: HBASE-20548.branch-1.4.001.patch, > HBASE-20548.branch-2.0.001.patch, HBASE-20548.master.001.patch > > > On our large clusters with, master has failed to startup within specified > time and aborted itself since it was initializing HDFS block distribution. > Enable table also takes time for larger tables for the same reason. My > proposal is to refresh HDFS block distribution at the end of master > initialization and not at retainAssignment()'s createCluster(). This would > address HBASE-16570's intention, but avoid the problems we ran into. > cc [~aoxiang] [~tedyu] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20548) Master fails to startup on large clusters, refreshing block distribution
[ https://issues.apache.org/jira/browse/HBASE-20548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-20548: - Fix Version/s: 2.0.1 > Master fails to startup on large clusters, refreshing block distribution > > > Key: HBASE-20548 > URL: https://issues.apache.org/jira/browse/HBASE-20548 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.4.4 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.5.0, 2.0.1, 1.4.5 > > Attachments: HBASE-20548.branch-1.4.001.patch, > HBASE-20548.branch-2.0.001.patch, HBASE-20548.master.001.patch > > > On our large clusters with, master has failed to startup within specified > time and aborted itself since it was initializing HDFS block distribution. > Enable table also takes time for larger tables for the same reason. My > proposal is to refresh HDFS block distribution at the end of master > initialization and not at retainAssignment()'s createCluster(). This would > address HBASE-16570's intention, but avoid the problems we ran into. > cc [~aoxiang] [~tedyu] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20548) Master fails to startup on large clusters, refreshing block distribution
[ https://issues.apache.org/jira/browse/HBASE-20548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-20548: - Attachment: HBASE-20548.branch-2.0.001.patch > Master fails to startup on large clusters, refreshing block distribution > > > Key: HBASE-20548 > URL: https://issues.apache.org/jira/browse/HBASE-20548 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.4.4 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.5.0, 2.0.1, 1.4.5 > > Attachments: HBASE-20548.branch-1.4.001.patch, > HBASE-20548.branch-2.0.001.patch, HBASE-20548.master.001.patch > > > On our large clusters with, master has failed to startup within specified > time and aborted itself since it was initializing HDFS block distribution. > Enable table also takes time for larger tables for the same reason. My > proposal is to refresh HDFS block distribution at the end of master > initialization and not at retainAssignment()'s createCluster(). This would > address HBASE-16570's intention, but avoid the problems we ran into. > cc [~aoxiang] [~tedyu] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20548) Master fails to startup on large clusters, refreshing block distribution
[ https://issues.apache.org/jira/browse/HBASE-20548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488631#comment-16488631 ] Thiruvel Thirumoolan commented on HBASE-20548: -- Uploaded patch for master. Change was RegionInfo instead of HRegionInfo. > Master fails to startup on large clusters, refreshing block distribution > > > Key: HBASE-20548 > URL: https://issues.apache.org/jira/browse/HBASE-20548 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.4.4 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.5.0, 1.4.5 > > Attachments: HBASE-20548.branch-1.4.001.patch, > HBASE-20548.master.001.patch > > > On our large clusters with, master has failed to startup within specified > time and aborted itself since it was initializing HDFS block distribution. > Enable table also takes time for larger tables for the same reason. My > proposal is to refresh HDFS block distribution at the end of master > initialization and not at retainAssignment()'s createCluster(). This would > address HBASE-16570's intention, but avoid the problems we ran into. > cc [~aoxiang] [~tedyu] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20548) Master fails to startup on large clusters, refreshing block distribution
[ https://issues.apache.org/jira/browse/HBASE-20548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-20548: - Attachment: HBASE-20548.master.001.patch > Master fails to startup on large clusters, refreshing block distribution > > > Key: HBASE-20548 > URL: https://issues.apache.org/jira/browse/HBASE-20548 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.4.4 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.5.0, 1.4.5 > > Attachments: HBASE-20548.branch-1.4.001.patch, > HBASE-20548.master.001.patch > > > On our large clusters with, master has failed to startup within specified > time and aborted itself since it was initializing HDFS block distribution. > Enable table also takes time for larger tables for the same reason. My > proposal is to refresh HDFS block distribution at the end of master > initialization and not at retainAssignment()'s createCluster(). This would > address HBASE-16570's intention, but avoid the problems we ran into. > cc [~aoxiang] [~tedyu] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20548) Master fails to startup on large clusters, refreshing block distribution
[ https://issues.apache.org/jira/browse/HBASE-20548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-20548: - Status: Patch Available (was: Open) > Master fails to startup on large clusters, refreshing block distribution > > > Key: HBASE-20548 > URL: https://issues.apache.org/jira/browse/HBASE-20548 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.4.4 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.5.0, 1.4.5 > > Attachments: HBASE-20548.branch-1.4.001.patch > > > On our large clusters with, master has failed to startup within specified > time and aborted itself since it was initializing HDFS block distribution. > Enable table also takes time for larger tables for the same reason. My > proposal is to refresh HDFS block distribution at the end of master > initialization and not at retainAssignment()'s createCluster(). This would > address HBASE-16570's intention, but avoid the problems we ran into. > cc [~aoxiang] [~tedyu] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20548) Master fails to startup on large clusters, refreshing block distribution
[ https://issues.apache.org/jira/browse/HBASE-20548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16486573#comment-16486573 ] Thiruvel Thirumoolan commented on HBASE-20548: -- Uploaded a patch that addresses the issue. I added a postMasterInitialize method to LoadBalancer and called that at the end of master initialization, so nothing is blocked. Will kickoff precommit build. > Master fails to startup on large clusters, refreshing block distribution > > > Key: HBASE-20548 > URL: https://issues.apache.org/jira/browse/HBASE-20548 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.4.4 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.5.0, 1.4.5 > > Attachments: HBASE-20548.branch-1.4.001.patch > > > On our large clusters with, master has failed to startup within specified > time and aborted itself since it was initializing HDFS block distribution. > Enable table also takes time for larger tables for the same reason. My > proposal is to refresh HDFS block distribution at the end of master > initialization and not at retainAssignment()'s createCluster(). This would > address HBASE-16570's intention, but avoid the problems we ran into. > cc [~aoxiang] [~tedyu] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20548) Master fails to startup on large clusters, refreshing block distribution
[ https://issues.apache.org/jira/browse/HBASE-20548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-20548: - Attachment: HBASE-20548.branch-1.4.001.patch > Master fails to startup on large clusters, refreshing block distribution > > > Key: HBASE-20548 > URL: https://issues.apache.org/jira/browse/HBASE-20548 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.4.4 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.5.0, 1.4.5 > > Attachments: HBASE-20548.branch-1.4.001.patch > > > On our large clusters with, master has failed to startup within specified > time and aborted itself since it was initializing HDFS block distribution. > Enable table also takes time for larger tables for the same reason. My > proposal is to refresh HDFS block distribution at the end of master > initialization and not at retainAssignment()'s createCluster(). This would > address HBASE-16570's intention, but avoid the problems we ran into. > cc [~aoxiang] [~tedyu] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20545) Improve performance of BaseLoadBalancer.retainAssignment
[ https://issues.apache.org/jira/browse/HBASE-20545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482964#comment-16482964 ] Thiruvel Thirumoolan commented on HBASE-20545: -- Thanks [~apurtell], [~yuzhih...@gmail.com] and [~aoxiang] for reviews. > Improve performance of BaseLoadBalancer.retainAssignment > > > Key: HBASE-20545 > URL: https://issues.apache.org/jira/browse/HBASE-20545 > Project: HBase > Issue Type: Improvement > Components: Balancer >Affects Versions: 1.4.4, 2.0.0 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.5 > > Attachments: HBASE-20545.branch-1.4.001.patch, > HBASE-20545.branch-1.4.002.patch, HBASE-20545.branch-2.001.patch > > > I was measuring perf at scale with a 1m region table and noticed some > improvements can be made to BaseLoadBalancer.retainAssignment(). > retainAssignment() spends a few mins to enable a 1m regions table and also > generates a lot of objects unnecessarily. This jira is to make the most > common case go faster with very minimal changes. A slightly modified version > of this patch takes about 5 seconds for a 1m region table ignoring the time > spent in createCluster(). I think locality can be refreshed during master > startup in different ways without taking time in retainAssignment, but will > follow up on that in subsequent jiras. Leaving it untouched here, but wanted > to call out the time taken without that method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20545) Improve performance of BaseLoadBalancer.retainAssignment
[ https://issues.apache.org/jira/browse/HBASE-20545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472831#comment-16472831 ] Thiruvel Thirumoolan commented on HBASE-20545: -- Doesn't look the failure is related to my patch, have resubmitted 002 patch to trigger the build again. > Improve performance of BaseLoadBalancer.retainAssignment > > > Key: HBASE-20545 > URL: https://issues.apache.org/jira/browse/HBASE-20545 > Project: HBase > Issue Type: Improvement > Components: Balancer >Affects Versions: 1.4.4, 2.0.0 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 3.0.0, 2.1.0, 1.5.0, 1.4.5 > > Attachments: HBASE-20545.branch-1.4.001.patch, > HBASE-20545.branch-1.4.002.patch, HBASE-20545.branch-2.001.patch > > > I was measuring perf at scale with a 1m region table and noticed some > improvements can be made to BaseLoadBalancer.retainAssignment(). > retainAssignment() spends a few mins to enable a 1m regions table and also > generates a lot of objects unnecessarily. This jira is to make the most > common case go faster with very minimal changes. A slightly modified version > of this patch takes about 5 seconds for a 1m region table ignoring the time > spent in createCluster(). I think locality can be refreshed during master > startup in different ways without taking time in retainAssignment, but will > follow up on that in subsequent jiras. Leaving it untouched here, but wanted > to call out the time taken without that method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20545) Improve performance of BaseLoadBalancer.retainAssignment
[ https://issues.apache.org/jira/browse/HBASE-20545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-20545: - Attachment: HBASE-20545.branch-1.4.002.patch > Improve performance of BaseLoadBalancer.retainAssignment > > > Key: HBASE-20545 > URL: https://issues.apache.org/jira/browse/HBASE-20545 > Project: HBase > Issue Type: Improvement > Components: Balancer >Affects Versions: 1.4.4, 2.0.0 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 3.0.0, 2.1.0, 1.5.0, 1.4.5 > > Attachments: HBASE-20545.branch-1.4.001.patch, > HBASE-20545.branch-1.4.002.patch, HBASE-20545.branch-2.001.patch > > > I was measuring perf at scale with a 1m region table and noticed some > improvements can be made to BaseLoadBalancer.retainAssignment(). > retainAssignment() spends a few mins to enable a 1m regions table and also > generates a lot of objects unnecessarily. This jira is to make the most > common case go faster with very minimal changes. A slightly modified version > of this patch takes about 5 seconds for a 1m region table ignoring the time > spent in createCluster(). I think locality can be refreshed during master > startup in different ways without taking time in retainAssignment, but will > follow up on that in subsequent jiras. Leaving it untouched here, but wanted > to call out the time taken without that method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20546) Improve perf of RegionLocationFinder.mapHostNameToServerName
[ https://issues.apache.org/jira/browse/HBASE-20546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16471173#comment-16471173 ] Thiruvel Thirumoolan commented on HBASE-20546: -- Thanks [~chia7712]. Good point. It looks like ClusterStatusChore will be updating clusterStatus. Its intention is to make StochasticBalancer better by updating regionload. Not sure why regionFinder's clusterstatus needs to be updated. I think we can have a setClusterStatus and updateClusterStatus API, so ClusterStatusChore can use the latter. I am not sure if updating clusterstatus or re-initializing the hostserver map I introduce in regionfinder in the middle of balance is a good idea. What do you think? > Improve perf of RegionLocationFinder.mapHostNameToServerName > > > Key: HBASE-20546 > URL: https://issues.apache.org/jira/browse/HBASE-20546 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.4.4, 2.0.0 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.5.0, 2.0.1, 1.4.5 > > Attachments: HBASE-20546.branch-1.4.001.patch > > > RegionLocationFinder.getTopBlockLocations() is called multiple times during > balancer. While profiling on a large table balance, mapHostNameToServerName() > seem to take a lot of time. One of the maps is repeatedly created for each > iteration, while we can just initialize it once. > Goes into both branch-1 and branch-2, although patches differ slightly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20546) Improve perf of RegionLocationFinder.mapHostNameToServerName
[ https://issues.apache.org/jira/browse/HBASE-20546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-20546: - Description: RegionLocationFinder.getTopBlockLocations() is called multiple times during balancer. While profiling on a large table balance, mapHostNameToServerName() seem to take a lot of time. One of the maps is repeatedly created for each iteration, while we can just initialize it once. Goes into both branch-1 and branch-2, although patches differ slightly. was: RegionLocationFinder.getTopBlockLocations() is called multiple times during balancer. While profiling on a large table balance, mapHostNameToServerName() seem to take a lot of time. One of the maps is repeatedly consumed for each iteration, while we can just initialize it once. Goes into both branch-1 and branch-2, although patches differ slightly. > Improve perf of RegionLocationFinder.mapHostNameToServerName > > > Key: HBASE-20546 > URL: https://issues.apache.org/jira/browse/HBASE-20546 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.4.4, 2.0.0 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.5.0, 2.0.1, 1.4.5 > > Attachments: HBASE-20546.branch-1.4.001.patch > > > RegionLocationFinder.getTopBlockLocations() is called multiple times during > balancer. While profiling on a large table balance, mapHostNameToServerName() > seem to take a lot of time. One of the maps is repeatedly created for each > iteration, while we can just initialize it once. > Goes into both branch-1 and branch-2, although patches differ slightly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20545) Improve performance of BaseLoadBalancer.retainAssignment
[ https://issues.apache.org/jira/browse/HBASE-20545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16470829#comment-16470829 ] Thiruvel Thirumoolan commented on HBASE-20545: -- [~yuzhih...@gmail.com], Looks like the build you triggered passed. Since existing tests were sufficient, I didn't add any. Can we also get this into branch-2? > Improve performance of BaseLoadBalancer.retainAssignment > > > Key: HBASE-20545 > URL: https://issues.apache.org/jira/browse/HBASE-20545 > Project: HBase > Issue Type: Improvement > Components: Balancer >Affects Versions: 1.4.4, 2.0.0 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 3.0.0, 1.5.0, 1.4.5 > > Attachments: HBASE-20545.branch-1.4.001.patch, > HBASE-20545.branch-2.001.patch > > > I was measuring perf at scale with a 1m region table and noticed some > improvements can be made to BaseLoadBalancer.retainAssignment(). > retainAssignment() spends a few mins to enable a 1m regions table and also > generates a lot of objects unnecessarily. This jira is to make the most > common case go faster with very minimal changes. A slightly modified version > of this patch takes about 5 seconds for a 1m region table ignoring the time > spent in createCluster(). I think locality can be refreshed during master > startup in different ways without taking time in retainAssignment, but will > follow up on that in subsequent jiras. Leaving it untouched here, but wanted > to call out the time taken without that method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20545) Improve performance of BaseLoadBalancer.retainAssignment
[ https://issues.apache.org/jira/browse/HBASE-20545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16469292#comment-16469292 ] Thiruvel Thirumoolan commented on HBASE-20545: -- Uploaded patch for master branch. The only change from branch-1 patch is RegionInfo instead of HRegionInfo. > Improve performance of BaseLoadBalancer.retainAssignment > > > Key: HBASE-20545 > URL: https://issues.apache.org/jira/browse/HBASE-20545 > Project: HBase > Issue Type: Improvement > Components: Balancer >Affects Versions: 1.4.4, 2.0.0 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.5.0, 2.0.1, 1.4.5 > > Attachments: HBASE-20545.branch-1.4.001.patch, > HBASE-20545.branch-2.001.patch > > > I was measuring perf at scale with a 1m region table and noticed some > improvements can be made to BaseLoadBalancer.retainAssignment(). > retainAssignment() spends a few mins to enable a 1m regions table and also > generates a lot of objects unnecessarily. This jira is to make the most > common case go faster with very minimal changes. A slightly modified version > of this patch takes about 5 seconds for a 1m region table ignoring the time > spent in createCluster(). I think locality can be refreshed during master > startup in different ways without taking time in retainAssignment, but will > follow up on that in subsequent jiras. Leaving it untouched here, but wanted > to call out the time taken without that method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20545) Improve performance of BaseLoadBalancer.retainAssignment
[ https://issues.apache.org/jira/browse/HBASE-20545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-20545: - Attachment: HBASE-20545.branch-2.001.patch > Improve performance of BaseLoadBalancer.retainAssignment > > > Key: HBASE-20545 > URL: https://issues.apache.org/jira/browse/HBASE-20545 > Project: HBase > Issue Type: Improvement > Components: Balancer >Affects Versions: 1.4.4, 2.0.0 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.5.0, 2.0.1, 1.4.5 > > Attachments: HBASE-20545.branch-1.4.001.patch, > HBASE-20545.branch-2.001.patch > > > I was measuring perf at scale with a 1m region table and noticed some > improvements can be made to BaseLoadBalancer.retainAssignment(). > retainAssignment() spends a few mins to enable a 1m regions table and also > generates a lot of objects unnecessarily. This jira is to make the most > common case go faster with very minimal changes. A slightly modified version > of this patch takes about 5 seconds for a 1m region table ignoring the time > spent in createCluster(). I think locality can be refreshed during master > startup in different ways without taking time in retainAssignment, but will > follow up on that in subsequent jiras. Leaving it untouched here, but wanted > to call out the time taken without that method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20545) Improve performance of BaseLoadBalancer.retainAssignment
[ https://issues.apache.org/jira/browse/HBASE-20545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16469222#comment-16469222 ] Thiruvel Thirumoolan commented on HBASE-20545: -- Linking HBASE-20548 for improvements to locality refresh during master startup. > Improve performance of BaseLoadBalancer.retainAssignment > > > Key: HBASE-20545 > URL: https://issues.apache.org/jira/browse/HBASE-20545 > Project: HBase > Issue Type: Improvement > Components: Balancer >Affects Versions: 1.4.4, 2.0.0 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.5.0, 2.0.1, 1.4.5 > > Attachments: HBASE-20545.branch-1.4.001.patch > > > I was measuring perf at scale with a 1m region table and noticed some > improvements can be made to BaseLoadBalancer.retainAssignment(). > retainAssignment() spends a few mins to enable a 1m regions table and also > generates a lot of objects unnecessarily. This jira is to make the most > common case go faster with very minimal changes. A slightly modified version > of this patch takes about 5 seconds for a 1m region table ignoring the time > spent in createCluster(). I think locality can be refreshed during master > startup in different ways without taking time in retainAssignment, but will > follow up on that in subsequent jiras. Leaving it untouched here, but wanted > to call out the time taken without that method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20548) Master fails to startup on large clusters, refreshing block distribution
Thiruvel Thirumoolan created HBASE-20548: Summary: Master fails to startup on large clusters, refreshing block distribution Key: HBASE-20548 URL: https://issues.apache.org/jira/browse/HBASE-20548 Project: HBase Issue Type: Improvement Affects Versions: 1.4.4 Reporter: Thiruvel Thirumoolan Assignee: Thiruvel Thirumoolan Fix For: 1.5.0, 1.4.5 On our large clusters with, master has failed to startup within specified time and aborted itself since it was initializing HDFS block distribution. Enable table also takes time for larger tables for the same reason. My proposal is to refresh HDFS block distribution at the end of master initialization and not at retainAssignment()'s createCluster(). This would address HBASE-16570's intention, but avoid the problems we ran into. cc [~aoxiang] [~tedyu] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20546) Improve perf of RegionLocationFinder.mapHostNameToServerName
[ https://issues.apache.org/jira/browse/HBASE-20546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-20546: - Status: Patch Available (was: Open) Will submit patch for branch-2 once the branch-1 patch gets in. > Improve perf of RegionLocationFinder.mapHostNameToServerName > > > Key: HBASE-20546 > URL: https://issues.apache.org/jira/browse/HBASE-20546 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0, 1.4.4 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.5.0, 2.0.1, 1.4.5 > > Attachments: HBASE-20546.branch-1.4.001.patch > > > RegionLocationFinder.getTopBlockLocations() is called multiple times during > balancer. While profiling on a large table balance, mapHostNameToServerName() > seem to take a lot of time. One of the maps is repeatedly consumed for each > iteration, while we can just initialize it once. > Goes into both branch-1 and branch-2, although patches differ slightly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20546) Improve perf of RegionLocationFinder.mapHostNameToServerName
[ https://issues.apache.org/jira/browse/HBASE-20546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-20546: - Attachment: HBASE-20546.branch-1.4.001.patch > Improve perf of RegionLocationFinder.mapHostNameToServerName > > > Key: HBASE-20546 > URL: https://issues.apache.org/jira/browse/HBASE-20546 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.4.4, 2.0.0 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.5.0, 2.0.1, 1.4.5 > > Attachments: HBASE-20546.branch-1.4.001.patch > > > RegionLocationFinder.getTopBlockLocations() is called multiple times during > balancer. While profiling on a large table balance, mapHostNameToServerName() > seem to take a lot of time. One of the maps is repeatedly consumed for each > iteration, while we can just initialize it once. > Goes into both branch-1 and branch-2, although patches differ slightly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20546) Improve perf of RegionLocationFinder.mapHostNameToServerName
Thiruvel Thirumoolan created HBASE-20546: Summary: Improve perf of RegionLocationFinder.mapHostNameToServerName Key: HBASE-20546 URL: https://issues.apache.org/jira/browse/HBASE-20546 Project: HBase Issue Type: Improvement Affects Versions: 2.0.0, 1.4.4 Reporter: Thiruvel Thirumoolan Assignee: Thiruvel Thirumoolan Fix For: 1.5.0, 2.0.1, 1.4.5 RegionLocationFinder.getTopBlockLocations() is called multiple times during balancer. While profiling on a large table balance, mapHostNameToServerName() seem to take a lot of time. One of the maps is repeatedly consumed for each iteration, while we can just initialize it once. Goes into both branch-1 and branch-2, although patches differ slightly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20545) Improve performance of BaseLoadBalancer.retainAssignment
[ https://issues.apache.org/jira/browse/HBASE-20545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-20545: - Status: Patch Available (was: Open) Once branch-1 patch gets in, I will submit patch for 2.x, should be very similar. > Improve performance of BaseLoadBalancer.retainAssignment > > > Key: HBASE-20545 > URL: https://issues.apache.org/jira/browse/HBASE-20545 > Project: HBase > Issue Type: Improvement > Components: Balancer >Affects Versions: 2.0.0, 1.4.4 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.5.0, 2.0.1, 1.4.5 > > Attachments: HBASE-20545.branch-1.4.001.patch > > > I was measuring perf at scale with a 1m region table and noticed some > improvements can be made to BaseLoadBalancer.retainAssignment(). > retainAssignment() spends a few mins to enable a 1m regions table and also > generates a lot of objects unnecessarily. This jira is to make the most > common case go faster with very minimal changes. A slightly modified version > of this patch takes about 5 seconds for a 1m region table ignoring the time > spent in createCluster(). I think locality can be refreshed during master > startup in different ways without taking time in retainAssignment, but will > follow up on that in subsequent jiras. Leaving it untouched here, but wanted > to call out the time taken without that method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20545) Improve performance of BaseLoadBalancer.retainAssignment
[ https://issues.apache.org/jira/browse/HBASE-20545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16467984#comment-16467984 ] Thiruvel Thirumoolan commented on HBASE-20545: -- Uploaded patch for 1.4, will also upload one for branch 2.x. The approach is that - for most scenarios, the servers are the same when retainAssignment is called. But we unnecessarily create Cluster object and populate it for every region. At scale, this times time (array copies in Cluster.doAssignRegion) and also lot of garbage. Since thats not common case, I moved out that to a separate loop and only initialize them when random assignment is required. So in the worst case, we still take that time, but for common scenarios, its much faster and less memory garbage. We can make the worst case scenario also faster, but thats later. > Improve performance of BaseLoadBalancer.retainAssignment > > > Key: HBASE-20545 > URL: https://issues.apache.org/jira/browse/HBASE-20545 > Project: HBase > Issue Type: Improvement > Components: Balancer >Affects Versions: 1.4.4, 2.0.0 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.5.0, 2.0.1, 1.4.5 > > Attachments: HBASE-20545.branch-1.4.001.patch > > > I was measuring perf at scale with a 1m region table and noticed some > improvements can be made to BaseLoadBalancer.retainAssignment(). > retainAssignment() spends a few mins to enable a 1m regions table and also > generates a lot of objects unnecessarily. This jira is to make the most > common case go faster with very minimal changes. A slightly modified version > of this patch takes about 5 seconds for a 1m region table ignoring the time > spent in createCluster(). I think locality can be refreshed during master > startup in different ways without taking time in retainAssignment, but will > follow up on that in subsequent jiras. Leaving it untouched here, but wanted > to call out the time taken without that method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20545) Improve performance of BaseLoadBalancer.retainAssignment
[ https://issues.apache.org/jira/browse/HBASE-20545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-20545: - Attachment: HBASE-20545.branch-1.4.001.patch > Improve performance of BaseLoadBalancer.retainAssignment > > > Key: HBASE-20545 > URL: https://issues.apache.org/jira/browse/HBASE-20545 > Project: HBase > Issue Type: Improvement > Components: Balancer >Affects Versions: 1.4.4, 2.0.0 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.5.0, 2.0.1, 1.4.5 > > Attachments: HBASE-20545.branch-1.4.001.patch > > > I was measuring perf at scale with a 1m region table and noticed some > improvements can be made to BaseLoadBalancer.retainAssignment(). > retainAssignment() spends a few mins to enable a 1m regions table and also > generates a lot of objects unnecessarily. This jira is to make the most > common case go faster with very minimal changes. A slightly modified version > of this patch takes about 5 seconds for a 1m region table ignoring the time > spent in createCluster(). I think locality can be refreshed during master > startup in different ways without taking time in retainAssignment, but will > follow up on that in subsequent jiras. Leaving it untouched here, but wanted > to call out the time taken without that method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20545) Improve performance of BaseLoadBalancer.retainAssignment
Thiruvel Thirumoolan created HBASE-20545: Summary: Improve performance of BaseLoadBalancer.retainAssignment Key: HBASE-20545 URL: https://issues.apache.org/jira/browse/HBASE-20545 Project: HBase Issue Type: Improvement Components: Balancer Affects Versions: 2.0.0, 1.4.4 Reporter: Thiruvel Thirumoolan Assignee: Thiruvel Thirumoolan Fix For: 1.5.0, 2.0.1, 1.4.5 I was measuring perf at scale with a 1m region table and noticed some improvements can be made to BaseLoadBalancer.retainAssignment(). retainAssignment() spends a few mins to enable a 1m regions table and also generates a lot of objects unnecessarily. This jira is to make the most common case go faster with very minimal changes. A slightly modified version of this patch takes about 5 seconds for a 1m region table ignoring the time spent in createCluster(). I think locality can be refreshed during master startup in different ways without taking time in retainAssignment, but will follow up on that in subsequent jiras. Leaving it untouched here, but wanted to call out the time taken without that method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20322) CME in StoreScanner causes region server crash
[ https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431034#comment-16431034 ] Thiruvel Thirumoolan commented on HBASE-20322: -- [~mdrob] - I raised HBASE-20373 to confirm that and fix it. Will get to it sometime this week or next. > CME in StoreScanner causes region server crash > -- > > Key: HBASE-20322 > URL: https://issues.apache.org/jira/browse/HBASE-20322 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.2 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.5.0, 1.3.3, 1.4.4 > > Attachments: HBASE-20322.branch-1.3.001.patch, > HBASE-20322.branch-1.3.002-addendum.patch, HBASE-20322.branch-1.3.002.patch, > HBASE-20322.branch-1.4.001.patch > > > RS crashed with ConcurrentModificationException on our 1.3 cluster, stack > trace below. [~toffer] and I checked and there is a race condition between > flush and scanner close. When StoreScanner.updateReaders() is updating the > scanners after a newly flushed file (in this trace below a region close > during a split), the client's scanner could be closing thus causing CME. > Its rare, but since it crashes the region server, needs to be fixed. > FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server > : Replay of WAL required. Forcing server shutdown > org.apache.hadoop.hbase.DroppedSnapshotException: region: > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207) > at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501) > at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566) > at > org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82) > at > org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) > at java.util.ArrayList$Itr.next(ArrayList.java:851) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825) > at > org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155) > PS: ignore the line no in the above stack trace, method calls should help > understand whats happening. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20373) Check and forward port HBASE-20322 (RS crash due to CME in StoreScanner)
Thiruvel Thirumoolan created HBASE-20373: Summary: Check and forward port HBASE-20322 (RS crash due to CME in StoreScanner) Key: HBASE-20373 URL: https://issues.apache.org/jira/browse/HBASE-20373 Project: HBase Issue Type: Bug Reporter: Thiruvel Thirumoolan Assignee: Thiruvel Thirumoolan I think the same problem that causes HBASE-20322 in 1.x exists in branch-2 also. This jira is to confirm that and fix it if required. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20322) CME in StoreScanner causes region server crash
[ https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424475#comment-16424475 ] Thiruvel Thirumoolan commented on HBASE-20322: -- [~apurtell] - There is a small bug in patch which causes TestAtomicOperation to fail. I see the bug at-least once when I run TestAtomicOperation 25 times. Pls let me know if you would like me to follow up on a separate Jira. I have attached the addendum along. > CME in StoreScanner causes region server crash > -- > > Key: HBASE-20322 > URL: https://issues.apache.org/jira/browse/HBASE-20322 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.2 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.5.0, 1.3.3, 1.4.4 > > Attachments: HBASE-20322.branch-1.3.001.patch, > HBASE-20322.branch-1.3.002-addendum.patch, HBASE-20322.branch-1.3.002.patch, > HBASE-20322.branch-1.4.001.patch > > > RS crashed with ConcurrentModificationException on our 1.3 cluster, stack > trace below. [~toffer] and I checked and there is a race condition between > flush and scanner close. When StoreScanner.updateReaders() is updating the > scanners after a newly flushed file (in this trace below a region close > during a split), the client's scanner could be closing thus causing CME. > Its rare, but since it crashes the region server, needs to be fixed. > FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server > : Replay of WAL required. Forcing server shutdown > org.apache.hadoop.hbase.DroppedSnapshotException: region: > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207) > at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501) > at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566) > at > org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82) > at > org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) > at java.util.ArrayList$Itr.next(ArrayList.java:851) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825) > at > org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155) > PS: ignore the line no in the above stack trace, method calls should help > understand whats happening. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20322) CME in StoreScanner causes region server crash
[ https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-20322: - Attachment: HBASE-20322.branch-1.3.002-addendum.patch > CME in StoreScanner causes region server crash > -- > > Key: HBASE-20322 > URL: https://issues.apache.org/jira/browse/HBASE-20322 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.2 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.5.0, 1.3.3, 1.4.4 > > Attachments: HBASE-20322.branch-1.3.001.patch, > HBASE-20322.branch-1.3.002-addendum.patch, HBASE-20322.branch-1.3.002.patch, > HBASE-20322.branch-1.4.001.patch > > > RS crashed with ConcurrentModificationException on our 1.3 cluster, stack > trace below. [~toffer] and I checked and there is a race condition between > flush and scanner close. When StoreScanner.updateReaders() is updating the > scanners after a newly flushed file (in this trace below a region close > during a split), the client's scanner could be closing thus causing CME. > Its rare, but since it crashes the region server, needs to be fixed. > FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server > : Replay of WAL required. Forcing server shutdown > org.apache.hadoop.hbase.DroppedSnapshotException: region: > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207) > at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501) > at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566) > at > org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82) > at > org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) > at java.util.ArrayList$Itr.next(ArrayList.java:851) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825) > at > org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155) > PS: ignore the line no in the above stack trace, method calls should help > understand whats happening. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HBASE-11288) Splittable Meta
[ https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan reassigned HBASE-11288: Assignee: Francis Liu (was: Thiruvel Thirumoolan) > Splittable Meta > --- > > Key: HBASE-11288 > URL: https://issues.apache.org/jira/browse/HBASE-11288 > Project: HBase > Issue Type: Sub-task >Reporter: Francis Liu >Assignee: Francis Liu >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HBASE-11288) Splittable Meta
[ https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan reassigned HBASE-11288: Assignee: Thiruvel Thirumoolan (was: Francis Liu) > Splittable Meta > --- > > Key: HBASE-11288 > URL: https://issues.apache.org/jira/browse/HBASE-11288 > Project: HBase > Issue Type: Sub-task >Reporter: Francis Liu >Assignee: Thiruvel Thirumoolan >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20322) CME in StoreScanner causes region server crash
[ https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422869#comment-16422869 ] Thiruvel Thirumoolan commented on HBASE-20322: -- Thanks [~apurtell] [~yuzhih...@gmail.com] for reviews. [~apurtell] - I responded to your question, pls let me know if it looks ok. I have re-uploaded same patch for precommit to run, last failure wasn't related to the patch. > CME in StoreScanner causes region server crash > -- > > Key: HBASE-20322 > URL: https://issues.apache.org/jira/browse/HBASE-20322 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.2 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.5.0, 1.3.3, 1.4.4 > > Attachments: HBASE-20322.branch-1.3.001.patch, > HBASE-20322.branch-1.3.002.patch, HBASE-20322.branch-1.4.001.patch > > > RS crashed with ConcurrentModificationException on our 1.3 cluster, stack > trace below. [~toffer] and I checked and there is a race condition between > flush and scanner close. When StoreScanner.updateReaders() is updating the > scanners after a newly flushed file (in this trace below a region close > during a split), the client's scanner could be closing thus causing CME. > Its rare, but since it crashes the region server, needs to be fixed. > FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server > : Replay of WAL required. Forcing server shutdown > org.apache.hadoop.hbase.DroppedSnapshotException: region: > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207) > at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501) > at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566) > at > org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82) > at > org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) > at java.util.ArrayList$Itr.next(ArrayList.java:851) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825) > at > org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155) > PS: ignore the line no in the above stack trace, method calls should help > understand whats happening. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20322) CME in StoreScanner causes region server crash
[ https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-20322: - Attachment: HBASE-20322.branch-1.3.002.patch > CME in StoreScanner causes region server crash > -- > > Key: HBASE-20322 > URL: https://issues.apache.org/jira/browse/HBASE-20322 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.2 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.5.0, 1.3.3, 1.4.4 > > Attachments: HBASE-20322.branch-1.3.001.patch, > HBASE-20322.branch-1.3.002.patch, HBASE-20322.branch-1.4.001.patch > > > RS crashed with ConcurrentModificationException on our 1.3 cluster, stack > trace below. [~toffer] and I checked and there is a race condition between > flush and scanner close. When StoreScanner.updateReaders() is updating the > scanners after a newly flushed file (in this trace below a region close > during a split), the client's scanner could be closing thus causing CME. > Its rare, but since it crashes the region server, needs to be fixed. > FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server > : Replay of WAL required. Forcing server shutdown > org.apache.hadoop.hbase.DroppedSnapshotException: region: > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207) > at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501) > at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566) > at > org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82) > at > org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) > at java.util.ArrayList$Itr.next(ArrayList.java:851) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825) > at > org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155) > PS: ignore the line no in the above stack trace, method calls should help > understand whats happening. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20322) CME in StoreScanner causes region server crash
[ https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-20322: - Status: Patch Available (was: Open) Uploaded patches for 1.3 and 1.4. Kicking off pre-commit builds. Thanks to [~toffer] for the internal reviews. Will check 2.x next week. > CME in StoreScanner causes region server crash > -- > > Key: HBASE-20322 > URL: https://issues.apache.org/jira/browse/HBASE-20322 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.2 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.5.0, 1.3.3, 1.4.4 > > Attachments: HBASE-20322.branch-1.3.001.patch, > HBASE-20322.branch-1.4.001.patch > > > RS crashed with ConcurrentModificationException on our 1.3 cluster, stack > trace below. [~toffer] and I checked and there is a race condition between > flush and scanner close. When StoreScanner.updateReaders() is updating the > scanners after a newly flushed file (in this trace below a region close > during a split), the client's scanner could be closing thus causing CME. > Its rare, but since it crashes the region server, needs to be fixed. > FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server > : Replay of WAL required. Forcing server shutdown > org.apache.hadoop.hbase.DroppedSnapshotException: region: > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207) > at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501) > at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566) > at > org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82) > at > org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) > at java.util.ArrayList$Itr.next(ArrayList.java:851) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825) > at > org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155) > PS: ignore the line no in the above stack trace, method calls should help > understand whats happening. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20322) CME in StoreScanner causes region server crash
[ https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-20322: - Attachment: HBASE-20322.branch-1.4.001.patch > CME in StoreScanner causes region server crash > -- > > Key: HBASE-20322 > URL: https://issues.apache.org/jira/browse/HBASE-20322 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.2 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.5.0, 1.3.3, 1.4.4 > > Attachments: HBASE-20322.branch-1.3.001.patch, > HBASE-20322.branch-1.4.001.patch > > > RS crashed with ConcurrentModificationException on our 1.3 cluster, stack > trace below. [~toffer] and I checked and there is a race condition between > flush and scanner close. When StoreScanner.updateReaders() is updating the > scanners after a newly flushed file (in this trace below a region close > during a split), the client's scanner could be closing thus causing CME. > Its rare, but since it crashes the region server, needs to be fixed. > FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server > : Replay of WAL required. Forcing server shutdown > org.apache.hadoop.hbase.DroppedSnapshotException: region: > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207) > at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501) > at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566) > at > org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82) > at > org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) > at java.util.ArrayList$Itr.next(ArrayList.java:851) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825) > at > org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155) > PS: ignore the line no in the above stack trace, method calls should help > understand whats happening. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20322) CME in StoreScanner causes region server crash
[ https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-20322: - Fix Version/s: 1.4.4 1.5.0 > CME in StoreScanner causes region server crash > -- > > Key: HBASE-20322 > URL: https://issues.apache.org/jira/browse/HBASE-20322 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.2 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.5.0, 1.3.3, 1.4.4 > > Attachments: HBASE-20322.branch-1.3.001.patch > > > RS crashed with ConcurrentModificationException on our 1.3 cluster, stack > trace below. [~toffer] and I checked and there is a race condition between > flush and scanner close. When StoreScanner.updateReaders() is updating the > scanners after a newly flushed file (in this trace below a region close > during a split), the client's scanner could be closing thus causing CME. > Its rare, but since it crashes the region server, needs to be fixed. > FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server > : Replay of WAL required. Forcing server shutdown > org.apache.hadoop.hbase.DroppedSnapshotException: region: > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207) > at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501) > at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566) > at > org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82) > at > org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) > at java.util.ArrayList$Itr.next(ArrayList.java:851) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825) > at > org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155) > PS: ignore the line no in the above stack trace, method calls should help > understand whats happening. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20322) CME in StoreScanner causes region server crash
[ https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-20322: - Attachment: HBASE-20322.branch-1.3.001.patch > CME in StoreScanner causes region server crash > -- > > Key: HBASE-20322 > URL: https://issues.apache.org/jira/browse/HBASE-20322 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.2 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.3.3 > > Attachments: HBASE-20322.branch-1.3.001.patch > > > RS crashed with ConcurrentModificationException on our 1.3 cluster, stack > trace below. [~toffer] and I checked and there is a race condition between > flush and scanner close. When StoreScanner.updateReaders() is updating the > scanners after a newly flushed file (in this trace below a region close > during a split), the client's scanner could be closing thus causing CME. > Its rare, but since it crashes the region server, needs to be fixed. > FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server > : Replay of WAL required. Forcing server shutdown > org.apache.hadoop.hbase.DroppedSnapshotException: region: > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207) > at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501) > at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566) > at > org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82) > at > org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) > at java.util.ArrayList$Itr.next(ArrayList.java:851) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825) > at > org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155) > PS: ignore the line no in the above stack trace, method calls should help > understand whats happening. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20322) CME in StoreScanner causes region server crash
Thiruvel Thirumoolan created HBASE-20322: Summary: CME in StoreScanner causes region server crash Key: HBASE-20322 URL: https://issues.apache.org/jira/browse/HBASE-20322 Project: HBase Issue Type: Bug Affects Versions: 1.3.2 Reporter: Thiruvel Thirumoolan RS crashed with ConcurrentModificationException on our 1.3 cluster, stack trace below. [~toffer] and I checked and there is a race condition between flush and scanner close. When StoreScanner.updateReaders() is updating the scanners after a newly flushed file (in this trace below a region close during a split), the client's scanner could be closing thus causing CME. Its rare, but since it crashes the region server, needs to be fixed. FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server : Replay of WAL required. Forcing server shutdown org.apache.hadoop.hbase.DroppedSnapshotException: region: at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207) at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420) at org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398) at org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278) at org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566) at org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82) at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.ConcurrentModificationException at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) at java.util.ArrayList$Itr.next(ArrayList.java:851) at org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797) at org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825) at org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155) PS: ignore the line no in the above stack trace, method calls should help understand whats happening. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20322) CME in StoreScanner causes region server crash
[ https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-20322: - Fix Version/s: 1.3.3 > CME in StoreScanner causes region server crash > -- > > Key: HBASE-20322 > URL: https://issues.apache.org/jira/browse/HBASE-20322 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.2 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.3.3 > > > RS crashed with ConcurrentModificationException on our 1.3 cluster, stack > trace below. [~toffer] and I checked and there is a race condition between > flush and scanner close. When StoreScanner.updateReaders() is updating the > scanners after a newly flushed file (in this trace below a region close > during a split), the client's scanner could be closing thus causing CME. > Its rare, but since it crashes the region server, needs to be fixed. > FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server > : Replay of WAL required. Forcing server shutdown > org.apache.hadoop.hbase.DroppedSnapshotException: region: > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207) > at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501) > at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566) > at > org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82) > at > org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) > at java.util.ArrayList$Itr.next(ArrayList.java:851) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825) > at > org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155) > PS: ignore the line no in the above stack trace, method calls should help > understand whats happening. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HBASE-20322) CME in StoreScanner causes region server crash
[ https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan reassigned HBASE-20322: Assignee: Thiruvel Thirumoolan > CME in StoreScanner causes region server crash > -- > > Key: HBASE-20322 > URL: https://issues.apache.org/jira/browse/HBASE-20322 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.2 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.3.3 > > > RS crashed with ConcurrentModificationException on our 1.3 cluster, stack > trace below. [~toffer] and I checked and there is a race condition between > flush and scanner close. When StoreScanner.updateReaders() is updating the > scanners after a newly flushed file (in this trace below a region close > during a split), the client's scanner could be closing thus causing CME. > Its rare, but since it crashes the region server, needs to be fixed. > FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server > : Replay of WAL required. Forcing server shutdown > org.apache.hadoop.hbase.DroppedSnapshotException: region: > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207) > at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501) > at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278) > at > org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566) > at > org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82) > at > org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) > at java.util.ArrayList$Itr.next(ArrayList.java:851) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825) > at > org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155) > PS: ignore the line no in the above stack trace, method calls should help > understand whats happening. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-15533) Add RSGroup Favored Balancer
[ https://issues.apache.org/jira/browse/HBASE-15533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384007#comment-16384007 ] Thiruvel Thirumoolan commented on HBASE-15533: -- [~stack], We moved to 1.3 and that's keeping me busy. Will get back to this, hopefully in a month or two. > Add RSGroup Favored Balancer > > > Key: HBASE-15533 > URL: https://issues.apache.org/jira/browse/HBASE-15533 > Project: HBase > Issue Type: Sub-task > Components: FavoredNodes >Reporter: Francis Liu >Assignee: Thiruvel Thirumoolan >Priority: Major > Attachments: HBASE-15533.master.001.patch, > HBASE-15533.master.002.patch, HBASE-15533.patch, HBASE-15533.rough.draft.patch > > > HBASE-16942 added favored stochastic load balancer so we can pick and choose > nodes to assign based on the favored nodes and load/locality. The intention > of this jira is to add a group based load balancer on top of the favored > stochastic balancer. This will ensure splits/merges will only use favored > nodes from that group and will inherit from the parents appropriately. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
[ https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379124#comment-16379124 ] Thiruvel Thirumoolan commented on HBASE-20001: -- Thanks [~chia7712] and [~yuzhih...@gmail.com] for the reviews. > cleanIfNoMetaEntry() uses encoded instead of region name to lookup region > - > > Key: HBASE-20001 > URL: https://issues.apache.org/jira/browse/HBASE-20001 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7 >Reporter: Francis Liu >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.3.2, 1.5.0, 1.2.7, 1.4.3 > > Attachments: HBASE-20001.branch-1.2.001.patch, > HBASE-20001.branch-1.3.001.patch, HBASE-20001.branch-1.4.001.patch, > HBASE-20001.branch-1.4.002.patch, HBASE-20001.branch-1.4.003.patch, > HBASE-20001.branch-1.4.004.patch, HBASE-20001.branch-1.4.005.patch, > HBASE-20001.branch-1.4.006.patch > > > In RegionStates.cleanIfNoMetaEntry() > {{if (MetaTableAccessor.getRegion(server.getConnection(), > hri.getEncodedNameAsBytes()) == null) {}} > {{regionOffline(hri);}} > {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}} > } > But api expects regionname > {{public static PairgetRegion(Connection > connection, byte [] regionName)}} > So we might end up cleaning good regions. > > ADDENDUM: > The scenario mentioned occurs when zkless assignment is used. With zk-based > assignment without the patch what could occur is the daughter regions are > offlined and have no hdfs directory but have entries in meta. The daughter > meta entries will prolly be picked up by the client causing NSREs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
[ https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16378144#comment-16378144 ] Thiruvel Thirumoolan commented on HBASE-20001: -- Pre commit results for branch-1.3 patch: TestEndToEndSplitTransaction#testMasterOpsWhileSplitting has been failing for a while, see nightly build [https://builds.apache.org/job/HBase%20Nightly/job/branch-1.3/245/testReport/] Pre commit results for branch-1.2 patch: TestMultiTableSnapshotInputFormat.testScanOBBToOPP - is flaky, as can be seen in one of the nightly builds [https://builds.apache.org/job/HBase%20Nightly/job/branch-1.2/243/] TestSplitTransactionOnCluster.testMasterRestartWhenSplittingIsPartial is flaky too, as can be seen here [https://builds.apache.org/job/HBase%20Nightly/job/branch-1.2/246/] > cleanIfNoMetaEntry() uses encoded instead of region name to lookup region > - > > Key: HBASE-20001 > URL: https://issues.apache.org/jira/browse/HBASE-20001 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7 >Reporter: Francis Liu >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.3.2, 1.5.0, 1.2.7, 1.4.3 > > Attachments: HBASE-20001.branch-1.2.001.patch, > HBASE-20001.branch-1.3.001.patch, HBASE-20001.branch-1.4.001.patch, > HBASE-20001.branch-1.4.002.patch, HBASE-20001.branch-1.4.003.patch, > HBASE-20001.branch-1.4.004.patch, HBASE-20001.branch-1.4.005.patch, > HBASE-20001.branch-1.4.006.patch > > > In RegionStates.cleanIfNoMetaEntry() > {{if (MetaTableAccessor.getRegion(server.getConnection(), > hri.getEncodedNameAsBytes()) == null) {}} > {{regionOffline(hri);}} > {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}} > } > But api expects regionname > {{public static PairgetRegion(Connection > connection, byte [] regionName)}} > So we might end up cleaning good regions. > > ADDENDUM: > The scenario mentioned occurs when zkless assignment is used. With zk-based > assignment without the patch what could occur is the daughter regions are > offlined and have no hdfs directory but have entries in meta. The daughter > meta entries will prolly be picked up by the client causing NSREs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
[ https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377944#comment-16377944 ] Thiruvel Thirumoolan commented on HBASE-20001: -- Uploaded 1.3 and 1.2 branch patches, the 1.4 patch didn't apply cleanly, but there were only minor changes. The new tests passed locally, will wait for pre-commit result. > cleanIfNoMetaEntry() uses encoded instead of region name to lookup region > - > > Key: HBASE-20001 > URL: https://issues.apache.org/jira/browse/HBASE-20001 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7 >Reporter: Francis Liu >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.3.2, 1.5.0, 1.2.7, 1.4.3 > > Attachments: HBASE-20001.branch-1.2.001.patch, > HBASE-20001.branch-1.3.001.patch, HBASE-20001.branch-1.4.001.patch, > HBASE-20001.branch-1.4.002.patch, HBASE-20001.branch-1.4.003.patch, > HBASE-20001.branch-1.4.004.patch, HBASE-20001.branch-1.4.005.patch, > HBASE-20001.branch-1.4.006.patch > > > In RegionStates.cleanIfNoMetaEntry() > {{if (MetaTableAccessor.getRegion(server.getConnection(), > hri.getEncodedNameAsBytes()) == null) {}} > {{regionOffline(hri);}} > {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}} > } > But api expects regionname > {{public static PairgetRegion(Connection > connection, byte [] regionName)}} > So we might end up cleaning good regions. > > ADDENDUM: > The scenario mentioned occurs when zkless assignment is used. With zk-based > assignment without the patch what could occur is the daughter regions are > offlined and have no hdfs directory but have entries in meta. The daughter > meta entries will prolly be picked up by the client causing NSREs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
[ https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-20001: - Attachment: HBASE-20001.branch-1.2.001.patch > cleanIfNoMetaEntry() uses encoded instead of region name to lookup region > - > > Key: HBASE-20001 > URL: https://issues.apache.org/jira/browse/HBASE-20001 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7 >Reporter: Francis Liu >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.3.2, 1.5.0, 1.2.7, 1.4.3 > > Attachments: HBASE-20001.branch-1.2.001.patch, > HBASE-20001.branch-1.3.001.patch, HBASE-20001.branch-1.4.001.patch, > HBASE-20001.branch-1.4.002.patch, HBASE-20001.branch-1.4.003.patch, > HBASE-20001.branch-1.4.004.patch, HBASE-20001.branch-1.4.005.patch, > HBASE-20001.branch-1.4.006.patch > > > In RegionStates.cleanIfNoMetaEntry() > {{if (MetaTableAccessor.getRegion(server.getConnection(), > hri.getEncodedNameAsBytes()) == null) {}} > {{regionOffline(hri);}} > {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}} > } > But api expects regionname > {{public static PairgetRegion(Connection > connection, byte [] regionName)}} > So we might end up cleaning good regions. > > ADDENDUM: > The scenario mentioned occurs when zkless assignment is used. With zk-based > assignment without the patch what could occur is the daughter regions are > offlined and have no hdfs directory but have entries in meta. The daughter > meta entries will prolly be picked up by the client causing NSREs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
[ https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-20001: - Attachment: (was: HBASE-20001.branch-1.2.001.patch) > cleanIfNoMetaEntry() uses encoded instead of region name to lookup region > - > > Key: HBASE-20001 > URL: https://issues.apache.org/jira/browse/HBASE-20001 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7 >Reporter: Francis Liu >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.3.2, 1.5.0, 1.2.7, 1.4.3 > > Attachments: HBASE-20001.branch-1.3.001.patch, > HBASE-20001.branch-1.4.001.patch, HBASE-20001.branch-1.4.002.patch, > HBASE-20001.branch-1.4.003.patch, HBASE-20001.branch-1.4.004.patch, > HBASE-20001.branch-1.4.005.patch, HBASE-20001.branch-1.4.006.patch > > > In RegionStates.cleanIfNoMetaEntry() > {{if (MetaTableAccessor.getRegion(server.getConnection(), > hri.getEncodedNameAsBytes()) == null) {}} > {{regionOffline(hri);}} > {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}} > } > But api expects regionname > {{public static PairgetRegion(Connection > connection, byte [] regionName)}} > So we might end up cleaning good regions. > > ADDENDUM: > The scenario mentioned occurs when zkless assignment is used. With zk-based > assignment without the patch what could occur is the daughter regions are > offlined and have no hdfs directory but have entries in meta. The daughter > meta entries will prolly be picked up by the client causing NSREs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
[ https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-20001: - Attachment: HBASE-20001.branch-1.2.001.patch > cleanIfNoMetaEntry() uses encoded instead of region name to lookup region > - > > Key: HBASE-20001 > URL: https://issues.apache.org/jira/browse/HBASE-20001 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7 >Reporter: Francis Liu >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.3.2, 1.5.0, 1.2.7, 1.4.3 > > Attachments: HBASE-20001.branch-1.2.001.patch, > HBASE-20001.branch-1.3.001.patch, HBASE-20001.branch-1.4.001.patch, > HBASE-20001.branch-1.4.002.patch, HBASE-20001.branch-1.4.003.patch, > HBASE-20001.branch-1.4.004.patch, HBASE-20001.branch-1.4.005.patch, > HBASE-20001.branch-1.4.006.patch > > > In RegionStates.cleanIfNoMetaEntry() > {{if (MetaTableAccessor.getRegion(server.getConnection(), > hri.getEncodedNameAsBytes()) == null) {}} > {{regionOffline(hri);}} > {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}} > } > But api expects regionname > {{public static PairgetRegion(Connection > connection, byte [] regionName)}} > So we might end up cleaning good regions. > > ADDENDUM: > The scenario mentioned occurs when zkless assignment is used. With zk-based > assignment without the patch what could occur is the daughter regions are > offlined and have no hdfs directory but have entries in meta. The daughter > meta entries will prolly be picked up by the client causing NSREs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
[ https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-20001: - Attachment: HBASE-20001.branch-1.3.001.patch > cleanIfNoMetaEntry() uses encoded instead of region name to lookup region > - > > Key: HBASE-20001 > URL: https://issues.apache.org/jira/browse/HBASE-20001 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7 >Reporter: Francis Liu >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.3.2, 1.5.0, 1.2.7, 1.4.3 > > Attachments: HBASE-20001.branch-1.3.001.patch, > HBASE-20001.branch-1.4.001.patch, HBASE-20001.branch-1.4.002.patch, > HBASE-20001.branch-1.4.003.patch, HBASE-20001.branch-1.4.004.patch, > HBASE-20001.branch-1.4.005.patch, HBASE-20001.branch-1.4.006.patch > > > In RegionStates.cleanIfNoMetaEntry() > {{if (MetaTableAccessor.getRegion(server.getConnection(), > hri.getEncodedNameAsBytes()) == null) {}} > {{regionOffline(hri);}} > {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}} > } > But api expects regionname > {{public static PairgetRegion(Connection > connection, byte [] regionName)}} > So we might end up cleaning good regions. > > ADDENDUM: > The scenario mentioned occurs when zkless assignment is used. With zk-based > assignment without the patch what could occur is the daughter regions are > offlined and have no hdfs directory but have entries in meta. The daughter > meta entries will prolly be picked up by the client causing NSREs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
[ https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377889#comment-16377889 ] Thiruvel Thirumoolan commented on HBASE-20001: -- [~yuzhih...@gmail.com], [~chia7712] - Patch with pre-commit passed. Lemme know if branch-1.4 patch can get in. I can start working on 1.3 and 1.2 patches. > cleanIfNoMetaEntry() uses encoded instead of region name to lookup region > - > > Key: HBASE-20001 > URL: https://issues.apache.org/jira/browse/HBASE-20001 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7 >Reporter: Francis Liu >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.3.2, 1.5.0, 1.2.7, 1.4.3 > > Attachments: HBASE-20001.branch-1.4.001.patch, > HBASE-20001.branch-1.4.002.patch, HBASE-20001.branch-1.4.003.patch, > HBASE-20001.branch-1.4.004.patch, HBASE-20001.branch-1.4.005.patch, > HBASE-20001.branch-1.4.006.patch > > > In RegionStates.cleanIfNoMetaEntry() > {{if (MetaTableAccessor.getRegion(server.getConnection(), > hri.getEncodedNameAsBytes()) == null) {}} > {{regionOffline(hri);}} > {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}} > } > But api expects regionname > {{public static PairgetRegion(Connection > connection, byte [] regionName)}} > So we might end up cleaning good regions. > > ADDENDUM: > The scenario mentioned occurs when zkless assignment is used. With zk-based > assignment without the patch what could occur is the daughter regions are > offlined and have no hdfs directory but have entries in meta. The daughter > meta entries will prolly be picked up by the client causing NSREs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
[ https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377456#comment-16377456 ] Thiruvel Thirumoolan commented on HBASE-20001: -- Thanks [~chia7712], the check is not required at the moment. Updated patch with comments addressed. > cleanIfNoMetaEntry() uses encoded instead of region name to lookup region > - > > Key: HBASE-20001 > URL: https://issues.apache.org/jira/browse/HBASE-20001 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7 >Reporter: Francis Liu >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.3.2, 1.5.0, 1.2.7, 1.4.3 > > Attachments: HBASE-20001.branch-1.4.001.patch, > HBASE-20001.branch-1.4.002.patch, HBASE-20001.branch-1.4.003.patch, > HBASE-20001.branch-1.4.004.patch, HBASE-20001.branch-1.4.005.patch, > HBASE-20001.branch-1.4.006.patch > > > In RegionStates.cleanIfNoMetaEntry() > {{if (MetaTableAccessor.getRegion(server.getConnection(), > hri.getEncodedNameAsBytes()) == null) {}} > {{regionOffline(hri);}} > {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}} > } > But api expects regionname > {{public static PairgetRegion(Connection > connection, byte [] regionName)}} > So we might end up cleaning good regions. > > ADDENDUM: > The scenario mentioned occurs when zkless assignment is used. With zk-based > assignment without the patch what could occur is the daughter regions are > offlined and have no hdfs directory but have entries in meta. The daughter > meta entries will prolly be picked up by the client causing NSREs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
[ https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-20001: - Attachment: HBASE-20001.branch-1.4.006.patch > cleanIfNoMetaEntry() uses encoded instead of region name to lookup region > - > > Key: HBASE-20001 > URL: https://issues.apache.org/jira/browse/HBASE-20001 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7 >Reporter: Francis Liu >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.3.2, 1.5.0, 1.2.7, 1.4.3 > > Attachments: HBASE-20001.branch-1.4.001.patch, > HBASE-20001.branch-1.4.002.patch, HBASE-20001.branch-1.4.003.patch, > HBASE-20001.branch-1.4.004.patch, HBASE-20001.branch-1.4.005.patch, > HBASE-20001.branch-1.4.006.patch > > > In RegionStates.cleanIfNoMetaEntry() > {{if (MetaTableAccessor.getRegion(server.getConnection(), > hri.getEncodedNameAsBytes()) == null) {}} > {{regionOffline(hri);}} > {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}} > } > But api expects regionname > {{public static PairgetRegion(Connection > connection, byte [] regionName)}} > So we might end up cleaning good regions. > > ADDENDUM: > The scenario mentioned occurs when zkless assignment is used. With zk-based > assignment without the patch what could occur is the daughter regions are > offlined and have no hdfs directory but have entries in meta. The daughter > meta entries will prolly be picked up by the client causing NSREs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
[ https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374928#comment-16374928 ] Thiruvel Thirumoolan commented on HBASE-20001: -- [~yuzhih...@gmail.com], sure, will wait for any other feedback and address them. I will change the method to cleanFailedSplitMergeRegions() if thats ok. [~chia7712], {quote}Is there a chance that null server is passed to RegionStates? {quote} I remember a failure without the check, lemme run tests locally and check. {quote}Why we always remove the daughter region directories when using the ZooKeeper based region assignments? It seems to me that it is another kind of data lose? {quote} We rollback during split and merge failures in zk assignment. So, we online the parents and cleanup the failed splits/merges. This is demonstrated by the unit tests - testMergeIsRolledBackOnMERGEFailure and testSplitIsRolledBackOnSPLITFailure. I also explained this in one of the comments above, lemme know what you think. Thanks! > cleanIfNoMetaEntry() uses encoded instead of region name to lookup region > - > > Key: HBASE-20001 > URL: https://issues.apache.org/jira/browse/HBASE-20001 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7 >Reporter: Francis Liu >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.3.2, 1.5.0, 1.2.7, 1.4.3 > > Attachments: HBASE-20001.branch-1.4.001.patch, > HBASE-20001.branch-1.4.002.patch, HBASE-20001.branch-1.4.003.patch, > HBASE-20001.branch-1.4.004.patch, HBASE-20001.branch-1.4.005.patch > > > In RegionStates.cleanIfNoMetaEntry() > {{if (MetaTableAccessor.getRegion(server.getConnection(), > hri.getEncodedNameAsBytes()) == null) {}} > {{regionOffline(hri);}} > {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}} > } > But api expects regionname > {{public static PairgetRegion(Connection > connection, byte [] regionName)}} > So we might end up cleaning good regions. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
[ https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374069#comment-16374069 ] Thiruvel Thirumoolan commented on HBASE-20001: -- Thanks [~yuzhih...@gmail.com] Patch HBASE-20001.branch-1.4.005.patch passed precommit tests (failure unrelated) and is ready for review. I can post patches for other branches if the 1.4 one looks ok. > cleanIfNoMetaEntry() uses encoded instead of region name to lookup region > - > > Key: HBASE-20001 > URL: https://issues.apache.org/jira/browse/HBASE-20001 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7 >Reporter: Francis Liu >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.3.2, 1.5.0, 1.2.7, 1.4.3 > > Attachments: HBASE-20001.branch-1.4.001.patch, > HBASE-20001.branch-1.4.002.patch, HBASE-20001.branch-1.4.003.patch, > HBASE-20001.branch-1.4.004.patch, HBASE-20001.branch-1.4.005.patch > > > In RegionStates.cleanIfNoMetaEntry() > {{if (MetaTableAccessor.getRegion(server.getConnection(), > hri.getEncodedNameAsBytes()) == null) {}} > {{regionOffline(hri);}} > {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}} > } > But api expects regionname > {{public static PairgetRegion(Connection > connection, byte [] regionName)}} > So we might end up cleaning good regions. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
[ https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-20001: - Attachment: HBASE-20001.branch-1.4.005.patch > cleanIfNoMetaEntry() uses encoded instead of region name to lookup region > - > > Key: HBASE-20001 > URL: https://issues.apache.org/jira/browse/HBASE-20001 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7 >Reporter: Francis Liu >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.3.2, 1.5.0, 1.2.7, 1.4.3 > > Attachments: HBASE-20001.branch-1.4.001.patch, > HBASE-20001.branch-1.4.002.patch, HBASE-20001.branch-1.4.003.patch, > HBASE-20001.branch-1.4.004.patch, HBASE-20001.branch-1.4.005.patch > > > In RegionStates.cleanIfNoMetaEntry() > {{if (MetaTableAccessor.getRegion(server.getConnection(), > hri.getEncodedNameAsBytes()) == null) {}} > {{regionOffline(hri);}} > {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}} > } > But api expects regionname > {{public static PairgetRegion(Connection > connection, byte [] regionName)}} > So we might end up cleaning good regions. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
[ https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-20001: - Attachment: HBASE-20001.branch-1.4.004.patch > cleanIfNoMetaEntry() uses encoded instead of region name to lookup region > - > > Key: HBASE-20001 > URL: https://issues.apache.org/jira/browse/HBASE-20001 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7 >Reporter: Francis Liu >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.3.2, 1.5.0, 1.2.7, 1.4.3 > > Attachments: HBASE-20001.branch-1.4.001.patch, > HBASE-20001.branch-1.4.002.patch, HBASE-20001.branch-1.4.003.patch, > HBASE-20001.branch-1.4.004.patch > > > In RegionStates.cleanIfNoMetaEntry() > {{if (MetaTableAccessor.getRegion(server.getConnection(), > hri.getEncodedNameAsBytes()) == null) {}} > {{regionOffline(hri);}} > {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}} > } > But api expects regionname > {{public static PairgetRegion(Connection > connection, byte [] regionName)}} > So we might end up cleaning good regions. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
[ https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16373621#comment-16373621 ] Thiruvel Thirumoolan commented on HBASE-20001: -- Uploaded HBASE-20001.branch-1.4.003.patch to address the issues we ([~toffer] and I) found. This addresses two issues: # regionName fix which caused the data loss issue for us. # ZK split/merge rollback on failure and unit tests. testRSSplitEphemeralsDisappearButDaughtersAreOnlinedAfterShutdownHandling - This test failed and caused subsequent tests to fail. It was failing in deletion of the test table (finally clause) because the daughters were in transition (SPLITTING_NEW) due to regionName fix. Without the regionName fix, the daughters were offlined and HDFS dir removed and the test passed, which is wrong. [~toffer] pointed out that the test was waiting for daughters to be online, but in zk based assignment, we rollback and not forward. So, we should be waiting for parent. The test was still passing all these checks because there were not enough barriers. So we fixed the test to comply with the zk based behavior. We also introduced a similar test for merge in zk mode. I will raise separate Jira for re-introducing zkless based tests back and will add the appropriate zkless tests in follow up. Once we fixed the test, we realized the failed daughters were in transition and not offlined. We fixed that also in RegionStates.java as part of this Jira itself. Please let us know what do you guys think. Thanks! > cleanIfNoMetaEntry() uses encoded instead of region name to lookup region > - > > Key: HBASE-20001 > URL: https://issues.apache.org/jira/browse/HBASE-20001 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7 >Reporter: Francis Liu >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.3.2, 1.5.0, 1.2.7, 1.4.3 > > Attachments: HBASE-20001.branch-1.4.001.patch, > HBASE-20001.branch-1.4.002.patch, HBASE-20001.branch-1.4.003.patch > > > In RegionStates.cleanIfNoMetaEntry() > {{if (MetaTableAccessor.getRegion(server.getConnection(), > hri.getEncodedNameAsBytes()) == null) {}} > {{regionOffline(hri);}} > {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}} > } > But api expects regionname > {{public static PairgetRegion(Connection > connection, byte [] regionName)}} > So we might end up cleaning good regions. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
[ https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-20001: - Attachment: HBASE-20001.branch-1.4.003.patch > cleanIfNoMetaEntry() uses encoded instead of region name to lookup region > - > > Key: HBASE-20001 > URL: https://issues.apache.org/jira/browse/HBASE-20001 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7 >Reporter: Francis Liu >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.3.2, 1.5.0, 1.2.7, 1.4.3 > > Attachments: HBASE-20001.branch-1.4.001.patch, > HBASE-20001.branch-1.4.002.patch, HBASE-20001.branch-1.4.003.patch > > > In RegionStates.cleanIfNoMetaEntry() > {{if (MetaTableAccessor.getRegion(server.getConnection(), > hri.getEncodedNameAsBytes()) == null) {}} > {{regionOffline(hri);}} > {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}} > } > But api expects regionname > {{public static PairgetRegion(Connection > connection, byte [] regionName)}} > So we might end up cleaning good regions. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
[ https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371084#comment-16371084 ] Thiruvel Thirumoolan commented on HBASE-20001: -- We found some more issues. Will upload another patch tomm, would be better to explain issues along with the patch. > cleanIfNoMetaEntry() uses encoded instead of region name to lookup region > - > > Key: HBASE-20001 > URL: https://issues.apache.org/jira/browse/HBASE-20001 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7 >Reporter: Francis Liu >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.3.2, 1.5.0, 1.2.7, 1.4.3 > > Attachments: HBASE-20001.branch-1.4.001.patch, > HBASE-20001.branch-1.4.002.patch > > > In RegionStates.cleanIfNoMetaEntry() > {{if (MetaTableAccessor.getRegion(server.getConnection(), > hri.getEncodedNameAsBytes()) == null) {}} > {{regionOffline(hri);}} > {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}} > } > But api expects regionname > {{public static PairgetRegion(Connection > connection, byte [] regionName)}} > So we might end up cleaning good regions. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
[ https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16368013#comment-16368013 ] Thiruvel Thirumoolan commented on HBASE-20001: -- [~chia7712] - Yes. We lost data on our test clusters. Daughters were in SPLITTING_NEW state and since there was no meta entry for the encoded name (bcoz of this bug), the daughter region directories on HDFS were removed. Catalog Janitor didn't find split references and removed the parent too, causing hbck to complain of region hole. > cleanIfNoMetaEntry() uses encoded instead of region name to lookup region > - > > Key: HBASE-20001 > URL: https://issues.apache.org/jira/browse/HBASE-20001 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7 >Reporter: Francis Liu >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.3.2, 1.5.0, 1.4.3 > > Attachments: HBASE-20001.branch-1.4.001.patch > > > In RegionStates.cleanIfNoMetaEntry() > {{if (MetaTableAccessor.getRegion(server.getConnection(), > hri.getEncodedNameAsBytes()) == null) {}} > {{regionOffline(hri);}} > {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}} > } > But api expects regionname > {{public static PairgetRegion(Connection > connection, byte [] regionName)}} > So we might end up cleaning good regions. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
[ https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366534#comment-16366534 ] Thiruvel Thirumoolan commented on HBASE-20001: -- [~chia7712] - I have added a comment in the reviewboard, can we pls continue the conversation there? thanks for your feedback. Some tests in TestSplitTransactionOnCluster also failed, guessing its mostly test based, will check it out. > cleanIfNoMetaEntry() uses encoded instead of region name to lookup region > - > > Key: HBASE-20001 > URL: https://issues.apache.org/jira/browse/HBASE-20001 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7 >Reporter: Francis Liu >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.3.2, 1.5.0, 1.4.3 > > Attachments: HBASE-20001.branch-1.4.001.patch > > > In RegionStates.cleanIfNoMetaEntry() > {{if (MetaTableAccessor.getRegion(server.getConnection(), > hri.getEncodedNameAsBytes()) == null) {}} > {{regionOffline(hri);}} > {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}} > } > But api expects regionname > {{public static PairgetRegion(Connection > connection, byte [] regionName)}} > So we might end up cleaning good regions. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
[ https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366158#comment-16366158 ] Thiruvel Thirumoolan commented on HBASE-20001: -- [~chia7712] When we were investigating this problem internally, it was hard to trace through regions that were cleaned up, we had to lookup namenode audit logs. > cleanIfNoMetaEntry() uses encoded instead of region name to lookup region > - > > Key: HBASE-20001 > URL: https://issues.apache.org/jira/browse/HBASE-20001 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7 >Reporter: Francis Liu >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.3.2, 1.5.0, 1.4.2 > > Attachments: HBASE-20001.branch-1.4.001.patch > > > In RegionStates.cleanIfNoMetaEntry() > {{if (MetaTableAccessor.getRegion(server.getConnection(), > hri.getEncodedNameAsBytes()) == null) {}} > {{regionOffline(hri);}} > {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}} > } > But api expects regionname > {{public static PairgetRegion(Connection > connection, byte [] regionName)}} > So we might end up cleaning good regions. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
[ https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-20001: - Fix Version/s: 1.4.2 1.5.0 1.3.2 > cleanIfNoMetaEntry() uses encoded instead of region name to lookup region > - > > Key: HBASE-20001 > URL: https://issues.apache.org/jira/browse/HBASE-20001 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7 >Reporter: Francis Liu >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.3.2, 1.5.0, 1.4.2 > > Attachments: HBASE-20001.branch-1.4.001.patch > > > In RegionStates.cleanIfNoMetaEntry() > {{if (MetaTableAccessor.getRegion(server.getConnection(), > hri.getEncodedNameAsBytes()) == null) {}} > {{regionOffline(hri);}} > {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}} > } > But api expects regionname > {{public static PairgetRegion(Connection > connection, byte [] regionName)}} > So we might end up cleaning good regions. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
[ https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-20001: - Status: Patch Available (was: Open) > cleanIfNoMetaEntry() uses encoded instead of region name to lookup region > - > > Key: HBASE-20001 > URL: https://issues.apache.org/jira/browse/HBASE-20001 > Project: HBase > Issue Type: Bug >Affects Versions: 1.1.7, 1.4.0, 1.3.0, 1.2.0 >Reporter: Francis Liu >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.3.2, 1.5.0, 1.4.2 > > Attachments: HBASE-20001.branch-1.4.001.patch > > > In RegionStates.cleanIfNoMetaEntry() > {{if (MetaTableAccessor.getRegion(server.getConnection(), > hri.getEncodedNameAsBytes()) == null) {}} > {{regionOffline(hri);}} > {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}} > } > But api expects regionname > {{public static PairgetRegion(Connection > connection, byte [] regionName)}} > So we might end up cleaning good regions. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20001) cleanIfNoMetaEntry() uses encoded instead of region name to lookup region
[ https://issues.apache.org/jira/browse/HBASE-20001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-20001: - Attachment: HBASE-20001.branch-1.4.001.patch > cleanIfNoMetaEntry() uses encoded instead of region name to lookup region > - > > Key: HBASE-20001 > URL: https://issues.apache.org/jira/browse/HBASE-20001 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.1.7 >Reporter: Francis Liu >Assignee: Thiruvel Thirumoolan >Priority: Major > Attachments: HBASE-20001.branch-1.4.001.patch > > > In RegionStates.cleanIfNoMetaEntry() > {{if (MetaTableAccessor.getRegion(server.getConnection(), > hri.getEncodedNameAsBytes()) == null) {}} > {{regionOffline(hri);}} > {{FSUtils.deleteRegionDir(server.getConfiguration(), hri);}} > } > But api expects regionname > {{public static PairgetRegion(Connection > connection, byte [] regionName)}} > So we might end up cleaning good regions. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19996) Some nonce procs might not be cleaned up (follow up HBASE-19756)
[ https://issues.apache.org/jira/browse/HBASE-19996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365306#comment-16365306 ] Thiruvel Thirumoolan commented on HBASE-19996: -- [~yuzhih...@gmail.com] - Uploaded patch for master branch, test case. Lemme know if this is ok or would you like me to push that in another jira. > Some nonce procs might not be cleaned up (follow up HBASE-19756) > > > Key: HBASE-19996 > URL: https://issues.apache.org/jira/browse/HBASE-19996 > Project: HBase > Issue Type: Bug >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.3.2, 1.5.0, 1.4.2 > > Attachments: HBASE-19996.branch-1.4.001.patch, > HBASE-19996.branch-1.4.001.patch, HBASE-19996.master.001.patch > > > Follow up to HBASE-19756 which dealt with NPEs during proc cleanup. > Unfortunately, the patch for branch-1 might not remove some valid procs too. > The branch-2 patch doesn't have this problem. This fixes the branch-1 bug and > also adds another test to branch-2. Thanks to [~toffer] for flagging this > internally. -- This message was sent by Atlassian JIRA (v7.6.3#76005)