[jira] [Created] (HBASE-26856) BufferedDataBlockEncoder.OnheapDecodedCell value can get corrupted
Mohammad Arshad created HBASE-26856: --- Summary: BufferedDataBlockEncoder.OnheapDecodedCell value can get corrupted Key: HBASE-26856 URL: https://issues.apache.org/jira/browse/HBASE-26856 Project: HBase Issue Type: Bug Reporter: Mohammad Arshad Assignee: Mohammad Arshad In our production cluster we observed the cell value is modified after successful scanner read. After analyzing we observed OnheapDecodedCell is not created properly. We create OnheapDecodedCell with complete valAndTagsBuffer underlying array. {code:java} return new OnheapDecodedCell(Bytes.copy(keyBuffer, 0, this.keyLength), currentKey.getRowLength(), currentKey.getFamilyOffset(), currentKey.getFamilyLength(), currentKey.getQualifierOffset(), currentKey.getQualifierLength(), currentKey.getTimestamp(), currentKey.getTypeByte(), valAndTagsBuffer.array(), valAndTagsBuffer.arrayOffset() + vOffset, this.valueLength, memstoreTS, tagsArray, tOffset, this.tagsLength); {code} Here we are passing valAndTagsBuffer.array() for value extraction. The underlying array will be modified if it is altered anywhere. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HBASE-25571) Compilation in branch-2 after HBASE-25364
Mohammad Arshad created HBASE-25571: --- Summary: Compilation in branch-2 after HBASE-25364 Key: HBASE-25571 URL: https://issues.apache.org/jira/browse/HBASE-25571 Project: HBase Issue Type: Bug Reporter: Mohammad Arshad Assignee: Mohammad Arshad {code:java} [ERROR] COMPILATION ERROR : [INFO] - [ERROR] /D:/code/apache/forked/hbaseBranch2/hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFile.java:[737,12] cannot find symbol symbol: method getCell(byte[],byte[],byte[]) location: class org.apache.hadoop.hbase.io.hfile.TestHFile [ERROR] /D:/code/apache/forked/hbaseBranch2/hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFile.java:[738,13] cannot find symbol symbol: method getCell(byte[],byte[],byte[]) location: class org.apache.hadoop.hbase.io.hfile.TestHFile [ERROR] /D:/code/apache/forked/hbaseBranch2/hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFile.java:[744,12] cannot find symbol symbol: method getCell(byte[],byte[],byte[]) location: class org.apache.hadoop.hbase.io.hfile.TestHFile [ERROR] /D:/code/apache/forked/hbaseBranch2/hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFile.java:[745,13] cannot find symbol symbol: method getCell(byte[],byte[],byte[]) location: class org.apache.hadoop.hbase.io.hfile.TestHFile [INFO] 4 errors {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25503) HBase code download is failing on windows with invalid path error
Mohammad Arshad created HBASE-25503: --- Summary: HBase code download is failing on windows with invalid path error Key: HBASE-25503 URL: https://issues.apache.org/jira/browse/HBASE-25503 Project: HBase Issue Type: Bug Reporter: Mohammad Arshad Assignee: Mohammad Arshad git pull command is failing with "error: invalid path" {noformat} Host1 MINGW64 /d/hbase (master) $ git pull error: invalid path 'dev-support/design-docs/HBASE-18070-ROOT_hbase:meta_Region_Replicas.pdf' Updating 2e96a5b2d3..dfefff7e59{noformat} This problem is coming only on windows machines. Tried on windows 7 and windows 10 both have problem. Searched on net, seems no any easy solution. Problem is because file HBASE-18070-ROOT_hbase:meta_Region_Replicas.pdf has colon in its name. To fix the problem we should remove the colon : from file name -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25492) Create table with rsgroup
Mohammad Arshad created HBASE-25492: --- Summary: Create table with rsgroup Key: HBASE-25492 URL: https://issues.apache.org/jira/browse/HBASE-25492 Project: HBase Issue Type: Improvement Components: rsgroup Reporter: Mohammad Arshad Assignee: Mohammad Arshad Fix For: 2.4.1 Currently we need to create table and then move it to desired RSGroup which cost HM assignment twice, considering table with many regions it will be huge. We have a use case where user want to create table with rsgroup. HBASE-22695 already implemented this feature in master branch but not ported to branch-2 because master and branch-2 implementations are different. This JIRA aims to implement this feature in branch-2. But unlike master branch, rsgroup information from TableDescriptor is used only while creating the table to keep the changes minimum. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24212) HMaster UI hangs when rsgorup is enabled and meta-region is not available
[ https://issues.apache.org/jira/browse/HBASE-24212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Arshad resolved HBASE-24212. - Resolution: Not A Problem * Moving rsgroup code to server is not the available option. * After HBASE-22738 & HBASE-24760 meta would not be offline in the mentioned scenario. > HMaster UI hangs when rsgorup is enabled and meta-region is not available > - > > Key: HBASE-24212 > URL: https://issues.apache.org/jira/browse/HBASE-24212 > Project: HBase > Issue Type: Bug > Components: rsgroup >Affects Versions: 2.2.4 > Reporter: Mohammad Arshad >Assignee: Mohammad Arshad >Priority: Major > > HMaster UI hangs when rsgroup is enabled and meta-region is not available. > Steps to reproduce: > # Cluster: 1 Master, 3 RS > # Create rsgroup r1 and r2 > # Move rs1 to r1 and rs2 to r2 then all the regions are online on rs3 > # Stop rs3 > # Now access URL hmaster:Host:infoPort/master-status The page will not open. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24676) Meta region assignment is blocked when all RS in meta table group are restarted.
[ https://issues.apache.org/jira/browse/HBASE-24676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Arshad resolved HBASE-24676. - Resolution: Not A Problem After HBASE-22738 & HBASE-24760 if no live server in current group then table regions will be moved to default group or other group where live servers exist, so meta will be assigned if any live server in cluster > Meta region assignment is blocked when all RS in meta table group are > restarted. > > > Key: HBASE-24676 > URL: https://issues.apache.org/jira/browse/HBASE-24676 > Project: HBase > Issue Type: Bug > Components: rsgroup >Affects Versions: 2.2.3 >Reporter: Mohammad Arshad >Assignee: Mohammad Arshad >Priority: Major > > This issue happened in a test cluster. The issue does not reproduce easily. > But we can reproduce it with debug points in code. > Steps to reproduce: > # Install a HBase cluster with three RS(rs1,rs2 and rs3) and one Master > # Create two rsgroups r1 and r2 and move rs1 to r1 and rs2 to r2 > {code} > add_rsgroup 'r1';add_rsgroup 'r2';move_servers_rsgroup > 'r1',['rs1Host:16020'];move_servers_rsgroup 'r2',['rs2Host:16020'] > {code} > # Create a table t1 > {code}create 't1','f1','f2';put't1','r1','f1:c1','v1'{code} > # Start debugging master, put debug point in while loop of > {code}org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl.ServerEventsListenerThread#run{code} > method. > # Stop rs3 > # When debug flow comes, wait around 30 seconds to let the meta be offline > and then let the debug flow execute. By now meta will be offline as rs3 is > stopped. HMaster UI will hang as meta is offline. > # Now start rs3, after start meta should be online and Master UI should open. > # No, still master UI hangs, then you have reproduced the issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HBASE-24025) Improve performance of move_servers_rsgroup and move_tables_rsgroup by using async region move API
[ https://issues.apache.org/jira/browse/HBASE-24025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Arshad reopened HBASE-24025: - > Improve performance of move_servers_rsgroup and move_tables_rsgroup by using > async region move API > -- > > Key: HBASE-24025 > URL: https://issues.apache.org/jira/browse/HBASE-24025 > Project: HBase > Issue Type: Improvement > Components: rsgroup > Reporter: Mohammad Arshad >Assignee: Mohammad Arshad >Priority: Major > Fix For: 3.0.0-alpha-1 > > > Currently move_servers_rsgroup and move_tables_rsgroup commands and APIs are > taking lot of time. > In my test environment, to move a server with 100 regions it takes around 137 > seconds. > Similarly it takes around same time to move a table with 100 regions to other > group. > The time taken in rsgroup meta update is negligible. Almost all the time is > taken in region moment. This is happening because region is moved serially > using getAssignmentManager().move(region) API > API getAssignmentManager().moveAsync(regionplan) can be used to move the > regions in parallel to improve the performance of region group move servers > and tables commands and APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25069) Display region name instead of encoded region name for holes in HBCK report page.
Mohammad Arshad created HBASE-25069: --- Summary: Display region name instead of encoded region name for holes in HBCK report page. Key: HBASE-25069 URL: https://issues.apache.org/jira/browse/HBASE-25069 Project: HBase Issue Type: Improvement Components: hbck Affects Versions: 2.3.1, 3.0.0-alpha-1 Reporter: Mohammad Arshad Assignee: Mohammad Arshad Attachments: image-2020-09-19-11-39-01-755.png In HMaster UI, in HBCK report, holes display only encoded region names. In display region encode name does not give any information. like which table, what's the start key etc. I think it is better to display the region name instead of region encoded name. !image-2020-09-19-11-39-01-755.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25009) Hbck chore logs wrong message when loading regions from FS
Mohammad Arshad created HBASE-25009: --- Summary: Hbck chore logs wrong message when loading regions from FS Key: HBASE-25009 URL: https://issues.apache.org/jira/browse/HBASE-25009 Project: HBase Issue Type: Bug Affects Versions: 2.3.1, 3.0.0-alpha-1 Reporter: Mohammad Arshad Assignee: Mohammad Arshad {code:java} LOG.info("Loaded {} regions from {} regionservers' reports and found {} orphan regions", numRegions, rsReports.size(), orphanRegionsOnFS.size()); {code} In above log message orphanRegionsOnFS.size() should be replaced with orphanRegionsOnRS.size() as the regions are loaded from RS not form FS -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24995) MetaFixer fails to fix overlaps when multiple tables have overlaps
Mohammad Arshad created HBASE-24995: --- Summary: MetaFixer fails to fix overlaps when multiple tables have overlaps Key: HBASE-24995 URL: https://issues.apache.org/jira/browse/HBASE-24995 Project: HBase Issue Type: Bug Components: hbck2 Affects Versions: 2.2.3, 3.0.0-alpha-1 Reporter: Mohammad Arshad Assignee: Mohammad Arshad MetaFixer fails to fix overlaps when multiple tables have overlaps *Steps to reproduce from UT.* # Create table t1 and t2 with split keys, ["bbb", "ccc", "ddd", "eee"] # Create extra region in both t1 and t2 with start key "bbb" and end key "ddd" # Run catalog janitor, It will report total 4 overlaps, 2 from each table. # Run MetaFixer, wait for merges to finish. # Run the catalog janitor again and verify report, there should not be any overlap # Overlap still exists. Reproduced!!! *Analysis.* * When I run the same scenario for just one table t1, overlaps are fixed successfully. * Seems problem with MetaFixer#calculateMerges. * I think merges should be calculated within a table. Across the table merge does not have significance. -- This message was sent by Atlassian Jira (v8.3.4#803005)
RE: Fixing Catalog Janitor and HBCK chore reported inconsistencies automatically
> In hbase1? No not in hbase1, Inconsistency occurred in hbase 2.2.3. > One suggestion is that you move to 2.3.1 rather than 2.2.3. It has the > benefit of experience running the 2.2 line and has even more fixes for > anomalies applied Good point. Thanks for suggesting, will surely consider upgrade to 2.3.x Thanks -Arshad -Original Message- From: Stack [mailto:st...@duboce.net] Sent: Monday, August 24, 2020 5:01 AM To: HBase Dev List Cc: Hbase-User Subject: Re: Fixing Catalog Janitor and HBCK chore reported inconsistencies automatically On Thu, Aug 20, 2020 at 11:48 PM Mohammad arshad wrote: > > > Are you regularly experiencing consistency issues? If so, what > > version > of hbase are you running? > We have experienced some inconsistencies like unknown servers, double > assignment or RIT (HBASE-24885). In hbase1? > In future we may witness some other inconsistencies, so we should make > HBCK2 mature enough to handle them. Off course we will learn from the > new bug and fix them in code rather than depending on HBCK2. > > Agree and if you run into issues, talk out loud here because we want > to hear about them and fix them as the come up. > Currently we are upgrading from 1.x to 2.2.3 version. Just a thought > process to ease HBase operator job as per some operator feedback, we > can discuss and come up with some ideas to handle this. > > Appreciated. One suggestion is that you move to 2.3.1 rather than 2.2.3. It has the benefit of experience running the 2.2 line and has even more fixes for anomalies applied. Also, the master-side of the hbck2 invocations has had a bunch of improvement made so when they run, they are more thorough. Yours, S > Thanks > -Arshad > > -Original Message- > From: Stack [mailto:st...@duboce.net] > Sent: Tuesday, August 18, 2020 9:20 AM > To: Hbase-User > Cc: dev@hbase.apache.org > Subject: Re: Fixing Catalog Janitor and HBCK chore reported > inconsistencies automatically > > On Thu, Aug 6, 2020 at 10:10 PM Mohammad arshad < > mohammad.ars...@huawei.com> > wrote: > > > Hello HBase Folks > > > > Currently Catalog Janitor (CJ) and HBCK chore reported > > inconsistencies to be fixed by manually by executing HBCK2 commands. > > HBCK2 requires high HBase skills. It is bit difficult for > > maintenance personals to figure out which command, when and in which > > order to be executed. > > > > > True. > > Is there any effort going on in community to automate fixing these > > inconsistencies? I also would like to contribute there. > > > > I was thinking, maybe we can expose CJ and HBCK chore reported > > inconsistencies through a new master API and then provide option to > > fix these inconsistencies. Basically adding two new commands in > > HBCK2 -listInconsistencies list CJ and HBCK chore reported > > inconsistencies -fixInconsistencies fix CJ and HBCK chore reported > > inconsistencies (Not sure if possible to fix all inconsistencies, > > need to analyze all inconsistencies case by case, but some are very > > straight forward for example holes and overlap) > > > > > So, a 'god' command that will fix any issue found? > > That is tough. You've seen the philosophy section on hbck2, of how it > makes no claims to being so capable [1]? > > We are trying to get to a place where hbck2 is increasingly less necessary. > The general idea is that inconsistencies are caused by bugs or oversight. > As time goes by, we've been plugging the holes. Upgrading hbase gains > you the fixes making the need for hbck2 less. > > But as you state above, when there is an issue, it can be hard for the > operator to figure how to make fixes. We've been trying to improve > this state with documentation in the UI up on the 'HBCK Report' page > and elsewhere but there is room for improvement. > > We've also been trying to aggregate on the hbck2 side so that commands > become increasingly 'macro', fixing a whole category of problem types > rather than an affliction at a time. This should make the tool easier > to use. The 'fixMeta' command is a good example here as it fixes any > holes or overlaps found in hbase:meta (This is probably ripe for > conversion into an auto-repair run on occasion by the Master). Another > way in which we've been trying to make improvement is by obsoleting > commands in hbck2 as we fix the root cause that required the hbck2 > command option to be needed in the first place. > > CJ and the HBCK Chore can report on inconsistencies found. It is > another thing altogether having them go ahead and repair any issues > found mostly because we are not yet confident the repair won't cause > more damage than it fixes. > > > > Any thoughts/inputs highly appreciated. > > > > > Are you regularly experiencing consistency issues? If so, what version > of hbase are you running? > > Thanks, > S > > > 1. > > https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2 > #philosophy > > > > Regards > > -Arshad > > > > > > >
[jira] [Created] (HBASE-24940) runCatalogJanitor() API should return -1 to indicate already running status
Mohammad Arshad created HBASE-24940: --- Summary: runCatalogJanitor() API should return -1 to indicate already running status Key: HBASE-24940 URL: https://issues.apache.org/jira/browse/HBASE-24940 Project: HBase Issue Type: Improvement Affects Versions: 2.2.3 Reporter: Mohammad Arshad Assignee: Mohammad Arshad runCleanerChore() API returns false if hbck chore is already running. This is quite helpful in many cases. runCatalogJanitor() API is not indicating whether scan is already running. Also it is not doing anything in case scan is already running. I think we should return -1 to indicate already running status. -- This message was sent by Atlassian Jira (v8.3.4#803005)
RE: Fixing Catalog Janitor and HBCK chore reported inconsistencies automatically
Thank you sir for your very informative response. > So, a 'god' command that will fix any issue found? Yes, the idea was to make -fixInconsistencies god command, fixing all inconstancies reported in both CJ and hbck chore. But as you rightly pointed out that we are not yet confident the repair won't cause more damage than it fixes, on second thought automating fixes at this point may not be appropriate as efficacy of all the fixes is not yet proved. > Are you regularly experiencing consistency issues? If so, what version of > hbase are you running? We have experienced some inconsistencies like unknown servers, double assignment or RIT (HBASE-24885). In future we may witness some other inconsistencies, so we should make HBCK2 mature enough to handle them. Off course we will learn from the new bug and fix them in code rather than depending on HBCK2. Currently we are upgrading from 1.x to 2.2.3 version. Just a thought process to ease HBase operator job as per some operator feedback, we can discuss and come up with some ideas to handle this. Thanks -Arshad -Original Message- From: Stack [mailto:st...@duboce.net] Sent: Tuesday, August 18, 2020 9:20 AM To: Hbase-User Cc: dev@hbase.apache.org Subject: Re: Fixing Catalog Janitor and HBCK chore reported inconsistencies automatically On Thu, Aug 6, 2020 at 10:10 PM Mohammad arshad wrote: > Hello HBase Folks > > Currently Catalog Janitor (CJ) and HBCK chore reported inconsistencies > to be fixed by manually by executing HBCK2 commands. > HBCK2 requires high HBase skills. It is bit difficult for maintenance > personals to figure out which command, when and in which order to be > executed. > > True. Is there any effort going on in community to automate fixing these > inconsistencies? I also would like to contribute there. > > I was thinking, maybe we can expose CJ and HBCK chore reported > inconsistencies through a new master API and then provide option to > fix these inconsistencies. Basically adding two new commands in HBCK2 > -listInconsistencies list CJ and HBCK chore reported inconsistencies > -fixInconsistencies fix CJ and HBCK chore reported inconsistencies > (Not sure if possible to fix all inconsistencies, need to analyze all > inconsistencies case by case, but some are very straight forward for > example holes and overlap) > > So, a 'god' command that will fix any issue found? That is tough. You've seen the philosophy section on hbck2, of how it makes no claims to being so capable [1]? We are trying to get to a place where hbck2 is increasingly less necessary. The general idea is that inconsistencies are caused by bugs or oversight. As time goes by, we've been plugging the holes. Upgrading hbase gains you the fixes making the need for hbck2 less. But as you state above, when there is an issue, it can be hard for the operator to figure how to make fixes. We've been trying to improve this state with documentation in the UI up on the 'HBCK Report' page and elsewhere but there is room for improvement. We've also been trying to aggregate on the hbck2 side so that commands become increasingly 'macro', fixing a whole category of problem types rather than an affliction at a time. This should make the tool easier to use. The 'fixMeta' command is a good example here as it fixes any holes or overlaps found in hbase:meta (This is probably ripe for conversion into an auto-repair run on occasion by the Master). Another way in which we've been trying to make improvement is by obsoleting commands in hbck2 as we fix the root cause that required the hbck2 command option to be needed in the first place. CJ and the HBCK Chore can report on inconsistencies found. It is another thing altogether having them go ahead and repair any issues found mostly because we are not yet confident the repair won't cause more damage than it fixes. > Any thoughts/inputs highly appreciated. > > Are you regularly experiencing consistency issues? If so, what version of hbase are you running? Thanks, S 1. https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2#philosophy > Regards > -Arshad > > >
Fixing Catalog Janitor and HBCK chore reported inconsistencies automatically
Hello HBase Folks Currently Catalog Janitor (CJ) and HBCK chore reported inconsistencies to be fixed by manually by executing HBCK2 commands. HBCK2 requires high HBase skills. It is bit difficult for maintenance personals to figure out which command, when and in which order to be executed. Is there any effort going on in community to automate fixing these inconsistencies? I also would like to contribute there. I was thinking, maybe we can expose CJ and HBCK chore reported inconsistencies through a new master API and then provide option to fix these inconsistencies. Basically adding two new commands in HBCK2 -listInconsistencies list CJ and HBCK chore reported inconsistencies -fixInconsistencies fix CJ and HBCK chore reported inconsistencies (Not sure if possible to fix all inconsistencies, need to analyze all inconsistencies case by case, but some are very straight forward for example holes and overlap) Any thoughts/inputs highly appreciated. Regards -Arshad
[jira] [Created] (HBASE-24675) On Master restart all servers are assigned to default rsgroup.
Mohammad Arshad created HBASE-24675: --- Summary: On Master restart all servers are assigned to default rsgroup. Key: HBASE-24675 URL: https://issues.apache.org/jira/browse/HBASE-24675 Project: HBase Issue Type: Bug Components: rsgroup Affects Versions: 2.2.3 Reporter: Mohammad Arshad Assignee: Mohammad Arshad Steps to reproduce: # Install a HBase cluster with three RS(rs1,rs2 and rs3) and one Master # Create two rsgroups r1 and r2 and move rs1 to r1 and rs2 to r2 {code:java} add_rsgroup 'r1';add_rsgroup 'r2';move_servers_rsgroup 'r1',['host1:16020'];move_servers_rsgroup 'r2',['host2:16020'] {code} # Restart Master # Run list_rsgroups for hbase shell, all region servers are assigned to default regroup. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24676) Meta region assignment is blocked when all RS in meta table group are restarted.
Mohammad Arshad created HBASE-24676: --- Summary: Meta region assignment is blocked when all RS in meta table group are restarted. Key: HBASE-24676 URL: https://issues.apache.org/jira/browse/HBASE-24676 Project: HBase Issue Type: Bug Components: rsgroup Affects Versions: 2.2.3 Reporter: Mohammad Arshad Assignee: Mohammad Arshad This issue happened in a test cluster. The issue does not reproduce easily. But we can reproduce it with debug points in code. Steps to reproduce: # Install a HBase cluster with three RS(rs1,rs2 and rs3) and one Master # Create two rsgroups r1 and r2 and move rs1 to r1 and rs2 to r2 {code} add_rsgroup 'r1';add_rsgroup 'r2';move_servers_rsgroup 'r1',['rs1Host:16020'];move_servers_rsgroup 'r2',['rs2Host:16020'] {code} # Create a table t1 {code}create 't1','f1','f2';put't1','r1','f1:c1','v1'{code} # Start debugging master, put debug point in while loop of {code}org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl.ServerEventsListenerThread#run{code} method. # Stop rs3 # When debug flow comes, wait around 30 seconds to let the meta be offline and then let the debug flow execute. By now meta will be offline as rs3 is stopped. HMaster UI will hang as meta is offline. # Now start rs3, after start meta should be online and Master UI should open. # No, still master UI hangs, then you have reproduced the issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24211) Create table is slow in large cluster when AccessController is enabled.
[ https://issues.apache.org/jira/browse/HBASE-24211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Arshad resolved HBASE-24211. - Fix Version/s: 2.4.0 Resolution: Fixed > Create table is slow in large cluster when AccessController is enabled. > --- > > Key: HBASE-24211 > URL: https://issues.apache.org/jira/browse/HBASE-24211 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.6, master, 2.2.4 > Reporter: Mohammad Arshad > Assignee: Mohammad Arshad >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.0, 1.7.0, 2.4.0 > > > *Problem:* > In HBase 1.3.x large, performance test, cluster (100 RS, 60k tables, 600k > regions) a simple table creation takes around 150 seconds. The time taken > varies but still takes lot of time. > *Analysis:* > 1. When HBase creates a table , it calls AssignmentManager#assign(final > ServerName destination, final List regions) > In AssignmentManager#assign,it calls asyncSetOfflineInZooKeeper(state, cb, > destination), and waits in below code loop for 2 minutes. > {code:java} > if (useZKForAssignment) { > // Wait until all unassigned nodes have been put up and watchers > set. > int total = states.size(); > for (int oldCounter = 0; !server.isStopped();) { > int count = counter.get(); > if (oldCounter != count) { > LOG.debug(destination.toString() + " unassigned znodes=" + > count + > " of total=" + total + "; oldCounter=" + oldCounter); > oldCounter = count; > } > if (count >= total) break; > Thread.sleep(5); > } > } > {code} > 2. asyncSetOfflineInZooKeeper creates a znode under > /hbase/region-in-transition/ and calls exist to ensure that znode is created. > This is simple operation should not take much time. Then where the time it > taken!!! > 3. ZooKeeper client API process watcher notification and async API response > through a queue one by one. > If there is a delay in any watcher/response processing by the client, in > this case HBase, all other response processing is delayed. Then it appears as > if API call has taken more time. > Same thing happen in this issue. > Watcher processing for znode creation under /hbase/acl took most of the time > and delayed /hbase/region-in-transition/region znode creation processing. > This is why wait in loop was too long. > 4. Watcher processing for znode creation under hbase/acl/ calls > ZKPermissionWatcher#nodeChildrenChanged, which internally calls > ZKUtil.getChildDataAndWatchForNewChildren > *which calls ZooKeeper's getData API, in this use case, 60k times which > takes most of the time.* > *Solutions:* > Move getChildDataAndWatchForNewChildren call into the async code block in > ZKPermissionWatcher#nodeChildrenChanged. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24308) Move hbase-rsgroup code into hbase-server code
Mohammad Arshad created HBASE-24308: --- Summary: Move hbase-rsgroup code into hbase-server code Key: HBASE-24308 URL: https://issues.apache.org/jira/browse/HBASE-24308 Project: HBase Issue Type: Bug Components: rsgroup Affects Versions: 2.2.3 Reporter: Mohammad Arshad Assignee: Mohammad Arshad Keeping rsgroup code into separate module is causing many problem. HBASE-22740 and HBASE-24212 are blocked because of this. In master branch hbase-rsgroup code is already moved into hbase-server. It is better to be in sync with master branch so issue fixes can be applied on branch-2 easily. This jira moves hbase-rsgroup code into hbase-server as it is, does not make change in protobuff etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24011) HMaster does not restart when rsgroup is enabled and /hbase/WALs is moved
[ https://issues.apache.org/jira/browse/HBASE-24011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Arshad resolved HBASE-24011. - Assignee: Mohammad Arshad Resolution: Won't Fix > HMaster does not restart when rsgroup is enabled and /hbase/WALs is moved > - > > Key: HBASE-24011 > URL: https://issues.apache.org/jira/browse/HBASE-24011 > Project: HBase > Issue Type: Bug > Components: rsgroup >Affects Versions: 2.2.3 > Reporter: Mohammad Arshad >Assignee: Mohammad Arshad >Priority: Critical > > HMaster does not restart when rsgroup is enabled and /hbase/WALs is moved > HMaster restarts properly if rsgroup is not enabled even if /hbase/WALs is > moved. > Steps to reproduce: > # start the cluster > # create a table do some put, delete > # kill all the region servers and master > # move WALs directory for backup (-mv /hbase/WALs /hbase/WALs2) > # start the cluster > # Master start fails, initialization keep failing > {code:java} > 2020-03-18 11:42:55,369 ERROR > [ActiveMasterInitializationMonitor-1584511075369] master.HMaster: Master > failed to complete initialization after 90ms. Please consider submitting > a bug report including a thread dump of this process. > {code} > > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24212) HMaster UI hangs when rsgorup is enabled and meta-region is not available
Mohammad Arshad created HBASE-24212: --- Summary: HMaster UI hangs when rsgorup is enabled and meta-region is not available Key: HBASE-24212 URL: https://issues.apache.org/jira/browse/HBASE-24212 Project: HBase Issue Type: Bug Components: rsgroup Affects Versions: 2.2.4, master Reporter: Mohammad Arshad Assignee: Mohammad Arshad HMaster UI hangs when rsgroup is enabled and meta-region is not available. Steps to reproduce: # Cluster: 1 Master, 3 RS # Create rsgroup r1 and r2 # Move rs1 to r1 and rs2 to r2 then all the regions are online on rs3 # Stop rs3 # Now access URL hmaster:Host:infoPort/master-status The page will not open. I think when meta region is not available, we should take the rsgroup information from ZooKeeper and proceed -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24211) Create table is slow in large cluster when AccessController is enabled.
Mohammad Arshad created HBASE-24211: --- Summary: Create table is slow in large cluster when AccessController is enabled. Key: HBASE-24211 URL: https://issues.apache.org/jira/browse/HBASE-24211 Project: HBase Issue Type: Bug Affects Versions: 2.2.4, 1.3.6, master Reporter: Mohammad Arshad Assignee: Mohammad Arshad *Problem:* In HBase 1.3.x large, performance test, cluster (100 RS, 60k tables, 600k regions) a simple table creation takes around 150 seconds. The time taken varies but still takes lot of time. *Analysis:* 1. When HBase creates a table , it calls AssignmentManager#assign(final ServerName destination, final List regions) In AssignmentManager#assign,it calls asyncSetOfflineInZooKeeper(state, cb, destination), and waits in below code loop for 2 minutes. {code:java} if (useZKForAssignment) { // Wait until all unassigned nodes have been put up and watchers set. int total = states.size(); for (int oldCounter = 0; !server.isStopped();) { int count = counter.get(); if (oldCounter != count) { LOG.debug(destination.toString() + " unassigned znodes=" + count + " of total=" + total + "; oldCounter=" + oldCounter); oldCounter = count; } if (count >= total) break; Thread.sleep(5); } } {code} 2. asyncSetOfflineInZooKeeper creates a znode under /hbase/region-in-transition/ and calls exist to ensure that znode is created. This is simple operation should not take much time. Then where the time it taken!!! 3. ZooKeeper client API process watcher notification and async API response through a queue one by one. If there is a delay in any watcher/response processing by the client, in this case HBase, all other response processing is delayed. Then it appears as if API call has taken more time. Same thing happen in this issue. Watcher processing for znode creation under /hbase/acl took most of the time and delayed /hbase/region-in-transition/region znode creation processing. This is why wait in loop was too long. 4. Watcher processing for znode creation under hbase/acl/ calls ZKPermissionWatcher#nodeChildrenChanged, which internally calls ZKUtil.getChildDataAndWatchForNewChildren *which calls ZooKeeper's getData API, in this use case, 60k times which takes most of the time.* *Solutions:* Move getChildDataAndWatchForNewChildren call into the async code block in ZKPermissionWatcher#nodeChildrenChanged. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24025) Improve performance of move_servers_rsgroup and move_tables_rsgroup by using async region move API
Mohammad Arshad created HBASE-24025: --- Summary: Improve performance of move_servers_rsgroup and move_tables_rsgroup by using async region move API Key: HBASE-24025 URL: https://issues.apache.org/jira/browse/HBASE-24025 Project: HBase Issue Type: Improvement Components: rsgroup Reporter: Mohammad Arshad Assignee: Mohammad Arshad Currently move_servers_rsgroup and move_tables_rsgroup commands and APIs are taking lot of time. In my test environment, to move a server with 100 regions it takes around 137 seconds. Similarly it takes around same time to move a table with 100 regions to other group. The time taken in rsgroup meta update is negligible. Almost all the time is taken in region moment. This is happening because region is moved serially using getAssignmentManager().move(region) API API getAssignmentManager().moveAsync(regionplan) can be used to move the regions in parallel to improve the performance of region group move servers and tables commands and APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24019) Correct exception messages for table null and namespace unavailable.
Mohammad Arshad created HBASE-24019: --- Summary: Correct exception messages for table null and namespace unavailable. Key: HBASE-24019 URL: https://issues.apache.org/jira/browse/HBASE-24019 Project: HBase Issue Type: Bug Reporter: Mohammad Arshad Assignee: Mohammad Arshad Exception message for following two scenarios should be corrected. 1. Change message to "The list of tables cannot be null." in below code {code:java} @Override public void moveTables(Set tables, String targetGroup) throws IOException { if (tables == null) { throw new ConstraintException("The list of servers cannot be null."); } {code} 2. Change the message to "Region server group "+group+" does not exist" in below code. {code:java} public void preCreateNamespace(ObserverContext ctx, NamespaceDescriptor ns) throws IOException { String group = ns.getConfigurationValue(RSGroupInfo.NAMESPACE_DESC_PROP_GROUP); if(group != null && groupAdminServer.getRSGroupInfo(group) == null) { throw new ConstraintException("Region server group "+group+" does not exit"); } } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24011) HMaster does not restart when rsgroup is enabled and /hbase/WALs is moved
Mohammad Arshad created HBASE-24011: --- Summary: HMaster does not restart when rsgroup is enabled and /hbase/WALs is moved Key: HBASE-24011 URL: https://issues.apache.org/jira/browse/HBASE-24011 Project: HBase Issue Type: Bug Components: rsgroup Affects Versions: 2.2.3 Reporter: Mohammad Arshad HMaster does not restart when rsgroup is enabled and /hbase/WALs is moved HMaster restarts properly if rsgroup is not enabled even if /hbase/WALs is moved. Steps to reproduce: # start the cluster # create a table do some put, delete # kill all the region servers and master # move WALs directory for backup (-mv /hbase/WALs /hbase/WALs2) # start the cluster # Master start fails, initialization keep failing {code:java} 2020-03-18 11:42:55,369 ERROR [ActiveMasterInitializationMonitor-1584511075369] master.HMaster: Master failed to complete initialization after 90ms. Please consider submitting a bug report including a thread dump of this process. {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23884) list_rsgroups didn't get the correct result
[ https://issues.apache.org/jira/browse/HBASE-23884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Arshad resolved HBASE-23884. - Resolution: Invalid Issue is invalid. Closing it now. Please feel free to reopen if you disagree. > list_rsgroups didn't get the correct result > --- > > Key: HBASE-23884 > URL: https://issues.apache.org/jira/browse/HBASE-23884 > Project: HBase > Issue Type: Bug > Components: rsgroup, shell >Affects Versions: 2.2.3 >Reporter: Bo Cui >Assignee: Mohammad Arshad >Priority: Minor > Attachments: image-2020-02-23-09-53-49-594.png > > > if my_group does not exist, list_rsgroups will get 0 row(s), > but i think list_rsgroups should throw NOTEXISTEXCEPTION > !image-2020-02-23-09-53-49-594.png|width=449,height=143! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23905) move_namespaces_rsgroup is not moving namespace into desired rsgroup
[ https://issues.apache.org/jira/browse/HBASE-23905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Arshad resolved HBASE-23905. - Resolution: Invalid Issue is invalid. Closing it now. Please feel free to reopen if you disagree. > move_namespaces_rsgroup is not moving namespace into desired rsgroup > > > Key: HBASE-23905 > URL: https://issues.apache.org/jira/browse/HBASE-23905 > Project: HBase > Issue Type: Bug > Components: rsgroup >Affects Versions: 2.2.3 >Reporter: Saurav Mehta >Assignee: Mohammad Arshad >Priority: Major > > When creating a namespace and specifying a rs group in hbase.rsgroup.name, > the namespace gets associated with the mentioned rs group. However later, the > namespace does not move to another rs group when using > "move_namespace_rsgroup". > > Steps to reproduce the issue: > # create a rs group 'r1' and add a region server in it. > # *create_namespace 'namespace',\{METHOD => 'set', 'hbase.rsgroup.name' => > 'r1'}* > # *describe_namespace 'namespace'* > # *move_namespaces_rsgroup 'default',['namespace']* > # *describe_namespace 'namespace'* > Before moving the namespace into another rs group, it will show rsgroup r1 > but even after the step 4, it still shows same description of the namespace. > This bug is not allowing me to remove a rs group because it keeps on telling > "r1 cannot be deleted as namespace 'namespace' is still associated with the > rs group 'r1' ". > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-22655) tags info is missing in Get and Scan Result Cells
[ https://issues.apache.org/jira/browse/HBASE-22655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Arshad resolved HBASE-22655. - Resolution: Invalid Issue is invalid > tags info is missing in Get and Scan Result Cells > - > > Key: HBASE-22655 > URL: https://issues.apache.org/jira/browse/HBASE-22655 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.5, 1.3.5 > Reporter: Mohammad Arshad > Assignee: Mohammad Arshad >Priority: Major > Fix For: 1.3.6 > > > tags info is missing in Get and Scan Result Cells. > It is because in Region Server while sending back Result, targs array is not > put into the CellProtos.Cell. > {code} > org.apache.hadoop.hbase.protobuf.ProtobufUtil.toCell(Cell) > {code} > There should be some option to get Cells with tags in it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22655) tags info is missing in Get and Scan Result Cells
Mohammad Arshad created HBASE-22655: --- Summary: tags info is missing in Get and Scan Result Cells Key: HBASE-22655 URL: https://issues.apache.org/jira/browse/HBASE-22655 Project: HBase Issue Type: Bug Affects Versions: 1.3.5, 2.1.5 Reporter: Mohammad Arshad Assignee: Mohammad Arshad Fix For: 1.3.6 tags info is missing in Get and Scan Result Cells. It is because in Region Server while sending back Result, targs array is not put into the CellProtos.Cell. {code} org.apache.hadoop.hbase.protobuf.ProtobufUtil.toCell(Cell) {code} There should be some option to get Cells with tags in it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19423) Replication entries are not filtered correctly when replication scope is set through WAL Co-processor
Mohammad Arshad created HBASE-19423: --- Summary: Replication entries are not filtered correctly when replication scope is set through WAL Co-processor Key: HBASE-19423 URL: https://issues.apache.org/jira/browse/HBASE-19423 Project: HBase Issue Type: Bug Reporter: Mohammad Arshad Fix For: 2.0.0, 1.4.0, 1.3.2 Replicaion scope set in WALObserver is getting reset in Replication.scopeWALEdits(). Because of this problem custom implementation of WALObserver can not be used as a replication filter. Suppose WALObserver implementation has logic to filter all entries from family f2 {code} // Filter all family f2 rows public static class ReplicationFilterWALCoprocessor extends BaseWALObserver { @Override public boolean preWALWrite(ObserverContext ctx, HRegionInfo info, WALKey logKey, WALEdit logEdit) throws IOException { ArrayList cells = logEdit.getCells(); for (Cell cell : cells) { byte[] fam = CellUtil.cloneFamily(cell); if ("f2".equals(Bytes.toString(fam))) { NavigableMap scopes = logKey.getScopes(); if (scopes == null) { logKey.setScopes(new TreeMap(Bytes.BYTES_COMPARATOR)); } logKey.getScopes().put(fam, HConstants.REPLICATION_SCOPE_LOCAL); } } return false; } } {code} This logic can not work as {{org.apache.hadoop.hbase.replication.regionserver.Replication.scopeWALEdits()}} recreates and populates scopes. *SOLUTION:* In Replication.scopeWALEdits(), create scopes map only if WALKey does not have it. {code} NavigableMap scopes = logKey.getScopes(); if (scopes == null) { scopes = new TreeMap(Bytes.BYTES_COMPARATOR); } {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)