[jira] [Created] (HBASE-26856) BufferedDataBlockEncoder.OnheapDecodedCell value can get corrupted

2022-03-16 Thread Mohammad Arshad (Jira)
Mohammad Arshad created HBASE-26856:
---

 Summary: BufferedDataBlockEncoder.OnheapDecodedCell value can get 
corrupted
 Key: HBASE-26856
 URL: https://issues.apache.org/jira/browse/HBASE-26856
 Project: HBase
  Issue Type: Bug
Reporter: Mohammad Arshad
Assignee: Mohammad Arshad


In our production cluster we observed the cell value is modified after 
successful scanner read. After analyzing we observed OnheapDecodedCell is not 
created properly.

We create OnheapDecodedCell with complete valAndTagsBuffer underlying array.

{code:java}
 return new OnheapDecodedCell(Bytes.copy(keyBuffer, 0, this.keyLength),
  currentKey.getRowLength(), currentKey.getFamilyOffset(), 
currentKey.getFamilyLength(),
  currentKey.getQualifierOffset(), currentKey.getQualifierLength(),
  currentKey.getTimestamp(), currentKey.getTypeByte(), 
valAndTagsBuffer.array(),
  valAndTagsBuffer.arrayOffset() + vOffset, this.valueLength, 
memstoreTS, tagsArray,
  tOffset, this.tagsLength);
{code}

Here we are passing valAndTagsBuffer.array() for value extraction.


The underlying array will be modified if it is altered anywhere. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-25571) Compilation in branch-2 after HBASE-25364

2021-02-12 Thread Mohammad Arshad (Jira)
Mohammad Arshad created HBASE-25571:
---

 Summary: Compilation in branch-2 after  HBASE-25364
 Key: HBASE-25571
 URL: https://issues.apache.org/jira/browse/HBASE-25571
 Project: HBase
  Issue Type: Bug
Reporter: Mohammad Arshad
Assignee: Mohammad Arshad


{code:java}
[ERROR] COMPILATION ERROR :
[INFO] -
[ERROR] 
/D:/code/apache/forked/hbaseBranch2/hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFile.java:[737,12]
 cannot find symbol
  symbol:   method getCell(byte[],byte[],byte[])
  location: class org.apache.hadoop.hbase.io.hfile.TestHFile
[ERROR] 
/D:/code/apache/forked/hbaseBranch2/hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFile.java:[738,13]
 cannot find symbol
  symbol:   method getCell(byte[],byte[],byte[])
  location: class org.apache.hadoop.hbase.io.hfile.TestHFile
[ERROR] 
/D:/code/apache/forked/hbaseBranch2/hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFile.java:[744,12]
 cannot find symbol
  symbol:   method getCell(byte[],byte[],byte[])
  location: class org.apache.hadoop.hbase.io.hfile.TestHFile
[ERROR] 
/D:/code/apache/forked/hbaseBranch2/hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFile.java:[745,13]
 cannot find symbol
  symbol:   method getCell(byte[],byte[],byte[])
  location: class org.apache.hadoop.hbase.io.hfile.TestHFile
[INFO] 4 errors

{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25503) HBase code download is failing on windows with invalid path error

2021-01-13 Thread Mohammad Arshad (Jira)
Mohammad Arshad created HBASE-25503:
---

 Summary: HBase code download is failing on windows with invalid 
path error
 Key: HBASE-25503
 URL: https://issues.apache.org/jira/browse/HBASE-25503
 Project: HBase
  Issue Type: Bug
Reporter: Mohammad Arshad
Assignee: Mohammad Arshad


git pull command is failing with "error: invalid path"
{noformat}
Host1 MINGW64 /d/hbase (master)
$ git pull
error: invalid path 
'dev-support/design-docs/HBASE-18070-ROOT_hbase:meta_Region_Replicas.pdf'
Updating 2e96a5b2d3..dfefff7e59{noformat}
This problem is coming only on windows machines. Tried on windows 7 and windows 
10 both have problem. Searched on net, seems no any easy solution.

Problem is because file HBASE-18070-ROOT_hbase:meta_Region_Replicas.pdf has 
colon in its name.

To fix the problem we should remove the colon : from file name



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25492) Create table with rsgroup

2021-01-10 Thread Mohammad Arshad (Jira)
Mohammad Arshad created HBASE-25492:
---

 Summary: Create table with rsgroup
 Key: HBASE-25492
 URL: https://issues.apache.org/jira/browse/HBASE-25492
 Project: HBase
  Issue Type: Improvement
  Components: rsgroup
Reporter: Mohammad Arshad
Assignee: Mohammad Arshad
 Fix For: 2.4.1


Currently we need to create table and then move it to desired RSGroup which 
cost HM assignment twice, considering table with many regions it will be huge.

We have a use case where user want to create table with rsgroup. HBASE-22695 
already implemented this feature in master branch but not ported to branch-2 
because master and branch-2 implementations are different. This JIRA aims to 
implement this feature in branch-2. 

But unlike master branch, rsgroup information from TableDescriptor is used only 
while creating the table to keep the changes minimum.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24212) HMaster UI hangs when rsgorup is enabled and meta-region is not available

2021-01-07 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad resolved HBASE-24212.
-
Resolution: Not A Problem

* Moving rsgroup code to server is not the available option.
* After HBASE-22738 & HBASE-24760 meta would not be offline in the mentioned 
scenario.


> HMaster UI hangs when rsgorup is enabled and meta-region is not available
> -
>
> Key: HBASE-24212
> URL: https://issues.apache.org/jira/browse/HBASE-24212
> Project: HBase
>  Issue Type: Bug
>  Components: rsgroup
>Affects Versions: 2.2.4
>    Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Major
>
> HMaster UI hangs when rsgroup is enabled and meta-region is not available.
> Steps to reproduce:
>  # Cluster: 1 Master, 3 RS
>  # Create rsgroup r1 and r2
>  # Move rs1 to r1 and rs2 to r2 then all the regions are online on rs3
>  # Stop rs3
>  # Now access URL hmaster:Host:infoPort/master-status The page will not open.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24676) Meta region assignment is blocked when all RS in meta table group are restarted.

2021-01-07 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad resolved HBASE-24676.
-
Resolution: Not A Problem

After HBASE-22738 & HBASE-24760 if no live server in current group then table 
regions will be moved to default group or other group where live servers exist, 
so meta will be assigned if any live server in cluster

> Meta region assignment is blocked when all RS in meta table group are 
> restarted.
> 
>
> Key: HBASE-24676
> URL: https://issues.apache.org/jira/browse/HBASE-24676
> Project: HBase
>  Issue Type: Bug
>  Components: rsgroup
>Affects Versions: 2.2.3
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Major
>
> This issue happened in a test cluster. The issue does not reproduce easily. 
> But we can reproduce it with debug points in code.
> Steps to reproduce:
> # Install a HBase cluster with three RS(rs1,rs2 and rs3) and one Master
> # Create two rsgroups r1 and r2 and move rs1 to r1 and rs2 to r2
> {code}
> add_rsgroup 'r1';add_rsgroup 'r2';move_servers_rsgroup 
> 'r1',['rs1Host:16020'];move_servers_rsgroup 'r2',['rs2Host:16020']
> {code}
> # Create a table t1
> {code}create 't1','f1','f2';put't1','r1','f1:c1','v1'{code}
> # Start debugging master, put debug point in while loop of 
> {code}org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl.ServerEventsListenerThread#run{code}
>  method.
> # Stop rs3
> # When debug flow comes, wait around 30 seconds to let the meta be offline 
> and then let the debug flow execute. By now meta will be offline as rs3 is 
> stopped. HMaster UI will hang as meta is offline.
> # Now start rs3, after start meta should be online and Master UI should open.
> # No, still master UI hangs, then you have reproduced the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HBASE-24025) Improve performance of move_servers_rsgroup and move_tables_rsgroup by using async region move API

2020-10-09 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad reopened HBASE-24025:
-

> Improve performance of move_servers_rsgroup and move_tables_rsgroup by using 
> async region move API
> --
>
> Key: HBASE-24025
> URL: https://issues.apache.org/jira/browse/HBASE-24025
> Project: HBase
>  Issue Type: Improvement
>  Components: rsgroup
>    Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
>
> Currently move_servers_rsgroup and move_tables_rsgroup commands and APIs are 
> taking lot of time.
> In my test environment, to move a server with 100 regions it takes around 137 
> seconds.
> Similarly it takes around same time to move a table with 100 regions to other 
> group.
> The time taken in rsgroup meta update is  negligible. Almost all the time is 
> taken in region moment. This is happening because region is moved serially 
> using  getAssignmentManager().move(region) API
> API getAssignmentManager().moveAsync(regionplan)  can be used to move the 
> regions in parallel to improve the performance of region group move servers 
> and tables commands and APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25069) Display region name instead of encoded region name for holes in HBCK report page.

2020-09-18 Thread Mohammad Arshad (Jira)
Mohammad Arshad created HBASE-25069:
---

 Summary:  Display region name instead of encoded region name for 
holes in HBCK report page.
 Key: HBASE-25069
 URL: https://issues.apache.org/jira/browse/HBASE-25069
 Project: HBase
  Issue Type: Improvement
  Components: hbck
Affects Versions: 2.3.1, 3.0.0-alpha-1
Reporter: Mohammad Arshad
Assignee: Mohammad Arshad
 Attachments: image-2020-09-19-11-39-01-755.png

In HMaster UI, in HBCK report, holes display only encoded region names. 

In display region encode name does not give any information. like which table, 
what's the start key etc. 

I think it is better to display the region name instead of region encoded name.

!image-2020-09-19-11-39-01-755.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25009) Hbck chore logs wrong message when loading regions from FS

2020-09-10 Thread Mohammad Arshad (Jira)
Mohammad Arshad created HBASE-25009:
---

 Summary: Hbck chore logs wrong message when loading regions from FS
 Key: HBASE-25009
 URL: https://issues.apache.org/jira/browse/HBASE-25009
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.3.1, 3.0.0-alpha-1
Reporter: Mohammad Arshad
Assignee: Mohammad Arshad


{code:java}
LOG.info("Loaded {} regions from {} regionservers' reports and found {} orphan 
regions",
numRegions, rsReports.size(), orphanRegionsOnFS.size());
{code}
In above log message orphanRegionsOnFS.size() should be replaced with 
orphanRegionsOnRS.size() as the regions are loaded from RS not form FS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24995) MetaFixer fails to fix overlaps when multiple tables have overlaps

2020-09-07 Thread Mohammad Arshad (Jira)
Mohammad Arshad created HBASE-24995:
---

 Summary: MetaFixer fails to fix overlaps when multiple tables have 
overlaps
 Key: HBASE-24995
 URL: https://issues.apache.org/jira/browse/HBASE-24995
 Project: HBase
  Issue Type: Bug
  Components: hbck2
Affects Versions: 2.2.3, 3.0.0-alpha-1
Reporter: Mohammad Arshad
Assignee: Mohammad Arshad


MetaFixer fails to fix overlaps when multiple tables have overlaps
*Steps to reproduce from UT.*
# Create table t1 and t2 with split keys, ["bbb", "ccc", "ddd", "eee"]
# Create extra region in both t1 and t2 with start key "bbb" and end key "ddd"
# Run catalog janitor, It will report total 4 overlaps, 2 from each table.
# Run MetaFixer, wait for merges to finish.
# Run the catalog janitor again and verify report, there should not be any 
overlap
# Overlap still exists. Reproduced!!!

*Analysis.*
* When I run the same scenario for just one table t1, overlaps are fixed 
successfully.
* Seems problem with MetaFixer#calculateMerges. 
* I think merges should be calculated within a table. Across the table merge 
does not have significance.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


RE: Fixing Catalog Janitor and HBCK chore reported inconsistencies automatically

2020-08-24 Thread Mohammad arshad
> In hbase1?
No not in hbase1, Inconsistency occurred in hbase 2.2.3.
> One suggestion is that you move to 2.3.1 rather than 2.2.3. It has the 
> benefit of experience running the 2.2 line and has even more fixes for 
> anomalies applied
Good point. Thanks for suggesting, will surely consider upgrade to 2.3.x

Thanks 
-Arshad
-Original Message-
From: Stack [mailto:st...@duboce.net] 
Sent: Monday, August 24, 2020 5:01 AM
To: HBase Dev List 
Cc: Hbase-User 
Subject: Re: Fixing Catalog Janitor and HBCK chore reported inconsistencies 
automatically

On Thu, Aug 20, 2020 at 11:48 PM Mohammad arshad 
wrote:

> 
> > Are you regularly experiencing consistency issues? If so, what 
> > version
> of hbase are you running?
> We have experienced some inconsistencies like unknown servers, double 
> assignment or RIT (HBASE-24885).


In hbase1?



> In future we may witness some other inconsistencies, so we should make
> HBCK2 mature enough to handle them. Off course we will learn from the 
> new bug and fix them in code rather than depending on HBCK2.
>
> Agree and if you run into issues, talk out loud here because we want 
> to
hear about them and fix them as the come up.


> Currently we are upgrading from 1.x to 2.2.3 version. Just a thought 
> process to ease HBase operator job as per some operator feedback, we 
> can discuss and come up with some ideas to handle this.
>
>
Appreciated.

One suggestion is that you move to 2.3.1 rather than 2.2.3. It has the benefit 
of experience running the 2.2 line and has even more fixes for anomalies 
applied. Also, the master-side of the hbck2 invocations has had a bunch of 
improvement made so when they run, they are more thorough.

Yours,
S





> Thanks
> -Arshad
>
> -Original Message-
> From: Stack [mailto:st...@duboce.net]
> Sent: Tuesday, August 18, 2020 9:20 AM
> To: Hbase-User 
> Cc: dev@hbase.apache.org
> Subject: Re: Fixing Catalog Janitor and HBCK chore reported 
> inconsistencies automatically
>
> On Thu, Aug 6, 2020 at 10:10 PM Mohammad arshad < 
> mohammad.ars...@huawei.com>
> wrote:
>
> > Hello HBase Folks
> >
> > Currently Catalog Janitor (CJ) and HBCK chore reported 
> > inconsistencies to be fixed by manually by executing HBCK2 commands.
> > HBCK2 requires high HBase skills. It is bit difficult for 
> > maintenance personals to figure out which command, when and in which 
> > order to be executed.
> >
> >
> True.
>
> Is there any effort going on in community to automate fixing these
> > inconsistencies?  I also would like to contribute there.
> >
> > I was thinking, maybe we can expose CJ and HBCK chore reported 
> > inconsistencies through a new master API and then provide option to 
> > fix these inconsistencies. Basically adding two new commands in 
> > HBCK2 -listInconsistencies list CJ and HBCK chore reported 
> > inconsistencies -fixInconsistencies  fix CJ and HBCK chore reported 
> > inconsistencies (Not sure if possible to fix all inconsistencies, 
> > need to analyze all inconsistencies case by case, but some are very 
> > straight forward for example holes and overlap)
> >
> >
> So, a 'god' command that will fix any issue found?
>
> That is tough. You've seen the philosophy section on hbck2, of how it 
> makes no claims to being so capable [1]?
>
> We are trying to get to a place where hbck2 is increasingly less necessary.
> The general idea is that inconsistencies are caused by bugs or oversight.
> As time goes by, we've been plugging the holes. Upgrading hbase gains 
> you the fixes making the need for hbck2 less.
>
> But as you state above, when there is an issue, it can be hard for the 
> operator to figure how to make fixes. We've been trying to improve 
> this state with documentation in the UI up on the 'HBCK Report' page 
> and elsewhere but there is room for improvement.
>
> We've also been trying to aggregate on the hbck2 side so that commands 
> become increasingly 'macro', fixing a whole category of problem types 
> rather than an affliction at a time. This should make the tool easier 
> to use. The 'fixMeta' command is a good example here as it fixes any 
> holes or overlaps found in hbase:meta (This is probably ripe for 
> conversion into an auto-repair run on occasion by the Master). Another 
> way in which we've been trying to make improvement is by obsoleting 
> commands in hbck2 as we fix the root cause that required the hbck2 
> command option to be needed in the first place.
>
> CJ and the HBCK Chore can report on inconsistencies found. It is 
> another thing altogether having them go ahead and repair any issues 
> found mostly because we are not yet confident the repair won't cause 
> more damage than it fixes.
>
>
> > Any thoughts/inputs highly appreciated.
> >
> >
> Are you regularly experiencing consistency issues? If so, what version 
> of hbase are you running?
>
> Thanks,
> S
>
>
> 1.
>
> https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2
> #philosophy
>
>
> > Regards
> > -Arshad
> >
> >
> >
>


[jira] [Created] (HBASE-24940) runCatalogJanitor() API should return -1 to indicate already running status

2020-08-23 Thread Mohammad Arshad (Jira)
Mohammad Arshad created HBASE-24940:
---

 Summary: runCatalogJanitor() API should return -1 to indicate 
already running status
 Key: HBASE-24940
 URL: https://issues.apache.org/jira/browse/HBASE-24940
 Project: HBase
  Issue Type: Improvement
Affects Versions: 2.2.3
Reporter: Mohammad Arshad
Assignee: Mohammad Arshad


runCleanerChore() API returns false if hbck chore is already running. This is 
quite helpful in many cases.

runCatalogJanitor() API  is not indicating whether scan is already running. 
Also it is not doing anything in case scan is already running.

I think we should return -1 to indicate already running status.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


RE: Fixing Catalog Janitor and HBCK chore reported inconsistencies automatically

2020-08-20 Thread Mohammad arshad
Thank you sir for your very informative response.

> So, a 'god' command that will fix any issue found?
Yes, the idea was to make -fixInconsistencies  god command, fixing all 
inconstancies reported in both CJ and hbck chore. But as you rightly pointed 
out that we are not yet confident the repair won't cause more damage than it 
fixes, on second thought automating fixes at this point may not be appropriate 
as efficacy of all the fixes is not yet proved. 

> Are you regularly experiencing consistency issues? If so, what version of 
> hbase are you running?
We have experienced some inconsistencies like unknown servers, double 
assignment or RIT (HBASE-24885). In future we may witness some other 
inconsistencies, so we should make HBCK2 mature enough to handle them. Off 
course we will learn from the new bug and fix them in code rather than 
depending on HBCK2.

Currently we are upgrading from 1.x to 2.2.3 version. Just a thought process to 
ease HBase operator job as per some operator feedback, we can discuss and come 
up with some ideas to handle this.

Thanks
-Arshad

-Original Message-
From: Stack [mailto:st...@duboce.net] 
Sent: Tuesday, August 18, 2020 9:20 AM
To: Hbase-User 
Cc: dev@hbase.apache.org
Subject: Re: Fixing Catalog Janitor and HBCK chore reported inconsistencies 
automatically

On Thu, Aug 6, 2020 at 10:10 PM Mohammad arshad 
wrote:

> Hello HBase Folks
>
> Currently Catalog Janitor (CJ) and HBCK chore reported inconsistencies 
> to be fixed by manually by executing HBCK2 commands.
> HBCK2 requires high HBase skills. It is bit difficult for maintenance 
> personals to figure out which command, when and in which order to be 
> executed.
>
>
True.

Is there any effort going on in community to automate fixing these
> inconsistencies?  I also would like to contribute there.
>
> I was thinking, maybe we can expose CJ and HBCK chore reported 
> inconsistencies through a new master API and then provide option to 
> fix these inconsistencies. Basically adding two new commands in HBCK2 
> -listInconsistencies list CJ and HBCK chore reported inconsistencies 
> -fixInconsistencies  fix CJ and HBCK chore reported inconsistencies 
> (Not sure if possible to fix all inconsistencies, need to analyze all 
> inconsistencies case by case, but some are very straight forward for 
> example holes and overlap)
>
>
So, a 'god' command that will fix any issue found?

That is tough. You've seen the philosophy section on hbck2, of how it makes no 
claims to being so capable [1]?

We are trying to get to a place where hbck2 is increasingly less necessary.
The general idea is that inconsistencies are caused by bugs or oversight.
As time goes by, we've been plugging the holes. Upgrading hbase gains you the 
fixes making the need for hbck2 less.

But as you state above, when there is an issue, it can be hard for the operator 
to figure how to make fixes. We've been trying to improve this state with 
documentation in the UI up on the 'HBCK Report' page and elsewhere but there is 
room for improvement.

We've also been trying to aggregate on the hbck2 side so that commands become 
increasingly 'macro', fixing a whole category of problem types rather than an 
affliction at a time. This should make the tool easier to use. The 'fixMeta' 
command is a good example here as it fixes any holes or overlaps found in 
hbase:meta (This is probably ripe for conversion into an auto-repair run on 
occasion by the Master). Another way in which we've been trying to make 
improvement is by obsoleting commands in hbck2 as we fix the root cause that 
required the hbck2 command option to be needed in the first place.

CJ and the HBCK Chore can report on inconsistencies found. It is another thing 
altogether having them go ahead and repair any issues found mostly because we 
are not yet confident the repair won't cause more damage than it fixes.


> Any thoughts/inputs highly appreciated.
>
>
Are you regularly experiencing consistency issues? If so, what version of hbase 
are you running?

Thanks,
S


1.
https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2#philosophy


> Regards
> -Arshad
>
>
>


Fixing Catalog Janitor and HBCK chore reported inconsistencies automatically

2020-08-06 Thread Mohammad arshad
Hello HBase Folks

Currently Catalog Janitor (CJ) and HBCK chore reported inconsistencies to be 
fixed by manually by executing HBCK2 commands.
HBCK2 requires high HBase skills. It is bit difficult for maintenance personals 
to figure out which command, when and in which order to be executed.

Is there any effort going on in community to automate fixing these 
inconsistencies?  I also would like to contribute there.

I was thinking, maybe we can expose CJ and HBCK chore reported inconsistencies 
through a new master API and then provide option to fix these inconsistencies. 
Basically adding two new commands in HBCK2
-listInconsistencies list CJ and HBCK chore reported inconsistencies
-fixInconsistencies  fix CJ and HBCK chore reported inconsistencies (Not sure 
if possible to fix all inconsistencies, need to analyze all inconsistencies 
case by case, but some are very straight forward for example holes and overlap)

Any thoughts/inputs highly appreciated.

Regards
-Arshad




[jira] [Created] (HBASE-24675) On Master restart all servers are assigned to default rsgroup.

2020-07-03 Thread Mohammad Arshad (Jira)
Mohammad Arshad created HBASE-24675:
---

 Summary: On Master restart all servers are assigned to default 
rsgroup.
 Key: HBASE-24675
 URL: https://issues.apache.org/jira/browse/HBASE-24675
 Project: HBase
  Issue Type: Bug
  Components: rsgroup
Affects Versions: 2.2.3
Reporter: Mohammad Arshad
Assignee: Mohammad Arshad


Steps to reproduce:
# Install a HBase cluster with three RS(rs1,rs2 and rs3) and one Master
# Create two rsgroups r1 and r2 and move rs1 to r1 and rs2 to r2
{code:java}
add_rsgroup 'r1';add_rsgroup 'r2';move_servers_rsgroup 
'r1',['host1:16020'];move_servers_rsgroup 'r2',['host2:16020']
{code}
# Restart Master
# Run list_rsgroups for hbase shell, all region servers are assigned to default 
regroup.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24676) Meta region assignment is blocked when all RS in meta table group are restarted.

2020-07-03 Thread Mohammad Arshad (Jira)
Mohammad Arshad created HBASE-24676:
---

 Summary: Meta region assignment is blocked when all RS in meta 
table group are restarted.
 Key: HBASE-24676
 URL: https://issues.apache.org/jira/browse/HBASE-24676
 Project: HBase
  Issue Type: Bug
  Components: rsgroup
Affects Versions: 2.2.3
Reporter: Mohammad Arshad
Assignee: Mohammad Arshad


This issue happened in a test cluster. The issue does not reproduce easily. But 
we can reproduce it with debug points in code.
Steps to reproduce:

# Install a HBase cluster with three RS(rs1,rs2 and rs3) and one Master
# Create two rsgroups r1 and r2 and move rs1 to r1 and rs2 to r2
{code}
add_rsgroup 'r1';add_rsgroup 'r2';move_servers_rsgroup 
'r1',['rs1Host:16020'];move_servers_rsgroup 'r2',['rs2Host:16020']
{code}
# Create a table t1
{code}create 't1','f1','f2';put't1','r1','f1:c1','v1'{code}
# Start debugging master, put debug point in while loop of 
{code}org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl.ServerEventsListenerThread#run{code}
 method.
# Stop rs3
# When debug flow comes, wait around 30 seconds to let the meta be offline and 
then let the debug flow execute. By now meta will be offline as rs3 is stopped. 
HMaster UI will hang as meta is offline.
# Now start rs3, after start meta should be online and Master UI should open.
# No, still master UI hangs, then you have reproduced the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24211) Create table is slow in large cluster when AccessController is enabled.

2020-05-11 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad resolved HBASE-24211.
-
Fix Version/s: 2.4.0
   Resolution: Fixed

> Create table is slow in large cluster when AccessController is enabled.
> ---
>
> Key: HBASE-24211
> URL: https://issues.apache.org/jira/browse/HBASE-24211
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.6, master, 2.2.4
>        Reporter: Mohammad Arshad
>    Assignee: Mohammad Arshad
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0, 1.7.0, 2.4.0
>
>
> *Problem:*
> In HBase 1.3.x  large, performance test, cluster (100 RS, 60k tables, 600k 
> regions) a simple table creation takes around 150 seconds. The time taken 
> varies but still takes lot of time.
> *Analysis:*
> 1. When HBase creates a table , it calls AssignmentManager#assign(final 
> ServerName destination, final List regions)
>  In AssignmentManager#assign,it calls asyncSetOfflineInZooKeeper(state, cb, 
> destination), and waits in below code loop for 2 minutes. 
> {code:java}
>  if (useZKForAssignment) {
>   // Wait until all unassigned nodes have been put up and watchers 
> set.
>   int total = states.size();
>   for (int oldCounter = 0; !server.isStopped();) {
> int count = counter.get();
> if (oldCounter != count) {
>   LOG.debug(destination.toString() + " unassigned znodes=" + 
> count +
> " of total=" + total + "; oldCounter=" + oldCounter);
>   oldCounter = count;
> }
> if (count >= total) break;
> Thread.sleep(5);
>   }
> }
> {code}
> 2. asyncSetOfflineInZooKeeper creates a znode under 
> /hbase/region-in-transition/ and calls exist to ensure that znode is created. 
> This is simple operation should not take much time. Then where the time it 
> taken!!!
> 3. ZooKeeper client API process watcher notification and async API response 
> through a queue one by one.
>  If there is a delay in any watcher/response processing by the client, in 
> this case HBase, all other response processing is delayed. Then it appears as 
> if API call has taken more time.
>  Same thing happen in this issue.
> Watcher processing for znode creation under /hbase/acl took most of the time 
> and delayed /hbase/region-in-transition/region znode creation processing. 
> This is why wait in loop was too long. 
> 4. Watcher processing for znode creation under hbase/acl/ calls 
> ZKPermissionWatcher#nodeChildrenChanged, which internally calls 
> ZKUtil.getChildDataAndWatchForNewChildren
>  *which calls ZooKeeper's getData API, in this use case, 60k times which 
> takes most of the time.*
> *Solutions:*
>  Move getChildDataAndWatchForNewChildren call into the async code block in 
> ZKPermissionWatcher#nodeChildrenChanged. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24308) Move hbase-rsgroup code into hbase-server code

2020-05-03 Thread Mohammad Arshad (Jira)
Mohammad Arshad created HBASE-24308:
---

 Summary: Move hbase-rsgroup code into hbase-server code
 Key: HBASE-24308
 URL: https://issues.apache.org/jira/browse/HBASE-24308
 Project: HBase
  Issue Type: Bug
  Components: rsgroup
Affects Versions: 2.2.3
Reporter: Mohammad Arshad
Assignee: Mohammad Arshad


Keeping rsgroup code into separate module is causing many problem. HBASE-22740 
and HBASE-24212 are blocked because of this.

In master branch hbase-rsgroup code is already moved into hbase-server. It is 
better to be in sync with master branch so issue fixes can be applied on 
branch-2 easily.

This jira moves hbase-rsgroup code into hbase-server as it is, does not make 
change in protobuff etc.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24011) HMaster does not restart when rsgroup is enabled and /hbase/WALs is moved

2020-04-21 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad resolved HBASE-24011.
-
  Assignee: Mohammad Arshad
Resolution: Won't Fix

> HMaster does not restart when rsgroup is enabled and /hbase/WALs is moved
> -
>
> Key: HBASE-24011
> URL: https://issues.apache.org/jira/browse/HBASE-24011
> Project: HBase
>  Issue Type: Bug
>  Components: rsgroup
>Affects Versions: 2.2.3
>    Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Critical
>
> HMaster does not restart when rsgroup is enabled and /hbase/WALs is moved
> HMaster restarts properly if rsgroup is not enabled even if /hbase/WALs is 
> moved.
> Steps to reproduce:
>  # start the cluster
>  # create a table do some put, delete
>  # kill all the region servers and master
>  # move WALs directory for backup (-mv /hbase/WALs /hbase/WALs2)
>  # start the cluster
>  # Master start fails, initialization keep failing
> {code:java}
> 2020-03-18 11:42:55,369 ERROR 
> [ActiveMasterInitializationMonitor-1584511075369] master.HMaster: Master 
> failed to complete initialization after 90ms. Please consider submitting 
> a bug report including a thread dump of this process.
> {code}
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24212) HMaster UI hangs when rsgorup is enabled and meta-region is not available

2020-04-17 Thread Mohammad Arshad (Jira)
Mohammad Arshad created HBASE-24212:
---

 Summary: HMaster UI hangs when rsgorup is enabled and meta-region 
is not available
 Key: HBASE-24212
 URL: https://issues.apache.org/jira/browse/HBASE-24212
 Project: HBase
  Issue Type: Bug
  Components: rsgroup
Affects Versions: 2.2.4, master
Reporter: Mohammad Arshad
Assignee: Mohammad Arshad


HMaster UI hangs when rsgroup is enabled and meta-region is not available.

Steps to reproduce:
 # Cluster: 1 Master, 3 RS
 # Create rsgroup r1 and r2
 # Move rs1 to r1 and rs2 to r2 then all the regions are online on rs3
 # Stop rs3
 # Now access URL hmaster:Host:infoPort/master-status The page will not open.

I think when meta region is not available, we should take the rsgroup 
information from ZooKeeper and proceed



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24211) Create table is slow in large cluster when AccessController is enabled.

2020-04-17 Thread Mohammad Arshad (Jira)
Mohammad Arshad created HBASE-24211:
---

 Summary: Create table is slow in large cluster when 
AccessController is enabled.
 Key: HBASE-24211
 URL: https://issues.apache.org/jira/browse/HBASE-24211
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.2.4, 1.3.6, master
Reporter: Mohammad Arshad
Assignee: Mohammad Arshad


*Problem:*

In HBase 1.3.x  large, performance test, cluster (100 RS, 60k tables, 600k 
regions) a simple table creation takes around 150 seconds. The time taken 
varies but still takes lot of time.

*Analysis:*

1. When HBase creates a table , it calls AssignmentManager#assign(final 
ServerName destination, final List regions)
 In AssignmentManager#assign,it calls asyncSetOfflineInZooKeeper(state, cb, 
destination), and waits in below code loop for 2 minutes. 
{code:java}
 if (useZKForAssignment) {
  // Wait until all unassigned nodes have been put up and watchers set.
  int total = states.size();
  for (int oldCounter = 0; !server.isStopped();) {
int count = counter.get();
if (oldCounter != count) {
  LOG.debug(destination.toString() + " unassigned znodes=" + count +
" of total=" + total + "; oldCounter=" + oldCounter);
  oldCounter = count;
}
if (count >= total) break;
Thread.sleep(5);
  }
}
{code}
2. asyncSetOfflineInZooKeeper creates a znode under 
/hbase/region-in-transition/ and calls exist to ensure that znode is created. 
This is simple operation should not take much time. Then where the time it 
taken!!!

3. ZooKeeper client API process watcher notification and async API response 
through a queue one by one.
 If there is a delay in any watcher/response processing by the client, in this 
case HBase, all other response processing is delayed. Then it appears as if API 
call has taken more time.
 Same thing happen in this issue.

Watcher processing for znode creation under /hbase/acl took most of the time 
and delayed /hbase/region-in-transition/region znode creation processing. This 
is why wait in loop was too long. 

4. Watcher processing for znode creation under hbase/acl/ calls 
ZKPermissionWatcher#nodeChildrenChanged, which internally calls 
ZKUtil.getChildDataAndWatchForNewChildren
 *which calls ZooKeeper's getData API, in this use case, 60k times which takes 
most of the time.*

*Solutions:*
 Move getChildDataAndWatchForNewChildren call into the async code block in 
ZKPermissionWatcher#nodeChildrenChanged. 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24025) Improve performance of move_servers_rsgroup and move_tables_rsgroup by using async region move API

2020-03-20 Thread Mohammad Arshad (Jira)
Mohammad Arshad created HBASE-24025:
---

 Summary: Improve performance of move_servers_rsgroup and 
move_tables_rsgroup by using async region move API
 Key: HBASE-24025
 URL: https://issues.apache.org/jira/browse/HBASE-24025
 Project: HBase
  Issue Type: Improvement
  Components: rsgroup
Reporter: Mohammad Arshad
Assignee: Mohammad Arshad


Currently move_servers_rsgroup and move_tables_rsgroup commands and APIs are 
taking lot of time.
In my test environment, to move a server with 100 regions it takes around 137 
seconds.
Similarly it takes around same time to move a table with 100 regions to other 
group.

The time taken in rsgroup meta update is  negligible. Almost all the time is 
taken in region moment. This is happening because region is moved serially 
using  getAssignmentManager().move(region) API

API getAssignmentManager().moveAsync(regionplan)  can be used to move the 
regions in parallel to improve the performance of region group move servers and 
tables commands and APIs




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24019) Correct exception messages for table null and namespace unavailable.

2020-03-19 Thread Mohammad Arshad (Jira)
Mohammad Arshad created HBASE-24019:
---

 Summary: Correct exception messages for table null and namespace 
unavailable.
 Key: HBASE-24019
 URL: https://issues.apache.org/jira/browse/HBASE-24019
 Project: HBase
  Issue Type: Bug
Reporter: Mohammad Arshad
Assignee: Mohammad Arshad


Exception message for following two scenarios should be corrected. 

1. Change message to  "The list of tables cannot be null." in below code

{code:java}
@Override
  public void moveTables(Set tables, String targetGroup) throws 
IOException {
if (tables == null) {
  throw new ConstraintException("The list of servers cannot be null.");
}
{code}
2. Change the message to "Region server group "+group+" does not exist" in 
below code.

{code:java}
public void preCreateNamespace(ObserverContext 
ctx,
 NamespaceDescriptor ns) throws IOException {
String group = 
ns.getConfigurationValue(RSGroupInfo.NAMESPACE_DESC_PROP_GROUP);
if(group != null && groupAdminServer.getRSGroupInfo(group) == null) {
  throw new ConstraintException("Region server group "+group+" does not 
exit");
}
  }
{code}


 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24011) HMaster does not restart when rsgroup is enabled and /hbase/WALs is moved

2020-03-18 Thread Mohammad Arshad (Jira)
Mohammad Arshad created HBASE-24011:
---

 Summary: HMaster does not restart when rsgroup is enabled and 
/hbase/WALs is moved
 Key: HBASE-24011
 URL: https://issues.apache.org/jira/browse/HBASE-24011
 Project: HBase
  Issue Type: Bug
  Components: rsgroup
Affects Versions: 2.2.3
Reporter: Mohammad Arshad


HMaster does not restart when rsgroup is enabled and /hbase/WALs is moved

HMaster restarts properly if rsgroup is not enabled even if /hbase/WALs is 
moved.

Steps to reproduce:
 # start the cluster
 # create a table do some put, delete
 # kill all the region servers and master
 # move WALs directory for backup (-mv /hbase/WALs /hbase/WALs2)
 # start the cluster
 # Master start fails, initialization keep failing
{code:java}
2020-03-18 11:42:55,369 ERROR [ActiveMasterInitializationMonitor-1584511075369] 
master.HMaster: Master failed to complete initialization after 90ms. Please 
consider submitting a bug report including a thread dump of this process.
{code}

 

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23884) list_rsgroups didn't get the correct result

2020-03-16 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad resolved HBASE-23884.
-
Resolution: Invalid

Issue is invalid. Closing it now. Please feel free to reopen if you disagree.

> list_rsgroups didn't get the correct result
> ---
>
> Key: HBASE-23884
> URL: https://issues.apache.org/jira/browse/HBASE-23884
> Project: HBase
>  Issue Type: Bug
>  Components: rsgroup, shell
>Affects Versions: 2.2.3
>Reporter: Bo Cui
>Assignee: Mohammad Arshad
>Priority: Minor
> Attachments: image-2020-02-23-09-53-49-594.png
>
>
> if my_group does not exist, list_rsgroups will get 0 row(s),
>  but i think list_rsgroups should throw NOTEXISTEXCEPTION
> !image-2020-02-23-09-53-49-594.png|width=449,height=143!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23905) move_namespaces_rsgroup is not moving namespace into desired rsgroup

2020-03-16 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad resolved HBASE-23905.
-
Resolution: Invalid

Issue is invalid. Closing it now. Please feel free to reopen if you disagree.

> move_namespaces_rsgroup is not moving namespace into desired rsgroup
> 
>
> Key: HBASE-23905
> URL: https://issues.apache.org/jira/browse/HBASE-23905
> Project: HBase
>  Issue Type: Bug
>  Components: rsgroup
>Affects Versions: 2.2.3
>Reporter: Saurav Mehta
>Assignee: Mohammad Arshad
>Priority: Major
>
> When creating a namespace and specifying a rs group in hbase.rsgroup.name, 
> the namespace gets associated with the mentioned rs group. However later, the 
> namespace does not move to another rs group when using 
> "move_namespace_rsgroup".
>  
> Steps to reproduce the issue:
>  # create a rs group 'r1' and add a region server in it.
>  # *create_namespace 'namespace',\{METHOD => 'set', 'hbase.rsgroup.name' => 
> 'r1'}*
>  # *describe_namespace 'namespace'*
>  # *move_namespaces_rsgroup 'default',['namespace']*
>  # *describe_namespace 'namespace'*
> Before moving the namespace into another rs group, it will show rsgroup r1 
> but even after the step 4, it still shows same description of the namespace. 
> This bug is not allowing me to remove a rs group because it keeps on telling 
> "r1 cannot be deleted as namespace 'namespace' is still associated with the 
> rs group 'r1' ".
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-22655) tags info is missing in Get and Scan Result Cells

2019-07-04 Thread Mohammad Arshad (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad resolved HBASE-22655.
-
Resolution: Invalid

Issue is invalid

> tags info is missing in Get and Scan Result Cells
> -
>
> Key: HBASE-22655
> URL: https://issues.apache.org/jira/browse/HBASE-22655
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.5, 1.3.5
>        Reporter: Mohammad Arshad
>    Assignee: Mohammad Arshad
>Priority: Major
> Fix For: 1.3.6
>
>
> tags info is missing in Get and Scan Result Cells. 
> It is because in Region Server while sending back Result,  targs array is not 
> put into the CellProtos.Cell.
> {code}
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.toCell(Cell)
> {code}
> There should be some option to get Cells with tags in it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-22655) tags info is missing in Get and Scan Result Cells

2019-07-03 Thread Mohammad Arshad (JIRA)
Mohammad Arshad created HBASE-22655:
---

 Summary: tags info is missing in Get and Scan Result Cells
 Key: HBASE-22655
 URL: https://issues.apache.org/jira/browse/HBASE-22655
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.3.5, 2.1.5
Reporter: Mohammad Arshad
Assignee: Mohammad Arshad
 Fix For: 1.3.6


tags info is missing in Get and Scan Result Cells. 
It is because in Region Server while sending back Result,  targs array is not 
put into the CellProtos.Cell.
{code}
org.apache.hadoop.hbase.protobuf.ProtobufUtil.toCell(Cell)
{code}
There should be some option to get Cells with tags in it. 





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-19423) Replication entries are not filtered correctly when replication scope is set through WAL Co-processor

2017-12-04 Thread Mohammad Arshad (JIRA)
Mohammad Arshad created HBASE-19423:
---

 Summary: Replication entries are not filtered correctly when 
replication scope is set through WAL Co-processor
 Key: HBASE-19423
 URL: https://issues.apache.org/jira/browse/HBASE-19423
 Project: HBase
  Issue Type: Bug
Reporter: Mohammad Arshad
 Fix For: 2.0.0, 1.4.0, 1.3.2


Replicaion scope set in WALObserver is getting reset in 
Replication.scopeWALEdits(). 
Because of this problem custom implementation of WALObserver can not be used as 
a replication filter.
Suppose WALObserver implementation has logic to filter all entries from family 
f2
{code}
// Filter all family f2 rows
  public static class ReplicationFilterWALCoprocessor extends BaseWALObserver {
@Override
public boolean preWALWrite(ObserverContext ctx,
HRegionInfo info, WALKey logKey, WALEdit logEdit) throws IOException {
  ArrayList cells = logEdit.getCells();
  for (Cell cell : cells) {
byte[] fam = CellUtil.cloneFamily(cell);
if ("f2".equals(Bytes.toString(fam))) {
  NavigableMap scopes = logKey.getScopes();
  if (scopes == null) {
logKey.setScopes(new TreeMap(Bytes.BYTES_COMPARATOR));
  }
  logKey.getScopes().put(fam, HConstants.REPLICATION_SCOPE_LOCAL);
}
  }
  return false;
}
  }
{code}
This logic can not work as 
{{org.apache.hadoop.hbase.replication.regionserver.Replication.scopeWALEdits()}}
 recreates and populates scopes.
*SOLUTION:*
In Replication.scopeWALEdits(), create scopes map only if WALKey does not have 
it.
{code}
NavigableMap scopes = logKey.getScopes();
if (scopes == null) {
  scopes = new TreeMap(Bytes.BYTES_COMPARATOR);
}
{code}

 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)