Re: Hbase Meetup -- Bangalore on 19th Jan
Unfortunately, we had to postpone the above event as we could not get enough traction to get minimum submissions for this event. You can continue submissions, once we can get enough of them, will organise this event. Will keep you posted. --- Mallikarjun On Sun, Dec 17, 2023 at 7:10 PM Mallikarjun wrote: > We are organising hbase-meetup in Bangalore, India on 19th Jan 2024. Reach > out to me if you want to attend. I will arrange for entry passes. > > CALL FOR SPEAKERS is open > https://sessionize.com/apache-hbase-meetup-bangalore/ > > Date: 19th Jan 2024 > Venue: Flipkart Office, Bangalore. > > --- > Mallikarjun >
Hbase Meetup -- Bangalore on 19th Jan
We are organising hbase-meetup in Bangalore, India on 19th Jan 2024. Reach out to me if you want to attend. I will arrange for entry passes. CALL FOR SPEAKERS is open https://sessionize.com/apache-hbase-meetup-bangalore/ Date: 19th Jan 2024 Venue: Flipkart Office, Bangalore. --- Mallikarjun
Re: [NOTICE] Going to cut 3.0.0-alpha-4
It would be good to include these 2 features Parallel Backups: https://issues.apache.org/jira/browse/HBASE-26034 RSGroup Support for Backups: https://issues.apache.org/jira/browse/HBASE-26322 CC: Bryan --- Mallikarjun On Sat, May 20, 2023 at 7:13 PM 张铎(Duo Zhang) wrote: > HBASE-27109 has been merged, which is the last feature I want to land in > the 3.0.0 release. > > The road map is to make 3.0.0-alpha-4, which is the last alpha release for > 3.0.0, and then cut branch-3, focus on stabilizing and make two beta > releases, and finally cut branch-3.0 and make the final 3.0 release. > > Shout if you still have other things that want to land into 3.0.0 release, > as once we cut branch-3, we will hold on the committing of big features so > we can soon make the branch stable enough, and prepare the final 3.0.0 > release. > > Thanks. >
Re: [DISCUSS] Kubernetes Orchestration for ZK, HDFS, and HBase
Hi Nick, I agree with your thought that there is an increasing reliance on kubernetes, more so for complex workloads like hbase deployments because of unavailability of reliable automation frameworks outside of k8s. But I have a slightly different view in terms of how to achieve it. When I was exploring what are the possibilities such as kustomize or helm or operator. I found it can get pretty complex in terms of writing extensible deployment manifest (for different kinds of deployments) with tools like kustomize or helm. Here is our attempt to conterairise hbase with operator --> https://github.com/flipkart-incubator/hbase-k8s-operator --- Mallikarjun On Mon, Mar 13, 2023 at 3:58 PM Nick Dimiduk wrote: > Heya team, > > Over here at $dayjob, we have an increasing reliance on Kubernetes for > both development and production workloads. Our tools are maturing and > we're hoping that they might be of interest to the wider community. > I'd like to see if there's community interest in receiving some/any of > them as a contribution. I think we'll also need a plan from ASF Infra > that makes kubernetes available to us as a project. > We have implemented a basic stack of tools for orchestrating ZK + HDFS > + HBase on Kubernetes. We use this for running a small local dev > cluster via MiniKube/KIND ; for ITBLL on smallish distributed clusters > in a public cloud ; and in production for running clusters of ~100 > Data Nodes/Region Servers in a public cloud. There was an earlier > discussion about using our donation of test hardware for running more > thorough tests in our CI, but one of the limiting factors is full > cluster deployment. I hope that the community might be interested in > receiving this tooling as a foundation for more rigorous correctness > and maybe even performance tests in the open. Furthermore, perhaps the > wider community has interest in an Apache licensed cluster > orchestration tool for other uses. > > Now for some details: The implementation is built on Kustomize, so > it's fundamentally transparent resource specification with yaml > patches for composability; this is in contrast to a solution using > templates with defined capabilities and interfaces. There is no > operator ; it's all coordinated via init/bootstrap containers, shell > scripts, shared volumes for state, &c. For now. > Such a donation will amount to a code drop, which will have its > challenges. I'm motivated via internal processes to carve it into > smaller pieces, and I think that will benefit community review as > well. Perhaps this approach could be used to make the contribution via > a feature branch. > > Is there community interest in adding such a capability to our > maintained responsibilities? I'd hope that we have several volunteers > to work with me through the contribution process, and who are > reasonably confident that they'll be able to help maintain such a > capability going forward. We'll also need someone who can work with > Infra to get us access to Kubernetes cluster(s), via whatever means. > > What do you think? > > Thanks, > Nick & the HBase team at Apple
Re: Branching for 2.6 code line (branch-2.6)
On hbase-backup, we are using in production for more then 1 year. I can vouch for it to be stable enough to be in a release version so that more people can use it and polished it further. On Sun, Oct 16, 2022, 11:25 PM Andrew Purtell wrote: > My understanding is some folks evaluating and polishing TLS for their > production are also considering hbase-backup in the same way, which is why > I linked them together. If that is incorrect then they both are still worth > considering in my opinion but would have a more tenuous link. > > Where we are with hbase-backup is it should probably be ported to where > more people would be inclined to evaluate it, in order for it to make more > progress. A new minor releasing line would fit. On the other hand if it is > too unpolished then the experience would be poor. > > > > On Oct 16, 2022, at 5:35 AM, 张铎 wrote: > > > > I believe the second one is still ongoing? > > > > Andrew Purtell 于2022年10月14日周五 05:37写道: > >> > >> We will begin releasing activity for the 2.6 code line and as a > >> prerequisite to that we shall need to make a new branch branch-2.6 from > >> branch-2. > >> > >> Before we do that let's make sure all commits for the key features of > 2.6 > >> are settled in branch-2 before the branching point. Those key features > are: > >> - mTLS RPC > >> - hbase-backup backport > >> > >> -- > >> Best regards, > >> Andrew >
Blog Posts on "How we solved for Hbase MultiTenancy"
Multi Tenancy was needed because of diverse use cases at our org. So we took hbase and made customisations on top of 2.1 to have better isolation guarantees. I have put down those customisations in the following II part articles. Part I --> https://blog.flipkart.tech/hbase-multi-tenancy-part-i-37cad340c0fa Part II --> https://blog.flipkart.tech/hbase-multi-tenancy-part-ii-79488c19b03d Talk: https://www.youtube.com/watch?v=ttGI9Ma7Xos&t=26s&ab_channel=LifeAtFlipkart Some of those can be contributed back upstream with modifications. Happy to hear thoughts. --- Mallikarjun
[jira] [Created] (HBASE-27238) Backport Backup/Restore to 2.x
Mallikarjun created HBASE-27238: --- Summary: Backport Backup/Restore to 2.x Key: HBASE-27238 URL: https://issues.apache.org/jira/browse/HBASE-27238 Project: HBase Issue Type: New Feature Components: backport, backup&restore Reporter: Mallikarjun Assignee: Mallikarjun -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] Time for 3.0.0-alpha-3
Thanks Duo I have responded to the comments. On Fri, Jun 17, 2022, 6:07 PM 张铎(Duo Zhang) wrote: > I posted some comments on the issues. PTAL. > > Thanks. > > Mallikarjun 于2022年6月17日周五 17:03写道: > > > These 2 patches would be a good addition. They still need to be reviewed. > > > > HBASE-26322 <https://issues.apache.org/jira/browse/HBASE-26322>: > > https://github.com/apache/hbase/pull/4544 > > HBASE-26034 <https://issues.apache.org/jira/browse/HBASE-26034>: > > https://github.com/apache/hbase/pull/4545 > > > > --- > > Mallikarjun > > > > > > On Thu, Jun 16, 2022 at 7:38 PM 张铎(Duo Zhang) > > wrote: > > > > > And also bump this one, feel free to reply here if you want to land > > > something in the final 3.0.0 release. > > > > > > Thanks. > > > > > > Andrew Purtell 于2022年6月13日周一 02:04写道: > > > > > > > +1, thanks for your efforts to push this forward. > > > > > > > > On Sun, Jun 12, 2022 at 7:55 AM 张铎(Duo Zhang) > > > > > wrote: > > > > > > > > > Will roll 3.0.0-alpha-3 in the next week. > > > > > > > > > > And 3.0.0-alpha-4 will be our last alpha release, after which we > will > > > cut > > > > > branch-3 and enter the feature freeze state. > > > > > > > > > > If you want something to be released in 3.0.0 please shout, and > let's > > > see > > > > > whether we can land it before the alpha4 release. For me, I would > > like > > > to > > > > > finish HBASE-27109 and HBASE-27110 before alpha4. > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > -- > > > > Best regards, > > > > Andrew > > > > > > > > Unrest, ignorance distilled, nihilistic imbeciles - > > > > It's what we’ve earned > > > > Welcome, apocalypse, what’s taken you so long? > > > > Bring us the fitting end that we’ve been counting on > > > >- A23, Welcome, Apocalypse > > > > > > > > > >
Re: [DISCUSS] Time for 3.0.0-alpha-3
These 2 patches would be a good addition. They still need to be reviewed. HBASE-26322 <https://issues.apache.org/jira/browse/HBASE-26322>: https://github.com/apache/hbase/pull/4544 HBASE-26034 <https://issues.apache.org/jira/browse/HBASE-26034>: https://github.com/apache/hbase/pull/4545 --- Mallikarjun On Thu, Jun 16, 2022 at 7:38 PM 张铎(Duo Zhang) wrote: > And also bump this one, feel free to reply here if you want to land > something in the final 3.0.0 release. > > Thanks. > > Andrew Purtell 于2022年6月13日周一 02:04写道: > > > +1, thanks for your efforts to push this forward. > > > > On Sun, Jun 12, 2022 at 7:55 AM 张铎(Duo Zhang) > > wrote: > > > > > Will roll 3.0.0-alpha-3 in the next week. > > > > > > And 3.0.0-alpha-4 will be our last alpha release, after which we will > cut > > > branch-3 and enter the feature freeze state. > > > > > > If you want something to be released in 3.0.0 please shout, and let's > see > > > whether we can land it before the alpha4 release. For me, I would like > to > > > finish HBASE-27109 and HBASE-27110 before alpha4. > > > > > > Thanks. > > > > > > > > > -- > > Best regards, > > Andrew > > > > Unrest, ignorance distilled, nihilistic imbeciles - > > It's what we’ve earned > > Welcome, apocalypse, what’s taken you so long? > > Bring us the fitting end that we’ve been counting on > >- A23, Welcome, Apocalypse > > >
Re: [VOTE] First release candidate for hbase 3.0.0-alpha-2 is available for download
Thanks. Got the point. --- Mallikarjun On Tue, Dec 14, 2021 at 7:43 PM 张铎(Duo Zhang) wrote: > We could still add new big features before the last alpha-4 release. > > After alpha-4 is out, we will cut a branch-3 and then start to stabilize > it. For branch-3, we should treat it like branch-2. And after beta1 and > beta2, we will cut branch-3.0, to make the final 3.0.0 release. > > Thanks. > > Mallikarjun 于2021年12月14日周二 10:14写道: > > > Not sure if we can add a new feature at this time of the release cycle. > > > > Since native backup/restore will be part of hbase release for the first > > time, these 2 features would be worthwhile to be considered to be part of > > the release. > > > > 1. Add rsgroup support for Backup > > Details: https://issues.apache.org/jira/browse/HBASE-26322 > > Patch: https://github.com/apache/hbase/pull/3726 > > > > 2. Add support to take parallel backups > > Details: https://issues.apache.org/jira/browse/HBASE-26034 > > Patch: https://github.com/apache/hbase/pull/3766 > > > > > > > > --- > > Mallikarjun > > > > > > On Tue, Dec 14, 2021 at 5:45 AM 张铎(Duo Zhang) > > wrote: > > > > > Thanks Josh! > > > > > > Will make a new RC1 soon. > > > > > > Josh Elser 于2021年12月14日周二 04:57写道: > > > > > > > -1 (binding) > > > > > > > > Log4j2 CVE mitigation is ineffective due an incorrect `export` in > > > > bin/hbase-config.sh. Appears that HBASE-26557 tried to add the > > > > mitigation to HBASE_OPTS but added spaces around either side of the > > > > equals sign, e.g. `export HBASE_OPTS = ".."`, which is invalid > syntax. > > > > > > > > > > > > > > > > $ ./bin/start-hbase.sh > > > > > > > > > > /Users/jelser/hbase300alpha2rc0/hbase300/hbase-3.0.0-alpha-2/bin/hbase-config.sh: > > > > > > > > line 167: export: `=': not a valid identifier > > > > > > > > > > /Users/jelser/hbase300alpha2rc0/hbase300/hbase-3.0.0-alpha-2/bin/hbase-config.sh: > > > > > > > > line 167: export: ` -Dlog4j2.formatMsgNoLookups=true': not a valid > > > > identifier > > > > > > > > > > /Users/jelser/hbase300alpha2rc0/hbase300/hbase-3.0.0-alpha-2/bin/hbase-config.sh: > > > > > > > > line 167: export: `=': not a valid identifier > > > > > > > > > > /Users/jelser/hbase300alpha2rc0/hbase300/hbase-3.0.0-alpha-2/bin/hbase-config.sh: > > > > > > > > line 167: export: ` -Dlog4j2.formatMsgNoLookups=true': not a valid > > > > identifier > > > > > > > > > > > > More naively, and just in plain bash: > > > > > > > > bash-5.1$ export FOO = "$FOO bar" > > > > bash: export: `=': not a valid identifier > > > > bash: export: ` bar': not a valid identifier > > > > bash-5.1$ echo $FOO > > > > > > > > > > > > I'll post a PR to fix after sending this. > > > > > > > > The good: > > > > * xsums and sigs were OK > > > > * Was able to run most unit tests locally > > > > * Was able to launch using bin tarball > > > > * Everything else looks great so far > > > > > > > > - Josh > > > > > > > > On 12/11/21 11:34 AM, Duo Zhang wrote: > > > > > Please vote on this Apache hbase release candidate, > > > > > hbase-3.0.0-alpha-2RC0 > > > > > > > > > > The VOTE will remain open for at least 72 hours. > > > > > > > > > > [ ] +1 Release this package as Apache hbase 3.0.0-alpha-2 > > > > > [ ] -1 Do not release this package because ... > > > > > > > > > > The tag to be voted on is 3.0.0-alpha-2RC0: > > > > > > > > > >https://github.com/apache/hbase/tree/3.0.0-alpha-2RC0 > > > > > > > > > > This tag currently points to git reference > > > > > > > > > >8bca21b47d7c809a0940aea8ed12dd4d2af12432 > > > > > > > > > > The release files, including signatures, digests, as well as > > CHANGES.md > > > > > and RELEASENOTES.md included in this RC can be found at: > > > > > > > > > >https://dist.apache.org/repos/dist/dev/hbase/3.0.0-alpha-2RC0/ > > > >
Re: [VOTE] First release candidate for hbase 3.0.0-alpha-2 is available for download
Not sure if we can add a new feature at this time of the release cycle. Since native backup/restore will be part of hbase release for the first time, these 2 features would be worthwhile to be considered to be part of the release. 1. Add rsgroup support for Backup Details: https://issues.apache.org/jira/browse/HBASE-26322 Patch: https://github.com/apache/hbase/pull/3726 2. Add support to take parallel backups Details: https://issues.apache.org/jira/browse/HBASE-26034 Patch: https://github.com/apache/hbase/pull/3766 --- Mallikarjun On Tue, Dec 14, 2021 at 5:45 AM 张铎(Duo Zhang) wrote: > Thanks Josh! > > Will make a new RC1 soon. > > Josh Elser 于2021年12月14日周二 04:57写道: > > > -1 (binding) > > > > Log4j2 CVE mitigation is ineffective due an incorrect `export` in > > bin/hbase-config.sh. Appears that HBASE-26557 tried to add the > > mitigation to HBASE_OPTS but added spaces around either side of the > > equals sign, e.g. `export HBASE_OPTS = ".."`, which is invalid syntax. > > > > > > > > $ ./bin/start-hbase.sh > > > /Users/jelser/hbase300alpha2rc0/hbase300/hbase-3.0.0-alpha-2/bin/hbase-config.sh: > > > > line 167: export: `=': not a valid identifier > > > /Users/jelser/hbase300alpha2rc0/hbase300/hbase-3.0.0-alpha-2/bin/hbase-config.sh: > > > > line 167: export: ` -Dlog4j2.formatMsgNoLookups=true': not a valid > > identifier > > > /Users/jelser/hbase300alpha2rc0/hbase300/hbase-3.0.0-alpha-2/bin/hbase-config.sh: > > > > line 167: export: `=': not a valid identifier > > > /Users/jelser/hbase300alpha2rc0/hbase300/hbase-3.0.0-alpha-2/bin/hbase-config.sh: > > > > line 167: export: ` -Dlog4j2.formatMsgNoLookups=true': not a valid > > identifier > > > > > > More naively, and just in plain bash: > > > > bash-5.1$ export FOO = "$FOO bar" > > bash: export: `=': not a valid identifier > > bash: export: ` bar': not a valid identifier > > bash-5.1$ echo $FOO > > > > > > I'll post a PR to fix after sending this. > > > > The good: > > * xsums and sigs were OK > > * Was able to run most unit tests locally > > * Was able to launch using bin tarball > > * Everything else looks great so far > > > > - Josh > > > > On 12/11/21 11:34 AM, Duo Zhang wrote: > > > Please vote on this Apache hbase release candidate, > > > hbase-3.0.0-alpha-2RC0 > > > > > > The VOTE will remain open for at least 72 hours. > > > > > > [ ] +1 Release this package as Apache hbase 3.0.0-alpha-2 > > > [ ] -1 Do not release this package because ... > > > > > > The tag to be voted on is 3.0.0-alpha-2RC0: > > > > > >https://github.com/apache/hbase/tree/3.0.0-alpha-2RC0 > > > > > > This tag currently points to git reference > > > > > >8bca21b47d7c809a0940aea8ed12dd4d2af12432 > > > > > > The release files, including signatures, digests, as well as CHANGES.md > > > and RELEASENOTES.md included in this RC can be found at: > > > > > >https://dist.apache.org/repos/dist/dev/hbase/3.0.0-alpha-2RC0/ > > > > > > Maven artifacts are available in a staging repository at: > > > > > > > > https://repository.apache.org/content/repositories/orgapachehbase-1472/ > > > > > > Artifacts were signed with the 9AD2AE49 key which can be found in: > > > > > >https://downloads.apache.org/hbase/KEYS > > > > > > 3.0.0-alpha-2 is the second alpha release for our 3.0.0 major release > > line. > > > HBase 3.0.0 includes the following big feature/changes: > > >Synchronous Replication > > >OpenTelemetry Tracing > > >Distributed MOB Compaction > > >Backup and Restore > > >Move RSGroup balancer to core > > >Reimplement sync client on async client > > >CPEPs on shaded proto > > >Move the logging framework from log4j to log4j2 > > > > > > 3.0.0-alpha-2 contains a critical security fix for addressing the > log4j2 > > > CVE-2021-44228. All users who already use 3.0.0-alpha-1 should upgrade > > > to 3.0.0-alpha-2 ASAP. > > > > > > Notice that this is not a production ready release. It is used to let > our > > > users try and test the new major release, to get feedback before the > > final > > > GA release is out. > > > So please do NOT use it in production. Just try it and report back > > > everything you find unusual. > > > > > > And this time we will not include CHANGES.md and RELEASENOTE.md > > > in our source code, you can find it on the download site. For getting > > these > > > two files for old releases, please go to > > > > > >https://archive.apache.org/dist/hbase/ > > > > > > To learn more about Apache hbase, please see > > > > > >http://hbase.apache.org/ > > > > > > Thanks, > > > Your HBase Release Manager > > > > > >
Re: Need code review for a couple of Backup/Restore related Pull Requests
Bump. --- Mallikarjun On Tue, Nov 2, 2021 at 4:46 PM Mallikarjun wrote: > Hi, > > I have a couple of pull requests on Backup/Restore. Appreciate if someone > could have a look at them. > > https://github.com/apache/hbase/pull/3766 > https://github.com/apache/hbase/pull/3726 > > I have put down the context in respective Jiras. Please ask if any clarity > is needed. > > --- > Mallikarjun >
Need code review for a couple of Backup/Restore related Pull Requests
Hi, I have a couple of pull requests on Backup/Restore. Appreciate if someone could have a look at them. https://github.com/apache/hbase/pull/3766 https://github.com/apache/hbase/pull/3726 I have put down the context in respective Jiras. Please ask if any clarity is needed. --- Mallikarjun
[jira] [Created] (HBASE-26346) Design support for rsgroup data isolation
Mallikarjun created HBASE-26346: --- Summary: Design support for rsgroup data isolation Key: HBASE-26346 URL: https://issues.apache.org/jira/browse/HBASE-26346 Project: HBase Issue Type: New Feature Components: rsgroup Reporter: Mallikarjun Assignee: Mallikarjun TODO -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-26343) Extend RSGroup to support data isolation to achieve true multitenancy in Hbase
Mallikarjun created HBASE-26343: --- Summary: Extend RSGroup to support data isolation to achieve true multitenancy in Hbase Key: HBASE-26343 URL: https://issues.apache.org/jira/browse/HBASE-26343 Project: HBase Issue Type: Umbrella Components: rsgroup Reporter: Mallikarjun RSGroups currently only provide isolation on serving layer, but not on the data layer. And there is a need for providing data isolation between rsgroups to achieve true multitenancy in hbase leading to independently scale individual rsgroups on need bases. Some of the aspects to be covered in this umbrella project are # Provide data isolation between different RSGroups # Add balancer support to understand this construct on various balancing activities # Extend support on various ancillary services such as export snapshot, cluster replication, etc -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-26322) Add rsgroup support for Backup
Mallikarjun created HBASE-26322: --- Summary: Add rsgroup support for Backup Key: HBASE-26322 URL: https://issues.apache.org/jira/browse/HBASE-26322 Project: HBase Issue Type: Improvement Components: backup&restore Affects Versions: 3.0.0-alpha-1 Reporter: Mallikarjun Assignee: Mallikarjun Fix For: 3.0.0-alpha-2 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-26279) Merger of backup:system table with hbase:meta table
Mallikarjun created HBASE-26279: --- Summary: Merger of backup:system table with hbase:meta table Key: HBASE-26279 URL: https://issues.apache.org/jira/browse/HBASE-26279 Project: HBase Issue Type: Improvement Components: backup&restore Reporter: Mallikarjun Assignee: Mallikarjun To Be filled -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Stuck Serial replication -- Need suggestions on recovery
Thanks for answering the queries. --- Mallikarjun On Wed, Aug 18, 2021 at 9:32 AM 张铎(Duo Zhang) wrote: > In hbase, the mvcc write number and wal sequence id are the same > thing, so when we just want to bump the mvcc number, we will not write > an actual WAL but the sequence id will be increased. > > And I think it will be good to have a MR job to replicate WAL serially. > > Mallikarjun 于2021年8月18日周三 上午10:50写道: > > > > Inline Reply > > > > On Wed, Aug 18, 2021 at 8:06 AM 张铎(Duo Zhang) > wrote: > > > > > Mallikarjun 于2021年8月18日周三 上午10:19写道: > > > > > > > > Thanks for the response @Duo > > > > > > > > Inline reply > > > > > > > > On Wed, Aug 18, 2021 at 7:37 AM 张铎(Duo Zhang) > > > > wrote: > > > > > > > > > This is the isRangeFinished method > > > > > > > > > > private boolean isRangeFinished(long endBarrier, String > > > > > encodedRegionName) throws IOException { > > > > > long pushedSeqId; > > > > > try { > > > > > pushedSeqId = storage.getLastSequenceId(encodedRegionName, > > > peerId); > > > > > } catch (ReplicationException e) { > > > > > throw new IOException( > > > > > "Failed to get pushed sequence id for " + > encodedRegionName + > > > > > ", peer " + peerId, e); > > > > > } > > > > > // endBarrier is the open sequence number. When opening a > region, > > > > > the open sequence number will > > > > > // be set to the old max sequence id plus one, so here we need > to > > > > > minus one. > > > > > return pushedSeqId >= endBarrier - 1; > > > > > } > > > > > > > > > > So for this region > > > > > > > > > > rs-9 24c765b42253f96b550831d83e99cc9e 18775105 18762209 [17776286, > +++ > > > > > 18762210, 18775053, 18775079, 18775104, -- 18775119] > > > > > > > > > > We have already finished the first range [17776286, 18762210), but > > > > > then we jump directly to range [18775053, 18775079), so the problem > > > > > here is where is the [18762210, 18775053)... > > > > > > > > > > > > > These sequence ID's are present in WAL's which are not cleaned up. > (in > > > > OLDWALs) > > > > > > > > Related Question: Is it allowed to have gaps in sequence IDs in WAL's > > > for a > > > > single region? > > > > Example: for Region: *24c765b42253f96b550831d83e99cc9e*, if > sequence ID > > > > *18775105* is present, can I expect *18775106 *is mandatory to be > > > present? > > > > or there can be gaps. > > > Inside a range it is allowed to have gaps, but when reopening a > > > region, we need to make sure there are no gaps otherwise the > > > replication will be stuck. > > > > > > > Just curious to know, what are some scenarios which can lead to gaps? > From > > my small number of experiments It was consecutive in nature, I did not > find > > such gap scenarios. > > > > > > > > > > > > > > > > > > > > > > And on the fix, you can clear the range information for the given > > > > > regions in meta table, and then restart the clusters, I think the > > > > > replication could continue. > > > > > > > > > > > > > > If you mean removing some barriers so that replication is unblocked. > > > > Doesn't it lead to *out of order events *replicated end up in > > > > corrupting data? > > > Yes, the replication will be out of order. But this is the easier way > > > to recover the replication. > > > If you still want to obtain the order, then you need to find out the > > > root cause of my question, where is the WAL for the missing ranges. Is > > > it because we have already replicated the data but do not mark the > > > range as finished, or we just lose the WAL data for the range? > > > > > > > Current scenario occurred in Active Passive cluster setup and replication > > is stuck on the Passive side. So I won't be able to answer following > > question > > > > we have already replicated the data > > > > > > But the following comment can help me find data from
Re: Stuck Serial replication -- Need suggestions on recovery
Inline Reply On Wed, Aug 18, 2021 at 8:06 AM 张铎(Duo Zhang) wrote: > Mallikarjun 于2021年8月18日周三 上午10:19写道: > > > > Thanks for the response @Duo > > > > Inline reply > > > > On Wed, Aug 18, 2021 at 7:37 AM 张铎(Duo Zhang) > wrote: > > > > > This is the isRangeFinished method > > > > > > private boolean isRangeFinished(long endBarrier, String > > > encodedRegionName) throws IOException { > > > long pushedSeqId; > > > try { > > > pushedSeqId = storage.getLastSequenceId(encodedRegionName, > peerId); > > > } catch (ReplicationException e) { > > > throw new IOException( > > > "Failed to get pushed sequence id for " + encodedRegionName + > > > ", peer " + peerId, e); > > > } > > > // endBarrier is the open sequence number. When opening a region, > > > the open sequence number will > > > // be set to the old max sequence id plus one, so here we need to > > > minus one. > > > return pushedSeqId >= endBarrier - 1; > > > } > > > > > > So for this region > > > > > > rs-9 24c765b42253f96b550831d83e99cc9e 18775105 18762209 [17776286, +++ > > > 18762210, 18775053, 18775079, 18775104, -- 18775119] > > > > > > We have already finished the first range [17776286, 18762210), but > > > then we jump directly to range [18775053, 18775079), so the problem > > > here is where is the [18762210, 18775053)... > > > > > > > These sequence ID's are present in WAL's which are not cleaned up. (in > > OLDWALs) > > > > Related Question: Is it allowed to have gaps in sequence IDs in WAL's > for a > > single region? > > Example: for Region: *24c765b42253f96b550831d83e99cc9e*, if sequence ID > > *18775105* is present, can I expect *18775106 *is mandatory to be > present? > > or there can be gaps. > Inside a range it is allowed to have gaps, but when reopening a > region, we need to make sure there are no gaps otherwise the > replication will be stuck. > Just curious to know, what are some scenarios which can lead to gaps? From my small number of experiments It was consecutive in nature, I did not find such gap scenarios. > > > > > > > > > > And on the fix, you can clear the range information for the given > > > regions in meta table, and then restart the clusters, I think the > > > replication could continue. > > > > > > > > If you mean removing some barriers so that replication is unblocked. > > Doesn't it lead to *out of order events *replicated end up in > > corrupting data? > Yes, the replication will be out of order. But this is the easier way > to recover the replication. > If you still want to obtain the order, then you need to find out the > root cause of my question, where is the WAL for the missing ranges. Is > it because we have already replicated the data but do not mark the > range as finished, or we just lose the WAL data for the range? > Current scenario occurred in Active Passive cluster setup and replication is stuck on the Passive side. So I won't be able to answer following question we have already replicated the data But the following comment can help me find data from oldWAL if last sequence id is present or not before region movement. but when reopening a region, we need to make sure there are no gaps > Alternatively, do you think it is a good idea to write a job similar to *RecoveredReplicationSource *which ensures serial replication to other cluster outside of hbase cluster? > > > > > > > Mallikarjun 于2021年8月17日周二 下午3:04写道: > > > > > > > > I have got into the following scenario. I won't go into details of > how I > > > > got here, since I am not able to reliably reproduce this scenario > thus > > > far. > > > > (Typically happens when some rs goes down because of hardware issues) > > > > > > > > Let me explain to you the following details. > > > > Col 1: Region server on which region is trying to replicate > > > > Col 2: Region trying to replicate but stuck > > > > Col 3: SequenceID which is being replicated and stuck because > previous > > > > range is not finished > > > > Col 4: Checkpoint in zk until which sequence id is already > replicated to > > > > peer > > > > Col 5: Replication barriers for that region. This is a list of open > > > > sequence IDs on region movement. (+++ means where *checkpoint* > belongs,
Re: Stuck Serial replication -- Need suggestions on recovery
Thanks for the response @Duo Inline reply On Wed, Aug 18, 2021 at 7:37 AM 张铎(Duo Zhang) wrote: > This is the isRangeFinished method > > private boolean isRangeFinished(long endBarrier, String > encodedRegionName) throws IOException { > long pushedSeqId; > try { > pushedSeqId = storage.getLastSequenceId(encodedRegionName, peerId); > } catch (ReplicationException e) { > throw new IOException( > "Failed to get pushed sequence id for " + encodedRegionName + > ", peer " + peerId, e); > } > // endBarrier is the open sequence number. When opening a region, > the open sequence number will > // be set to the old max sequence id plus one, so here we need to > minus one. > return pushedSeqId >= endBarrier - 1; > } > > So for this region > > rs-9 24c765b42253f96b550831d83e99cc9e 18775105 18762209 [17776286, +++ > 18762210, 18775053, 18775079, 18775104, -- 18775119] > > We have already finished the first range [17776286, 18762210), but > then we jump directly to range [18775053, 18775079), so the problem > here is where is the [18762210, 18775053)... > These sequence ID's are present in WAL's which are not cleaned up. (in OLDWALs) Related Question: Is it allowed to have gaps in sequence IDs in WAL's for a single region? Example: for Region: *24c765b42253f96b550831d83e99cc9e*, if sequence ID *18775105* is present, can I expect *18775106 *is mandatory to be present? or there can be gaps. > > And on the fix, you can clear the range information for the given > regions in meta table, and then restart the clusters, I think the > replication could continue. > > If you mean removing some barriers so that replication is unblocked. Doesn't it lead to *out of order events *replicated end up in corrupting data? > Mallikarjun 于2021年8月17日周二 下午3:04写道: > > > > I have got into the following scenario. I won't go into details of how I > > got here, since I am not able to reliably reproduce this scenario thus > far. > > (Typically happens when some rs goes down because of hardware issues) > > > > Let me explain to you the following details. > > Col 1: Region server on which region is trying to replicate > > Col 2: Region trying to replicate but stuck > > Col 3: SequenceID which is being replicated and stuck because previous > > range is not finished > > Col 4: Checkpoint in zk until which sequence id is already replicated to > > peer > > Col 5: Replication barriers for that region. This is a list of open > > sequence IDs on region movement. (+++ means where *checkpoint* belongs, > --- > > is where *to replicate seqid* belongs) > > > > There are in total 53 regions and 10 regionservers > > > > RegionServer Region Trying to replicate sequenceID Replicated until > Current > > Barriers > > rs-9 24c765b42253f96b550831d83e99cc9e 18775105 18762209 [17776286, +++ > > 18762210, 18775053, 18775079, 18775104, -- 18775119] > > rs-5 b4144bfe75c5826710ec54849741b038 189154192 189091221 [184183678, +++ > > 189117430, 189154191, -- 189154327] > > rs-8 deb6fee3380e7b9db9826cb5f27f8a59 189099509 189036510 [180662218, +++ > > 189062798, 189099508, -- 189099587] > > rs-8 3338fd34ae7ba06a7eccd89048fa83ce 189078951 189077722 [184170310, +++ > > 189078876, 189078950, -- 189104780, 189141509, 189141545, 189141595] > > rs-6 1af22c68b9212971ab2570e14b7b0dc2 183301002 183265047 [180239864, +++ > > 183265048, 183270357, 183277363, 183300886, 183301001, -- 183301062] > > rs-10 1af22c68b9212971ab2570e14b7b0dc2 183301063 183265047 [180239864, > +++ > > 183265048, 183270357, 183277363, 183300886, 183301001, 183301062 --] > > rs-6 4b9e98c7eca7a24c74136de1aa8aeab0 189027036 189022619 [189022618, +++ > > 189027035, 189085155, 189085241, 189085290] > > rs-4 e45ba292df95edbdf884e2ec50cf5f16 189099081 189062191 [184126535, +++ > > 189098947, 189099080, -- 189099226] > > rs-4 83e65729dcad644738a0a3cee994e2df 189012454 189012365 [184103269, +++ > > 189012453, -- 189012538, 189074967, 189075016, 189075294, 189075349] > > rs-10 83e65729dcad644738a0a3cee994e2df 189012539 189012365 [184103269, > +++ > > 189012453, 189012538, -- 189074967, 189075016, 189075294, 189075349] > > rs-3 11fca95de4878782af53371a25cf44d0 189121426 189058129 [180684344, +++ > > 189084916, 189121283, 189121425, -- 189121602] > > rs-3 b9db001578e127740d7e0e186e4fbab6 189145458 189081436 [184175242, +++ > > 189083026, 189145417, 189145457, -- 189145562, 189145723, 189145781] > > rs-2 262ca9ff7b878f32c451fac3eb430a88 189128535 189065879 [184159187, +++ > > 189091684, 189128534, -- 189128708] >
Stuck Serial replication -- Need suggestions on recovery
I have got into the following scenario. I won't go into details of how I got here, since I am not able to reliably reproduce this scenario thus far. (Typically happens when some rs goes down because of hardware issues) Let me explain to you the following details. Col 1: Region server on which region is trying to replicate Col 2: Region trying to replicate but stuck Col 3: SequenceID which is being replicated and stuck because previous range is not finished Col 4: Checkpoint in zk until which sequence id is already replicated to peer Col 5: Replication barriers for that region. This is a list of open sequence IDs on region movement. (+++ means where *checkpoint* belongs, --- is where *to replicate seqid* belongs) There are in total 53 regions and 10 regionservers RegionServer Region Trying to replicate sequenceID Replicated until Current Barriers rs-9 24c765b42253f96b550831d83e99cc9e 18775105 18762209 [17776286, +++ 18762210, 18775053, 18775079, 18775104, -- 18775119] rs-5 b4144bfe75c5826710ec54849741b038 189154192 189091221 [184183678, +++ 189117430, 189154191, -- 189154327] rs-8 deb6fee3380e7b9db9826cb5f27f8a59 189099509 189036510 [180662218, +++ 189062798, 189099508, -- 189099587] rs-8 3338fd34ae7ba06a7eccd89048fa83ce 189078951 189077722 [184170310, +++ 189078876, 189078950, -- 189104780, 189141509, 189141545, 189141595] rs-6 1af22c68b9212971ab2570e14b7b0dc2 183301002 183265047 [180239864, +++ 183265048, 183270357, 183277363, 183300886, 183301001, -- 183301062] rs-10 1af22c68b9212971ab2570e14b7b0dc2 183301063 183265047 [180239864, +++ 183265048, 183270357, 183277363, 183300886, 183301001, 183301062 --] rs-6 4b9e98c7eca7a24c74136de1aa8aeab0 189027036 189022619 [189022618, +++ 189027035, 189085155, 189085241, 189085290] rs-4 e45ba292df95edbdf884e2ec50cf5f16 189099081 189062191 [184126535, +++ 189098947, 189099080, -- 189099226] rs-4 83e65729dcad644738a0a3cee994e2df 189012454 189012365 [184103269, +++ 189012453, -- 189012538, 189074967, 189075016, 189075294, 189075349] rs-10 83e65729dcad644738a0a3cee994e2df 189012539 189012365 [184103269, +++ 189012453, 189012538, -- 189074967, 189075016, 189075294, 189075349] rs-3 11fca95de4878782af53371a25cf44d0 189121426 189058129 [180684344, +++ 189084916, 189121283, 189121425, -- 189121602] rs-3 b9db001578e127740d7e0e186e4fbab6 189145458 189081436 [184175242, +++ 189083026, 189145417, 189145457, -- 189145562, 189145723, 189145781] rs-2 262ca9ff7b878f32c451fac3eb430a88 189128535 189065879 [184159187, +++ 189091684, 189128534, -- 189128708] rs-2 03a1eb906a344944aad727dbb8210cfc 172392082 172390331 [167737983, +++ 172392081, -- 172400093, 172446121, 172446172] rs-10 ae2726c7b4eeec3f93336d71e80145a4 189027430 189026939 [184119428, +++ 189027429, -- 189053118, 189089933, 189089995, 189090059] rs-10 770ba4f4568fff803e6df340b2ffe486 189034144 189032879 [184127026, +++ 189034143, --189048295, 189059834, 189096413, 189096513, 189096548, 189096606] rs-1 5846f4ce8acdd5aabf325c847d18c729 18793501 18780639 [18778549, +++ 18784783, 18793471, 18793484, 18793500 --] rs-1 5846f4ce8acdd5aabf325c847d18c729 18793472 18780639 [18778549, +++ 18784783, 18793471, --- 18793484, 18793500] rs-1 fabd3ea591d5f20a86a26f8767d34f63 189028498 189024357 [184116531, +++ 189025318, 189028497, --- 189051176, 189087488, 189087737, 189087850] rs-1 335d855c5005343719ea73bcb7dcb269 189064849 189037338 [184130122, +++ 189064848, --- 189101485, 189101698, 189101774] My question is, how do I recover from here? Any suggestions. Only thought is that I have to replay by writing some MR jobs / some scripts to read and replay selectively and update checkpoints. --- Mallikarjun
[jira] [Created] (HBASE-26203) Minor cleanups to reduce checkstyle warnings on backup code
Mallikarjun created HBASE-26203: --- Summary: Minor cleanups to reduce checkstyle warnings on backup code Key: HBASE-26203 URL: https://issues.apache.org/jira/browse/HBASE-26203 Project: HBase Issue Type: Improvement Components: backup&restore Affects Versions: 3.0.0-alpha-2 Reporter: Mallikarjun Assignee: Mallikarjun Fix For: 3.0.0-alpha-2 `WALProcedureStore` stands deprecated. Review its usage in Backup/Restore -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-26202) Review deprecated WALProcedureStore usage in Backup
Mallikarjun created HBASE-26202: --- Summary: Review deprecated WALProcedureStore usage in Backup Key: HBASE-26202 URL: https://issues.apache.org/jira/browse/HBASE-26202 Project: HBase Issue Type: Improvement Components: backup&restore Affects Versions: 3.0.0-alpha-2 Reporter: Mallikarjun Assignee: Mallikarjun Fix For: 3.0.0-alpha-2 -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Hbase Backup design changes
Thanks Duo. --- Mallikarjun On Sun, Jul 25, 2021 at 7:32 PM 张铎(Duo Zhang) wrote: > Replied on jira. Please give more details about what you are doing in the > PR... > > Thanks. > > Mallikarjun 于2021年7月25日周日 上午10:48写道: > > > > Can someone review this pull request? > > https://github.com/apache/hbase/pull/3359 > > > > This change changes meta information for backup, if not part of hbase > > 3.0.0. It might have a lot of additional work to be put into executing > the > > above mentioned plan. > > > > --- > > Mallikarjun > > > > > > On Thu, Feb 11, 2021 at 5:36 PM Mallikarjun > > wrote: > > > > > Slight modification to previous version --> https://ibb.co/Nttx3J1 > > > > > > --- > > > Mallikarjun > > > > > > > > > On Thu, Feb 11, 2021 at 8:12 AM Mallikarjun > > > wrote: > > > > > >> Inline Reply > > >> > > >> On Wed, Feb 3, 2021 at 6:44 AM Sean Busbey wrote: > > >> > > >>> Hi Mallikarjun, > > >>> > > >>> Those goals sound worthwhile. > > >>> > > >>> Do you have a flow chart similar to the one you posted for the > current > > >>> system but for the proposed solution? > > >>> > > >> > > >> This is what I am thinking --> https://ibb.co/KmH6Cwv > > >> > > >> > > >>> > > >>> How much will we need to change our existing test coverage to > accommodate > > >>> the proposed solution? > > >>> > > >> > > >> Of the 38 tests, it looks like we might have to change a couple only. > > >> Will have to add more tests to cover parallel backup scenarios. > > >> > > >> > > >>> > > >>> How much will we need to update the existing reference guide section? > > >>> > > >>> > > >> Probably nothing. Interface as such will not change. > > >> > > >> > > >>> > > >>> On Sun, Jan 31, 2021, 04:59 Mallikarjun > > >>> wrote: > > >>> > > >>> > Bringing up this thread. > > >>> > > > >>> > On Mon, Jan 25, 2021, 3:38 PM Viraj Jasani > wrote: > > >>> > > > >>> > > Thanks, the image is visible now. > > >>> > > > > >>> > > > Since I wanted to open this for discussion, did not consider > > >>> placing it > > >>> > > in > > >>> > > *hbase/dev_support/design-docs*. > > >>> > > > > >>> > > Definitely, only after we come to concrete conclusion with the > > >>> reviewer, > > >>> > we > > >>> > > should open up a PR. Until then this thread is anyways up for > > >>> discussion. > > >>> > > > > >>> > > > > >>> > > On Mon, 25 Jan 2021 at 1:58 PM, Mallikarjun < > > >>> mallik.v.ar...@gmail.com> > > >>> > > wrote: > > >>> > > > > >>> > > > Hope this link works --> https://ibb.co/hYjRpgP > > >>> > > > > > >>> > > > Inline reply > > >>> > > > On Mon, Jan 25, 2021 at 1:16 PM Viraj Jasani < > vjas...@apache.org> > > >>> > wrote: > > >>> > > > > > >>> > > > > Hi, > > >>> > > > > > > >>> > > > > Still not available :) > > >>> > > > > The attachments don’t work on mailing lists. You can try > > >>> uploading > > >>> > the > > >>> > > > > attachment on some public hosting site and provide the url > to the > > >>> > same > > >>> > > > > here. > > >>> > > > > > > >>> > > > > Since I am not aware of the contents, I cannot confirm right > > >>> away but > > >>> > > if > > >>> > > > > the reviewer feels we should have the attachment on our > github > > >>> repo: > > >>> > > > > hbase/dev-support/design-docs , good to upload the content > there > > >>> > later. > > >>> > > > For > > >>
Re: [DISCUSS] Hbase Backup design changes
Can someone review this pull request? https://github.com/apache/hbase/pull/3359 This change changes meta information for backup, if not part of hbase 3.0.0. It might have a lot of additional work to be put into executing the above mentioned plan. --- Mallikarjun On Thu, Feb 11, 2021 at 5:36 PM Mallikarjun wrote: > Slight modification to previous version --> https://ibb.co/Nttx3J1 > > --- > Mallikarjun > > > On Thu, Feb 11, 2021 at 8:12 AM Mallikarjun > wrote: > >> Inline Reply >> >> On Wed, Feb 3, 2021 at 6:44 AM Sean Busbey wrote: >> >>> Hi Mallikarjun, >>> >>> Those goals sound worthwhile. >>> >>> Do you have a flow chart similar to the one you posted for the current >>> system but for the proposed solution? >>> >> >> This is what I am thinking --> https://ibb.co/KmH6Cwv >> >> >>> >>> How much will we need to change our existing test coverage to accommodate >>> the proposed solution? >>> >> >> Of the 38 tests, it looks like we might have to change a couple only. >> Will have to add more tests to cover parallel backup scenarios. >> >> >>> >>> How much will we need to update the existing reference guide section? >>> >>> >> Probably nothing. Interface as such will not change. >> >> >>> >>> On Sun, Jan 31, 2021, 04:59 Mallikarjun >>> wrote: >>> >>> > Bringing up this thread. >>> > >>> > On Mon, Jan 25, 2021, 3:38 PM Viraj Jasani wrote: >>> > >>> > > Thanks, the image is visible now. >>> > > >>> > > > Since I wanted to open this for discussion, did not consider >>> placing it >>> > > in >>> > > *hbase/dev_support/design-docs*. >>> > > >>> > > Definitely, only after we come to concrete conclusion with the >>> reviewer, >>> > we >>> > > should open up a PR. Until then this thread is anyways up for >>> discussion. >>> > > >>> > > >>> > > On Mon, 25 Jan 2021 at 1:58 PM, Mallikarjun < >>> mallik.v.ar...@gmail.com> >>> > > wrote: >>> > > >>> > > > Hope this link works --> https://ibb.co/hYjRpgP >>> > > > >>> > > > Inline reply >>> > > > On Mon, Jan 25, 2021 at 1:16 PM Viraj Jasani >>> > wrote: >>> > > > >>> > > > > Hi, >>> > > > > >>> > > > > Still not available :) >>> > > > > The attachments don’t work on mailing lists. You can try >>> uploading >>> > the >>> > > > > attachment on some public hosting site and provide the url to the >>> > same >>> > > > > here. >>> > > > > >>> > > > > Since I am not aware of the contents, I cannot confirm right >>> away but >>> > > if >>> > > > > the reviewer feels we should have the attachment on our github >>> repo: >>> > > > > hbase/dev-support/design-docs , good to upload the content there >>> > later. >>> > > > For >>> > > > > instance, pdf file can contain existing design and new design >>> > diagrams >>> > > > and >>> > > > > talk about pros and cons etc once we have things finalized. >>> > > > > >>> > > > > >>> > > > Since I wanted to open this for discussion, did not consider >>> placing it >>> > > in >>> > > > *hbase/dev_support/design-docs*. >>> > > > >>> > > > >>> > > > > >>> > > > > On Mon, 25 Jan 2021 at 12:13 PM, Mallikarjun < >>> > mallik.v.ar...@gmail.com >>> > > > >>> > > > > wrote: >>> > > > > >>> > > > > > Attached as image. Please let me know if it is availabe now. >>> > > > > > >>> > > > > > --- >>> > > > > > Mallikarjun >>> > > > > > >>> > > > > > >>> > > > > > On Mon, Jan 25, 2021 at 10:32 AM Sean Busbey < >>> bus...@apache.org> >>> > > > wrote: >>> > > > > > >>> > > > > >> Hi! >&g
[jira] [Created] (HBASE-26034) Add support to take multiple parallel backup
Mallikarjun created HBASE-26034: --- Summary: Add support to take multiple parallel backup Key: HBASE-26034 URL: https://issues.apache.org/jira/browse/HBASE-26034 Project: HBase Issue Type: Improvement Components: backup&restore Affects Versions: 3.0.0-alpha-2 Reporter: Mallikarjun Assignee: Mallikarjun Fix For: 3.0.0-alpha-2 -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Time for 3.0.0 release
For multi tenancy with favoured nodes, timeline looks unreasonable for 3.0. Can it be part of later 3.x releases? Or should it wait for 4.0? On Fri, May 21, 2021, 7:30 PM 张铎(Duo Zhang) wrote: > We already have the below big feature/changes for 3.0.0. > > Synchronous Replication > OpenTelemetry Tracing > Distributed MOB Compaction > Backup and Restore > Move RSGroup balancer to core > Reimplement sync client on async client > CPEPs on shaded proto > > There are also some ongoing works which target 3.0.0. > > Splittable meta > Move balancer code to hbase-balancer > Compaction offload > Replication offload > > Since now, we do not even have enough new features to cut a minor release > for 2.x, I think it is time to cut the 3.x release line now, and I think we > already have enough new features for a new major release. > > Here I plan to cut a branch-3 at the end of June and make our first > 3.0.0-alpha1 release, and finally make the 3.0.0 release by the end of > 2021. So if any of the above work can not be done before the end of June, > they will be moved out to 4.0.0. > > Thoughts? Thanks. >
[jira] [Created] (HBASE-25891) Simplify backup table to be able to maintain it
Mallikarjun created HBASE-25891: --- Summary: Simplify backup table to be able to maintain it Key: HBASE-25891 URL: https://issues.apache.org/jira/browse/HBASE-25891 Project: HBase Issue Type: Improvement Components: backup&restore Reporter: Mallikarjun Assignee: Mallikarjun More details will be added soon -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Next 2.x releases
Inline On Wed, May 5, 2021 at 4:51 AM Sean Busbey wrote: > My understanding is that backup work is not ready for inclusion in 2.x. > > The talk of removing it from the master branch and proposed adoption of the > feature through more involvement from some community members were not so > long ago. > > I am working on backup changes as per the discussion in email with subject *[DISCUSS] Hbase Backup design changes* Removing it entirely may not be a good idea, as we are using it in production and I see a good value addition with this feature instead of starting fresh. Maybe in a month or two, I will be able to close it. > On Tue, May 4, 2021, 15:49 Andrew Purtell wrote: > > > Correct me if I am mistaken but backup tests failing on master in > precommit > > is common enough to warrant ignoring them as unrelated. Is it not fully > > baked yet? > > > > +1 for backport of tracing. If we do the backport to branch-2 that would > be > > one new compelling reason for a 2.5.0 release. > > > > > > On Tue, May 4, 2021 at 9:34 AM Nick Dimiduk wrote: > > > > > On Fri, Apr 30, 2021 at 5:40 PM 张铎(Duo Zhang) > > > wrote: > > > > > > > Does anyone have interest in backporting HBASE-22120 to branch-2? > > > > > > > > > > Yes, I think there would be interest in getting your tracing effort > onto > > > branch-2 ; there's quite a few "watchers" on that Jira. > > > > > > What about the backup work? Has it stabilized enough for backport? > > > > > > Conversely, if we don't have a killer new feature for 2.5, does that > mean > > > it's time for 3.0? > > > > > > Andrew Purtell 于2021年5月1日周六 上午5:46写道: > > > > > > > > > Inline > > > > > > > > > > On Fri, Apr 30, 2021 at 2:11 PM Nick Dimiduk > > > > wrote: > > > > > > > > > > > Heya, > > > > > > > > > > > > I'd like to have a planning discussion around our 2.x releases. > 2.4 > > > > seems > > > > > > to be humming along nicely, I think we won't have a need for 2.3 > > much > > > > > > longer. Likewise, it seems like it should be time to start > planning > > > > 2.5, > > > > > > but when I look at the issues unique to that branch [0], I don't > > see > > > > the > > > > > > big new feature that justifies the new minor release. Rather, I > > see a > > > > > > number of items that should be backported to a 2.4.x. Do we have > a > > > big > > > > > new > > > > > > feature in the works? Should we consider backporting something > from > > > > > master? > > > > > > Or maybe there's enough minor changes on 2.5 to justify the > > > release... > > > > > but > > > > > > if so, which of them motivate users to upgrade? > > > > > > > > > > > > So, to summarize: > > > > > > > > > > > > - How much longer do we want to run 2.3? > > > > > > - What issues in the below query can be backported to 2.4? Any > > > > > volunteers? > > > > > > > > > > > > > > > > Thanks for starting this discussion, Nick. > > > > > > > > > > Looking at that report, issues like HBASE-24305, HBASE-25793, or > > > > > HBASE-25458 that clean up or reduce interfaces or refactor/move > > classes > > > > are > > > > > probably excluded from a patch release by our guidelines. > Conversely, > > > > they > > > > > would not provide much value if backported. > > > > > > > > > > I also agree that the motivation for 2.5 here is thin as of now. > > > > Refactors, > > > > > interface improvements, or deprecation removals will be > nice-to-haves > > > > when > > > > > there is a 2.5 someday. > > > > > > > > > > All the others in the report are either operational improvements > that > > > > would > > > > > be nice to backport or bug fixes that should be backported. > > > > > > > > > > There might be case by case issues with compatibility during the > pick > > > > > backs, but we can deal with them one at a time. > > > > > > > > > > If you are looking for a volunteer to do this, as 2.4 RM I am game. > > > > > > > > > > > > > > > - What's the big new feature that motivates 2.5? > > > > > > > > > > > > Thanks, > > > > > > Nick > > > > > > > > > > > > [0]: > > > > > > > > > > > > > > > > > > > > > > > > > > > https://issues.apache.org/jira/issues/?jql=project%20%3D%20HBASE%20AND%20resolution%20%3D%20Fixed%20AND%20fixVersion%20%3D%202.5.0%20AND%20fixVersion%20not%20in%20(2.4.0%2C%202.4.1%2C%202.4.2%2C%202.4.3)%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC > > > > > > > > > > > > > > > > > > > > > -- > > > > > Best regards, > > > > > Andrew > > > > > > > > > > Words like orphans lost among the crosstalk, meaning torn from > > truth's > > > > > decrepit hands > > > > >- A23, Crosstalk > > > > > > > > > > > > > > > > > > -- > > Best regards, > > Andrew > > > > Words like orphans lost among the crosstalk, meaning torn from truth's > > decrepit hands > >- A23, Crosstalk > > >
Re: DISCUSS: Remove hbase-backup from master?
Sure. I have assigned it to myself. Will look into it. Last time I checked, I did not find any failed tests and it was not hadoop3 --- Mallikarjun On Fri, May 14, 2021 at 2:34 AM Nick Dimiduk wrote: > Hi Mallikarjun, > > I just saw a bunch of backup tests fail on an unrelated PR build. I filed > HBASE-25888 and uploaded some logs. I have the full test-logs.zip file, but > it's too big to upload to jira. I linked it from the Jira, but the archive > will disappear when the PR is eventually closed. I would ping you from > Jira, but I didn't find any Jira user that seemed likely to be your > account. Would you mind taking a look? > > Thanks, > Nick > > On Wed, Dec 23, 2020 at 8:20 PM Mallikarjun > wrote: > > > Yea. I have noticed that. > > > > I have some patches ready and have to add unit tests. Will start raising > in > > a couple of weeks time. > > --- > > Mallikarjun > > > > > > On Thu, Dec 24, 2020 at 7:48 AM 张铎(Duo Zhang) > > wrote: > > > > > The UTs in the backup module are easy to fail with NPE, I've seen this > in > > > several pre commit results. > > > > > > Any progress here? > > > > > > mallik.v.ar...@gmail.com 于2020年12月3日周四 > > > 上午9:58写道: > > > > > > > On Tue, Dec 1, 2020 at 11:26 PM Stack wrote: > > > > > > > > > On Tue, Dec 1, 2020 at 6:38 AM mallik.v.ar...@gmail.com < > > > > > mallik.v.ar...@gmail.com> wrote: > > > > > > > > > > > On Tue, Dec 1, 2020 at 7:34 PM Sean Busbey > > > wrote: > > > > > > > > > > > > > One reason for moving it out of core is that at some point we > > will > > > > have > > > > > > > hbase 3 releases. Right now hbase 3 targeted things are in the > > > master > > > > > > > branch. If the feature is not sufficiently ready to be in a > > release > > > > > then > > > > > > > when the time for HBase 3 releases comes we'll have to do work > to > > > > pull > > > > > it > > > > > > > out of the release branch. If the feature needs to stay in the > > core > > > > > repo > > > > > > > that will necessitate that hbase 3 have a branch distinct from > > the > > > > > master > > > > > > > branch (which may or may not happen anyways). At that point we > > risk > > > > > > having > > > > > > > to do the work to remove the feature from a release branch > again > > > come > > > > > > hbase > > > > > > > 4. > > > > > > > > > > > > > > > > > > > > > I think a lot of this is moot if you'd like to start providing > > > > patches > > > > > > for > > > > > > > the things you've needed to use it. If there are gaps that you > > > think > > > > > can > > > > > > > trip folks up maybe we could label it "experimental" to provide > > > > better > > > > > > > context for others. > > > > > > > > > > > > > > > > > > > I will start putting effort in maintaining this feature. > > > > > > > > > > > > > > > > > FYI, seems like backup has a bunch of flakey tests. Might be worth > > > > looking > > > > > at. > > > > > > > > > > > > > Any reference I can get. They seem fine when I run tests. > > > > > > > > > > > > > Thanks, > > > > > S > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Dec 1, 2020, 07:48 mallik.v.ar...@gmail.com < > > > > > > > mallik.v.ar...@gmail.com> wrote: > > > > > > > > > > > > > > > On Tue, Dec 1, 2020 at 12:14 PM Stack > > wrote: > > > > > > > > > > > > > > > > > On Mon, Nov 30, 2020 at 9:30 PM mallik.v.ar...@gmail.com < > > > > > > > > > mallik.v.ar...@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > Inline > > > > > > > > > > > > >
[jira] [Created] (HBASE-25870) Search only ancestors instead of entire history instead for a particular backup
Mallikarjun created HBASE-25870: --- Summary: Search only ancestors instead of entire history instead for a particular backup Key: HBASE-25870 URL: https://issues.apache.org/jira/browse/HBASE-25870 Project: HBase Issue Type: Bug Components: backup&restore Reporter: Mallikarjun Assignee: Mallikarjun -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [SURVEY] The current usage of favor node balancer across the community
Inline reply On Tue, Apr 27, 2021 at 1:03 AM Stack wrote: > On Mon, Apr 26, 2021 at 12:30 PM Stack wrote: > > > On Mon, Apr 26, 2021 at 8:10 AM Mallikarjun > > wrote: > > > >> We use FavoredStochasticBalancer, which by description says the same > thing > >> as FavoredNodeLoadBalancer. Ignoring that fact, problem appears to be > >> > >> > > > > Other concerns: > > > > * Hard-coded triplet of nodes that will inevitably rot as machines come > > and go (Are there tools for remediation?) > It doesn't really rot, if you think it with balancer responsible to assigning regions 1. On every region assigned to a particular regionserver, the balancer would have to reassign this triplet and hence there is no scope of rot (Same logic applied to WAL as well). (On compaction hdfs blocks will be pulled back if any spill over) 2. We used hostnames only (so, come and go is not going to be new nodes but same hostnames) Couple of outstanding problems though. 1. We couldn't increase replication factor to > 3. Which was fine so far for our use cases. But we have had thoughts around fixing them. 2. Balancer doesn't understand favored nodes construct, perfect balanced fn among the rsgroup datanodes isn't possible, but with some variance like 10-20% difference is expected > > * A workaround for a facility that belongs in the NN > Probably, you can argue both ways. Hbase is the owner of data and hbase has the authority to dictate where a particular region replica sits. Benefits like data locality will be mostly around 1, rack awareness is more aligned to this strategy and so on. Moreover, HDFS has data pinning for clients to make use of it. Isn't it? > > * Opaque in operation > We haven't looked around wrapping these operations around metrics, so that it is no longer opaque and reasons mentioned in the above point. > > * My understanding was that the feature was never finished; in > particular > > the balancer wasn't properly wired- up (Happy to be incorrect here). > > > > > One more concern was that the feature was dead/unused. You seem to refute > this notion of mine. > S > We have been using this for more than a year with hbase 2.1 in highly critical workloads for our company. And several years with hbase 1.2 as well with backporting rsgroup from master at that time. (2017-18 ish) And it has been very smooth operationally in hbase 2.1 > > > > > > > >> Going a step back. > >> > >> Did we ever consider giving a thought towards truely multi-tenant hbase? > >> > > > > Always. > > > > > >> Where each rsgroup has a group of datanodes and namespace tables data > >> created under that particular rsgroup would sit on those datanodes only? > >> We > >> have attempted to do that and have largely been very successful running > >> clusters of hundreds of terabytes with hundreds of > >> regionservers(datanodes) > >> per cluster. > >> > >> > > So isolation of load by node? (I believe this is where the rsgroup > feature > > came from originally; the desire for a deploy like you describe above. > > IIUC, its what Thiru and crew run). > > > > > > > >> 1. We use a modified version of RSGroupBasedFavoredNodeLoadBalancer > >> contributed by Thiruvel Thirumoolan --> > >> https://issues.apache.org/jira/browse/HBASE-15533 > >> > >> On each balance operation, while the region is moved around (or while > >> creating table), favored nodes are assigned based on the rsgroup that > >> region is pinned to. And hence data is pinned to those datanodes only > >> (Pinning favored nodes is best effort from the hdfs side, but there are > >> only a few exception scenarios where data will be spilled over and they > >> recover after a major compaction). > >> > >> > > Sounds like you have studied this deploy in operation. Write it up? Blog > > post on hbase.apache.org? > > > Definitely will write up. > > > > > >> 2. We have introduced several balancer cost functions to restore things > to > >> normalcy (multi tenancy with fn pinning) such as when a node is dead, or > >> when fn's are imbalanced within the same rsgroup, etc. > >> > >> 3. We had diverse workloads under the same cluster and WAL isolation > >> became > >> a requirement and we went ahead with similar philosophy mentioned in > line > >> 1. Where WAL's are created with FN pinning so that they are tied to > >> datanodes belonging to the same rsgroup. Some
Re: [SURVEY] The current usage of favor node balancer across the community
We use FavoredStochasticBalancer, which by description says the same thing as FavoredNodeLoadBalancer. Ignoring that fact, problem appears to be favor node balancer is a problem, as it stores the favor node information > in hbase:meta. > Going a step back. Did we ever consider giving a thought towards truely multi-tenant hbase? Where each rsgroup has a group of datanodes and namespace tables data created under that particular rsgroup would sit on those datanodes only? We have attempted to do that and have largely been very successful running clusters of hundreds of terabytes with hundreds of regionservers(datanodes) per cluster. 1. We use a modified version of RSGroupBasedFavoredNodeLoadBalancer contributed by Thiruvel Thirumoolan --> https://issues.apache.org/jira/browse/HBASE-15533 On each balance operation, while the region is moved around (or while creating table), favored nodes are assigned based on the rsgroup that region is pinned to. And hence data is pinned to those datanodes only (Pinning favored nodes is best effort from the hdfs side, but there are only a few exception scenarios where data will be spilled over and they recover after a major compaction). 2. We have introduced several balancer cost functions to restore things to normalcy (multi tenancy with fn pinning) such as when a node is dead, or when fn's are imbalanced within the same rsgroup, etc. 3. We had diverse workloads under the same cluster and WAL isolation became a requirement and we went ahead with similar philosophy mentioned in line 1. Where WAL's are created with FN pinning so that they are tied to datanodes belonging to the same rsgroup. Some discussion seems to have happened here --> https://issues.apache.org/jira/browse/HBASE-21641 There are several other enhancements we have worked on with respect to rsgroup aware export snapshot, rsaware regionmover, rsaware cluster replication, etc. For above use cases, we would be needing fn information on hbase:meta. If the use case seems to be a fit for how we would want hbase to be taken forward as one of the supported use cases, willing to contribute our changes back to the community. (I was anyway planning to initiate this discussion) To strengthen the above use case. Here is what one of our multi tenant cluster looks like RSGroups(Tenants): 21 (With tenant isolation) Regionservers: 275 Regions Hosted: 6k Tables Hosted: 87 Capacity: 250 TB (100TB used) --- Mallikarjun On Mon, Apr 26, 2021 at 9:15 AM 张铎(Duo Zhang) wrote: > As you all know, we always want to reduce the size of the hbase-server > module. This time we want to separate the balancer related code to another > sub module. > > The design doc: > > https://docs.google.com/document/d/1T7WSgcQBJTtbJIjqi8sZYLxD2Z7JbIHx4TJaKKdkBbE/edit# > > You can see the bottom of the design doc, favor node balancer is a problem, > as it stores the favor node information in hbase:meta. Stack mentioned that > the feature is already dead, maybe we could just purge it from our code > base. > > So here we want to know if there are still some users in the community who > still use favor node balancer. Please share your experience and whether you > still want to use it. > > Thanks. >
[jira] [Created] (HBASE-25784) Support for Parallel Backups enabling multi tenancy
Mallikarjun created HBASE-25784: --- Summary: Support for Parallel Backups enabling multi tenancy Key: HBASE-25784 URL: https://issues.apache.org/jira/browse/HBASE-25784 Project: HBase Issue Type: Improvement Components: backport Reporter: Mallikarjun Assignee: Mallikarjun Proposed Design. !https://i.ibb.co/vVV1BTs/Backup-Activity-Diagram.png|width=322,height=414! -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Hbase Backup design changes
Slight modification to previous version --> https://ibb.co/Nttx3J1 --- Mallikarjun On Thu, Feb 11, 2021 at 8:12 AM Mallikarjun wrote: > Inline Reply > > On Wed, Feb 3, 2021 at 6:44 AM Sean Busbey wrote: > >> Hi Mallikarjun, >> >> Those goals sound worthwhile. >> >> Do you have a flow chart similar to the one you posted for the current >> system but for the proposed solution? >> > > This is what I am thinking --> https://ibb.co/KmH6Cwv > > >> >> How much will we need to change our existing test coverage to accommodate >> the proposed solution? >> > > Of the 38 tests, it looks like we might have to change a couple only. > Will have to add more tests to cover parallel backup scenarios. > > >> >> How much will we need to update the existing reference guide section? >> >> > Probably nothing. Interface as such will not change. > > >> >> On Sun, Jan 31, 2021, 04:59 Mallikarjun wrote: >> >> > Bringing up this thread. >> > >> > On Mon, Jan 25, 2021, 3:38 PM Viraj Jasani wrote: >> > >> > > Thanks, the image is visible now. >> > > >> > > > Since I wanted to open this for discussion, did not consider >> placing it >> > > in >> > > *hbase/dev_support/design-docs*. >> > > >> > > Definitely, only after we come to concrete conclusion with the >> reviewer, >> > we >> > > should open up a PR. Until then this thread is anyways up for >> discussion. >> > > >> > > >> > > On Mon, 25 Jan 2021 at 1:58 PM, Mallikarjun > > >> > > wrote: >> > > >> > > > Hope this link works --> https://ibb.co/hYjRpgP >> > > > >> > > > Inline reply >> > > > On Mon, Jan 25, 2021 at 1:16 PM Viraj Jasani >> > wrote: >> > > > >> > > > > Hi, >> > > > > >> > > > > Still not available :) >> > > > > The attachments don’t work on mailing lists. You can try uploading >> > the >> > > > > attachment on some public hosting site and provide the url to the >> > same >> > > > > here. >> > > > > >> > > > > Since I am not aware of the contents, I cannot confirm right away >> but >> > > if >> > > > > the reviewer feels we should have the attachment on our github >> repo: >> > > > > hbase/dev-support/design-docs , good to upload the content there >> > later. >> > > > For >> > > > > instance, pdf file can contain existing design and new design >> > diagrams >> > > > and >> > > > > talk about pros and cons etc once we have things finalized. >> > > > > >> > > > > >> > > > Since I wanted to open this for discussion, did not consider >> placing it >> > > in >> > > > *hbase/dev_support/design-docs*. >> > > > >> > > > >> > > > > >> > > > > On Mon, 25 Jan 2021 at 12:13 PM, Mallikarjun < >> > mallik.v.ar...@gmail.com >> > > > >> > > > > wrote: >> > > > > >> > > > > > Attached as image. Please let me know if it is availabe now. >> > > > > > >> > > > > > --- >> > > > > > Mallikarjun >> > > > > > >> > > > > > >> > > > > > On Mon, Jan 25, 2021 at 10:32 AM Sean Busbey > > >> > > > wrote: >> > > > > > >> > > > > >> Hi! >> > > > > >> >> > > > > >> Thanks for the write up. unfortunately, your image for the >> > existing >> > > > > >> design didn't come through. Could you post it to some host and >> > link >> > > it >> > > > > >> here? >> > > > > >> >> > > > > >> On Sun, Jan 24, 2021 at 3:12 AM Mallikarjun < >> > > mallik.v.ar...@gmail.com >> > > > > >> > > > > >> wrote: >> > > > > >> > >> > > > > >> > Existing Design: >> > > > > >> > >> > > > > >> > >> > > > > >> > >> > > > > >> &
Re: [DISCUSS] Hbase Backup design changes
Inline Reply On Wed, Feb 3, 2021 at 6:44 AM Sean Busbey wrote: > Hi Mallikarjun, > > Those goals sound worthwhile. > > Do you have a flow chart similar to the one you posted for the current > system but for the proposed solution? > This is what I am thinking --> https://ibb.co/KmH6Cwv > > How much will we need to change our existing test coverage to accommodate > the proposed solution? > Of the 38 tests, it looks like we might have to change a couple only. Will have to add more tests to cover parallel backup scenarios. > > How much will we need to update the existing reference guide section? > > Probably nothing. Interface as such will not change. > > On Sun, Jan 31, 2021, 04:59 Mallikarjun wrote: > > > Bringing up this thread. > > > > On Mon, Jan 25, 2021, 3:38 PM Viraj Jasani wrote: > > > > > Thanks, the image is visible now. > > > > > > > Since I wanted to open this for discussion, did not consider placing > it > > > in > > > *hbase/dev_support/design-docs*. > > > > > > Definitely, only after we come to concrete conclusion with the > reviewer, > > we > > > should open up a PR. Until then this thread is anyways up for > discussion. > > > > > > > > > On Mon, 25 Jan 2021 at 1:58 PM, Mallikarjun > > > wrote: > > > > > > > Hope this link works --> https://ibb.co/hYjRpgP > > > > > > > > Inline reply > > > > On Mon, Jan 25, 2021 at 1:16 PM Viraj Jasani > > wrote: > > > > > > > > > Hi, > > > > > > > > > > Still not available :) > > > > > The attachments don’t work on mailing lists. You can try uploading > > the > > > > > attachment on some public hosting site and provide the url to the > > same > > > > > here. > > > > > > > > > > Since I am not aware of the contents, I cannot confirm right away > but > > > if > > > > > the reviewer feels we should have the attachment on our github > repo: > > > > > hbase/dev-support/design-docs , good to upload the content there > > later. > > > > For > > > > > instance, pdf file can contain existing design and new design > > diagrams > > > > and > > > > > talk about pros and cons etc once we have things finalized. > > > > > > > > > > > > > > Since I wanted to open this for discussion, did not consider placing > it > > > in > > > > *hbase/dev_support/design-docs*. > > > > > > > > > > > > > > > > > > On Mon, 25 Jan 2021 at 12:13 PM, Mallikarjun < > > mallik.v.ar...@gmail.com > > > > > > > > > wrote: > > > > > > > > > > > Attached as image. Please let me know if it is availabe now. > > > > > > > > > > > > --- > > > > > > Mallikarjun > > > > > > > > > > > > > > > > > > On Mon, Jan 25, 2021 at 10:32 AM Sean Busbey > > > > wrote: > > > > > > > > > > > >> Hi! > > > > > >> > > > > > >> Thanks for the write up. unfortunately, your image for the > > existing > > > > > >> design didn't come through. Could you post it to some host and > > link > > > it > > > > > >> here? > > > > > >> > > > > > >> On Sun, Jan 24, 2021 at 3:12 AM Mallikarjun < > > > mallik.v.ar...@gmail.com > > > > > > > > > > >> wrote: > > > > > >> > > > > > > >> > Existing Design: > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > Problem 1: > > > > > >> > > > > > > >> > With this design, Incremental and Full backup can't be run in > > > > parallel > > > > > >> and leading to degraded RPO's in case Full backup is of longer > > > > duration > > > > > esp > > > > > >> for large tables. > > > > > >> > > > > > > >> > Example: > > > > > >> > Expectation: Say you have a big table with 10 TB and your RPO > is > > > 60 > > > > > >> minutes and you are allowed to shi
Re: [DISCUSS] Hbase Backup design changes
Hi Sean, I will get back with the design changes and the answers to above questions in a few days time. --- Mallikarjun On Wed, Feb 3, 2021 at 6:44 AM Sean Busbey wrote: > Hi Mallikarjun, > > Those goals sound worthwhile. > > Do you have a flow chart similar to the one you posted for the current > system but for the proposed solution? > > How much will we need to change our existing test coverage to accommodate > the proposed solution? > > How much will we need to update the existing reference guide section? > > > On Sun, Jan 31, 2021, 04:59 Mallikarjun wrote: > > > Bringing up this thread. > > > > On Mon, Jan 25, 2021, 3:38 PM Viraj Jasani wrote: > > > > > Thanks, the image is visible now. > > > > > > > Since I wanted to open this for discussion, did not consider placing > it > > > in > > > *hbase/dev_support/design-docs*. > > > > > > Definitely, only after we come to concrete conclusion with the > reviewer, > > we > > > should open up a PR. Until then this thread is anyways up for > discussion. > > > > > > > > > On Mon, 25 Jan 2021 at 1:58 PM, Mallikarjun > > > wrote: > > > > > > > Hope this link works --> https://ibb.co/hYjRpgP > > > > > > > > Inline reply > > > > On Mon, Jan 25, 2021 at 1:16 PM Viraj Jasani > > wrote: > > > > > > > > > Hi, > > > > > > > > > > Still not available :) > > > > > The attachments don’t work on mailing lists. You can try uploading > > the > > > > > attachment on some public hosting site and provide the url to the > > same > > > > > here. > > > > > > > > > > Since I am not aware of the contents, I cannot confirm right away > but > > > if > > > > > the reviewer feels we should have the attachment on our github > repo: > > > > > hbase/dev-support/design-docs , good to upload the content there > > later. > > > > For > > > > > instance, pdf file can contain existing design and new design > > diagrams > > > > and > > > > > talk about pros and cons etc once we have things finalized. > > > > > > > > > > > > > > Since I wanted to open this for discussion, did not consider placing > it > > > in > > > > *hbase/dev_support/design-docs*. > > > > > > > > > > > > > > > > > > On Mon, 25 Jan 2021 at 12:13 PM, Mallikarjun < > > mallik.v.ar...@gmail.com > > > > > > > > > wrote: > > > > > > > > > > > Attached as image. Please let me know if it is availabe now. > > > > > > > > > > > > --- > > > > > > Mallikarjun > > > > > > > > > > > > > > > > > > On Mon, Jan 25, 2021 at 10:32 AM Sean Busbey > > > > wrote: > > > > > > > > > > > >> Hi! > > > > > >> > > > > > >> Thanks for the write up. unfortunately, your image for the > > existing > > > > > >> design didn't come through. Could you post it to some host and > > link > > > it > > > > > >> here? > > > > > >> > > > > > >> On Sun, Jan 24, 2021 at 3:12 AM Mallikarjun < > > > mallik.v.ar...@gmail.com > > > > > > > > > > >> wrote: > > > > > >> > > > > > > >> > Existing Design: > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > Problem 1: > > > > > >> > > > > > > >> > With this design, Incremental and Full backup can't be run in > > > > parallel > > > > > >> and leading to degraded RPO's in case Full backup is of longer > > > > duration > > > > > esp > > > > > >> for large tables. > > > > > >> > > > > > > >> > Example: > > > > > >> > Expectation: Say you have a big table with 10 TB and your RPO > is > > > 60 > > > > > >> minutes and you are allowed to ship the remote backup with 800 > > Mbps. > > > > And > > > > > >> you are allowed to take Full Backups once
Re: [DISCUSS] Hbase Backup design changes
Bringing up this thread. On Mon, Jan 25, 2021, 3:38 PM Viraj Jasani wrote: > Thanks, the image is visible now. > > > Since I wanted to open this for discussion, did not consider placing it > in > *hbase/dev_support/design-docs*. > > Definitely, only after we come to concrete conclusion with the reviewer, we > should open up a PR. Until then this thread is anyways up for discussion. > > > On Mon, 25 Jan 2021 at 1:58 PM, Mallikarjun > wrote: > > > Hope this link works --> https://ibb.co/hYjRpgP > > > > Inline reply > > On Mon, Jan 25, 2021 at 1:16 PM Viraj Jasani wrote: > > > > > Hi, > > > > > > Still not available :) > > > The attachments don’t work on mailing lists. You can try uploading the > > > attachment on some public hosting site and provide the url to the same > > > here. > > > > > > Since I am not aware of the contents, I cannot confirm right away but > if > > > the reviewer feels we should have the attachment on our github repo: > > > hbase/dev-support/design-docs , good to upload the content there later. > > For > > > instance, pdf file can contain existing design and new design diagrams > > and > > > talk about pros and cons etc once we have things finalized. > > > > > > > > Since I wanted to open this for discussion, did not consider placing it > in > > *hbase/dev_support/design-docs*. > > > > > > > > > > On Mon, 25 Jan 2021 at 12:13 PM, Mallikarjun > > > > wrote: > > > > > > > Attached as image. Please let me know if it is availabe now. > > > > > > > > --- > > > > Mallikarjun > > > > > > > > > > > > On Mon, Jan 25, 2021 at 10:32 AM Sean Busbey > > wrote: > > > > > > > >> Hi! > > > >> > > > >> Thanks for the write up. unfortunately, your image for the existing > > > >> design didn't come through. Could you post it to some host and link > it > > > >> here? > > > >> > > > >> On Sun, Jan 24, 2021 at 3:12 AM Mallikarjun < > mallik.v.ar...@gmail.com > > > > > > >> wrote: > > > >> > > > > >> > Existing Design: > > > >> > > > > >> > > > > >> > > > > >> > Problem 1: > > > >> > > > > >> > With this design, Incremental and Full backup can't be run in > > parallel > > > >> and leading to degraded RPO's in case Full backup is of longer > > duration > > > esp > > > >> for large tables. > > > >> > > > > >> > Example: > > > >> > Expectation: Say you have a big table with 10 TB and your RPO is > 60 > > > >> minutes and you are allowed to ship the remote backup with 800 Mbps. > > And > > > >> you are allowed to take Full Backups once in a week and rest of them > > > should > > > >> be incremental backups > > > >> > > > > >> > Shortcoming: With the above design, one can't run parallel backups > > and > > > >> whenever there is a full backup running (which takes roughly 25 > hours) > > > you > > > >> are not allowed to take incremental backups and that would be a > breach > > > in > > > >> your RPO. > > > >> > > > > >> > Proposed Solution: Barring some critical sections such as > modifying > > > >> state of the backup on meta tables, others can happen parallelly. > > > Leaving > > > >> incremental backups to be able to run based on older successful > full / > > > >> incremental backups and completion time of backup should be used > > > instead of > > > >> start time of backup for ordering. I have not worked on the full > > > redesign, > > > >> and will be doing so if this proposal seems acceptable for the > > > community. > > > >> > > > > >> > Problem 2: > > > >> > > > > >> > With one backup at a time, it fails easily for a multi-tenant > > system. > > > >> This poses following problems > > > >> > > > > >> > Admins will not be able to achieve required RPO's for their tables > > > >> because of dependence on other tenants present in the system. As one > > > tenant > > > >> doesn't have control over other tenants' table sizes and hence the > > > duration > > > >> of the backup > > > >> > Management overhead of setting up a right sequence to achieve > > required > > > >> RPO's for different tenants could be very hard. > > > >> > > > > >> > Proposed Solution: Same as previous proposal > > > >> > > > > >> > Problem 3: > > > >> > > > > >> > Incremental backup works on WAL's and > > > >> org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures that > > > WAL's > > > >> are never cleaned up until the next backup (Full / Incremental) is > > > taken. > > > >> This poses following problem > > > >> > > > > >> > WAL's can grow unbounded in case there are transient problems like > > > >> backup site facing issues or anything else until next backup > scheduled > > > goes > > > >> successful > > > >> > > > > >> > Proposed Solution: I can't think of anything better, but I see > this > > > can > > > >> be a potential problem. Also, one can force full backup if required > > WAL > > > >> files are missing for whatever other reasons not necessarily > mentioned > > > >> above. > > > >> > > > > >> > --- > > > >> > Mallikarjun > > > >> > > > > > > > > > >
Re: [DISCUSS] Hbase Backup design changes
Hope this link works --> https://ibb.co/hYjRpgP Inline reply On Mon, Jan 25, 2021 at 1:16 PM Viraj Jasani wrote: > Hi, > > Still not available :) > The attachments don’t work on mailing lists. You can try uploading the > attachment on some public hosting site and provide the url to the same > here. > > Since I am not aware of the contents, I cannot confirm right away but if > the reviewer feels we should have the attachment on our github repo: > hbase/dev-support/design-docs , good to upload the content there later. For > instance, pdf file can contain existing design and new design diagrams and > talk about pros and cons etc once we have things finalized. > > Since I wanted to open this for discussion, did not consider placing it in *hbase/dev_support/design-docs*. > > On Mon, 25 Jan 2021 at 12:13 PM, Mallikarjun > wrote: > > > Attached as image. Please let me know if it is availabe now. > > > > --- > > Mallikarjun > > > > > > On Mon, Jan 25, 2021 at 10:32 AM Sean Busbey wrote: > > > >> Hi! > >> > >> Thanks for the write up. unfortunately, your image for the existing > >> design didn't come through. Could you post it to some host and link it > >> here? > >> > >> On Sun, Jan 24, 2021 at 3:12 AM Mallikarjun > >> wrote: > >> > > >> > Existing Design: > >> > > >> > > >> > > >> > Problem 1: > >> > > >> > With this design, Incremental and Full backup can't be run in parallel > >> and leading to degraded RPO's in case Full backup is of longer duration > esp > >> for large tables. > >> > > >> > Example: > >> > Expectation: Say you have a big table with 10 TB and your RPO is 60 > >> minutes and you are allowed to ship the remote backup with 800 Mbps. And > >> you are allowed to take Full Backups once in a week and rest of them > should > >> be incremental backups > >> > > >> > Shortcoming: With the above design, one can't run parallel backups and > >> whenever there is a full backup running (which takes roughly 25 hours) > you > >> are not allowed to take incremental backups and that would be a breach > in > >> your RPO. > >> > > >> > Proposed Solution: Barring some critical sections such as modifying > >> state of the backup on meta tables, others can happen parallelly. > Leaving > >> incremental backups to be able to run based on older successful full / > >> incremental backups and completion time of backup should be used > instead of > >> start time of backup for ordering. I have not worked on the full > redesign, > >> and will be doing so if this proposal seems acceptable for the > community. > >> > > >> > Problem 2: > >> > > >> > With one backup at a time, it fails easily for a multi-tenant system. > >> This poses following problems > >> > > >> > Admins will not be able to achieve required RPO's for their tables > >> because of dependence on other tenants present in the system. As one > tenant > >> doesn't have control over other tenants' table sizes and hence the > duration > >> of the backup > >> > Management overhead of setting up a right sequence to achieve required > >> RPO's for different tenants could be very hard. > >> > > >> > Proposed Solution: Same as previous proposal > >> > > >> > Problem 3: > >> > > >> > Incremental backup works on WAL's and > >> org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures that > WAL's > >> are never cleaned up until the next backup (Full / Incremental) is > taken. > >> This poses following problem > >> > > >> > WAL's can grow unbounded in case there are transient problems like > >> backup site facing issues or anything else until next backup scheduled > goes > >> successful > >> > > >> > Proposed Solution: I can't think of anything better, but I see this > can > >> be a potential problem. Also, one can force full backup if required WAL > >> files are missing for whatever other reasons not necessarily mentioned > >> above. > >> > > >> > --- > >> > Mallikarjun > >> > > >
Re: [DISCUSS] Hbase Backup design changes
Attached as image. Please let me know if it is availabe now. --- Mallikarjun On Mon, Jan 25, 2021 at 10:32 AM Sean Busbey wrote: > Hi! > > Thanks for the write up. unfortunately, your image for the existing > design didn't come through. Could you post it to some host and link it > here? > > On Sun, Jan 24, 2021 at 3:12 AM Mallikarjun > wrote: > > > > Existing Design: > > > > > > > > Problem 1: > > > > With this design, Incremental and Full backup can't be run in parallel > and leading to degraded RPO's in case Full backup is of longer duration esp > for large tables. > > > > Example: > > Expectation: Say you have a big table with 10 TB and your RPO is 60 > minutes and you are allowed to ship the remote backup with 800 Mbps. And > you are allowed to take Full Backups once in a week and rest of them should > be incremental backups > > > > Shortcoming: With the above design, one can't run parallel backups and > whenever there is a full backup running (which takes roughly 25 hours) you > are not allowed to take incremental backups and that would be a breach in > your RPO. > > > > Proposed Solution: Barring some critical sections such as modifying > state of the backup on meta tables, others can happen parallelly. Leaving > incremental backups to be able to run based on older successful full / > incremental backups and completion time of backup should be used instead of > start time of backup for ordering. I have not worked on the full redesign, > and will be doing so if this proposal seems acceptable for the community. > > > > Problem 2: > > > > With one backup at a time, it fails easily for a multi-tenant system. > This poses following problems > > > > Admins will not be able to achieve required RPO's for their tables > because of dependence on other tenants present in the system. As one tenant > doesn't have control over other tenants' table sizes and hence the duration > of the backup > > Management overhead of setting up a right sequence to achieve required > RPO's for different tenants could be very hard. > > > > Proposed Solution: Same as previous proposal > > > > Problem 3: > > > > Incremental backup works on WAL's and > org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures that WAL's > are never cleaned up until the next backup (Full / Incremental) is taken. > This poses following problem > > > > WAL's can grow unbounded in case there are transient problems like > backup site facing issues or anything else until next backup scheduled goes > successful > > > > Proposed Solution: I can't think of anything better, but I see this can > be a potential problem. Also, one can force full backup if required WAL > files are missing for whatever other reasons not necessarily mentioned > above. > > > > --- > > Mallikarjun >
[DISCUSS] Hbase Backup design changes
*Existing Design:* [image: image.png] *Problem 1: * With this design, Incremental and Full backup can't be run in parallel and leading to degraded RPO's in case Full backup is of longer duration esp for large tables. Example: Expectation: Say you have a big table with 10 TB and your RPO is 60 minutes and you are allowed to ship the remote backup with 800 Mbps. And you are allowed to take Full Backups once in a week and rest of them should be incremental backups Shortcoming: With the above design, one can't run parallel backups and whenever there is a full backup running (which takes roughly 25 hours) you are not allowed to take incremental backups and that would be a breach in your RPO. *Proposed Solution: *Barring some critical sections such as modifying state of the backup on meta tables, others can happen parallelly. Leaving incremental backups to be able to run based on older successful full / incremental backups and completion time of backup should be used instead of start time of backup for ordering. I have not worked on the full redesign, and will be doing so if this proposal seems acceptable for the community. *Problem 2:* With one backup at a time, it fails easily for a multi-tenant system. This poses following problems - Admins will not be able to achieve required RPO's for their tables because of dependence on other tenants present in the system. As one tenant doesn't have control over other tenants' table sizes and hence the duration of the backup - Management overhead of setting up a right sequence to achieve required RPO's for different tenants could be very hard. *Proposed Solution: *Same as previous proposal *Problem 3: * Incremental backup works on WAL's and org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures that WAL's are never cleaned up until the next backup (Full / Incremental) is taken. This poses following problem - WAL's can grow unbounded in case there are transient problems like backup site facing issues or anything else until next backup scheduled goes successful *Proposed Solution: *I can't think of anything better, but I see this can be a potential problem. Also, one can force full backup if required WAL files are missing for whatever other reasons not necessarily mentioned above. --- Mallikarjun
[jira] [Resolved] (HBASE-14414) HBase Backup/Restore Phase 3
[ https://issues.apache.org/jira/browse/HBASE-14414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun resolved HBASE-14414. - Resolution: Done > HBase Backup/Restore Phase 3 > > > Key: HBASE-14414 > URL: https://issues.apache.org/jira/browse/HBASE-14414 > Project: HBase > Issue Type: New Feature >Affects Versions: 3.0.0-alpha-1 >Reporter: Vladimir Rodionov > Assignee: Mallikarjun >Priority: Major > Labels: backup > > Umbrella ticket for Phase 3 of development -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-18892) B&R testing
[ https://issues.apache.org/jira/browse/HBASE-18892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallikarjun resolved HBASE-18892. - Resolution: Resolved > B&R testing > --- > > Key: HBASE-18892 > URL: https://issues.apache.org/jira/browse/HBASE-18892 > Project: HBase > Issue Type: Umbrella >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > > Backup & Restore testing umbrella, for all bugs dicovered -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25501) Backup not using parameters such as bandwidth, workers, etc while exporting snapshot
Mallikarjun created HBASE-25501: --- Summary: Backup not using parameters such as bandwidth, workers, etc while exporting snapshot Key: HBASE-25501 URL: https://issues.apache.org/jira/browse/HBASE-25501 Project: HBase Issue Type: Bug Reporter: Mallikarjun Assignee: Mallikarjun -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: DISCUSS: Remove hbase-backup from master?
Yea. I have noticed that. I have some patches ready and have to add unit tests. Will start raising in a couple of weeks time. --- Mallikarjun On Thu, Dec 24, 2020 at 7:48 AM 张铎(Duo Zhang) wrote: > The UTs in the backup module are easy to fail with NPE, I've seen this in > several pre commit results. > > Any progress here? > > mallik.v.ar...@gmail.com 于2020年12月3日周四 > 上午9:58写道: > > > On Tue, Dec 1, 2020 at 11:26 PM Stack wrote: > > > > > On Tue, Dec 1, 2020 at 6:38 AM mallik.v.ar...@gmail.com < > > > mallik.v.ar...@gmail.com> wrote: > > > > > > > On Tue, Dec 1, 2020 at 7:34 PM Sean Busbey > wrote: > > > > > > > > > One reason for moving it out of core is that at some point we will > > have > > > > > hbase 3 releases. Right now hbase 3 targeted things are in the > master > > > > > branch. If the feature is not sufficiently ready to be in a release > > > then > > > > > when the time for HBase 3 releases comes we'll have to do work to > > pull > > > it > > > > > out of the release branch. If the feature needs to stay in the core > > > repo > > > > > that will necessitate that hbase 3 have a branch distinct from the > > > master > > > > > branch (which may or may not happen anyways). At that point we risk > > > > having > > > > > to do the work to remove the feature from a release branch again > come > > > > hbase > > > > > 4. > > > > > > > > > > > > > > > I think a lot of this is moot if you'd like to start providing > > patches > > > > for > > > > > the things you've needed to use it. If there are gaps that you > think > > > can > > > > > trip folks up maybe we could label it "experimental" to provide > > better > > > > > context for others. > > > > > > > > > > > > > I will start putting effort in maintaining this feature. > > > > > > > > > > > FYI, seems like backup has a bunch of flakey tests. Might be worth > > looking > > > at. > > > > > > > Any reference I can get. They seem fine when I run tests. > > > > > > > Thanks, > > > S > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Dec 1, 2020, 07:48 mallik.v.ar...@gmail.com < > > > > > mallik.v.ar...@gmail.com> wrote: > > > > > > > > > > > On Tue, Dec 1, 2020 at 12:14 PM Stack wrote: > > > > > > > > > > > > > On Mon, Nov 30, 2020 at 9:30 PM mallik.v.ar...@gmail.com < > > > > > > > mallik.v.ar...@gmail.com> wrote: > > > > > > > > > > > > > > > Inline > > > > > > > > > > > > > > > > On Tue, Dec 1, 2020, 10:10 AM Andrew Purtell < > > > > > andrew.purt...@gmail.com > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > We are allowed to debate if this should be in the tree or > > not. > > > > > Given > > > > > > > the > > > > > > > > > lack of interest, as evidenced by incomplete state, lack of > > > > > release, > > > > > > > and > > > > > > > > > lack of contribution, it is more than fair to discuss > > removal. > > > > > > > > > > > > > > > > > > Here is my take: First of all, it is not released. There is > > no > > > > > > implied > > > > > > > > > roadmap or support. Second, there do not seem to be any > > active > > > > > > > > maintainers > > > > > > > > > or volunteers as such. Third, unless someone shows up with > > more > > > > > > patches > > > > > > > > for > > > > > > > > > it there will be no polish or maturing, there can be no > > > > > expectations > > > > > > of > > > > > > > > > further improvement. > > > > > > > > > > > > > > > > > > That said, this is open source. New code contribution will > > > change > > > > >
[jira] [Created] (HBASE-24931) Candidate Generator helper Action method ignoring 0th index region
Mallikarjun created HBASE-24931: --- Summary: Candidate Generator helper Action method ignoring 0th index region Key: HBASE-24931 URL: https://issues.apache.org/jira/browse/HBASE-24931 Project: HBase Issue Type: Bug Components: Balancer Reporter: Mallikarjun Assignee: Mallikarjun Balance Candidate generators such as `LocalityBasedCandidateGenerator`, `RegionReplicaCandidateGenerator`, `RegionReplicaRackCandidateGenerator`, etc uses helper method `getAction` to generate action is ignoring 0th index of `fromRegion` and `toRegion`. {code:java} protected BaseLoadBalancer.Cluster.Action getAction(int fromServer, int fromRegion, int toServer, int toRegion) { if (fromServer < 0 || toServer < 0) { return BaseLoadBalancer.Cluster.NullAction; } if (fromRegion > 0 && toRegion > 0) { return new BaseLoadBalancer.Cluster.SwapRegionsAction(fromServer, fromRegion, toServer, toRegion); } else if (fromRegion > 0) { return new BaseLoadBalancer.Cluster.MoveRegionAction(fromRegion, fromServer, toServer); } else if (toRegion > 0) { return new BaseLoadBalancer.Cluster.MoveRegionAction(toRegion, toServer, fromServer); } else { return BaseLoadBalancer.Cluster.NullAction; } } {code} Is this unintentional or is there some particular reason? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24157) RSGroup Aware Export Snapshot
Mallikarjun created HBASE-24157: --- Summary: RSGroup Aware Export Snapshot Key: HBASE-24157 URL: https://issues.apache.org/jira/browse/HBASE-24157 Project: HBase Issue Type: Improvement Components: snapshots Reporter: Mallikarjun Export Snapshot to a destination cluster which is RSGroup aware can lead to data spill outside of destination RSGroup. This improvement aims at ensuring export snapshort to be aware of destination RSGroup nodes and placing the data only in those boxes. -- This message was sent by Atlassian Jira (v8.3.4#803005)