[jira] [Resolved] (HBASE-27931) Update hadoop.version from 3.3.5 to 3.3.6

2024-10-10 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-27931.
-
Resolution: Duplicate

The plan is to go directly to 3.4.0.


> Update hadoop.version from 3.3.5 to 3.3.6
> -
>
> Key: HBASE-27931
> URL: https://issues.apache.org/jira/browse/HBASE-27931
> Project: HBase
>  Issue Type: Improvement
>Reporter: Wei-Chiu Chuang
>    Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
>
> HBase's default Hadoop3 version is 3.2.4 but HBase already supports Haddoop 
> 3.3.x.
> Hadoop 3.2 line has not been updated for over a year. It is perhaps the time 
> to update the Hadoop dependency to the 3.3.x line. (I'll start a DISCUSS 
> thread if the test goes well)
> 3.3.6 RC is out which fixed a bunch of CVEs and I'd like to test HBase 
> against it. Additionally, Hadoop 3.3.6 will permit us to use non-HDFS as WAL 
> storage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Using Hadoop 3.3.6 and 3.4.0 as default versions

2024-10-08 Thread Istvan Toth
I have the PR up for the test changes (for master) in HBASE-28906.

You can see its output at the slightly hacked up test job + PR that I used
for development:
https://ci-hbase.apache.org/job/Test%20script%20for%20nightly%20script/job/HBASE-28846/

Also, these tests WILL fail, until HBASE-28721
<https://github.com/apache/hbase/pull/6270> is fixed.
I have a PR up for HBASE-28721 <https://github.com/apache/hbase/pull/6270>
 at https://github.com/apache/hbase/pull/6269
( and https://github.com/apache/hbase/pull/6270 for branch-2)



On Fri, Sep 27, 2024 at 7:34 PM Istvan Toth  wrote:

> I was too hasty to reply. You did not suggest dropping support for older
> releases in the released branches.
>
> Still, I would keep Hadoop 3.3.x support at least on branch-2.
> We effectively need to support  Hadoop 3.3.x for 2.5 and 2.6 anyway, so
> supporting 3.3 on more branches only costs us some machine time for
> running the nightly tests.
>
> On Fri, Sep 27, 2024 at 7:26 PM Istvan Toth  wrote:
>
>> I am not sure that we should drop support for all releases with CVEs.
>> That could easily lead to not having any CVE-less Hadoop releases to
>> support for some periods of time.
>>
>> Once I manage the proper backwards test matrix working, we could support
>> more releases in parallel.
>>
>> I would suggest deciding on a set of Hadoop releases when we release a
>> new minor version,
>> and trying to keep support for those for the life of that HBase minor
>> version, regardless of any CVEs. (We can of course add support for newer
>> Hadoop versions)
>>
>> We do something similar in Phoenix WRT HBase versions.
>>
>> If the default is the newest Hadoop release, then our binaries will be as
>> CVE-free as possible, while not forcing existing users to upgrade Hadoop
>> before they upgrade HBase.
>>
>> Istvan
>>
>>
>>
>>
>>
>>
>> On Fri, Sep 27, 2024 at 2:07 PM 张铎(Duo Zhang) 
>> wrote:
>>
>>> There is a CVE for hadoop version less than 3.4.0...
>>>
>>> https://nvd.nist.gov/vuln/detail/CVE-2024-23454
>>>
>>> Maybe we can only support hadoop 3.4.0 on master, branch-3 and
>>> branch-2(for hadoop 3) now?
>>>
>>> Thanks.
>>>
>>> Istvan Toth  于2024年9月27日周五 16:16写道:
>>> >
>>> > I'm still working on this, but I struggle a lot with the Jenkinsfile,
>>> and
>>> > some Yetus bugs also make testing hard.
>>> >
>>> > On Thu, Sep 19, 2024 at 9:25 AM Istvan Toth 
>>> wrote:
>>> >
>>> > > Master is still going to  support 3.3.5, so master still needs to use
>>> > > reflection as well.
>>> > >
>>> > > The point of these changes is allowing Hbase to default to a newer
>>> Hadoop
>>> > > version, without dropping support for older releases.
>>> > >
>>> > > Dropping support for 3.3.5 would be a different discussion, and I
>>> > > personally feel that it would be too early.
>>> > >
>>> > > Istvan
>>> > >
>>> > > On Wed, Sep 18, 2024 at 7:46 PM Wei-Chiu Chuang
>>> > >  wrote:
>>> > >
>>> > >> SoI was wondering now that we'll be using Hadoop 3.4.0, if it's
>>> okay
>>> > >> to
>>> > >> port HBASE-27769 <https://issues.apache.org/jira/browse/HBASE-27769>
>>> to
>>> > >> the
>>> > >> master branch.
>>> > >> This will allow Ozone to be used by HBase. We are preparing Apache
>>> Ozone
>>> > >> 2.0 release and having a usable Apache HBase to work with is
>>> important. It
>>> > >> is working now with Cloudera's HBase work but I'd like to open up
>>> this
>>> > >> opportunity to the community as well.
>>> > >>
>>> > >> We can start with master, and then I can find a solution (something
>>> that
>>> > >> involves the use of reflection) and backport to lower branches.
>>> > >> Ultimately,
>>> > >> release a version of HBase with this feature.
>>> > >>
>>> > >> cc: Stephen.
>>> > >>
>>> > >> On Wed, Sep 18, 2024 at 12:08 AM Istvan Toth
>>> 
>>> > >> wrote:
>>> > >>
>>> > >> > Created a new ticket, as the old one was for 3.3.6 but we've
>>> agreed on
>>> > >

Re: [DISCUSS] Migrate from Jenkins to GitHub Actions for CI

2024-10-08 Thread Istvan Toth
I'm not sure about enabling github issues.

We already have the mailing lists, JIRA, and the pull requests to keep
track of, I'm afraid that adding another forum would overcomplicate things.

IMO migrating to GH actions and using GH issues are independent from each
other.

The current JIRA signup process is definitely bad, we are often supposed to
be making decisions on cutesy usernames without
any real context, and we cannot even ask for more information.

Maybe we could add some kind of form to the docs where we list some
questions from ppl trying to sign up that would be too much work
for a spammer ?

Istvan




On Tue, Oct 8, 2024 at 9:30 AM 张铎(Duo Zhang)  wrote:

> Oh, typo, PMCs -> PMC members
>
> 张铎(Duo Zhang)  于2024年10月8日周二 11:46写道:
> >
> > For me I think we could first enable the github issues, for users to
> > ask questions and discuss things. And if there are actual bugs or
> > something which require code changes, we could file an jira, and also
> > let the contributor to register a jira account.
> > I think this is also easier for our PMCs to decide whether a jira
> > account is necessary for a given user comparing to the current
> > workflow. The private mailing list is full of jira registrations
> > notifications and hard to find other useful information...
> >
> > And on moving to github actions, in general I'm +1 on this. We should
> > try to follow modern ways.
> >
> > And on the funding side, we still have 10 machines, we could contact
> > INFRA to see how to make use of these machines if we switch to github
> > actions.
> >
> > Thanks.
> >
> > Istvan Toth  于2024年10月2日周三 17:31写道:
> > >
> > > I've been working on modifying the existing Jenkinsfile, and it has
> been a
> > > horrible experience, especially as I'm trying to mix declarative and
> > > scripted syntax.
> > > I think from a usability standpoint GH actions would be a win.
> > >
> > > On the other hand, our Jenkinsfiles don't do that much, as most of the
> > > actual CI process is performed via Yetus, so migration shouldn't be a
> huge
> > > amount of work.
> > >
> > > I seem to recall seeing similar discussions on ASF mailing lists, but I
> > > haven't followed them closely.
> > >
> > > Istvan
> > >
> > > Istvan
> > >
> > > On Wed, Oct 2, 2024 at 11:23 AM Nick Dimiduk 
> wrote:
> > >
> > > > Heya,
> > > >
> > > > I'd like to take the community temperature on migrating our build
> infra
> > > > from the ci-hbase.a.o Jenkins instance to something built on GitHub
> > > > Actions. I have several reasons that justify this proposal.
> > > >
> > > > As some of you may know, our community funding has reduced and we
> will no
> > > > longer be able to sustain the current fleet of build infrastructure.
> So,
> > > > one motivation for this proposal is cost-cutting: I think that we'll
> be
> > > > able to operate at lower costs if we can migrate to a
> provisioned-as-needed
> > > > model of consumption.
> > > >
> > > > My second reason is an optimistic appeal to a larger contributor
> base. I
> > > > suspect that if we can modernize our infrastructure then we will
> increase
> > > > the pool of contributors who might be able to participate in this
> area. I
> > > > believe that GH Actions (and systems like it) is more prevalent in
> the
> > > > industry than Jenkins, which means that more people already have
> experience
> > > > with the platform and more people will feel compelled to offer
> support to
> > > > an OSS project that uses the platform as a means of growing their own
> > > > skillset and as a means of bolstering their CVs.
> > > >
> > > > Dove-tailed into reason two is reason three: I believe that there is
> a
> > > > large community of folks who are developing GitHub Actions on its
> > > > marketplace. We would effectively open ourselves up to more
> off-the-shelf
> > > > offerings and those offerings would be in our hands directly. By
> contrast,
> > > > I don't think there's as much development in Jenkins plugins, and the
> > > > process of adding a new plugin to our Jenkins instance requires
> filing an
> > > > INFRA ticket.
> > > >
> > > > These are my motivations. I'm still not clear on what's possible yet
> for
> > > > ASF proje

[jira] [Created] (HBASE-28906) Run nightly tests with multiple Hadoop 3 versions

2024-10-07 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28906:
---

 Summary: Run nightly tests with multiple Hadoop 3 versions
 Key: HBASE-28906
 URL: https://issues.apache.org/jira/browse/HBASE-28906
 Project: HBase
  Issue Type: Sub-task
  Components: integration tests, test
Reporter: Istvan Toth
Assignee: Istvan Toth






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] The first release candidate for Apache HBase 2.6.1 is available

2024-10-03 Thread Istvan Toth
That's a different issue, these incompatibilities are between 2.6.0 and
2.6.1.

The compatibility promises are different between minor and patch versions.

Istvan

On Thu, Oct 3, 2024 at 6:36 PM Nihal Jain  wrote:

> Ah yes, hbase-connectors was also failing due to this incompatibilty: "
> HBASE-28790 
> hbase-connectors
> fails to build with hbase 2.6.0".
>
> On Thu, 3 Oct 2024 at 6:24 PM, Nick Dimiduk  wrote:
>
> > I think that we also have source-incompatible additions. The new methods
> > setRequestAttributes on AsyncBufferedMutatorBuilder will result in a
> > compilation failure for anyone providing their own implementation of that
> > class. These came with HBASE-28001: Add request attribute support to
> > BufferedMutator (#6076)
> >
> > On Tue, Oct 1, 2024 at 10:51 AM Nick Dimiduk 
> wrote:
> >
> > > Please vote on this Apache hbase release candidate,
> > > hbase-2.6.1RC0
> > >
> > > The VOTE will remain open for at least 72 hours.
> > >
> > > [ ] +1 Release this package as Apache hbase 2.6.1
> > > [ ] -1 Do not release this package because ...
> > >
> > > The tag to be voted on is 2.6.1RC0:
> > >
> > >   https://github.com/apache/hbase/tree/2.6.1RC0
> > >
> > > This tag currently points to git reference
> > >
> > >   ea9ffa81213bfe2d8d764838c7b962c2151624f1
> > >
> > > The release files, including signatures, digests, as well as CHANGES.md
> > > and RELEASENOTES.md included in this RC can be found at:
> > >
> > >   https://dist.apache.org/repos/dist/dev/hbase/2.6.1RC0/
> > >
> > > Maven artifacts are available in a staging repository at:
> > >
> > >
> > https://repository.apache.org/content/repositories/orgapachehbase-1558/
> > >
> > > Maven artifacts for hadoop3 are available in a staging repository at:
> > >
> > >
> > https://repository.apache.org/content/repositories/orgapachehbase-1559/
> > >
> > > Artifacts were signed with the 0xEF4EBF27 key which can be found in:
> > >
> > >   https://downloads.apache.org/hbase/KEYS
> > >
> > > To learn more about Apache hbase, please see
> > >
> > >   http://hbase.apache.org/
> > >
> > > Thanks,
> > > Your HBase Release Manager
> > >
> >
>


-- 
*István Tóth* | Sr. Staff Software Engineer
*Email*: st...@cloudera.com
cloudera.com 
[image: Cloudera] 
[image: Cloudera on Twitter]  [image:
Cloudera on Facebook]  [image: Cloudera
on LinkedIn] 
--
--


Re: [VOTE] The first release candidate for Apache HBase 2.6.1 is available

2024-10-03 Thread Istvan Toth
I see a similar issue when trying to build Phoenix with it :

[ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-compiler-plugin:3.11.0:compile
> (default-compile) on project phoenix-core-server: Compilation failure:
> Compilation failure:
> [ERROR]
> /home/stoty/workspaces/apache-phoenix/phoenix/phoenix-core-server/src/main/java/org/apache/phoenix/coprocessor/DelegateRegionCoprocessorEnvironment.java:[41,8]
> org.apache.phoenix.coprocessor.DelegateRegionCoprocessorEnvironment is not
> abstract and does not override abstract method
> checkBatchQuota(org.apache.hadoop.hbase.regionserver.Region,int,int) in
> org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment
> [ERROR]
> /home/stoty/workspaces/apache-phoenix/phoenix/phoenix-core-server/src/main/java/org/apache/phoenix/iterate/SnapshotScanner.java:[201,47]
>  is not abstract
> and does not override abstract method
> checkBatchQuota(org.apache.hadoop.hbase.regionserver.Region,int,int) in
> org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment
>

RegionCoprocessorEnvironment is @InterfaceAudience.LimitedPrivate , so this
should probably also be treated in a similar way.

We certainly can fix this in Phoenix, but only in a new release.

Istvam


On Thu, Oct 3, 2024 at 10:21 AM Nick Dimiduk  wrote:

> -1
>
> In reviewing the compatibility report, I think that we have a breaking
> change to the AsyncTable interface [0]. We have introduced a new method
> without a default implementation. I believe that this is contrary to our
> Client Binary Compatibility [1] statement:
>
> > Client code written to APIs available in a given patch release can run
> unchanged (no recompilation needed) against the new jars of later patch
> versions.
>
> I believe that this change was introduced via HBASE-28770: Support partial
> results in AggregateImplementation and AsyncAggregationClient (#6167) [2].
>
> Let's see if we can add a default implementation that preserves binary
> compatibility.
>
> Thanks,
> Nick
>
> [0]:
>
> https://dist.apache.org/repos/dist/dev/hbase/2.6.1RC0/api_compare_2.6.0_to_2.6.1RC0.html#Type_Binary_Problems_Medium
> [1]: https://hbase.apache.org/book.html#hbase.versioning
> [2]: https://issues.apache.org/jira/browse/HBASE-28770
>
> On Tue, Oct 1, 2024 at 10:51 AM Nick Dimiduk  wrote:
>
> > Please vote on this Apache hbase release candidate,
> > hbase-2.6.1RC0
> >
> > The VOTE will remain open for at least 72 hours.
> >
> > [ ] +1 Release this package as Apache hbase 2.6.1
> > [ ] -1 Do not release this package because ...
> >
> > The tag to be voted on is 2.6.1RC0:
> >
> >   https://github.com/apache/hbase/tree/2.6.1RC0
> >
> > This tag currently points to git reference
> >
> >   ea9ffa81213bfe2d8d764838c7b962c2151624f1
> >
> > The release files, including signatures, digests, as well as CHANGES.md
> > and RELEASENOTES.md included in this RC can be found at:
> >
> >   https://dist.apache.org/repos/dist/dev/hbase/2.6.1RC0/
> >
> > Maven artifacts are available in a staging repository at:
> >
> >
> https://repository.apache.org/content/repositories/orgapachehbase-1558/
> >
> > Maven artifacts for hadoop3 are available in a staging repository at:
> >
> >
> https://repository.apache.org/content/repositories/orgapachehbase-1559/
> >
> > Artifacts were signed with the 0xEF4EBF27 key which can be found in:
> >
> >   https://downloads.apache.org/hbase/KEYS
> >
> > To learn more about Apache hbase, please see
> >
> >   http://hbase.apache.org/
> >
> > Thanks,
> > Your HBase Release Manager
> >
>


-- 
*István Tóth* | Sr. Staff Software Engineer
*Email*: st...@cloudera.com
cloudera.com 
[image: Cloudera] 
[image: Cloudera on Twitter]  [image:
Cloudera on Facebook]  [image: Cloudera
on LinkedIn] 
--
--


[jira] [Resolved] (HBASE-26701) KafkaProxy enablePeer parameter mismatch

2024-10-03 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-26701.
-
Fix Version/s: hbase-connectors-1.1.0
   Resolution: Fixed

Committed.
Thanks for the patch [~xiaozhang] and the review [~nihaljain.cs].

> KafkaProxy enablePeer parameter mismatch
> 
>
> Key: HBASE-26701
> URL: https://issues.apache.org/jira/browse/HBASE-26701
> Project: HBase
>  Issue Type: Bug
>  Components: hbase-connectors
>Reporter: Xiao Zhang
>Assignee: Xiao Zhang
>Priority: Trivial
> Fix For: hbase-connectors-1.1.0
>
>
> In KafkaProxy class, enable peer by setting "e"
> {code:java}
> options.addOption("e", "enablepeer", false,
> "enable peer on startup (defaults to false)"); {code}
> but it actually uses "a":
> {code:java}
> if (commandLine.hasOption('a')){
>   createPeer=true;
> }
> if (commandLine.hasOption("a")){
>   enablePeer=true;
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Migrate from Jenkins to GitHub Actions for CI

2024-10-02 Thread Istvan Toth
I've been working on modifying the existing Jenkinsfile, and it has been a
horrible experience, especially as I'm trying to mix declarative and
scripted syntax.
I think from a usability standpoint GH actions would be a win.

On the other hand, our Jenkinsfiles don't do that much, as most of the
actual CI process is performed via Yetus, so migration shouldn't be a huge
amount of work.

I seem to recall seeing similar discussions on ASF mailing lists, but I
haven't followed them closely.

Istvan

Istvan

On Wed, Oct 2, 2024 at 11:23 AM Nick Dimiduk  wrote:

> Heya,
>
> I'd like to take the community temperature on migrating our build infra
> from the ci-hbase.a.o Jenkins instance to something built on GitHub
> Actions. I have several reasons that justify this proposal.
>
> As some of you may know, our community funding has reduced and we will no
> longer be able to sustain the current fleet of build infrastructure. So,
> one motivation for this proposal is cost-cutting: I think that we'll be
> able to operate at lower costs if we can migrate to a provisioned-as-needed
> model of consumption.
>
> My second reason is an optimistic appeal to a larger contributor base. I
> suspect that if we can modernize our infrastructure then we will increase
> the pool of contributors who might be able to participate in this area. I
> believe that GH Actions (and systems like it) is more prevalent in the
> industry than Jenkins, which means that more people already have experience
> with the platform and more people will feel compelled to offer support to
> an OSS project that uses the platform as a means of growing their own
> skillset and as a means of bolstering their CVs.
>
> Dove-tailed into reason two is reason three: I believe that there is a
> large community of folks who are developing GitHub Actions on its
> marketplace. We would effectively open ourselves up to more off-the-shelf
> offerings and those offerings would be in our hands directly. By contrast,
> I don't think there's as much development in Jenkins plugins, and the
> process of adding a new plugin to our Jenkins instance requires filing an
> INFRA ticket.
>
> These are my motivations. I'm still not clear on what's possible yet for
> ASF projects. I have filed an INFRA ticket, requesting whatever is
> necessary for us to start an experiment. Indeed, I believe that there are
> some major limitations on the current implementation provided by the ASF,
> and as far as I can tell, only one project with a build footprint that
> resembles HBase has pursued this effort. I've catalogued the applicable
> information that I've found so far on that issue.
>
> https://issues.apache.org/jira/browse/INFRA-26170
>
> Thanks,
> Nick
>


-- 
*István Tóth* | Sr. Staff Software Engineer
*Email*: st...@cloudera.com
cloudera.com 
[image: Cloudera] 
[image: Cloudera on Twitter]  [image:
Cloudera on Facebook]  [image: Cloudera
on LinkedIn] 
--
--


[jira] [Resolved] (HBASE-26841) Replace log4j with reload4j for hbase-connector

2024-10-02 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-26841.
-
Resolution: Duplicate

We've already moved to log4j2.

> Replace log4j with reload4j for hbase-connector
> ---
>
> Key: HBASE-26841
> URL: https://issues.apache.org/jira/browse/HBASE-26841
> Project: HBase
>  Issue Type: Task
>  Components: hbase-connectors
>Affects Versions: 1.0.0
>Reporter: Tak-Lon (Stephen) Wu
>Assignee: Tak-Lon (Stephen) Wu
>Priority: Major
> Fix For: 1.1.0
>
>
> As a result of the [discussion thread of replacing log4j with 
> reload4j|https://lists.apache.org/thread/kfmjg6zmvdjqcwolj0oh634nzv42y806] 
> and HBASE-26691, we have replaced log4j with reload4j in branch-2. 
> Where hbase-connector as a part of the community, this JIRA propose to align 
> with the active hbase release and replace log4j with reload4j before the next 
> release of hbase-connector



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Using Hadoop 3.3.6 and 3.4.0 as default versions

2024-09-27 Thread Istvan Toth
I was too hasty to reply. You did not suggest dropping support for older
releases in the released branches.

Still, I would keep Hadoop 3.3.x support at least on branch-2.
We effectively need to support  Hadoop 3.3.x for 2.5 and 2.6 anyway, so
supporting 3.3 on more branches only costs us some machine time for
running the nightly tests.

On Fri, Sep 27, 2024 at 7:26 PM Istvan Toth  wrote:

> I am not sure that we should drop support for all releases with CVEs.
> That could easily lead to not having any CVE-less Hadoop releases to
> support for some periods of time.
>
> Once I manage the proper backwards test matrix working, we could support
> more releases in parallel.
>
> I would suggest deciding on a set of Hadoop releases when we release a new
> minor version,
> and trying to keep support for those for the life of that HBase minor
> version, regardless of any CVEs. (We can of course add support for newer
> Hadoop versions)
>
> We do something similar in Phoenix WRT HBase versions.
>
> If the default is the newest Hadoop release, then our binaries will be as
> CVE-free as possible, while not forcing existing users to upgrade Hadoop
> before they upgrade HBase.
>
> Istvan
>
>
>
>
>
>
> On Fri, Sep 27, 2024 at 2:07 PM 张铎(Duo Zhang) 
> wrote:
>
>> There is a CVE for hadoop version less than 3.4.0...
>>
>> https://nvd.nist.gov/vuln/detail/CVE-2024-23454
>>
>> Maybe we can only support hadoop 3.4.0 on master, branch-3 and
>> branch-2(for hadoop 3) now?
>>
>> Thanks.
>>
>> Istvan Toth  于2024年9月27日周五 16:16写道:
>> >
>> > I'm still working on this, but I struggle a lot with the Jenkinsfile,
>> and
>> > some Yetus bugs also make testing hard.
>> >
>> > On Thu, Sep 19, 2024 at 9:25 AM Istvan Toth  wrote:
>> >
>> > > Master is still going to  support 3.3.5, so master still needs to use
>> > > reflection as well.
>> > >
>> > > The point of these changes is allowing Hbase to default to a newer
>> Hadoop
>> > > version, without dropping support for older releases.
>> > >
>> > > Dropping support for 3.3.5 would be a different discussion, and I
>> > > personally feel that it would be too early.
>> > >
>> > > Istvan
>> > >
>> > > On Wed, Sep 18, 2024 at 7:46 PM Wei-Chiu Chuang
>> > >  wrote:
>> > >
>> > >> SoI was wondering now that we'll be using Hadoop 3.4.0, if it's
>> okay
>> > >> to
>> > >> port HBASE-27769 <https://issues.apache.org/jira/browse/HBASE-27769>
>> to
>> > >> the
>> > >> master branch.
>> > >> This will allow Ozone to be used by HBase. We are preparing Apache
>> Ozone
>> > >> 2.0 release and having a usable Apache HBase to work with is
>> important. It
>> > >> is working now with Cloudera's HBase work but I'd like to open up
>> this
>> > >> opportunity to the community as well.
>> > >>
>> > >> We can start with master, and then I can find a solution (something
>> that
>> > >> involves the use of reflection) and backport to lower branches.
>> > >> Ultimately,
>> > >> release a version of HBase with this feature.
>> > >>
>> > >> cc: Stephen.
>> > >>
>> > >> On Wed, Sep 18, 2024 at 12:08 AM Istvan Toth
>> 
>> > >> wrote:
>> > >>
>> > >> > Created a new ticket, as the old one was for 3.3.6 but we've
>> agreed on
>> > >> > 3.4.0, and expanded the scope.
>> > >> >
>> > >> > https://issues.apache.org/jira/browse/HBASE-28846
>> > >> >
>> > >> > On Tue, Sep 17, 2024 at 3:47 PM 张铎(Duo Zhang) <
>> palomino...@gmail.com>
>> > >> > wrote:
>> > >> >
>> > >> > > I think we could start the work from branch-2.6? Branch-2.5
>> should
>> > >> > > reach its EOL soon, after 2.6.x is stable enough.
>> > >> > >
>> > >> > > In this way we only need to deal with 3.3.x and 3.4.x.
>> > >> > >
>> > >> > > Istvan Toth  于2024年9月17日周二 16:56写道:
>> > >> > > >
>> > >> > > > Thanks for the assessment, Wei-Chiu.
>> > >> > > >
>> > >> > > > Transitive dependency updates in Hadoop are normal (desired
>> e

Re: [DISCUSS] Using Hadoop 3.3.6 and 3.4.0 as default versions

2024-09-27 Thread Istvan Toth
I am not sure that we should drop support for all releases with CVEs.
That could easily lead to not having any CVE-less Hadoop releases to
support for some periods of time.

Once I manage the proper backwards test matrix working, we could support
more releases in parallel.

I would suggest deciding on a set of Hadoop releases when we release a new
minor version,
and trying to keep support for those for the life of that HBase minor
version, regardless of any CVEs. (We can of course add support for newer
Hadoop versions)

We do something similar in Phoenix WRT HBase versions.

If the default is the newest Hadoop release, then our binaries will be as
CVE-free as possible, while not forcing existing users to upgrade Hadoop
before they upgrade HBase.

Istvan






On Fri, Sep 27, 2024 at 2:07 PM 张铎(Duo Zhang)  wrote:

> There is a CVE for hadoop version less than 3.4.0...
>
> https://nvd.nist.gov/vuln/detail/CVE-2024-23454
>
> Maybe we can only support hadoop 3.4.0 on master, branch-3 and
> branch-2(for hadoop 3) now?
>
> Thanks.
>
> Istvan Toth  于2024年9月27日周五 16:16写道:
> >
> > I'm still working on this, but I struggle a lot with the Jenkinsfile, and
> > some Yetus bugs also make testing hard.
> >
> > On Thu, Sep 19, 2024 at 9:25 AM Istvan Toth  wrote:
> >
> > > Master is still going to  support 3.3.5, so master still needs to use
> > > reflection as well.
> > >
> > > The point of these changes is allowing Hbase to default to a newer
> Hadoop
> > > version, without dropping support for older releases.
> > >
> > > Dropping support for 3.3.5 would be a different discussion, and I
> > > personally feel that it would be too early.
> > >
> > > Istvan
> > >
> > > On Wed, Sep 18, 2024 at 7:46 PM Wei-Chiu Chuang
> > >  wrote:
> > >
> > >> SoI was wondering now that we'll be using Hadoop 3.4.0, if it's
> okay
> > >> to
> > >> port HBASE-27769 <https://issues.apache.org/jira/browse/HBASE-27769>
> to
> > >> the
> > >> master branch.
> > >> This will allow Ozone to be used by HBase. We are preparing Apache
> Ozone
> > >> 2.0 release and having a usable Apache HBase to work with is
> important. It
> > >> is working now with Cloudera's HBase work but I'd like to open up this
> > >> opportunity to the community as well.
> > >>
> > >> We can start with master, and then I can find a solution (something
> that
> > >> involves the use of reflection) and backport to lower branches.
> > >> Ultimately,
> > >> release a version of HBase with this feature.
> > >>
> > >> cc: Stephen.
> > >>
> > >> On Wed, Sep 18, 2024 at 12:08 AM Istvan Toth
> 
> > >> wrote:
> > >>
> > >> > Created a new ticket, as the old one was for 3.3.6 but we've agreed
> on
> > >> > 3.4.0, and expanded the scope.
> > >> >
> > >> > https://issues.apache.org/jira/browse/HBASE-28846
> > >> >
> > >> > On Tue, Sep 17, 2024 at 3:47 PM 张铎(Duo Zhang) <
> palomino...@gmail.com>
> > >> > wrote:
> > >> >
> > >> > > I think we could start the work from branch-2.6? Branch-2.5 should
> > >> > > reach its EOL soon, after 2.6.x is stable enough.
> > >> > >
> > >> > > In this way we only need to deal with 3.3.x and 3.4.x.
> > >> > >
> > >> > > Istvan Toth  于2024年9月17日周二 16:56写道:
> > >> > > >
> > >> > > > Thanks for the assessment, Wei-Chiu.
> > >> > > >
> > >> > > > Transitive dependency updates in Hadoop are normal (desired
> even),
> > >> > that's
> > >> > > > something that HBase needs to manage.
> > >> > > >
> > >> > > > As for the test:
> > >> > > >
> > >> > > > - Duo's suggestion is to extend the Hadoop compatibility tests,
> and
> > >> run
> > >> > > > them with multiple Hadoop 3 releases.
> > >> > > > Looking at the nightly results, those tests are fast, it takes
> 14
> > >> > minutes
> > >> > > > for Hadoop2 and Hadoop3.
> > >> > > > I've peeked into hbase_nightly_pseudo-distributed-test.sh , the
> > >> tests
> > >> > > there
> > >> > > > are indeed quite minimal,
&

[jira] [Resolved] (HBASE-28645) Add build information to the REST server version endpoint

2024-09-27 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-28645.
-
Fix Version/s: 3.0.0-beta-2
   2.6.1
   2.5.11
   Resolution: Fixed

Committed to all active branches.
Thanks for the patch [~paksyd], and for the review [~nihaljain.cs].

> Add build information to the REST server version endpoint
> -
>
> Key: HBASE-28645
> URL: https://issues.apache.org/jira/browse/HBASE-28645
> Project: HBase
>  Issue Type: New Feature
>  Components: REST
>    Reporter: Istvan Toth
>Assignee: Dávid Paksy
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.0.0-beta-2, 2.6.1, 2.5.11
>
>
> There is currently no way to check the REST server version / build number 
> remotely.
> The */version/cluster* endpoint takes the version from master (fair enough),
> and the */version/rest* does not include the build information.
> We should add a version field to the /version/rest endpoint, which reports 
> the version of the REST server component.
> We should also log this at startup, just like we log the cluster version now.
> We may have to add and store the version in the hbase-rest code during build, 
> similarly to how do it for the other components.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Using Hadoop 3.3.6 and 3.4.0 as default versions

2024-09-27 Thread Istvan Toth
I'm still working on this, but I struggle a lot with the Jenkinsfile, and
some Yetus bugs also make testing hard.

On Thu, Sep 19, 2024 at 9:25 AM Istvan Toth  wrote:

> Master is still going to  support 3.3.5, so master still needs to use
> reflection as well.
>
> The point of these changes is allowing Hbase to default to a newer Hadoop
> version, without dropping support for older releases.
>
> Dropping support for 3.3.5 would be a different discussion, and I
> personally feel that it would be too early.
>
> Istvan
>
> On Wed, Sep 18, 2024 at 7:46 PM Wei-Chiu Chuang
>  wrote:
>
>> SoI was wondering now that we'll be using Hadoop 3.4.0, if it's okay
>> to
>> port HBASE-27769 <https://issues.apache.org/jira/browse/HBASE-27769> to
>> the
>> master branch.
>> This will allow Ozone to be used by HBase. We are preparing Apache Ozone
>> 2.0 release and having a usable Apache HBase to work with is important. It
>> is working now with Cloudera's HBase work but I'd like to open up this
>> opportunity to the community as well.
>>
>> We can start with master, and then I can find a solution (something that
>> involves the use of reflection) and backport to lower branches.
>> Ultimately,
>> release a version of HBase with this feature.
>>
>> cc: Stephen.
>>
>> On Wed, Sep 18, 2024 at 12:08 AM Istvan Toth 
>> wrote:
>>
>> > Created a new ticket, as the old one was for 3.3.6 but we've agreed on
>> > 3.4.0, and expanded the scope.
>> >
>> > https://issues.apache.org/jira/browse/HBASE-28846
>> >
>> > On Tue, Sep 17, 2024 at 3:47 PM 张铎(Duo Zhang) 
>> > wrote:
>> >
>> > > I think we could start the work from branch-2.6? Branch-2.5 should
>> > > reach its EOL soon, after 2.6.x is stable enough.
>> > >
>> > > In this way we only need to deal with 3.3.x and 3.4.x.
>> > >
>> > > Istvan Toth  于2024年9月17日周二 16:56写道:
>> > > >
>> > > > Thanks for the assessment, Wei-Chiu.
>> > > >
>> > > > Transitive dependency updates in Hadoop are normal (desired even),
>> > that's
>> > > > something that HBase needs to manage.
>> > > >
>> > > > As for the test:
>> > > >
>> > > > - Duo's suggestion is to extend the Hadoop compatibility tests, and
>> run
>> > > > them with multiple Hadoop 3 releases.
>> > > > Looking at the nightly results, those tests are fast, it takes 14
>> > minutes
>> > > > for Hadoop2 and Hadoop3.
>> > > > I've peeked into hbase_nightly_pseudo-distributed-test.sh , the
>> tests
>> > > there
>> > > > are indeed quite minimal,
>> > > > more of a smoke test, and seem to be targeted to check the shaded
>> > > artifacts.
>> > > >
>> > > > - Nick's suggestion is to run DevTests and set the Hadoop version.
>> > > > The runAllTests step in the current nightly takes 8+ hours.
>> > > > On my 6+8 core laptop, my last attempt failed after 90 minutes, so
>> > let's
>> > > > say the full run takes 120 minutes.
>> > > >
>> > > > I don't know how many free resources HBase has, but if we can
>> utilize
>> > > > another VM per branch we could run the dev tests with four HBase
>> > > versions,
>> > > > and still finish about the same time as the full test job does.
>> > > >
>> > > > We don't need to test with the default version, as we already run
>> the
>> > > full
>> > > > suite for that one.
>> > > >
>> > > > Assuming that we officially support 3.4.0 on all active branches,
>> and
>> > > also
>> > > > default to 3.4.0 on all branches, and trusting Hadoop's
>> compatibility
>> > so
>> > > > that we don't need to test interim
>> > > > patch releases within a minor version, we could go with these
>> versions:
>> > > >
>> > > > branch-2.5 : 3.2.3, 3.2.4, 3.3.2, 3.3.6
>> > > > branch-2.6 : 3.3.5, 3.3.6
>> > > > branch-3: 3.3.5, 3.3.6
>> > > > branch-4: 3.3.5, 3.3.6
>> > > >
>> > > > If we trust Hadoop not to break compatibility in patch releases,  we
>> > > could
>> > > > reduce this to only the oldest patch releases:

[jira] [Resolved] (HBASE-28886) Web site misses Reference guide for 2.5 and 2.6

2024-09-27 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-28886.
-
Resolution: Not A Problem

> Web site misses Reference guide for 2.5 and 2.6
> ---
>
> Key: HBASE-28886
> URL: https://issues.apache.org/jira/browse/HBASE-28886
> Project: HBase
>  Issue Type: Bug
>  Components: documentation
>    Reporter: Istvan Toth
>Assignee: Dávid Paksy
>Priority: Major
>
> 2.3 and 2.4 does have the reference guide links in the menu, but 2.5 and 2.6 
> does not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28889) Ignored flags in Yetus nightly runs

2024-09-27 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28889:
---

 Summary: Ignored flags in Yetus nightly runs
 Key: HBASE-28889
 URL: https://issues.apache.org/jira/browse/HBASE-28889
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: Istvan Toth


{noformat}
22:08:23  |  -0  | yetus  |   0m  6s   | Unprocessed flag(s): 
22:08:23  |  ||| --blanks-tabs-ignore-file
22:08:23  |  ||| --blanks-eol-ignore-file
22:08:23  |  ||| --author-ignore-list
{noformat}

The flags seem to be still present in the Yetus source code.
Maybe the modules are no longer getting activated ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28886) Web site misses Reference guide for 2.5 and 2.6

2024-09-26 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28886:
---

 Summary: Web site misses Reference guide for 2.5 and 2.6
 Key: HBASE-28886
 URL: https://issues.apache.org/jira/browse/HBASE-28886
 Project: HBase
  Issue Type: Bug
  Components: documentation
Reporter: Istvan Toth


2.3 and 2.4 does have the reference guide links in the menu, but 2.5 and 2.6 
does not.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Reverting hbase rest protobuf package name

2024-09-25 Thread Istvan Toth
Hi Nick,

Thanks for looking into this.

Actually, the branch-2 changes have already been committed.
The REST interface doesn't use any Protobuf 2 specific features, so there
shouldn't be any wire incompatibilities.

I forgot about the branch-3 changes, but I will try to get a PR up in the
next few weeks.
(After I'm done with the Hadoop packaging changes)

Istvan


On Wed, Sep 25, 2024 at 2:33 PM Nick Dimiduk  wrote:

> Hi Istvan,
>
> I think that your assessment is correct and your proposal makes sense to
> me. Let's keep it clear that this is a public interface.
>
> For the branch-2 changes, I'm less sure. Well, I agree in principle, but
> I'm not sure if this can be achieved without a breaking change. I'm happy
> to be proven wrong.
>
> Good on you!
>
> Thanks,
> Nick
>
> On Fri, Jul 12, 2024 at 12:42 PM Istvan Toth 
> wrote:
>
> > HBASE-23975 was the original ticket.
> >
> > My guess is that since hbase-shaded-protocol was already set up to do the
> > compiling and shading, moving it there was the easiest solution.
> > I guess that the same logic was behind the rename: since every other
> class
> > there uses the .shaded. package, change the REST messages the same way.
> >
> > regards
> > Istvan
> >
> >
> >
> >
> > On Fri, Jul 12, 2024 at 9:48 AM 张铎(Duo Zhang) 
> > wrote:
> >
> > > In which jira we did this moving? Are there any reasons why we did
> > > this in the past?
> > >
> > > Istvan Toth  于2024年7月12日周五 03:57写道:
> > > >
> > > > Hi!
> > > >
> > > > While working on HBASE-28725, I realized that in HBase 3+ the REST
> > > protobuf
> > > > definition files have been moved to hbase-shaded-protobuf, and the
> > > package
> > > > name has also been renamed.
> > > >
> > > > While I fully agree with the move to using the thirdparty protobuf
> > > library
> > > > (in fact I'd like to backport that change to 2.x), I think that
> moving
> > > the
> > > > .proto files and renaming the package was not a good idea.
> > > >
> > > > The REST interface does not use the HBase patched features of the
> > > protobuf
> > > > library, and if we want to maintain any pretense that the REST
> protobuf
> > > > encoding is usable by non-java code, then we should not use it in the
> > > > future either.
> > > >
> > > > (If we ever decide to use the patched features for performance
> reasons,
> > > we
> > > > will need to define new protobuf messages for that anyway)
> > > >
> > > > Protobuf does not use the package name on the wire, so wire
> > compatibility
> > > > is not an issue.
> > > >
> > > > In the unlikely case that someone has implemented an independent REST
> > > > client that uses protobuf encoding, this will also ensure
> compatibility
> > > > with the 3.0+ .protoc definitions.
> > > >
> > > > My proposal is:
> > > >
> > > > HBASE-28726 <https://issues.apache.org/jira/browse/HBASE-28726>
> Revert
> > > REST
> > > > protobuf package to org.apache.hadoop.hbase.shaded.rest
> > > > *This applies only to branch-3+:*
> > > > 1. Move the REST .proto files and compiling back to the hbase-rest
> > module
> > > > (but use the same protoc compiler that we use now)
> > > > 2. Revert the package name of the protobuf messages to the original
> > > > 3. No other changes, we still use the thirdparty protobuf library.
> > > >
> > > > The other issue is that on HBase 2.x the REST client still requires
> > > > unshaded protobuf 2.5.0 which brings back all the protobuf library
> > > > conflicts that were fixed in 3.0 and by hbase-shaded-client. To fix
> > this,
> > > > my proposal is:
> > > >
> > > > HBASE-28725 <https://issues.apache.org/jira/browse/HBASE-28725> Use
> > > > thirdparty protobuf for REST interface in HBase 2.x
> > > > *This applies only to branch-2.x:*
> > > > 1. Backport the code changes that use the thirdparty protobuf library
> > for
> > > > REST to branch-2.x
> > > >
> > > > With these two changes, the REST code would be almost identical on
> > every
> > > > branch, easing maintenance.
> > > >
> > > > What do you think ?
> > > >
> > > > Istvan
> > >
> >
> >
> > --
> > *István Tóth* | Sr. Staff Software Engineer
> > *Email*: st...@cloudera.com
> > cloudera.com <https://www.cloudera.com>
> > [image: Cloudera] <https://www.cloudera.com/>
> > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> Cloudera
> > on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > --
> > --
> >
>


-- 
*István Tóth* | Sr. Staff Software Engineer
*Email*: st...@cloudera.com
cloudera.com <https://www.cloudera.com>
[image: Cloudera] <https://www.cloudera.com/>
[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
--
--


[jira] [Resolved] (HBASE-25140) HBase test mini cluster is working only with Hadoop 2.8.0 - 3.0.3

2024-09-20 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-25140.
-
Resolution: Cannot Reproduce

This is known to work to on all Hadoop 3 versions up to 3.3.6. now.

3.4.0 does not work, but that's a different issue tracked elsewhere.

> HBase test mini cluster is working only with Hadoop 2.8.0 - 3.0.3
> -
>
> Key: HBASE-25140
> URL: https://issues.apache.org/jira/browse/HBASE-25140
> Project: HBase
>  Issue Type: Bug
>  Components: documentation, hadoop2, test
>Affects Versions: 2.2.3
>Reporter: Miklos Gergely
>Priority: Major
> Attachments: mvn-1.log
>
>
> Running HBaseTestingUtility.startMiniCluster() on HBase 2.2.3 works only with 
> hadoop version range 2.8.0 - 3.0.3, for example with 2.4.1 the following 
> exception occurs:
>  
> {code:java}
> 21:49:04,124 [RS:0;71af2d647bb3:35715] ERROR 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper [] - 
> Couldn't properly initialize access to HDFS internals. Please update your WAL 
> Provider to not make use of the 'asyncfs' provider. See HBASE-16110 for more 
> information.21:49:04,124 [RS:0;71af2d647bb3:35715] ERROR 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper [] - 
> Couldn't properly initialize access to HDFS internals. Please update your WAL 
> Provider to not make use of the 'asyncfs' provider. See HBASE-16110 for more 
> information.java.lang.NoSuchMethodException: 
> org.apache.hadoop.hdfs.DFSClient.beginFileLease(long, 
> org.apache.hadoop.hdfs.DFSOutputStream) at 
> java.lang.Class.getDeclaredMethod(Class.java:2130) ~[?:1.8.0_242] at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.createLeaseManager(FanOutOneBlockAsyncDFSOutputHelper.java:198)
>  ~[hbase-server-2.2.3.jar:2.2.3] at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.(FanOutOneBlockAsyncDFSOutputHelper.java:274)
>  [hbase-server-2.2.3.jar:2.2.3] at java.lang.Class.forName0(Native Method) 
> ~[?:1.8.0_242] at java.lang.Class.forName(Class.java:264) [?:1.8.0_242] at 
> org.apache.hadoop.hbase.wal.AsyncFSWALProvider.load(AsyncFSWALProvider.java:136)
>  [hbase-server-2.2.3.jar:2.2.3] at 
> org.apache.hadoop.hbase.wal.WALFactory.getProviderClass(WALFactory.java:136) 
> [hbase-server-2.2.3.jar:2.2.3] at 
> org.apache.hadoop.hbase.wal.WALFactory.getProvider(WALFactory.java:175) 
> [hbase-server-2.2.3.jar:2.2.3] at 
> org.apache.hadoop.hbase.wal.WALFactory.(WALFactory.java:198) 
> [hbase-server-2.2.3.jar:2.2.3] at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1871)
>  [hbase-server-2.2.3.jar:2.2.3] at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1589)
>  [hbase-server-2.2.3.jar:2.2.3] at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.handleReportForDutyResponse(MiniHBaseCluster.java:157)
>  [hbase-server-2.2.3-tests.jar:2.2.3] at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1001)
>  [hbase-server-2.2.3.jar:2.2.3] at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:184)
>  [hbase-server-2.2.3-tests.jar:2.2.3] at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:130)
>  [hbase-server-2.2.3-tests.jar:2.2.3] at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:168)
>  [hbase-server-2.2.3-tests.jar:2.2.3] at 
> java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_242] at 
> javax.security.auth.Subject.doAs(Subject.java:360) [?:1.8.0_242] at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1536)
>  [hadoop-common-2.4.1.jar:?] at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:341) 
> [hbase-common-2.2.3.jar:2.2.3] at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:165)
>  [hbase-server-2.2.3-tests.jar:2.2.3] at 
> java.lang.Thread.run(Thread.java:748) [?:1.8.0_242]
> {code}
> Also upon failure during maven run it would be great if the actual exception 
> would be displayed, not just that "Master not initialized after 20ms".
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Using Hadoop 3.3.6 and 3.4.0 as default versions

2024-09-19 Thread Istvan Toth
Master is still going to  support 3.3.5, so master still needs to use
reflection as well.

The point of these changes is allowing Hbase to default to a newer Hadoop
version, without dropping support for older releases.

Dropping support for 3.3.5 would be a different discussion, and I
personally feel that it would be too early.

Istvan

On Wed, Sep 18, 2024 at 7:46 PM Wei-Chiu Chuang
 wrote:

> SoI was wondering now that we'll be using Hadoop 3.4.0, if it's okay to
> port HBASE-27769 <https://issues.apache.org/jira/browse/HBASE-27769> to
> the
> master branch.
> This will allow Ozone to be used by HBase. We are preparing Apache Ozone
> 2.0 release and having a usable Apache HBase to work with is important. It
> is working now with Cloudera's HBase work but I'd like to open up this
> opportunity to the community as well.
>
> We can start with master, and then I can find a solution (something that
> involves the use of reflection) and backport to lower branches. Ultimately,
> release a version of HBase with this feature.
>
> cc: Stephen.
>
> On Wed, Sep 18, 2024 at 12:08 AM Istvan Toth 
> wrote:
>
> > Created a new ticket, as the old one was for 3.3.6 but we've agreed on
> > 3.4.0, and expanded the scope.
> >
> > https://issues.apache.org/jira/browse/HBASE-28846
> >
> > On Tue, Sep 17, 2024 at 3:47 PM 张铎(Duo Zhang) 
> > wrote:
> >
> > > I think we could start the work from branch-2.6? Branch-2.5 should
> > > reach its EOL soon, after 2.6.x is stable enough.
> > >
> > > In this way we only need to deal with 3.3.x and 3.4.x.
> > >
> > > Istvan Toth  于2024年9月17日周二 16:56写道:
> > > >
> > > > Thanks for the assessment, Wei-Chiu.
> > > >
> > > > Transitive dependency updates in Hadoop are normal (desired even),
> > that's
> > > > something that HBase needs to manage.
> > > >
> > > > As for the test:
> > > >
> > > > - Duo's suggestion is to extend the Hadoop compatibility tests, and
> run
> > > > them with multiple Hadoop 3 releases.
> > > > Looking at the nightly results, those tests are fast, it takes 14
> > minutes
> > > > for Hadoop2 and Hadoop3.
> > > > I've peeked into hbase_nightly_pseudo-distributed-test.sh , the tests
> > > there
> > > > are indeed quite minimal,
> > > > more of a smoke test, and seem to be targeted to check the shaded
> > > artifacts.
> > > >
> > > > - Nick's suggestion is to run DevTests and set the Hadoop version.
> > > > The runAllTests step in the current nightly takes 8+ hours.
> > > > On my 6+8 core laptop, my last attempt failed after 90 minutes, so
> > let's
> > > > say the full run takes 120 minutes.
> > > >
> > > > I don't know how many free resources HBase has, but if we can utilize
> > > > another VM per branch we could run the dev tests with four HBase
> > > versions,
> > > > and still finish about the same time as the full test job does.
> > > >
> > > > We don't need to test with the default version, as we already run the
> > > full
> > > > suite for that one.
> > > >
> > > > Assuming that we officially support 3.4.0 on all active branches, and
> > > also
> > > > default to 3.4.0 on all branches, and trusting Hadoop's compatibility
> > so
> > > > that we don't need to test interim
> > > > patch releases within a minor version, we could go with these
> versions:
> > > >
> > > > branch-2.5 : 3.2.3, 3.2.4, 3.3.2, 3.3.6
> > > > branch-2.6 : 3.3.5, 3.3.6
> > > > branch-3: 3.3.5, 3.3.6
> > > > branch-4: 3.3.5, 3.3.6
> > > >
> > > > If we trust Hadoop not to break compatibility in patch releases,  we
> > > could
> > > > reduce this to only the oldest patch releases:
> > > >
> > > > branch-2.5 : 3.2.3,  3.3.2
> > > > branch-2.6 : 3.3.5
> > > > branch-3: 3.3.5
> > > > branch-4: 3.3.5
> > > >
> > > > or if we trust it not break compatibility in specific minor versions,
> > we
> > > > could further reduce it to just the oldest supported release:
> > > >
> > > > branch-2.5 : 3.2.3
> > > > branch-2.6 : 3.3.5
> > > > branch-3: 3.3.5
> > > > branch-4: 3.3.5
> > > >
> > > > Of course running every devTest is an overkill, 

Re: [DISCUSS] Using Hadoop 3.3.6 and 3.4.0 as default versions

2024-09-18 Thread Istvan Toth
Created a new ticket, as the old one was for 3.3.6 but we've agreed on
3.4.0, and expanded the scope.

https://issues.apache.org/jira/browse/HBASE-28846

On Tue, Sep 17, 2024 at 3:47 PM 张铎(Duo Zhang)  wrote:

> I think we could start the work from branch-2.6? Branch-2.5 should
> reach its EOL soon, after 2.6.x is stable enough.
>
> In this way we only need to deal with 3.3.x and 3.4.x.
>
> Istvan Toth  于2024年9月17日周二 16:56写道:
> >
> > Thanks for the assessment, Wei-Chiu.
> >
> > Transitive dependency updates in Hadoop are normal (desired even), that's
> > something that HBase needs to manage.
> >
> > As for the test:
> >
> > - Duo's suggestion is to extend the Hadoop compatibility tests, and run
> > them with multiple Hadoop 3 releases.
> > Looking at the nightly results, those tests are fast, it takes 14 minutes
> > for Hadoop2 and Hadoop3.
> > I've peeked into hbase_nightly_pseudo-distributed-test.sh , the tests
> there
> > are indeed quite minimal,
> > more of a smoke test, and seem to be targeted to check the shaded
> artifacts.
> >
> > - Nick's suggestion is to run DevTests and set the Hadoop version.
> > The runAllTests step in the current nightly takes 8+ hours.
> > On my 6+8 core laptop, my last attempt failed after 90 minutes, so let's
> > say the full run takes 120 minutes.
> >
> > I don't know how many free resources HBase has, but if we can utilize
> > another VM per branch we could run the dev tests with four HBase
> versions,
> > and still finish about the same time as the full test job does.
> >
> > We don't need to test with the default version, as we already run the
> full
> > suite for that one.
> >
> > Assuming that we officially support 3.4.0 on all active branches, and
> also
> > default to 3.4.0 on all branches, and trusting Hadoop's compatibility so
> > that we don't need to test interim
> > patch releases within a minor version, we could go with these versions:
> >
> > branch-2.5 : 3.2.3, 3.2.4, 3.3.2, 3.3.6
> > branch-2.6 : 3.3.5, 3.3.6
> > branch-3: 3.3.5, 3.3.6
> > branch-4: 3.3.5, 3.3.6
> >
> > If we trust Hadoop not to break compatibility in patch releases,  we
> could
> > reduce this to only the oldest patch releases:
> >
> > branch-2.5 : 3.2.3,  3.3.2
> > branch-2.6 : 3.3.5
> > branch-3: 3.3.5
> > branch-4: 3.3.5
> >
> > or if we trust it not break compatibility in specific minor versions, we
> > could further reduce it to just the oldest supported release:
> >
> > branch-2.5 : 3.2.3
> > branch-2.6 : 3.3.5
> > branch-3: 3.3.5
> > branch-4: 3.3.5
> >
> > Of course running every devTest is an overkill, as the vast majority of
> the
> > tests use the same set of Hadoop APIs and features, and we'd only really
> > need to run the tests that cover that feature set.
> > Figuring out a subset of tests that exercise the full Hadoop API (that we
> > use) is a hard and error prone task, so if we have the resources, we can
> > just brute force it with devTests.
> >
> > As a base for further discussion:
> >
> > Let's take the first (first and last supported patch level for each minor
> > release) set of versions,
> > and run both the pseudistributed tests and the devTests on them.
> >
> > Does that sound good ? Do we have the resources for that ? Do we have a
> > better idea ?
> >
> > Istvan
> >
> > On Mon, Sep 16, 2024 at 7:20 PM Wei-Chiu Chuang 
> wrote:
> >
> > > I strive to meet that stated compatibility goal when I release Hadoop.
> > > But we don't have a rigorous compatibility/upgrade test in Hadoop so
> YMMV
> > > (we now have in Ozone!)
> > >
> > > There are so many gotchas that it really depends on the RM to do the
> > > hardwork, checking protobuf definitions, running API compat report,
> > > compiling against downstream applications.
> > > The other thing is thirdparty dependency update. Whenever I bump Netty
> or
> > > Jetty version, new transitive dependencies slip in as part of the
> update,
> > > which sometimes break HBase because of the dependency check in shading.
> > >
> > > On Mon, Sep 16, 2024 at 4:48 AM Istvan Toth  >
> > > wrote:
> > >
> > > > On Wed, Sep 11, 2024 at 4:30 PM 张铎(Duo Zhang)  >
> > > > wrote:
> > > >
> > > > > There is a problem that, usually, you can use an old hadoop client
> to
> > > > > communica

[jira] [Created] (HBASE-28846) Change the default Hadoop3 version to 3.4.0, and add tests to make sure HBase works with earlier supported Hadoop version

2024-09-18 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28846:
---

 Summary: Change the default Hadoop3 version to 3.4.0, and add 
tests to make sure HBase works with earlier supported Hadoop version
 Key: HBASE-28846
 URL: https://issues.apache.org/jira/browse/HBASE-28846
 Project: HBase
  Issue Type: Improvement
  Components: hadoop3, test
Affects Versions: 3.0.0-beta-1, 2.6.0, 4.0.0-alpha-1, 2.7.0
Reporter: Istvan Toth
Assignee: Istvan Toth






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] For 3.0.0-beta-2 and the final 3.0.0 release

2024-09-17 Thread Istvan Toth
I'd like to make sure that HBASE-28431
 makes it into 3.0.

Nihal has PR up for https://github.com/apache/hbase/pull/6258 for the
diagnostics code refactor (would be helpful if you could take a look)
and I have an older WIP PR for the Hadoop-less assembly.
I think we can wrap that project up in the next 2-3 weeks.

On Tue, Sep 17, 2024 at 4:41 PM Duo Zhang  wrote:

> I've done several rounds of API cleanup in HBASE-24888 and now for me
> I think it is enough for making the new major release.
>
> I've filed HBASE-28844 for aligning the commit history and jira
> issues, it will be a huge project for a major release and may take
> several weeks.
>
> After that, I think we are good to go. And I will also try to find
> some resources to run several rounds of ITBLL and also some upgrading
> tests. Finally, after everything is OK, I will release 3.0.0-beta-2,
> and then cut branch-3.0 from the 3.0.0-beta-2 tag and make the 3.0.0
> release soon.
>
> Hope I can get all these things done within the year 2024.
>
> Thanks.
>


-- 
*István Tóth* | Sr. Staff Software Engineer
*Email*: st...@cloudera.com
cloudera.com 
[image: Cloudera] 
[image: Cloudera on Twitter]  [image:
Cloudera on Facebook]  [image: Cloudera
on LinkedIn] 
--
--


Re: [DISCUSS] Using Hadoop 3.3.6 and 3.4.0 as default versions

2024-09-17 Thread Istvan Toth
Thanks for the assessment, Wei-Chiu.

Transitive dependency updates in Hadoop are normal (desired even), that's
something that HBase needs to manage.

As for the test:

- Duo's suggestion is to extend the Hadoop compatibility tests, and run
them with multiple Hadoop 3 releases.
Looking at the nightly results, those tests are fast, it takes 14 minutes
for Hadoop2 and Hadoop3.
I've peeked into hbase_nightly_pseudo-distributed-test.sh , the tests there
are indeed quite minimal,
more of a smoke test, and seem to be targeted to check the shaded artifacts.

- Nick's suggestion is to run DevTests and set the Hadoop version.
The runAllTests step in the current nightly takes 8+ hours.
On my 6+8 core laptop, my last attempt failed after 90 minutes, so let's
say the full run takes 120 minutes.

I don't know how many free resources HBase has, but if we can utilize
another VM per branch we could run the dev tests with four HBase versions,
and still finish about the same time as the full test job does.

We don't need to test with the default version, as we already run the full
suite for that one.

Assuming that we officially support 3.4.0 on all active branches, and also
default to 3.4.0 on all branches, and trusting Hadoop's compatibility so
that we don't need to test interim
patch releases within a minor version, we could go with these versions:

branch-2.5 : 3.2.3, 3.2.4, 3.3.2, 3.3.6
branch-2.6 : 3.3.5, 3.3.6
branch-3: 3.3.5, 3.3.6
branch-4: 3.3.5, 3.3.6

If we trust Hadoop not to break compatibility in patch releases,  we could
reduce this to only the oldest patch releases:

branch-2.5 : 3.2.3,  3.3.2
branch-2.6 : 3.3.5
branch-3: 3.3.5
branch-4: 3.3.5

or if we trust it not break compatibility in specific minor versions, we
could further reduce it to just the oldest supported release:

branch-2.5 : 3.2.3
branch-2.6 : 3.3.5
branch-3: 3.3.5
branch-4: 3.3.5

Of course running every devTest is an overkill, as the vast majority of the
tests use the same set of Hadoop APIs and features, and we'd only really
need to run the tests that cover that feature set.
Figuring out a subset of tests that exercise the full Hadoop API (that we
use) is a hard and error prone task, so if we have the resources, we can
just brute force it with devTests.

As a base for further discussion:

Let's take the first (first and last supported patch level for each minor
release) set of versions,
and run both the pseudistributed tests and the devTests on them.

Does that sound good ? Do we have the resources for that ? Do we have a
better idea ?

Istvan

On Mon, Sep 16, 2024 at 7:20 PM Wei-Chiu Chuang  wrote:

> I strive to meet that stated compatibility goal when I release Hadoop.
> But we don't have a rigorous compatibility/upgrade test in Hadoop so YMMV
> (we now have in Ozone!)
>
> There are so many gotchas that it really depends on the RM to do the
> hardwork, checking protobuf definitions, running API compat report,
> compiling against downstream applications.
> The other thing is thirdparty dependency update. Whenever I bump Netty or
> Jetty version, new transitive dependencies slip in as part of the update,
> which sometimes break HBase because of the dependency check in shading.
>
> On Mon, Sep 16, 2024 at 4:48 AM Istvan Toth 
> wrote:
>
> > On Wed, Sep 11, 2024 at 4:30 PM 张铎(Duo Zhang) 
> > wrote:
> >
> > > There is a problem that, usually, you can use an old hadoop client to
> > > communicate with a new hadoop server, but not vice versa.
> > >
> >
> > Do we have examples of that ?
> >
> >
> >
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html
> > specifically states otherwise:
> >
> > In addition to the limitations imposed by being Stable
> > <
> >
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/InterfaceClassification.html#Stable
> > >,
> > Hadoop’s wire protocols MUST also be forward compatible across minor
> > releases within a major version according to the following:
> >
> >- Client-Server compatibility MUST be maintained so as to allow users
> to
> >continue using older clients even after upgrading the server (cluster)
> > to a
> >later version (or vice versa). For example, a Hadoop 2.1.0 client
> > talking
> >to a Hadoop 2.3.0 cluster.
> >- Client-Server compatibility MUST be maintained so as to allow users
> to
> >upgrade the client before upgrading the server (cluster). For
> example, a
> >Hadoop 2.4.0 client talking to a Hadoop 2.3.0 cluster. This allows
> >deployment of client-side bug fixes ahead of full cluster upgrades.
> Note
> >that new cluster features invoked by new client APIs or shell commands

Re: [DISCUSS] Using Hadoop 3.3.6 and 3.4.0 as default versions

2024-09-16 Thread Istvan Toth
OK, I'm gonna look into downloading multiple Hadoop 3 versions, and running
those tests with each one.





On Mon, Sep 16, 2024 at 3:08 PM 张铎(Duo Zhang)  wrote:

> And if we can make sure the compatibility, I agree that we could
> depend on the newest possible hadoop version by default. As you said,
> it can reduce most transitive security issues.
>
> There are still 3 security issues on master branch because of netty 3,
> which should be fixed in 3.4.0.
>
> 张铎(Duo Zhang)  于2024年9月16日周一 21:03写道:
> >
> > There is a devTests profile in our pom, we can make use of it first.
> >
> > And on integration tests, I mean this one
> >
> >
> https://github.com/apache/hbase/blob/4446d297112899dab59c0952489457c4419366d3/dev-support/Jenkinsfile#L755
> >
> > We could extend this test to test different combinations.
> >
> > Istvan Toth  于2024年9月16日周一 19:48写道:
> > >
> > > On Wed, Sep 11, 2024 at 4:30 PM 张铎(Duo Zhang) 
> wrote:
> > >
> > > > There is a problem that, usually, you can use an old hadoop client to
> > > > communicate with a new hadoop server, but not vice versa.
> > > >
> > >
> > > Do we have examples of that ?
> > >
> > >
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html
> > > specifically states otherwise:
> > >
> > > In addition to the limitations imposed by being Stable
> > > <
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/InterfaceClassification.html#Stable
> >,
> > > Hadoop’s wire protocols MUST also be forward compatible across minor
> > > releases within a major version according to the following:
> > >
> > >- Client-Server compatibility MUST be maintained so as to allow
> users to
> > >continue using older clients even after upgrading the server
> (cluster) to a
> > >later version (or vice versa). For example, a Hadoop 2.1.0 client
> talking
> > >to a Hadoop 2.3.0 cluster.
> > >- Client-Server compatibility MUST be maintained so as to allow
> users to
> > >upgrade the client before upgrading the server (cluster). For
> example, a
> > >Hadoop 2.4.0 client talking to a Hadoop 2.3.0 cluster. This allows
> > >deployment of client-side bug fixes ahead of full cluster upgrades.
> Note
> > >that new cluster features invoked by new client APIs or shell
> commands will
> > >not be usable. YARN applications that attempt to use new APIs
> (including
> > >new fields in data structures) that have not yet been deployed to
> the
> > >cluster can expect link exceptions.
> > >- Client-Server compatibility MUST be maintained so as to allow
> > >upgrading individual components without upgrading others. For
> example,
> > >upgrade HDFS from version 2.1.0 to 2.2.0 without upgrading
> MapReduce.
> > >- Server-Server compatibility MUST be maintained so as to allow
> mixed
> > >versions within an active cluster so the cluster may be upgraded
> without
> > >downtime in a rolling fashion.
> > >
> > > Admittedly, I don't have a lot of experience with mismatched Hadoop
> > > versions, but my proposal should be covered by the second clause.
> > >
> > > Usage of newer APIs should be caught when compiling with older Hadoop
> > > versions.
> > > The only risk I can see is when we use a new feature which was added
> > > without changing the API signature (such as adding a new constant
> value for
> > > some new behaviour)
> > >
> > >
> > > > When deploying HBase, HBase itself acts as a client of hadoop, that's
> > > > why we always stay on the oldest support hadoop version.
> > > >
> > > >
> > > Not true for 2.6 , which according to the docs supports Hadoop 3.2, but
> > > defaults to Hadoop 3.3
> > >
> > >
> > > > For me, technically I think bumping to the newest patch release of a
> > > > minor release should be fine, which is the proposal 1.
> > > >
> > > > But the current hadoopcheck is not enough, since it can only ensure
> > > > that there is no complation error.
> > > > Maybe we should also run some simple dev tests in the hadoopcheck
> > > > stage, and in integration tests, we should try to build with all the
> > > > support hadoop version and run the basic read write tests.
> > >
> > >
> > > Do 

Re: [DISCUSS] Using Hadoop 3.3.6 and 3.4.0 as default versions

2024-09-16 Thread Istvan Toth
On Wed, Sep 11, 2024 at 4:30 PM 张铎(Duo Zhang)  wrote:

> There is a problem that, usually, you can use an old hadoop client to
> communicate with a new hadoop server, but not vice versa.
>

Do we have examples of that ?

https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html
specifically states otherwise:

In addition to the limitations imposed by being Stable
<https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/InterfaceClassification.html#Stable>,
Hadoop’s wire protocols MUST also be forward compatible across minor
releases within a major version according to the following:

   - Client-Server compatibility MUST be maintained so as to allow users to
   continue using older clients even after upgrading the server (cluster) to a
   later version (or vice versa). For example, a Hadoop 2.1.0 client talking
   to a Hadoop 2.3.0 cluster.
   - Client-Server compatibility MUST be maintained so as to allow users to
   upgrade the client before upgrading the server (cluster). For example, a
   Hadoop 2.4.0 client talking to a Hadoop 2.3.0 cluster. This allows
   deployment of client-side bug fixes ahead of full cluster upgrades. Note
   that new cluster features invoked by new client APIs or shell commands will
   not be usable. YARN applications that attempt to use new APIs (including
   new fields in data structures) that have not yet been deployed to the
   cluster can expect link exceptions.
   - Client-Server compatibility MUST be maintained so as to allow
   upgrading individual components without upgrading others. For example,
   upgrade HDFS from version 2.1.0 to 2.2.0 without upgrading MapReduce.
   - Server-Server compatibility MUST be maintained so as to allow mixed
   versions within an active cluster so the cluster may be upgraded without
   downtime in a rolling fashion.

Admittedly, I don't have a lot of experience with mismatched Hadoop
versions, but my proposal should be covered by the second clause.

Usage of newer APIs should be caught when compiling with older Hadoop
versions.
The only risk I can see is when we use a new feature which was added
without changing the API signature (such as adding a new constant value for
some new behaviour)


> When deploying HBase, HBase itself acts as a client of hadoop, that's
> why we always stay on the oldest support hadoop version.
>
>
Not true for 2.6 , which according to the docs supports Hadoop 3.2, but
defaults to Hadoop 3.3


> For me, technically I think bumping to the newest patch release of a
> minor release should be fine, which is the proposal 1.
>
> But the current hadoopcheck is not enough, since it can only ensure
> that there is no complation error.
> Maybe we should also run some simple dev tests in the hadoopcheck
> stage, and in integration tests, we should try to build with all the
> support hadoop version and run the basic read write tests.


Do we need to test all versions ?
If We test with say, 3.3.0 and 3.3.6 , do we need to test with 3.3.[1-5] ?
Or if we test with 3.2.5  and 3.3.6, do we need to test with any of the
interim versions ?

Basically, how much do we trust Hadoop to keep to its compatibility rules ?

Running a limited number of tests should not be a problem.
Should we add a new test category, so that they can be easily started from
Maven ?

Can you suggest some tests that we should run for the compatibility check ?


> Thanks.
>
> Istvan Toth  于2024年9月11日周三 21:05写道:
> >
> > Let me summarize my take of the discussion so far:
> >
> > There are two aspects to the HBase version we build with:
> > 1. Source code quality/compatibility
> > 2. Security and usability of the public binary assemblies and (shaded)
> > hbase maven artifacts.
> >
> > 1. Source code quality/compatibility
> >
> > AFAICT we have the following hard goals:
> > 1.a : Ensure that HBase compiles and runs well with the earlier supported
> > Hadoop version on the given branch
> > 1.b: Ensure that HBase compiles and runs well with the latest supported
> > Hadoop version on the given branch
> >
> > In my opinion we should also strive for these goals:
> > 1.c: Aim to officially support the newest possible Hadoop releases
> > 1.d: Take advantage  of new features in newer Hadoop versions
> >
> > 2. Public binary usability wish list:
> >
> > 2.a: We want them to work OOB for as many use cases as possible
> > 2.b: We want to work them as well as possible
> > 2.c: We want to have as few CVEs in them as possible
> > 2.d: We want to make upgrades as painless as possible, especially for
> patch
> > releases
> >
> > The factor that Hadoop does not have an explicit end-of-life policy of
> > course complicates things.
> >
> > Our current policy seems to be

[jira] [Resolved] (HBASE-26511) hbase shell config override option does not work

2024-09-16 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-26511.
-
Resolution: Fixed

Since branch-2.4 is EOL, and the fix is present on all active branches, we can 
close this.

> hbase shell config override option does not work
> 
>
> Key: HBASE-26511
> URL: https://issues.apache.org/jira/browse/HBASE-26511
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Affects Versions: 3.0.0-alpha-2
>    Reporter: Istvan Toth
>Assignee: Tushar Raina
>Priority: Major
>
> According to the docs, we should be able to specify properties on the command 
> line with the -D switch.
> https://hbase.apache.org/book.html#_overriding_configuration_starting_the_hbase_shell
> However:
> {noformat}
> ./bin/hbase shell 
> -Dhbase.zookeeper.quorum=ZK0.remote.cluster.example.org,ZK1.remote.cluster.example.org,ZK2.remote.cluster.example.org
>  -Draining=false
> Usage: shell [OPTIONS] [SCRIPTFILE [ARGUMENTS]]
>  -d | --debugSet DEBUG log levels.
>  -h | --help This help.
>  -n | --noninteractive   Do not run within an IRB session and exit with 
> non-zero
>  status on first error.
>  --top-level-defsCompatibility flag to export HBase shell commands 
> onto
>  Ruby's main object
>  -Dkey=value Pass hbase-*.xml Configuration overrides. For 
> example, to
>  use an alternate zookeeper ensemble, pass:
>-Dhbase.zookeeper.quorum=zookeeper.example.org
>  For faster fail, pass the below and vary the values:
>-Dhbase.client.retries.number=7
>-Dhbase.ipc.client.connect.max.retries=3
> classpath:/jar-bootstrap.rb: invalid option -- b
> GetoptLong::InvalidOption: invalid option -- b
>   set_error at 
> uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/getoptlong.rb:395
> get at 
> uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/getoptlong.rb:572
>each at 
> uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/getoptlong.rb:603
>loop at org/jruby/RubyKernel.java:1442
>each at 
> uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/getoptlong.rb:602
>   at classpath:/jar-bootstrap.rb:98
> {noformat}
> Something is broken in the command line parsing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-26935) Update httpcomponents to version 5.1

2024-09-15 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-26935.
-
  Assignee: Istvan Toth
Resolution: Duplicate

I opened this twice.

> Update httpcomponents to version 5.1
> 
>
> Key: HBASE-26935
> URL: https://issues.apache.org/jira/browse/HBASE-26935
> Project: HBase
>  Issue Type: Improvement
>    Reporter: Istvan Toth
>    Assignee: Istvan Toth
>Priority: Major
>
> HTTPComponents 5 is major rewrite.
> One of the main improvements is that it uses slf4j for logging, instead of 
> log4j.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Using Hadoop 3.3.6 and 3.4.0 as default versions

2024-09-11 Thread Istvan Toth
Let me summarize my take of the discussion so far:

There are two aspects to the HBase version we build with:
1. Source code quality/compatibility
2. Security and usability of the public binary assemblies and (shaded)
hbase maven artifacts.

1. Source code quality/compatibility

AFAICT we have the following hard goals:
1.a : Ensure that HBase compiles and runs well with the earlier supported
Hadoop version on the given branch
1.b: Ensure that HBase compiles and runs well with the latest supported
Hadoop version on the given branch

In my opinion we should also strive for these goals:
1.c: Aim to officially support the newest possible Hadoop releases
1.d: Take advantage  of new features in newer Hadoop versions

2. Public binary usability wish list:

2.a: We want them to work OOB for as many use cases as possible
2.b: We want to work them as well as possible
2.c: We want to have as few CVEs in them as possible
2.d: We want to make upgrades as painless as possible, especially for patch
releases

The factor that Hadoop does not have an explicit end-of-life policy of
course complicates things.

Our current policy seems to be that we pick a Hadoop version to build with
when releasing a minor version,
and stay on that version until there is a newer patch released of that
minor version with direct CVE fixes.
This does not seem to be an absolute, for example the recently released
HBase 2.4.18 still defaults to Hadoop 3.1.2,
which has several old CVEs, many of which are reportedly fixed in 3.1.3 and
3.1.4.

my proposals are :

Proposal 1:

Whenever a new Hadoop patch release is released for a minor version, then
unless it breaks source compatibility, we should automatically update the
default Hadoop version for
all branches that use the same minor version.
The existing hadoopcheck mechanism should be good enough to guarantee that
we do not break compatibility with the earlier patch releases.

This would ensure that the binaries use the latest and greatest Hadoop (of
that minor branch) and that users of the binaries get the latest fixes,
both CVE and functionality wise, and
the binaries also get the transitive CVE fixes in that release.
For example,if we did this we could use  the new feature in 3.3.6 in
HBASE-27769 (via reflection) and also test it, thereby improving Ozone
support.

On the other hand we minimize changes and maximize compatibility by
sticking to the same Hadoop minor release.

Proposal 2:

We should default to the latest hadoop version (currently 3.4.0) on
unreleased branches.
This should ensure that when we do release we default to the latest
version, and we've tested it as thoroughly as possible.

Again. the existing Hadoopcheck mechanism should ensure that we do not
break compatibility with earlier supported versions.

Istvan




On Mon, Sep 9, 2024 at 9:41 PM Nick Dimiduk  wrote:

> Yes, we’ll use reflection to make use of APIs introduced in newer HDFS
> versions than the stated dependency until the stated dependency finally
> catches up.
>
> On Mon, 9 Sep 2024 at 19:55, Wei-Chiu Chuang  wrote:
>
> > Reflection is probably the way to go to ensure maximum compatibility TBH
> >
> > On Mon, Sep 9, 2024 at 10:40 AM Istvan Toth 
> > wrote:
> >
> > > Stephen Wu has kindly sent me the link for the previous email thread:
> > > https://lists.apache.org/thread/2k4tvz3wpg06sgkynkhgvxrodmj86vsj
> > >
> > > Reading it, I cannot see anything there that would contraindicate
> > upgrading
> > > to 3.3.6 from 3.3.5, at least on the branches that already default to
> > > 3.3.5, i.e. 2.6+.
> > >
> > > At first glance, the new logic in HBASE-27769 could also be implemented
> > > with the usual reflection hacks, while preserving the old logic for
> > Hadoop
> > > 3.3.5 and earlier.
> > >
> > > Thanks,
> > > Istvan
> > >
> > >
> > >
> > > On Mon, Sep 9, 2024 at 1:42 PM Istvan Toth  wrote:
> > >
> > > > Thanks for your reply, Nick.
> > > >
> > > > There are no listed direct CVEs in either Hadoop 3.2.4 or 3.3.5, but
> > > there
> > > > are CVEs in their transitive dependencies.
> > > >
> > > > My impression is that rather than shipping the oldest 'safe' version,
> > > > HBase does seem to update the default Hadoop version to the
> latest-ish
> > at
> > > > the time of the start
> > > > of the release process, otherwise 2.6 would still default to 3.2.4.
> > > (HBase
> > > > 2.6 release was already underway when Hadoop 3.4.0 was released)
> > > >
> > > > For now, we (Phoenix) have resorted to dependency managing transitive
> > > > dependencies coming in (only) via Hadoop in Phoenix,
> > > > bu

Re: [DISCUSS] Using Hadoop 3.3.6 and 3.4.0 as default versions

2024-09-09 Thread Istvan Toth
Thanks for your reply, Nick.

There are no listed direct CVEs in either Hadoop 3.2.4 or 3.3.5, but there
are CVEs in their transitive dependencies.

My impression is that rather than shipping the oldest 'safe' version, HBase
does seem to update the default Hadoop version to the latest-ish at the
time of the start
of the release process, otherwise 2.6 would still default to 3.2.4. (HBase
2.6 release was already underway when Hadoop 3.4.0 was released)

For now, we (Phoenix) have resorted to dependency managing transitive
dependencies coming in (only) via Hadoop in Phoenix,
but that is a slippery slope, and adds a layer of uncertainty, as it may
introduce incompatibilities in Hadoop that we don't have tests for.

Our situation is similar to that of the HBase shaded artifacts, where we
ship a huge uberjar that includes much of both HBase and Hadoop on top of
(or rather below) Phoenix,
similar to the hbase-client-shaded jar.

I will look into to hadoop check CI tests that you've mentioned, then I
will try to resurrect HBASE-27931, and if I don't find any issues, and
there are no objections, then
I will put a PR to update the unreleased version to default to 3.4.0.

Istvan

On Mon, Sep 9, 2024 at 11:06 AM Nick Dimiduk  wrote:

> My understanding of our hadoop dependency policy is that we ship poms with
> hadoop versions pinned to the oldest compatible, "safe" version that is
> supported. Our test infrastructure has a "hadoop check" procedure that does
> some validation against other patch release versions.
>
> I don't know if anyone has done a CVE sweep recently. If there are new
> CVEs, we do bump the minimum supported version specified in the pom as part
> of patch releases. These changes need to include a pretty thorough
> compatibility check so that we can include release notes about any
> introduced incompatibilities.
>
> I am in favor of a dependency bump so as to address known CVEs as best as
> we reasonably can.
>
> Thanks,
> Nick
>
> On Mon, Sep 9, 2024 at 10:59 AM Istvan Toth  wrote:
>
> > Hi!
> >
> > I'm working on building the Phoenix uberjars with newer Hadoop versions
> by
> > default to improve its CVE stance, and I realized that HBase itself does
> > not use the latest releases.
> >
> > branch-2.5 defaults to 3.2.4
> > branch-2.6 and later defaults to 3.3.5
> >
> > I can kind of understand that we don't want to bump the minor version for
> > branch-2.5 from the one it was released with.
> >
> > However, I don't see the rationale for not upgrading branch-2.6 to at
> least
> > 3.3.6, and the unreleased branches (branch-2, branch-3, master) to 3.4.0.
> >
> > I found a mention of wanting to stay off the latest patch release
> > HBASE-27931, but I could not figure if it has a technical reason, or if
> > this is a written (or unwritten) policy.
> >
> > best regards
> > Istvan
> >
>


-- 
*István Tóth* | Sr. Staff Software Engineer
*Email*: st...@cloudera.com
cloudera.com <https://www.cloudera.com>
[image: Cloudera] <https://www.cloudera.com/>
[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
--
--


[DISCUSS] Using Hadoop 3.3.6 and 3.4.0 as default versions

2024-09-09 Thread Istvan Toth
Hi!

I'm working on building the Phoenix uberjars with newer Hadoop versions by
default to improve its CVE stance, and I realized that HBase itself does
not use the latest releases.

branch-2.5 defaults to 3.2.4
branch-2.6 and later defaults to 3.3.5

I can kind of understand that we don't want to bump the minor version for
branch-2.5 from the one it was released with.

However, I don't see the rationale for not upgrading branch-2.6 to at least
3.3.6, and the unreleased branches (branch-2, branch-3, master) to 3.4.0.

I found a mention of wanting to stay off the latest patch release
HBASE-27931, but I could not figure if it has a technical reason, or if
this is a written (or unwritten) policy.

best regards
Istvan


Re: About removing the preWALWrite and postWALWrite methods in WALObserver

2024-08-12 Thread Istvan Toth
As far as Phoenix is concerned, only postWALWrite is used, and only in an
integration test, which should be easy to change if
we can still access the required data.

WALAnnotationIT.AnnotatedWALObserver.postWALWrite(ObserverContext, RegionInfo, WALKey, WALEdit)

However, I wonder if exposing the internal WAL related types in a method
that specifically hooks into WAL processing is really a problem.

Istvan.


On Mon, Aug 5, 2024 at 5:04 PM Duo Zhang  wrote:

> These two methods are marked as deprecated and the plan is to remove
> them in 3.0.0 release.
>
> But the javadoc says
>
> To be replaced with an alternative that does not expose
> InterfaceAudience classes such as WALKey and WALEdit.
>
> But there are no such methods for now.
>
> So I wonder whether there are actual usages of these two methods in
> the community. If so, I think we need to change the deprecation cycle
> to remove them in 4.0.0, until we introduce the alternate methods.
>
> Thoughts?
>
> Thanks.
>


-- 
*István Tóth* | Sr. Staff Software Engineer
*Email*: st...@cloudera.com
cloudera.com 
[image: Cloudera] 
[image: Cloudera on Twitter]  [image:
Cloudera on Facebook]  [image: Cloudera
on LinkedIn] 
--
--


[jira] [Created] (HBASE-28761) Expose HTTP context in REST Client

2024-07-30 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28761:
---

 Summary: Expose HTTP context in REST Client
 Key: HBASE-28761
 URL: https://issues.apache.org/jira/browse/HBASE-28761
 Project: HBase
  Issue Type: Improvement
  Components: REST
Reporter: Istvan Toth


We already expose the Apache HTTP Client object in the REST client, but we 
specify the context for each call separately, so it is not possible to retrieve 
it.

Add a getter and setter for the stickyContext object.

The use case for this is copying session cookies between clients to avoid 
re-authentication by each client object, but this may also be useful for 
debugging purposes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] About the add methods in WALEdit

2024-07-15 Thread Istvan Toth
Yes, Phoenix does call WalEdit.add() in

org.apache.phoenix.hbase.index.Indexer.preBatchMutateWithExceptions(ObserverContext,
MiniBatchOperationInProgress)

On Sat, Jul 13, 2024 at 4:24 PM Duo Zhang  wrote:

> I filed HBASE-28719 to change the internal type of WALEdit to
> ExtendedCell but still keep the public methods to accept Cell since it
> is marked as IA.LimitedPrivate for coprocessor and replication.
>
> In the javadoc of WALEdit, we said that the class is read only for
> coprocessor and customized replication, and mark all the add methods
> IA.Private. But in HRegion's implementation, we do call the
> coprocessor methods with an empty WALEdit, and then collect the cells
> in this WALEdit for adding them to the WALEdit we actually write out
> later.
>
> So I wonder whether the javadoc reflects the real usage here...
>
> Do we have any real usage which adds cells to WALEdit? If so, I think
> we should redesign the visibility of these methods a bit.
>
> Feedbacks are welcome.
>
> Thanks.
>


-- 
*István Tóth* | Sr. Staff Software Engineer
*Email*: st...@cloudera.com
cloudera.com 
[image: Cloudera] 
[image: Cloudera on Twitter]  [image:
Cloudera on Facebook]  [image: Cloudera
on LinkedIn] 
--
--


Re: [DISCUSS] Reverting hbase rest protobuf package name

2024-07-12 Thread Istvan Toth
HBASE-23975 was the original ticket.

My guess is that since hbase-shaded-protocol was already set up to do the
compiling and shading, moving it there was the easiest solution.
I guess that the same logic was behind the rename: since every other class
there uses the .shaded. package, change the REST messages the same way.

regards
Istvan




On Fri, Jul 12, 2024 at 9:48 AM 张铎(Duo Zhang)  wrote:

> In which jira we did this moving? Are there any reasons why we did
> this in the past?
>
> Istvan Toth  于2024年7月12日周五 03:57写道:
> >
> > Hi!
> >
> > While working on HBASE-28725, I realized that in HBase 3+ the REST
> protobuf
> > definition files have been moved to hbase-shaded-protobuf, and the
> package
> > name has also been renamed.
> >
> > While I fully agree with the move to using the thirdparty protobuf
> library
> > (in fact I'd like to backport that change to 2.x), I think that moving
> the
> > .proto files and renaming the package was not a good idea.
> >
> > The REST interface does not use the HBase patched features of the
> protobuf
> > library, and if we want to maintain any pretense that the REST protobuf
> > encoding is usable by non-java code, then we should not use it in the
> > future either.
> >
> > (If we ever decide to use the patched features for performance reasons,
> we
> > will need to define new protobuf messages for that anyway)
> >
> > Protobuf does not use the package name on the wire, so wire compatibility
> > is not an issue.
> >
> > In the unlikely case that someone has implemented an independent REST
> > client that uses protobuf encoding, this will also ensure compatibility
> > with the 3.0+ .protoc definitions.
> >
> > My proposal is:
> >
> > HBASE-28726 <https://issues.apache.org/jira/browse/HBASE-28726> Revert
> REST
> > protobuf package to org.apache.hadoop.hbase.shaded.rest
> > *This applies only to branch-3+:*
> > 1. Move the REST .proto files and compiling back to the hbase-rest module
> > (but use the same protoc compiler that we use now)
> > 2. Revert the package name of the protobuf messages to the original
> > 3. No other changes, we still use the thirdparty protobuf library.
> >
> > The other issue is that on HBase 2.x the REST client still requires
> > unshaded protobuf 2.5.0 which brings back all the protobuf library
> > conflicts that were fixed in 3.0 and by hbase-shaded-client. To fix this,
> > my proposal is:
> >
> > HBASE-28725 <https://issues.apache.org/jira/browse/HBASE-28725> Use
> > thirdparty protobuf for REST interface in HBase 2.x
> > *This applies only to branch-2.x:*
> > 1. Backport the code changes that use the thirdparty protobuf library for
> > REST to branch-2.x
> >
> > With these two changes, the REST code would be almost identical on every
> > branch, easing maintenance.
> >
> > What do you think ?
> >
> > Istvan
>


-- 
*István Tóth* | Sr. Staff Software Engineer
*Email*: st...@cloudera.com
cloudera.com <https://www.cloudera.com>
[image: Cloudera] <https://www.cloudera.com/>
[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
--
--


[DISCUSS] Reverting hbase rest protobuf package name

2024-07-11 Thread Istvan Toth
Hi!

While working on HBASE-28725, I realized that in HBase 3+ the REST protobuf
definition files have been moved to hbase-shaded-protobuf, and the package
name has also been renamed.

While I fully agree with the move to using the thirdparty protobuf library
(in fact I'd like to backport that change to 2.x), I think that moving the
.proto files and renaming the package was not a good idea.

The REST interface does not use the HBase patched features of the protobuf
library, and if we want to maintain any pretense that the REST protobuf
encoding is usable by non-java code, then we should not use it in the
future either.

(If we ever decide to use the patched features for performance reasons, we
will need to define new protobuf messages for that anyway)

Protobuf does not use the package name on the wire, so wire compatibility
is not an issue.

In the unlikely case that someone has implemented an independent REST
client that uses protobuf encoding, this will also ensure compatibility
with the 3.0+ .protoc definitions.

My proposal is:

HBASE-28726  Revert REST
protobuf package to org.apache.hadoop.hbase.shaded.rest
*This applies only to branch-3+:*
1. Move the REST .proto files and compiling back to the hbase-rest module
(but use the same protoc compiler that we use now)
2. Revert the package name of the protobuf messages to the original
3. No other changes, we still use the thirdparty protobuf library.

The other issue is that on HBase 2.x the REST client still requires
unshaded protobuf 2.5.0 which brings back all the protobuf library
conflicts that were fixed in 3.0 and by hbase-shaded-client. To fix this,
my proposal is:

HBASE-28725  Use
thirdparty protobuf for REST interface in HBase 2.x
*This applies only to branch-2.x:*
1. Backport the code changes that use the thirdparty protobuf library for
REST to branch-2.x

With these two changes, the REST code would be almost identical on every
branch, easing maintenance.

What do you think ?

Istvan


[jira] [Created] (HBASE-28726) Revert REST protobuf package to org.apache.hadoop.hbase.shaded.rest

2024-07-11 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28726:
---

 Summary: Revert REST protobuf package to 
org.apache.hadoop.hbase.shaded.rest
 Key: HBASE-28726
 URL: https://issues.apache.org/jira/browse/HBASE-28726
 Project: HBase
  Issue Type: Bug
  Components: REST
Reporter: Istvan Toth


In Hbase 3+, the package name of the REST messages has been renamed to 
org.apache.hadoop.hbase.shaded.rest from org.apache.hadoop.hbase.rest 

These definitions are only used by REST, and have nothing to do with standard 
HBase RPC communication.

I propose reverting the package name.
We may also want to move the protobuf definitions back to the hbase-rest module.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28725) Use thirdparty protobuf for REST interface in HBase 2.x

2024-07-11 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28725:
---

 Summary: Use thirdparty protobuf for REST interface in HBase 2.x
 Key: HBASE-28725
 URL: https://issues.apache.org/jira/browse/HBASE-28725
 Project: HBase
  Issue Type: Improvement
  Components: REST
Reporter: Istvan Toth
Assignee: Istvan Toth


This change has already been done in branch 3+ as part of the protobuf 2.5 
removal,
We just need to backport it to 2.x.

This removes the requirement of having unshaded protobuf 2.5.0 on the 
hbase-rest client classpath.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28717) Support FuzzyRowFilter in REST interface

2024-07-08 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28717:
---

 Summary: Support FuzzyRowFilter in REST interface
 Key: HBASE-28717
 URL: https://issues.apache.org/jira/browse/HBASE-28717
 Project: HBase
  Issue Type: Improvement
  Components: REST
Reporter: Istvan Toth
Assignee: Istvan Toth


This is similar to MultiRowRangeFilter, but needs a new DTO to represent the 
keys with masks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Move our official slack channel to the one in the-asf.slack.com

2024-07-08 Thread Istvan Toth
+1

On Sun, Jul 7, 2024 at 8:55 PM Pankaj Kumar  wrote:

> +1
>
> On Sun, Jul 7, 2024 at 9:24 PM Bryan Beaudreault 
> wrote:
>
> > +1 sounds good to me
> >
> > On Sun, Jul 7, 2024 at 11:07 AM Duo Zhang  wrote:
> >
> > > As I mentioned in another thread, now slack will hide the comments
> > > before 90 days in the current apache-hbase.slack.com, which is really
> > > not good for finding useful discussions.
> > >
> > > According to the documentation here
> > >
> > > https://infra.apache.org/slack.html
> > >
> > > We could invite people which do not have at apache dot org email
> > > address as a guest to the slack channel, so there is no concerns about
> > > only committers can join the channel.
> > >
> > > Thoughts?
> > >
> > > Thanks.
> > >
> >
>


-- 
*István Tóth* | Sr. Staff Software Engineer
*Email*: st...@cloudera.com
cloudera.com 
[image: Cloudera] 
[image: Cloudera on Twitter]  [image:
Cloudera on Facebook]  [image: Cloudera
on LinkedIn] 
--
--


[jira] [Resolved] (HBASE-28646) Use Streams to unmarshall protobuf REST data

2024-06-17 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-28646.
-
Fix Version/s: 2.7.0
   3.0.0-beta-2
   2.6.1
   2.5.9
   Resolution: Fixed

Committed to active branches.

> Use Streams to unmarshall protobuf REST data
> 
>
> Key: HBASE-28646
> URL: https://issues.apache.org/jira/browse/HBASE-28646
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>    Reporter: Istvan Toth
>    Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.9
>
>
> We've recently optimized REST marshalling by using streams directly.
> We should do the same for unmarshalling.
> The easy part is the server side, as that affects only a small set files.
> However, we should also support streams on the client side, which requires 
> duplicating each method the returns / expects a byte array to also work with 
> streams.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28671) Add close method to REST client

2024-06-17 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28671:
---

 Summary: Add close method to REST client
 Key: HBASE-28671
 URL: https://issues.apache.org/jira/browse/HBASE-28671
 Project: HBase
  Issue Type: Improvement
  Components: REST
Reporter: Istvan Toth
Assignee: Istvan Toth






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28670) Marker interface for Cells which may have backing ByteBuffers

2024-06-16 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28670:
---

 Summary: Marker interface for Cells which may have backing 
ByteBuffers
 Key: HBASE-28670
 URL: https://issues.apache.org/jira/browse/HBASE-28670
 Project: HBase
  Issue Type: Improvement
Reporter: Istvan Toth


We often need to treat cells that may have backing ByteBuffers.
The easy thing to would be checking if they are ByteBufferExtendedCell , but 
CellWrapper , OnHepDecodedCell and TagRewriteCell may also delegate to a 
ByteBufferExtendedCell.

Having a marker interface that indicates that a cell is guaranteed to be 
on-heap, or that it may not be fully on-heap would make it easier and faster 
check whether we need to clone them. (We only need one, it may be either 
negative or positive)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] About the cleanup of our Cell related methods

2024-06-16 Thread Istvan Toth
Hi Duo,

Yes, it does:

[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-compiler-plugin:3.11.0:compile
(default-compile) on project phoenix-core-client: Compilation failure:
Compilation failure:
[ERROR]
/home/stoty/workspaces/apache-phoenix/phoenix/phoenix-core-client/src/main/java/org/apache/phoenix/util/PhoenixKeyValueUtil.java:[262,93]
incompatible types: java.util.List cannot be
converted to java.util.List
[ERROR]
/home/stoty/workspaces/apache-phoenix/phoenix/phoenix-core-client/src/main/java/org/apache/phoenix/hbase/index/util/IndexManagementUtil.java:[248,69]
incompatible types: java.util.List cannot be
converted to java.util.List

Now, we haven't actually released (TBH we haven't even committed it) Hbase
2.6 support yet, so technically this doesn't break any released Phoenix
code.

I played around with it a bit, and it looks like those two errors can be
fixed in a backwards-compatible manner with some work.

Opened https://issues.apache.org/jira/browse/PHOENIX-7331 to track that
work.


On Mon, Jun 17, 2024 at 3:12 AM 张铎(Duo Zhang)  wrote:

> Hi, Istvan, do you have any findings? Does this break Phoenix on
> branch-2.6?
>
> Thanks.
>
> 张铎(Duo Zhang)  于2024年6月14日周五 09:32写道:
> >
> > The PR for branch-2
> >
> > https://github.com/apache/hbase/pull/5985
> >
> > The PR for branch-2.6
> >
> > https://github.com/apache/hbase/pull/5990
> >
> > At least the UTs are all fine.
> >
> > The current PR does not do much incompatible change, the only extra
> > check is we require filters to return ExtendedCell instead of Cell.
> >
> > Thanks.
> >
> > 张铎(Duo Zhang)  于2024年6月13日周四 14:33写道:
> > >
> > > OK, let me open PRs for branch-2 and branch-2.6 too.
> > >
> > > Istvan Toth  于2024年6月13日周四 14:02写道:
> > > >
> > > > I am also concerned about this change on branch-2.
> > > > Some of our code  does implement getSequenceId() , so doing this on
> 2.5 and
> > > > 2.6  would definitely break the existing releases, and require some
> changes
> > > > and new compatibility modules on the Phoenix side.
> > > >
> > > > Phoenix usually uses the KeyValue type internally, which already
> extends
> > > > ExtendedCell, so it's probably not a huge problem, but we'd like to
> check
> > > > before this is merged to branch-2.
> > > > I'd like to ask for a heads-up when we have a PR for branch-2 so
> that we
> > > > can check what breaks.
> > > > (We currently only support branch-2.6, so ideally we'd have a
> branch-2.6 PR
> > > > to check, but we can probably get branch-2 to work to check the
> change
> > > > without a lot of work)
> > > >
> > > > We also have a POC branch for HBase 3, so we can also check Phoenix
> with a
> > > > branch-3 version of this change.
> > > >
> > > > The worst case scenario for Phoenix would be having to open a
> different
> > > > branch for hbase 3.x support, we really want to avoid that.
> > > > As long as the change can be papered over with a compatibility
> module,
> > > > we're mostly fine.
> > > >
> > > > Istvan
> > > >
> > > >
> > > >
> > > > On Thu, Jun 13, 2024 at 5:45 AM 张铎(Duo Zhang) 
> wrote:
> > > >
> > > > > In fact, I do not think CPs can really return their own Cell
> > > > > implementation which is not an ExtendedCell, we will call
> > > > > PrivateCellUtil.setSequenceId and setTimestamp at server side, if
> the
> > > > > Cell is not an ExtendedCell we will throw IOException...
> > > > >
> > > > > I guess the only place where it is safe to return a customized
> Cell is
> > > > > in filter implementation, in getNextCellHint, where we will only
> use
> > > > > it to seek the scanners without storing it anywhere...
> > > > >
> > > > > Reid Chan  于2024年6月13日周四 10:56写道:
> > > > > >
> > > > > > It may affect Phoenix, as it is CPs related, at least for
> branch-2.
> > > > > >
> > > > > > Let's wait and see if there are any comments from them
> > > > > >
> > > > > >
> > > > > > ---
> > > > > >
> > > > > > Best regards,
> > > > > > R.C
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Jun 11, 2024 at 4:

[jira] [Created] (HBASE-28662) Removing missing scanner via REST should return 404

2024-06-13 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28662:
---

 Summary: Removing missing scanner via REST should return 404
 Key: HBASE-28662
 URL: https://issues.apache.org/jira/browse/HBASE-28662
 Project: HBase
  Issue Type: Bug
  Components: REST
Reporter: Istvan Toth
Assignee: Istvan Toth


We do not handle the case when the user is trying to remove a missing scanner, 
and let an NPE bubble up to Jersey.

We should check or catch the error and return 404 instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] About the cleanup of our Cell related methods

2024-06-12 Thread Istvan Toth
I am also concerned about this change on branch-2.
Some of our code  does implement getSequenceId() , so doing this on 2.5 and
2.6  would definitely break the existing releases, and require some changes
and new compatibility modules on the Phoenix side.

Phoenix usually uses the KeyValue type internally, which already extends
ExtendedCell, so it's probably not a huge problem, but we'd like to check
before this is merged to branch-2.
I'd like to ask for a heads-up when we have a PR for branch-2 so that we
can check what breaks.
(We currently only support branch-2.6, so ideally we'd have a branch-2.6 PR
to check, but we can probably get branch-2 to work to check the change
without a lot of work)

We also have a POC branch for HBase 3, so we can also check Phoenix with a
branch-3 version of this change.

The worst case scenario for Phoenix would be having to open a different
branch for hbase 3.x support, we really want to avoid that.
As long as the change can be papered over with a compatibility module,
we're mostly fine.

Istvan



On Thu, Jun 13, 2024 at 5:45 AM 张铎(Duo Zhang)  wrote:

> In fact, I do not think CPs can really return their own Cell
> implementation which is not an ExtendedCell, we will call
> PrivateCellUtil.setSequenceId and setTimestamp at server side, if the
> Cell is not an ExtendedCell we will throw IOException...
>
> I guess the only place where it is safe to return a customized Cell is
> in filter implementation, in getNextCellHint, where we will only use
> it to seek the scanners without storing it anywhere...
>
> Reid Chan  于2024年6月13日周四 10:56写道:
> >
> > It may affect Phoenix, as it is CPs related, at least for branch-2.
> >
> > Let's wait and see if there are any comments from them
> >
> >
> > ---
> >
> > Best regards,
> > R.C
> >
> >
> >
> > On Tue, Jun 11, 2024 at 4:19 PM Duo Zhang  wrote:
> >
> > > We have several deprecated methods in Cell interface, like
> > > getSequenceId and tag related ones, which are marked as deprecated
> > > since 2.0.0 and should be removed in 3.0.0. Most of them are marked as
> > > deprecated in HBASE-19112.
> > >
> > > After investigating, I found that it is not an easy work...
> > >
> > > We have 3 levels for the Cell interface
> > >
> > > Cell -> RawCell -> ExtendedCell
> > >
> > > Where Cell is the Public API for client side usage, RawCell is for CPs
> > > where we expose the tag related APIs, and ExtendedCell is for server
> > > side usage, where we expose all the internal stuff, like sequence id.
> > >
> > > In HBASE-19550, we introduced a CellWrapper as we think maybe CPs will
> > > return a Cell which is not an ExtendedCell at server side, we need to
> > > wrap it so at server side we always get an ExtendedCell. So if we
> > > remove the tag and sequence id related methods, CellWrapper will be
> > > broken.
> > >
> > > For me, I do not think we need the CellWrapper. In general, all actual
> > > Cell implementation classes in our code base implement the
> > > ExtendedCell interface, i.e, we can simply consider all Cells are
> > > actually ExtendedCells. The only reason we introduce the Cell
> > > interface, is to hide internal stuff from client public API, does not
> > > mean we want users to implement their own Cell.
> > >
> > > So the plan is to change server side code use ExtendedCell as much as
> > > possible in HBASE-28644, a draft PR is already there for review[1].
> > > This will be landed to both 3.x and at least branch-2(I'm not sure if
> > > it worth to also land on branch-2.6 and branch-2.5).
> > > And starting from 3.0.0, we should make clear that all Cell instances
> > > should be created via the CellBuilder related APIs, other Cell
> > > implementations are not allowed. So on 3.0.0, we should treat Cell as
> > > ExtendedCell everywhere in our code base, and also remove the
> > > CellWrapper class, then we could finally remove the deprecated methods
> > > in the Cell interface.
> > >
> > > Thoughts?
> > >
> > > Thanks.
> > >
> > > 1. https://github.com/apache/hbase/pull/5976
> > >
>


-- 
*István Tóth* | Sr. Staff Software Engineer
*Email*: st...@cloudera.com
cloudera.com 
[image: Cloudera] 
[image: Cloudera on Twitter]  [image:
Cloudera on Facebook]  [image: Cloudera
on LinkedIn] 
--
--


[jira] [Created] (HBASE-28650) REST multiget endpoint returns 500 error if no rows are specified

2024-06-10 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28650:
---

 Summary: REST multiget endpoint returns 500 error if no rows are 
specified
 Key: HBASE-28650
 URL: https://issues.apache.org/jira/browse/HBASE-28650
 Project: HBase
  Issue Type: Bug
  Components: REST
Reporter: Istvan Toth


Should return 404 instead.

Need to check if there are any "row" params before trying to iterate over them

{noformat}
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hbase.rest.MultiRowResource.get(MultiRowResource.java:100)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hbase.thirdparty.org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52)
at 
org.apache.hbase.thirdparty.org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:134)
at 
org.apache.hbase.thirdparty.org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:177)
at {noformat}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28649) Wrong properties are used to set up SSL for REST Client Kerberos authenticator

2024-06-10 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-28649.
-
Fix Version/s: 2.7.0
   3.0.0-beta-2
   2.6.1
   2.5.9
   Resolution: Fixed

Committed to all active branches.

> Wrong properties are used to set up SSL for REST Client Kerberos authenticator
> --
>
> Key: HBASE-28649
> URL: https://issues.apache.org/jira/browse/HBASE-28649
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>    Reporter: Istvan Toth
>    Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.9
>
>
> We are setting the ssl keystore properties, when we should be setting the 
> truststore ones.
> This results in SPENGO negotiation failing with custom truststores.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28649) Wrong properties are used to set up SSL for REST Client Kerberos authenticator

2024-06-10 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28649:
---

 Summary: Wrong properties are used to set up SSL for REST Client 
Kerberos authenticator
 Key: HBASE-28649
 URL: https://issues.apache.org/jira/browse/HBASE-28649
 Project: HBase
  Issue Type: Bug
  Components: REST
Reporter: Istvan Toth
Assignee: Istvan Toth


We are setting the ssl keystore properties, when we should be setting the 
truststore ones.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28647) Support streams in org.apache.hadoop.hbase.rest.client.Client

2024-06-09 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28647:
---

 Summary: Support streams in 
org.apache.hadoop.hbase.rest.client.Client
 Key: HBASE-28647
 URL: https://issues.apache.org/jira/browse/HBASE-28647
 Project: HBase
  Issue Type: Improvement
Reporter: Istvan Toth


Support using stream for sending/receiving data in 
org.apache.hadoop.hbase.rest.client.Client .

Also update tests to use the new methods.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28646) Use Streams to unmarshall protobuf REST data

2024-06-09 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28646:
---

 Summary: Use Streams to unmarshall protobuf REST data
 Key: HBASE-28646
 URL: https://issues.apache.org/jira/browse/HBASE-28646
 Project: HBase
  Issue Type: Improvement
  Components: REST
Reporter: Istvan Toth


We've recently optimized REST marshalling by using streams directly.

We should do the same for unmarshalling.

The easy part is the server side, as that affects only a small set files.

However, we should also support streams on the client side, which requires 
duplicating each method the returns / expects a byte array to also work with 
streams.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28645) Add build information to the REST server version endpoint

2024-06-09 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28645:
---

 Summary: Add build information to the REST server version endpoint
 Key: HBASE-28645
 URL: https://issues.apache.org/jira/browse/HBASE-28645
 Project: HBase
  Issue Type: New Feature
  Components: REST
Reporter: Istvan Toth


There is currently no way to check the REST server version / build number 
remotely.

The */version/cluster* endpoint takes the version from master (fair enough),
and the */version/rest* does not include the build information.

We should add a version field to the /version/rest endpoint, which reports the 
version of the REST server component.

We should also log this at startup, just like we log the cluster version now.

We may have to add and store the version in the hbase-rest code during build, 
similarly to how do it for the other components.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28540) Cache Results in org.apache.hadoop.hbase.rest.client.RemoteHTable.Scanner

2024-06-07 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-28540.
-
Fix Version/s: 2.7.0
   3.0.0-beta-2
   2.6.1
   2.5.9
   Resolution: Fixed

> Cache Results in org.apache.hadoop.hbase.rest.client.RemoteHTable.Scanner
> -
>
> Key: HBASE-28540
> URL: https://issues.apache.org/jira/browse/HBASE-28540
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>    Reporter: Istvan Toth
>    Assignee: Istvan Toth
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.9
>
>
> The implementation of org.apache.hadoop.hbase.rest.client.RemoteHTable.Scanner
> is very inefficient, as the standard next() methods makes separate a http 
> request for each row.
> Performance can be improved by not specifying the row count in the REST call 
> and caching the returned Results.
> Chunk size can still be influenced by scan.setBatch();



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Marking Filters based on their hinting / row stateful behaviours.

2024-06-04 Thread Istvan Toth
Committed the discussed fix as HBASE-28622 .
Thank you Andrew for discussing this, and Duo for the review.

Istvan

On Fri, May 31, 2024 at 2:36 PM Istvan Toth  wrote:

> It turns out that ColumnPaginationFilter is both row stateful and can
> return a seek hint.
> I have removed the HintingFilter marker from it to preserve the correct
> operation.
>
> With this change, ColumnPaginationFilter is no worse off than it was, but
> the rest of the hinting
> filters will work correctly.
>
> On Fri, May 31, 2024 at 9:32 AM Istvan Toth  wrote:
>
>> This is indeed quite a small change.
>> The PR is up at https://github.com/apache/hbase/pull/5955
>>
>> On Wed, May 29, 2024 at 10:07 AM Istvan Toth  wrote:
>>
>>> Thanks for the detailed reply, Andrew.
>>>
>>> I was also considering default methods, but it turns out that Filter is
>>> not an interface, but an abstract class, so it doesn't apply.
>>>
>>> Children not implementing a marker interface or marker method would
>>> inherit the marker method implementation from the closest parent the same
>>> way they would inherit the marker interface, so I think they are equivalent
>>> in this aspect, too.
>>>
>>> I think that marker interface(s) and overridable non-abstract getter(s)
>>> in Filter are mostly equivalent from both logical and source compatibility
>>> aspects.
>>> The only difference is that the marker interfaces cannot be removed in a
>>> subclass, while the getter can be overridden anywhere, but with well-chosen
>>> defaults it shouldn't be much of a limitation.
>>>
>>> Now that I think about it, we could cache the markers' values in an
>>> array when creating the filter lists, so even the cost of looking them up
>>> doesn't matter as it wouldn't happen in the hot code path.
>>>
>>> Using the marker interfaces is more elegant, and discourages problematic
>>> subclassing, so I am leaning towards that.
>>>
>>> Istvan
>>>
>>> On Wed, May 29, 2024 at 2:30 AM Andrew Purtell 
>>> wrote:
>>>
>>>> Actually source compatibility with default methods would be fine too. I
>>>> forget this is the main reason default methods were invented. The code
>>>> of
>>>> derived classes would not need to be changed, unless the returned value
>>>> of
>>>> the new method should be changed, and this is no worse than having a
>>>> marker
>>>> interface, which would also require code changes to implement
>>>> non-default
>>>> behaviors.
>>>>
>>>> A marker interface does remain as an option. It might make a difference
>>>> in
>>>> chained use cases. Consider a chain of filter instances that mixes
>>>> derived
>>>> code that is unaware of isHinting() and base code that is. The filter
>>>> chain
>>>> can be examined for the presence or absence of the marker interface and
>>>> would not need to rely on every filter in the chain passing return
>>>> values
>>>> of isHinting back.
>>>>
>>>> Marker interfaces can also be added to denote stateful or stateless
>>>> filters, if distinguishing between them would be useful, perhaps down
>>>> the
>>>> road.
>>>>
>>>> On Tue, May 28, 2024 at 5:13 PM Andrew Purtell 
>>>> wrote:
>>>>
>>>> > I think you've clearly put a lot of time into the analysis and it is
>>>> > plausible.
>>>> >
>>>> > Adding isHinting as a default method will preserve binary
>>>> compatibility.
>>>> > Source compatibility for derived custom filters would be broken
>>>> though and
>>>> > that probably prevents this going back into a releasing code line.
>>>> >
>>>> > Have you considered adding a marker interface instead? That would
>>>> preserve
>>>> > both source and binary compatibility. It wouldn't require any changes
>>>> to
>>>> > derived custom filters. A runtime instanceof test would determine if
>>>> the
>>>> > filter is a hinting filter or not. No need for a new method, default
>>>> or
>>>> > otherwise.
>>>> >
>>>> > On Tue, May 28, 2024 at 12:41 AM Istvan Toth 
>>>> wrote:
>>>> >
>>>> >> I have recently opened HBASE-28622
>>>> >> 

Re: [DISCUSS] Marking Filters based on their hinting / row stateful behaviours.

2024-05-31 Thread Istvan Toth
It turns out that ColumnPaginationFilter is both row stateful and can
return a seek hint.
I have removed the HintingFilter marker from it to preserve the correct
operation.

With this change, ColumnPaginationFilter is no worse off than it was, but
the rest of the hinting
filters will work correctly.

On Fri, May 31, 2024 at 9:32 AM Istvan Toth  wrote:

> This is indeed quite a small change.
> The PR is up at https://github.com/apache/hbase/pull/5955
>
> On Wed, May 29, 2024 at 10:07 AM Istvan Toth  wrote:
>
>> Thanks for the detailed reply, Andrew.
>>
>> I was also considering default methods, but it turns out that Filter is
>> not an interface, but an abstract class, so it doesn't apply.
>>
>> Children not implementing a marker interface or marker method would
>> inherit the marker method implementation from the closest parent the same
>> way they would inherit the marker interface, so I think they are equivalent
>> in this aspect, too.
>>
>> I think that marker interface(s) and overridable non-abstract getter(s)
>> in Filter are mostly equivalent from both logical and source compatibility
>> aspects.
>> The only difference is that the marker interfaces cannot be removed in a
>> subclass, while the getter can be overridden anywhere, but with well-chosen
>> defaults it shouldn't be much of a limitation.
>>
>> Now that I think about it, we could cache the markers' values in an array
>> when creating the filter lists, so even the cost of looking them up doesn't
>> matter as it wouldn't happen in the hot code path.
>>
>> Using the marker interfaces is more elegant, and discourages problematic
>> subclassing, so I am leaning towards that.
>>
>> Istvan
>>
>> On Wed, May 29, 2024 at 2:30 AM Andrew Purtell 
>> wrote:
>>
>>> Actually source compatibility with default methods would be fine too. I
>>> forget this is the main reason default methods were invented. The code of
>>> derived classes would not need to be changed, unless the returned value
>>> of
>>> the new method should be changed, and this is no worse than having a
>>> marker
>>> interface, which would also require code changes to implement non-default
>>> behaviors.
>>>
>>> A marker interface does remain as an option. It might make a difference
>>> in
>>> chained use cases. Consider a chain of filter instances that mixes
>>> derived
>>> code that is unaware of isHinting() and base code that is. The filter
>>> chain
>>> can be examined for the presence or absence of the marker interface and
>>> would not need to rely on every filter in the chain passing return values
>>> of isHinting back.
>>>
>>> Marker interfaces can also be added to denote stateful or stateless
>>> filters, if distinguishing between them would be useful, perhaps down the
>>> road.
>>>
>>> On Tue, May 28, 2024 at 5:13 PM Andrew Purtell 
>>> wrote:
>>>
>>> > I think you've clearly put a lot of time into the analysis and it is
>>> > plausible.
>>> >
>>> > Adding isHinting as a default method will preserve binary
>>> compatibility.
>>> > Source compatibility for derived custom filters would be broken though
>>> and
>>> > that probably prevents this going back into a releasing code line.
>>> >
>>> > Have you considered adding a marker interface instead? That would
>>> preserve
>>> > both source and binary compatibility. It wouldn't require any changes
>>> to
>>> > derived custom filters. A runtime instanceof test would determine if
>>> the
>>> > filter is a hinting filter or not. No need for a new method, default or
>>> > otherwise.
>>> >
>>> > On Tue, May 28, 2024 at 12:41 AM Istvan Toth  wrote:
>>> >
>>> >> I have recently opened HBASE-28622
>>> >> <https://issues.apache.org/jira/browse/HBASE-28622> , which has
>>> turned
>>> >> out
>>> >> to be another aspect of the problem discussed in HBASE-20565
>>> >> <https://issues.apache.org/jira/browse/HBASE-20565> .
>>> >>
>>> >> The problem is discussed in detail in HBASE-20565
>>> >> <https://issues.apache.org/jira/browse/HBASE-20565> , but it boils
>>> down
>>> >> to
>>> >> the API design decision that the filters returning
>>> SEEK_NEXT_USING_HINT
>>> >&g

Re: [DISCUSS] Marking Filters based on their hinting / row stateful behaviours.

2024-05-31 Thread Istvan Toth
This is indeed quite a small change.
The PR is up at https://github.com/apache/hbase/pull/5955

On Wed, May 29, 2024 at 10:07 AM Istvan Toth  wrote:

> Thanks for the detailed reply, Andrew.
>
> I was also considering default methods, but it turns out that Filter is
> not an interface, but an abstract class, so it doesn't apply.
>
> Children not implementing a marker interface or marker method would
> inherit the marker method implementation from the closest parent the same
> way they would inherit the marker interface, so I think they are equivalent
> in this aspect, too.
>
> I think that marker interface(s) and overridable non-abstract getter(s) in
> Filter are mostly equivalent from both logical and source compatibility
> aspects.
> The only difference is that the marker interfaces cannot be removed in a
> subclass, while the getter can be overridden anywhere, but with well-chosen
> defaults it shouldn't be much of a limitation.
>
> Now that I think about it, we could cache the markers' values in an array
> when creating the filter lists, so even the cost of looking them up doesn't
> matter as it wouldn't happen in the hot code path.
>
> Using the marker interfaces is more elegant, and discourages problematic
> subclassing, so I am leaning towards that.
>
> Istvan
>
> On Wed, May 29, 2024 at 2:30 AM Andrew Purtell 
> wrote:
>
>> Actually source compatibility with default methods would be fine too. I
>> forget this is the main reason default methods were invented. The code of
>> derived classes would not need to be changed, unless the returned value of
>> the new method should be changed, and this is no worse than having a
>> marker
>> interface, which would also require code changes to implement non-default
>> behaviors.
>>
>> A marker interface does remain as an option. It might make a difference in
>> chained use cases. Consider a chain of filter instances that mixes derived
>> code that is unaware of isHinting() and base code that is. The filter
>> chain
>> can be examined for the presence or absence of the marker interface and
>> would not need to rely on every filter in the chain passing return values
>> of isHinting back.
>>
>> Marker interfaces can also be added to denote stateful or stateless
>> filters, if distinguishing between them would be useful, perhaps down the
>> road.
>>
>> On Tue, May 28, 2024 at 5:13 PM Andrew Purtell 
>> wrote:
>>
>> > I think you've clearly put a lot of time into the analysis and it is
>> > plausible.
>> >
>> > Adding isHinting as a default method will preserve binary compatibility.
>> > Source compatibility for derived custom filters would be broken though
>> and
>> > that probably prevents this going back into a releasing code line.
>> >
>> > Have you considered adding a marker interface instead? That would
>> preserve
>> > both source and binary compatibility. It wouldn't require any changes to
>> > derived custom filters. A runtime instanceof test would determine if the
>> > filter is a hinting filter or not. No need for a new method, default or
>> > otherwise.
>> >
>> > On Tue, May 28, 2024 at 12:41 AM Istvan Toth  wrote:
>> >
>> >> I have recently opened HBASE-28622
>> >> <https://issues.apache.org/jira/browse/HBASE-28622> , which has turned
>> >> out
>> >> to be another aspect of the problem discussed in HBASE-20565
>> >> <https://issues.apache.org/jira/browse/HBASE-20565> .
>> >>
>> >> The problem is discussed in detail in HBASE-20565
>> >> <https://issues.apache.org/jira/browse/HBASE-20565> , but it boils
>> down
>> >> to
>> >> the API design decision that the filters returning SEEK_NEXT_USING_HINT
>> >> rely on filterCell() getting called.
>> >>
>> >> On the other hand, some filters maintain an internal row state that
>> sets
>> >> counters for calls of filterCell(), which interacts with the results of
>> >> previous filters in a filterList.
>> >>
>> >> When filters return different results for filterRowkey(), then filters
>> >> returning  SEEK_NEXT_USING_HINT that have returned false must have
>> >> filterCell() called, otherwise the scan will degenerate into a full
>> scan.
>> >>
>> >> On the other hand, filters that maintain an internal row state must
>> only
>> >> be
>> >> called if all previous filters have INCLUDEed the Cell, otherwise th

[jira] [Resolved] (HBASE-28629) Using JDK17 resulted in regionserver reportForDuty failing

2024-05-30 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-28629.
-
Resolution: Information Provided

Hbase 2.1.1 has been end of life for a long time.

Use an active release, like 2.5 or 2.6.

> Using JDK17 resulted in regionserver reportForDuty failing
> --
>
> Key: HBASE-28629
> URL: https://issues.apache.org/jira/browse/HBASE-28629
> Project: HBase
>  Issue Type: Bug
>  Components: netty, regionserver, rpc
>Affects Versions: 2.1.1
> Environment: test environment:
> mem:32G
> hadoop version:2.7.2
> core:40
> hbase version:2.1.1
>Reporter: 高建达
>Priority: Major
> Attachments: image-2024-05-30-16-23-34-561.png, 
> image-2024-05-30-17-00-45-266.png, image-2024-05-30-17-02-18-965.png
>
>
> I am currently using HBASE-2.1.1 version to adapt to JDK17 and have 
> encountered some issues: 1) Java. lang. NoSuchFiledException: modifiers; 2) 
> Unable to make static boolean Java.nio Bits.unaligned() accessible: module 
> java.base does not "opens java.nio" to unnamed moudle; 3) Regionserver 
> HRegionServer: error telling master we are up。 Problem 1 is solved through 
> HBASE-25516 [JDK17] reflective access Field. class. getDeclaredField 
> ("modifiers") not supported - ASF JIRA (apache. org). Problem 2 is solved by 
> adding – add open Java. base/Java. lang=ALL-UNNAMED – add open Java. 
> base/Java. lang. reflect=ALL-UNNAMED – add open Java. base/Java. 
> nio=ALL-UNNAMED parameter. However, there is currently no idea for problem 3. 
> How can I handle this. now master is running  normally.
> regionserver:
> !image-2024-05-30-16-23-34-561.png!
> master:
> !image-2024-05-30-17-02-18-965.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28628) Use Base64.getUrlEncoder().withoutPaddding() in REST tests

2024-05-29 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28628:
---

 Summary: Use Base64.getUrlEncoder().withoutPaddding() in REST tests
 Key: HBASE-28628
 URL: https://issues.apache.org/jira/browse/HBASE-28628
 Project: HBase
  Issue Type: Bug
  Components: REST
Reporter: Istvan Toth


The encoder returned by java.util.Base64.getUrlEncoder() is unsuitable for the 
purpose.

To get an encode that is actually usable in URLs, 
ava.util.Base64.getUrlEncoder().withoutPadding() must be used.

The relevant Java bug is https://bugs.openjdk.org/browse/JDK-8026330 , however 
instead of fixing the encode, Java has decided to keep the broken default, and 
add the .withoutPadding()  method as a way to get a working one.

Due to sheer luck (or rather bad luck), this is not triggered in our tests, but 
anyone using them as a template will be in for a ride when hit by this problem.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28627) REST ScannerModel doesn't support includeStartRow/includeStopRow

2024-05-29 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28627:
---

 Summary: REST ScannerModel doesn't support 
includeStartRow/includeStopRow
 Key: HBASE-28627
 URL: https://issues.apache.org/jira/browse/HBASE-28627
 Project: HBase
  Issue Type: Bug
  Components: REST
 Environment: includeStartRow/includeStopRow should be transparently 
supported.
The current behaviour is limited and confiusing.

The only problem is that adding them may break backwards compatibility.
Need to test if the XML unmarshaller can handle nonexistent fields.

Reporter: Istvan Toth






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28623) Scan with MultiRowRangeFilter very slow

2024-05-29 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-28623.
-
Resolution: Won't Fix

> Scan with MultiRowRangeFilter very slow
> ---
>
> Key: HBASE-28623
> URL: https://issues.apache.org/jira/browse/HBASE-28623
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 2.4.14
>Reporter: chaijunjie
>Priority: Major
>
> when  scan a big table({*}more than 500 regions{*}) with 
> {*}MultiRowRangeFilter{*}, it is very slow...
> it seems to {*}scan all regions{*}...
> for example:
> we scan 3 ranges..
> startRow: 097_28220_ stopRow: 097_28220_~
> startRow: 098_28221_ stopRow: 098_28221_~
> startRow: 099_28222_ stopRow: 099_28222_~
> and enable TRACE log in hbase client
> we find there are too many scans
> {code:java}
>  1713987938886.93886cc52eea6200518feb7ebce7e1a4.', STARTKEY => '', ENDKEY => 
> '000_2147757104_4641'}
>     行 139: 1716188377677.a2e0d724dd73196d81ecbfb58c77b611.', STARTKEY => 
> '000_2147757104_4641', ENDKEY => '000_21
>     行 162: 1716188377677.b377942c957c300286afcb763f0dd338.', STARTKEY => 
> '000_2148042968_3081', ENDKEY => '000_21
>     行 185: 1714319482833.4e5bfdfb6f2bcf381681726429bf2adb.', STARTKEY => 
> '000_2148518165_26648', ENDKEY => '000_3
>     行 197: 1715031138715.36bac123de7eec3c4c08a775d592f387.', STARTKEY => 
> '000_389786_4001', ENDKEY => '000_434112
>     行 211: 1715031138715.2dc9f1a78f532454ce8381ff9738e93e.', STARTKEY => 
> '000_434112_88683', ENDKEY => '000~'}
>     行 225: 1713890960521.94e341a71b5b3e98569809d7a0f4354e.', STARTKEY => 
> '000~', ENDKEY => '001_2147735632_4395'}
>     行 250: 1716239834572.3061c9f457b91ed40c938d801f8cac5f.', STARTKEY => 
> '001_2147735632_4395', ENDKEY => '001_21
>     行 264: 1716239834572.e56a4d6aae43b5d42561e4ee6f0e3132.', STARTKEY => 
> '001_2148043057_5975', ENDKEY => '001_23
>     行 278: 1714252181329.5de683912a8120bae9f37833fb286a30.', STARTKEY => 
> '001_238065_2439', ENDKEY => '001_400433
>     行 292: 1714858026179.941a4921968267374876b52fdb33a1d7.', STARTKEY => 
> '001_400433_45599', ENDKEY => '001_43429
>     行 306: 1714858026179.16e7de83bd7944e9d23b3568b14eaf9c.', STARTKEY => 
> '001_434296_34588', ENDKEY => '001~'}
>     行 331: 1714082282269.6853c99dc6d17b2340e04307e5492d58.', STARTKEY => 
> '001~', ENDKEY => '002_2147741550_785'}
>     行 345: 1714463331546.80f60ef11f1d337bcc09d7f24d390b28.', STARTKEY => 
> '002_2147741550_785', ENDKEY => '002_214
>     行 359: 1714463331546.9281d964d08863aab2745f8331c148ad.', STARTKEY => 
> '002_2148386148_27094', ENDKEY => '002_4
>     行 373: 1714685085875.2affd725c347399ad8c77eabd0a5d4f2.', STARTKEY => 
> '002_400185_74884', ENDKEY => '002_45861
>     行 387: 1714685085875.910cbc03d1d8571f1eda21e3441f9359.', STARTKEY => 
> '002_458618_25467', ENDKEY => '002~'}
>     行 401: 1714065682984.2358541c9c8d3f2f8c4496a1fd350c6c.', STARTKEY => 
> '002~', ENDKEY => '003_2147739809_4985'}
>     行 415: 1716251410111.c60662b46cabd2cd0638d39796f11827.', STARTKEY => 
> '003_2147739809_4985', ENDKEY => '003_21
>     行 429: 1716251410111.016507ab001379f86acdf0c40a5b93be.', STARTKEY => 
> '003_2148024128_3054', ENDKEY => '003_21
>     行 443: 1714348539371.e7a41938549f7384192edd059d7e4a3e.', STARTKEY => 
> '003_2148386097_25973', ENDKEY => '003_3
>     行 457: 1714925889818.a6c3c09cddd2c3e359c0f1497a302d6d.', STARTKEY => 
> '003_396959_86147', ENDKEY => '003_45861
>     行 471: 1714925889818.eb98caf696d333714fc917c95839ea8e.', STARTKEY => 
> '003_458619_61964', ENDKEY => '003~'}
>     行 485: 1713919439849.22b315f87ea850b2f1b052ccacf40a5c.', STARTKEY => 
> '003~', ENDKEY => '004_2147804164_6378'}
>     行 499: 1714553829364.ee60c3e63e43e18487afa3ebd9db7890.', STARTKEY => 
> '004_2147804164_6378', ENDKEY => '004_21
>     行 516: 1714553829364.30e09f836793166fb64f1799b63c56fc.', STARTKEY => 
> '004_2148363241_1674

[jira] [Created] (HBASE-28626) MultiRowRangeFilter deserialization fails in org.apache.hadoop.hbase.rest.model.ScannerModel

2024-05-29 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28626:
---

 Summary: MultiRowRangeFilter deserialization fails in 
org.apache.hadoop.hbase.rest.model.ScannerModel
 Key: HBASE-28626
 URL: https://issues.apache.org/jira/browse/HBASE-28626
 Project: HBase
  Issue Type: Bug
  Components: REST
Reporter: Istvan Toth
Assignee: Istvan Toth


org.apache.hadoop.hbase.filter.MultiRowRangeFilter.BasicRowRange has several 
getters that have no corresponing setters. 

jackson serializes the pseudo-getters' values, but when it tries to 
deserialize, there are no corresponding setters and it errors out.

{noformat}
com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: Unrecognized 
field "ascendingOrder" (class 
org.apache.hadoop.hbase.filter.MultiRowRangeFilter$RowRange), not marked as 
ignorable (4 known properties: "startRow", "startRowInclusive", "stopRow", 
"stopRowInclusive"])
 at [Source: 
(String)"{"type":"FilterList","op":"MUST_PASS_ALL","comparator":null,"value":null,"filters":[{"type":"MultiRowRangeFilter","op":null,"comparator":null,"value":null,"filters":null,"limit":null,"offset":null,"family":null,"qualifier":null,"ifMissing":null,"latestVersion":null,"minColumn":null,"minColumnInclusive":null,"maxColumn":null,"maxColumnInclusive":null,"dropDependentColumn":null,"chance":null,"prefixes":null,"ranges":[{"startRow":"MQ==","startRowInclusive":true,"stopRow":"MQ==","stopRowInclusive":t"[truncated
 553 chars]; line: 1, column: 526] (through reference chain: 
org.apache.hadoop.hbase.rest.model.ScannerModel$FilterModel["filters"]->java.util.ArrayList[0]->org.apache.hadoop.hbase.rest.model.ScannerModel$FilterModel["ranges"]->java.util.ArrayList[0]->org.apache.hadoop.hbase.filter.MultiRowRangeFilter$RowRange["ascendingOrder"])
at 
com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException.from(UnrecognizedPropertyException.java:61)
at 
com.fasterxml.jackson.databind.DeserializationContext.handleUnknownProperty(DeserializationContext.java:1127)
at 
com.fasterxml.jackson.databind.deser.std.StdDeserializer.handleUnknownProperty(StdDeserializer.java:2036)
at 
com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownProperty(BeanDeserializerBase.java:1700)
at 
com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownVanilla(BeanDeserializerBase.java:1678)
at 
com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:320)
at 
com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:177)
at 
com.fasterxml.jackson.databind.deser.std.CollectionDeserializer._deserializeFromArray(CollectionDeserializer.java:355)
at 
com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:244)
at 
com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:28)
at 
com.fasterxml.jackson.databind.deser.impl.FieldProperty.deserializeAndSet(FieldProperty.java:138)
at 
com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:314)
at 
com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:177)
at 
com.fasterxml.jackson.databind.deser.std.CollectionDeserializer._deserializeFromArray(CollectionDeserializer.java:355)
at 
com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:244)
at 
com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.deserialize(CollectionDeserializer.java:28)
at 
com.fasterxml.jackson.databind.deser.impl.FieldProperty.deserializeAndSet(FieldProperty.java:138)
at 
com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:314)
at 
com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:177)
at 
com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323)
at 
com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4674)
at 
com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3629)
at 
com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3597)
at 
org.apache.hadoop.hbase.rest.model.ScannerM

Re: [DISCUSS] Marking Filters based on their hinting / row stateful behaviours.

2024-05-29 Thread Istvan Toth
Thanks for the detailed reply, Andrew.

I was also considering default methods, but it turns out that Filter is not
an interface, but an abstract class, so it doesn't apply.

Children not implementing a marker interface or marker method would
inherit the marker method implementation from the closest parent the same
way they would inherit the marker interface, so I think they are equivalent
in this aspect, too.

I think that marker interface(s) and overridable non-abstract getter(s) in
Filter are mostly equivalent from both logical and source compatibility
aspects.
The only difference is that the marker interfaces cannot be removed in a
subclass, while the getter can be overridden anywhere, but with well-chosen
defaults it shouldn't be much of a limitation.

Now that I think about it, we could cache the markers' values in an array
when creating the filter lists, so even the cost of looking them up doesn't
matter as it wouldn't happen in the hot code path.

Using the marker interfaces is more elegant, and discourages problematic
subclassing, so I am leaning towards that.

Istvan

On Wed, May 29, 2024 at 2:30 AM Andrew Purtell  wrote:

> Actually source compatibility with default methods would be fine too. I
> forget this is the main reason default methods were invented. The code of
> derived classes would not need to be changed, unless the returned value of
> the new method should be changed, and this is no worse than having a marker
> interface, which would also require code changes to implement non-default
> behaviors.
>
> A marker interface does remain as an option. It might make a difference in
> chained use cases. Consider a chain of filter instances that mixes derived
> code that is unaware of isHinting() and base code that is. The filter chain
> can be examined for the presence or absence of the marker interface and
> would not need to rely on every filter in the chain passing return values
> of isHinting back.
>
> Marker interfaces can also be added to denote stateful or stateless
> filters, if distinguishing between them would be useful, perhaps down the
> road.
>
> On Tue, May 28, 2024 at 5:13 PM Andrew Purtell 
> wrote:
>
> > I think you've clearly put a lot of time into the analysis and it is
> > plausible.
> >
> > Adding isHinting as a default method will preserve binary compatibility.
> > Source compatibility for derived custom filters would be broken though
> and
> > that probably prevents this going back into a releasing code line.
> >
> > Have you considered adding a marker interface instead? That would
> preserve
> > both source and binary compatibility. It wouldn't require any changes to
> > derived custom filters. A runtime instanceof test would determine if the
> > filter is a hinting filter or not. No need for a new method, default or
> > otherwise.
> >
> > On Tue, May 28, 2024 at 12:41 AM Istvan Toth  wrote:
> >
> >> I have recently opened HBASE-28622
> >> <https://issues.apache.org/jira/browse/HBASE-28622> , which has turned
> >> out
> >> to be another aspect of the problem discussed in HBASE-20565
> >> <https://issues.apache.org/jira/browse/HBASE-20565> .
> >>
> >> The problem is discussed in detail in HBASE-20565
> >> <https://issues.apache.org/jira/browse/HBASE-20565> , but it boils down
> >> to
> >> the API design decision that the filters returning SEEK_NEXT_USING_HINT
> >> rely on filterCell() getting called.
> >>
> >> On the other hand, some filters maintain an internal row state that sets
> >> counters for calls of filterCell(), which interacts with the results of
> >> previous filters in a filterList.
> >>
> >> When filters return different results for filterRowkey(), then filters
> >> returning  SEEK_NEXT_USING_HINT that have returned false must have
> >> filterCell() called, otherwise the scan will degenerate into a full
> scan.
> >>
> >> On the other hand, filters that maintain an internal row state must only
> >> be
> >> called if all previous filters have INCLUDEed the Cell, otherwise their
> >> internal state will be off. (This still has caveats, as described in
> >> HBASE-20565 <https://issues.apache.org/jira/browse/HBASE-20565>)
> >>
> >> In my opinion, the current code from HBASE-20565
> >> <https://issues.apache.org/jira/browse/HBASE-20565> strikes a bad
> balance
> >> between features, as while it fixes some use cases for row stateful
> >> filters, it also often negates the performance benefits of the filters
> >> providing hints, which in practice makes them unusa

[DISCUSS] Marking Filters based on their hinting / row stateful behaviours.

2024-05-28 Thread Istvan Toth
I have recently opened HBASE-28622
 , which has turned out
to be another aspect of the problem discussed in HBASE-20565
 .

The problem is discussed in detail in HBASE-20565
 , but it boils down to
the API design decision that the filters returning SEEK_NEXT_USING_HINT
rely on filterCell() getting called.

On the other hand, some filters maintain an internal row state that sets
counters for calls of filterCell(), which interacts with the results of
previous filters in a filterList.

When filters return different results for filterRowkey(), then filters
returning  SEEK_NEXT_USING_HINT that have returned false must have
filterCell() called, otherwise the scan will degenerate into a full scan.

On the other hand, filters that maintain an internal row state must only be
called if all previous filters have INCLUDEed the Cell, otherwise their
internal state will be off. (This still has caveats, as described in
HBASE-20565 )

In my opinion, the current code from HBASE-20565
 strikes a bad balance
between features, as while it fixes some use cases for row stateful
filters, it also often negates the performance benefits of the filters
providing hints, which in practice makes them unusable in many filter list
combinations.

Without completely re-designing the filter system, I think that the best
solution would be adding a method to distinguish the filters that can
return hints from the rest of them. (This was also suggested in HBASE-20565
 , but it was not
implemented)

In theory, we have four combinations of hinting and row stateful filters,
but currently we have no filters that are both hinting and row stateful,
and I don't think that there is valid use case for those. The ones that are
neither hinting nor stateful could be handled as either, but treating them
as non-hinting seems faster.

Once we have that, we can improve the filterList behaviour a lot:
- in filterRowKey(), if any hinting filter returns false, then we could
return false
- in filterCell(), rather than returning on the first non-include result,
we could process the remaining hinting filters, while skipping the
non-hinting ones.

The code changes are minimal, we just need to add a new method like
isHinting() to the Filter class, and change the above two methods.

We could add this even in 2.5, by defaulting isHinting() to return false in
the Filter class, which would preserve the current API and behaviour for
existing custom filters.

I was looking at it from the AND filter perspective, but if needed, similar
changes could be made to the OR filter.

What do you think ?
Is this a good idea ?

Istvan


[jira] [Created] (HBASE-28622) FilterListWithAND can swallow SEEK_NEXT_USING_HINT

2024-05-27 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28622:
---

 Summary: FilterListWithAND can swallow SEEK_NEXT_USING_HINT
 Key: HBASE-28622
 URL: https://issues.apache.org/jira/browse/HBASE-28622
 Project: HBase
  Issue Type: Bug
  Components: Filters
Reporter: Istvan Toth
Assignee: Istvan Toth


org.apache.hadoop.hbase.filter.FilterListWithAND.filterRowKey(Cell) will return 
true if ANY of the filters returns true for Filter#filterRowKey().

However, the SEEK_NEXT_USING_HINT mechanism relies on filterRowKey() returning 
false, so that filterCell() can return SEEK_NEXT_USING_HINT.

If none of the filters matches, but one of them returns true for 
filterRowKey(), then the  filter(s) that returned to false, so that they can 
return SEEK_NEXT_USING_HINT in filterCell() never get a chance to return 
SEEK_NEXT_USING_HINT, and instead of seeking, FilterListWithAND will do very 
slow full scan.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28621) PrefixFilter should use SEEK_NEXT_USING_HINT

2024-05-27 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28621:
---

 Summary: PrefixFilter should use SEEK_NEXT_USING_HINT 
 Key: HBASE-28621
 URL: https://issues.apache.org/jira/browse/HBASE-28621
 Project: HBase
  Issue Type: Improvement
  Components: Filters
Reporter: Istvan Toth
Assignee: Istvan Toth


Looking at PrefixFilter, I have noticed that it doesn't use the 
SEEK_NEXT_USING_HINT mechanism.

AFAICT, we could safely set the the prefix as a next row hint, which could be a 
huge performance win.

Of course, ideally the user would set the scan startRow to the prefix, which 
avoids the problem, if the user doesn't, then we effectively do a full scan 
until the prefix is reached.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28613) Use streaming when marshalling protobuf REST output

2024-05-23 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28613:
---

 Summary: Use streaming when marshalling protobuf REST output
 Key: HBASE-28613
 URL: https://issues.apache.org/jira/browse/HBASE-28613
 Project: HBase
  Issue Type: Improvement
  Components: REST
Reporter: Istvan Toth
Assignee: Istvan Toth


We are currently marshalling protobuf into a byte array, and then send that to 
the client.
This is both slow and memory intensive.

Using streaming instead results in huge perf improvements. In my bechnmark, 
both the wall clock time was almost halved, while the REST server CPU usage was 
reduced by 40%.

wall clock: 120s ->65s
Total REST CPU: 300s -> 180s




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28501) Support non-SPNEGO authentication methods and implement session handling in REST java client library

2024-05-21 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-28501.
-
Resolution: Fixed

> Support non-SPNEGO authentication methods and implement session handling in 
> REST java client library
> 
>
> Key: HBASE-28501
> URL: https://issues.apache.org/jira/browse/HBASE-28501
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>    Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.4.18, 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.9
>
>
> The current java client only supports the SPENGO authentication method.
> This does not support the case when an application proxy like Apache Knox 
> performs AAA conversion from BASIC/DIGEST to kerberos authentication.
> Add support for BASIC username/password auth the client.
> Generally, the authentication code in the client looks quite backwards, it 
> seems that most of the kerberos / auth cookie code duplicates HttpClient 
> functionality. AFAICT setting HttpClient up (or letting user set it up) , and 
> letting it handle authentication by itself would be a better and more generic 
> solution.
> -Also add support for specifying a prefix for the URL path.-



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28553) SSLContext not used for Kerberos auth negotiation in rest client

2024-05-16 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-28553.
-
Resolution: Duplicate

Fix included in HBASE-28501

> SSLContext not used for Kerberos auth negotiation in rest client
> 
>
> Key: HBASE-28553
> URL: https://issues.apache.org/jira/browse/HBASE-28553
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>    Reporter: Istvan Toth
>    Assignee: Istvan Toth
>Priority: Major
>
> The included REST client now supports specifying a Trust store for SSL 
> connections.
> However, the configured SSL library is not used when the Kerberos negotation 
> is performed by the Hadoop library, which uses its own client.
> We need to set up the Hadoop auth process to use the same SSLContext.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28597) Support native Cell format for protobuf in REST server and client

2024-05-15 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28597:
---

 Summary: Support native Cell format for protobuf in REST server 
and client
 Key: HBASE-28597
 URL: https://issues.apache.org/jira/browse/HBASE-28597
 Project: HBase
  Issue Type: Wish
  Components: REST
Reporter: Istvan Toth


REST currently uses its own (outdated) CellSetModel format for transferring 
cells.

This is fine for XML and JSON, which are slow anyway and even slower handling 
byte arrays, and is expected to be used in cases where a simple  client code 
which does not depend on the hbase java libraries is more important than raw 
performance.

However, we perform the same marshalling and unmarshalling when we are using 
protobuf, which doesn't really add value, but eats up resources.

We could add a new encoding for Results which uses the native cell format in 
protobuf, by simply dumping the binary cell bytestreams into the REST response 
body.

This should save a lot of resources on the server side, and would be either 
faster, or the same speed on the client.

As an additional advantage, the resulting Cells would be of native HBase Cell 
type instead of the REST Cell type.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Dropping Java 8 support in HBase 3

2024-05-07 Thread Istvan Toth
I'd expect the automated backporting process to only work for fairly
trivial patches which do not use protobuf, etc.
More involved patches would need manual work anyway.

If we want to make sure that everything compiles with JDK8, it's easier to
just compile the master branch with JDK8 (along with 11/17),
and fail the CI check if it doesn't.

We need to find a balance between using the new Java features and keeping
the workload manageable.
We could keep compiling master with JDK8 for a year or two, and when
activity on the 2.x branches tapers off, we could remove that restriction.


On Tue, May 7, 2024 at 3:56 PM Andrew Purtell 
wrote:

> I also like the suggestion to have CI help us here too.
>
> > On May 7, 2024, at 9:42 AM, Bryan Beaudreault 
> wrote:
> >
> > I'm nervous about creating more big long-term divergences between the
> > branches. Already I sometimes get caught up on HBaseTestingUtil vs
> > HBaseTestingUtility. And we all know the burden of maintaining the old
> > HTable impl.
> >
> > I'm not sure if this is a useful suggestion since it would require
> someone
> > to do a good deal of work, but I wonder if we could automate backport
> > testing a bit. Our yetus checks already check the patch, maybe it could
> > apply the patch to branch-2. This would increase the cost of master
> branch
> > PRs but maybe speed us up overall.
> >
> >> On Tue, May 7, 2024 at 9:21 AM 张铎(Duo Zhang) 
> wrote:
> >>
> >> The problem is that, if we only compile and run tests on JDK11+, the
> >> contributors may implicitly use some JDK11+ only features and
> >> introduce difference when backporting to branch-2.x.
> >>
> >> Maybe a possible policy is that, once a patch should go into
> >> branch-2.x too, before mering the master PR, we should make sure the
> >> contributor open a PR for branch-2.x too, so we can catch the
> >> differences between the 2 PRs, and whether to align them.
> >>
> >> WDYT?
> >>
> >> Thanks.
> >>
> >> Andrew Purtell  于2024年5月7日周二 20:20写道:
> >>>
> >>> I don’t expect 2.x to wind down for up to several more years. We will
> be
> >>> still using it in production at my employer for a long time and I would
> >>> continue my role as RM for 2.x as needed. HBase 3 is great but not GA
> yet
> >>> and then some users will want to wait one to a couple years before
> >> adopting
> >>> the new major version, especially if migration is not seamless. (We
> even
> >>> faced breaking changes in a minor upgrade from 2.4 to 2.5 that brought
> >> down
> >>> a cluster during a rolling upgrade, so there should be no expectation
> of
> >> a
> >>> seamless upgrade.) My plan is to continue releasing 2.x until, like
> with
> >>> 1.x, the commits to branch-2 essentially stop, or until the PMC stops
> >>> allowing release of the candidates.
> >>>
> >>> Perhaps we do not need to do a total ban on use of 11 features. We
> should
> >>> allow a case by case discussion. We can minimize their scope and even
> >>> potentially offer multiversion support like we do with Unsafe access
> >>> utility classes in hbase-thirdparty. There are no planned uses of new
> 11+
> >>> APIs and features now anyhow.
> >>>
> >>>
> >>> On Tue, May 7, 2024 at 7:40 AM 张铎(Duo Zhang) 
> >> wrote:
> >>>
> >>>> For me I think Istvan's plan is also acceptable.
> >>>>
> >>>> So in conclusion, we should
> >>>>
> >>>> 1. Jump to JDK11/JDK17(we could start a new thread to discuss this,
> >>>> maybe also on the user mailing list)
> >>>> 2. Claim and also make sure 3.x does not work with JDK8
> >>>> 3. Introduce a policy to only allow JDK8 features on master and
> >>>> branch-3.x for a while(maybe still keep the release version as 8?)
> >>>>
> >>>> Any other suggestions?
> >>>>
> >>>> Thanks.
> >>>>
> >>>> Istvan Toth  于2024年4月30日周二 12:45写道:
> >>>>>
> >>>>> Spring is a good argument for JDK17.
> >>>>>
> >>>>> Duo's suggestion is a great step forward, firmly stating that JDK8
> >> is not
> >>>>> officially supported solves most of our expected future CVE problems.
> >>>>>
> >>>>> However, I think that ripping off the bandaid, and m

[jira] [Resolved] (HBASE-28556) Reduce memory copying in Rest server when serializing CellModel to Protobuf

2024-05-06 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-28556.
-
Fix Version/s: 2.4.18
   3.0.0
   2.7.0
   2.6.1
   2.5.9
   Resolution: Fixed

Committed to all active branches.
Thanks for the review [~zhangduo].

> Reduce memory copying in Rest server when serializing CellModel to Protobuf
> ---
>
> Key: HBASE-28556
> URL: https://issues.apache.org/jira/browse/HBASE-28556
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>    Reporter: Istvan Toth
>    Assignee: Istvan Toth
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 2.4.18, 3.0.0, 2.7.0, 2.6.1, 2.5.9
>
>
> The REST server does a lot of unneccessary coping, which could be avoided at 
> least for protobuf encoding.
> - -It uses ByteStringer to handle ByteBuffer backed Cells. However, it uses 
> the client API, so it should never encounter ByteBuffer backed cells.-
> - It clones everything from the cells (sometimes multiple times) before 
> serializing to protbuf.
> We could mimic the structure in Cell, with array, offset and length for each 
> field, in CellModel and use the appropriate protobuf setters to avoid the 
> extra copies.
> There may or may not be a way to do the same for JSON and XML via jax-rs, I 
> don't know the frameworks well enough to tell, but if not, we could just do 
> the copying in the getters for them, which would not make things worse.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28561) Add separate fields for column family and qualifier in REST message format

2024-05-01 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28561:
---

 Summary: Add separate fields for column family and qualifier in 
REST message format
 Key: HBASE-28561
 URL: https://issues.apache.org/jira/browse/HBASE-28561
 Project: HBase
  Issue Type: Improvement
  Components: REST
Reporter: Istvan Toth


The current format uses the archaic column field, which requires extra 
processing and copying at both the server and client side.

We need to:
- Add a version field to the requests, to be enabled by clients that support 
the new format
- Add the new fields to the JSON, XML and protobuf formats, and logic to use 
them.

This should be doable in a backwards-compatible manner, with the server falling 
back to the old format if it receives an unversioned request.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28523) Use a single get call in REST multiget endpoint

2024-04-30 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-28523.
-
Resolution: Fixed

Committed to all active branches.

> Use a single get call in REST multiget endpoint
> ---
>
> Key: HBASE-28523
> URL: https://issues.apache.org/jira/browse/HBASE-28523
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>    Reporter: Istvan Toth
>    Assignee: Istvan Toth
>Priority: Major
>  Labels: beginner, pull-request-available
> Fix For: 2.4.18, 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.9
>
>
> The REST multiget endpoint currently issues a separate HBase GET operation 
> for each key.
> Use the method that accepts a list of keys instead.
> That should be faster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Dropping Java 8 support in HBase 3

2024-04-29 Thread Istvan Toth
Spring is a good argument for JDK17.

Duo's suggestion is a great step forward, firmly stating that JDK8 is not
officially supported solves most of our expected future CVE problems.

However, I think that ripping off the bandaid, and making sure that HBase 3
does not work with Java 8 would be better.
It's easier to accept such a change in a major version than in a minor
version.

IMO users that are so conservative that they are still using Java 8 are
unlikely to be first movers to a new major release anyway.

I think that the following upgrade path would optimal:

- User stays on (supported) Hbase 2.x until ready to upgrade Java
- User upgrades to Java 11/17 with the same HBase
- User upgrades to Hbase 3.x

As noted, we will need to support 2.x for some time anyway (just like 1.x
was supported for a long time).

As for the backporting issues:
We could make it a policy to avoid using Java 11+ features in Hbase code
until 2.x supports winds down.
This has worked quite well for Phoenix with Java 7 / Java 8.









On Tue, Apr 30, 2024 at 3:59 AM 张铎(Duo Zhang)  wrote:

> AFAIK spring 6 and spring-boot 3 have jumped to java17 directly, so if we
> want to upgrade, I also suggest that we jump to java 17 directly.
>
> While upgrading to java 17 can reduce our compatibility work on branch-3+,
> but consider the widely usage for java 8, I think we still need to support
> branch-2 for several years, then this will increase the compatibility work
> as the code between branch-3+ and branch-2.x will be more and more
> different.
>
> So for me, a workable solution is
>
> 1. We first claim that branch-3+ will move minimum java support to 11 or
> 17.
> 2. Start to move the compilation to java 11 or 17, but still keep release
> version 8, and still keep the pre commit pipeline to run java 8, 11, 17, to
> minimum our compatibility work before we have the first 3.0.0 release.
> 3. Cut branch-3.0 and release 3.0.0, so we have a 3.0.0 release, actually
> which can still run on java 8, so it will be easier for our users to
> upgrade to 3.x and reduce our pressure on maintaining branch-2, especially
> do not need to back port new features there.
> 4. Start to move the release version to 11 or 17 on branch-3+, and prepare
> for 3.1.0 release, which will be the real 11 or 17 only release.
>
> Thanks.
>
> Bryan Beaudreault 于2024年4月30日 周二02:54写道:
>
> > I am a huge +1 for dropping java8.
> >
> > One reason I would suggest going to 17 is that it seems so hard to change
> > these things given our long development cycle on major releases. There
> are
> > some nice language features in 17, but more importantly is that the
> initial
> > release of java11 was released 6 years ago and java17 released 3 years.
> > Java21 is already released as well. So I could see java17 being widely
> > available enough that we could jump "in the middle" rather than to the
> > oldest LTS.
> >
> > I will say that we're already running java 21 on all of our hbase/hadoop
> in
> > prod (70 clusters, 7k regionservers). I know not every organization can
> be
> > that aggressive, and I wouldn't suggest jumping to 21 in the codebase.
> Just
> > pointing it out in terms of basic support already existing and being
> > stable.
> >
> > On Mon, Apr 29, 2024 at 2:33 PM Andrew Purtell  >
> > wrote:
> >
> > > I also agree that mitigation of security problems in dependencies will
> be
> > > increasingly difficult, as we cannot expect our dependencies to
> continue
> > to
> > > support Java 8. They might, but as time goes on it is less likely.
> > >
> > > A minimum of Java 11 makes a lot of sense. This is where the center of
> > > gravity of the Java ecosystem is, probably.
> > >
> > > A minimum of 17 is aggressive and I don’t see the point unless there
> is a
> > > feature in 17 that we would like to base an improvement on.
> > >
> > > > On Apr 29, 2024, at 1:23 PM, chrajeshbab...@gmail.com wrote:
> > > >
> > > > Hi!
> > > >
> > > > With 3.0 on the horizon, we could look into bumping the minimum
> > required
> > > > Java version for HBase.
> > > >
> > > > The last discussion I could find was four years ago, when dropping
> 8.0
> > > > support was rejected.
> > > >
> > > > https://lists.apache.org/thread/ph8xry0x37cvjj89fp2jk1k48yb7gs46
> > > >
> > > > Now it's four years later, and the end of OpenJDK support for Java 8
> > and
> > > 11
> > > > are much closer.
> > > > (Oracle public support is so short that I consider that irrelevant)
> > > >
> > > > Some critical dependencies (like Jetty) have ended even regular
> > security
> > > > support for Java 8.
> > > >
> > > > By supporting Java 8 we are alse limiting ourselves to using an
> already
> > > 10
> > > > year old Java release, ignoring any developments in the language.
> > > >
> > > > My take is that with the current dogmatic emphasis on CVE mitigation
> > the
> > > > benefits of bumping the required JDK version outweigh the benefits
> even
> > > for
> > > > the legacy install base, especially it's getting harder and 

[jira] [Created] (HBASE-28556) Reduce memory copying in Rest server when converting CellModel to Protobuf

2024-04-29 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28556:
---

 Summary: Reduce memory copying in Rest server when converting 
CellModel to Protobuf
 Key: HBASE-28556
 URL: https://issues.apache.org/jira/browse/HBASE-28556
 Project: HBase
  Issue Type: Improvement
  Components: REST
Reporter: Istvan Toth


The REST server does a lot of unneccessary coping, which could be avoided at 
least for protobuf encoding.

- It uses ByteStringer to handle ByteBuffer backed Cells. However, it uses the 
client API, so it sjpuld never encounter ByteBuffer backed cells.
- It clones everything from the cells (sometimes multiple times) before 
serializing to protbuf.

We could mimic the structure in Cell, with array, offset and length for each 
field, and use the appropriate protobuf setters to avoid the extra copies.

There may or may not be a way to do the same for JSON and XML via jax-rs, I 
don't know the frameworks well enough to tell, but if not, we could just do the 
copying in the getters for them.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28553) SSLContext not used for Kerberos auth negotiation in rest client

2024-04-25 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28553:
---

 Summary: SSLContext not used for Kerberos auth negotiation in rest 
client
 Key: HBASE-28553
 URL: https://issues.apache.org/jira/browse/HBASE-28553
 Project: HBase
  Issue Type: Bug
  Components: REST
Reporter: Istvan Toth
Assignee: Istvan Toth


The included REST client now supports specifying a Trust store for SSL 
connections.
However, the configured SSL library is not used when the Kerberos negotation is 
performed by the Hadoop library, which uses its own client.

We need to set up the Hadoop auth process to use the same SSLContext.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] The second release candidate for 2.6.0 (RC3) is available

2024-04-24 Thread Istvan Toth
I can merge https://github.com/apache/hbase/pull/5852 as soon as I get a
review on it for the above issue.

best regards
Istvan


On Thu, Apr 25, 2024 at 4:14 AM 张铎(Duo Zhang)  wrote:

> HBASE-25818 introduced a breaking change, it removed the SCAN_FILTER
> field, and introduced two new fields in
> org.apache.hadoop.hbase.rest.Constants.
>
> But unfortunately, org.apache.hadoop.hbase.rest.Constants is IA.Public
> so we can not remove its field without a deprecation cycle...
>
> Bryan Beaudreault  于2024年4月25日周四 09:21写道:
> >
> > Please vote on this Apache hbase release candidate,
> > hbase-2.6.0RC3
> >
> > The VOTE will remain open for at least 72 hours.
> >
> > [ ] +1 Release this package as Apache hbase 2.6.0
> > [ ] -1 Do not release this package because ...
> >
> > The tag to be voted on is 2.6.0RC3:
> >
> >   https://github.com/apache/hbase/tree/2.6.0RC3
> >
> > This tag currently points to git reference
> >
> >   df3343989d02966752ce7562546619f86a36169a
> >
> > The release files, including signatures, digests, as well as CHANGES.md
> > and RELEASENOTES.md included in this RC can be found at:
> >
> >   https://dist.apache.org/repos/dist/dev/hbase/2.6.0RC3/
> >
> > Maven artifacts are available in a staging repository at:
> >
> >
> https://repository.apache.org/content/repositories/orgapachehbase-1540/
> >
> > Maven artifacts for hadoop3 are available in a staging repository at:
> >
> >
> https://repository.apache.org/content/repositories/orgapachehbase-1541/
> >
> > Artifacts were signed with the 0x74EFF462 key which can be found in:
> >
> >   https://downloads.apache.org/hbase/KEYS
> >
> > To learn more about Apache hbase, please see
> >
> >   http://hbase.apache.org/
> >
> > Thanks,
> > Your HBase Release Manager
>


-- 
*István Tóth* | Sr. Staff Software Engineer
*Email*: st...@cloudera.com
cloudera.com 
[image: Cloudera] 
[image: Cloudera on Twitter]  [image:
Cloudera on Facebook]  [image: Cloudera
on LinkedIn] 
--
--


[jira] [Created] (HBASE-28550) Provide working benchmark tool for REST server

2024-04-24 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28550:
---

 Summary: Provide working benchmark tool for REST server
 Key: HBASE-28550
 URL: https://issues.apache.org/jira/browse/HBASE-28550
 Project: HBase
  Issue Type: Umbrella
  Components: REST
Reporter: Istvan Toth


This is an umbrella ticket for the individual changes.

The goal is to be able to performance test the rest server performance either 
directly or via Knox or other proxies / load balancers, and compare this with 
the results when going via the native client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28544) org.apache.hadoop.hbase.rest.PerformanceEvaluation does not evaluate REST performance

2024-04-22 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28544:
---

 Summary: org.apache.hadoop.hbase.rest.PerformanceEvaluation does 
not evaluate REST performance
 Key: HBASE-28544
 URL: https://issues.apache.org/jira/browse/HBASE-28544
 Project: HBase
  Issue Type: Bug
  Components: REST
Reporter: Istvan Toth


org.apache.hadoop.hbase.rest.PerformanceEvaluation only uses the REST interface 
for Admin tasks like creating tables.

All data access is done via the native RPC client, which makes the whole tool a 
big red herring.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28543) org.apache.hadoop.hbase.rest.PerformanceEvaluation does not read hbase-site.xml

2024-04-22 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28543:
---

 Summary: org.apache.hadoop.hbase.rest.PerformanceEvaluation does 
not read hbase-site.xml
 Key: HBASE-28543
 URL: https://issues.apache.org/jira/browse/HBASE-28543
 Project: HBase
  Issue Type: Bug
  Components: REST
Reporter: Istvan Toth
Assignee: Istvan Toth


I am trying to run org.apache.hadoop.hbase.rest.PerformanceEvaluation.
It cannot connect to the ZK quorum specified in hbase-site.xml.

It implements the Configurable interface incorrectly.
Fixing the Configurable implementation results in connecing to ZK properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28540) Cache Results in org.apache.hadoop.hbase.rest.client.RemoteHTable.Scanner

2024-04-22 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28540:
---

 Summary: Cache Results in 
org.apache.hadoop.hbase.rest.client.RemoteHTable.Scanner
 Key: HBASE-28540
 URL: https://issues.apache.org/jira/browse/HBASE-28540
 Project: HBase
  Issue Type: Improvement
  Components: REST
Reporter: Istvan Toth
Assignee: Istvan Toth


The implementation of org.apache.hadoop.hbase.rest.client.RemoteHTable.Scanner
is very inefficient, as the standard next() methods makes separate a http 
request for each row.

Performance can be improved by not specifying the row count in the REST call 
and caching the returned Results.

Chunk size can still be influenced by scan.setBatch();




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28500) Rest Java client library assumes stateless servers

2024-04-17 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-28500.
-
Resolution: Fixed

> Rest Java client library assumes stateless servers
> --
>
> Key: HBASE-28500
> URL: https://issues.apache.org/jira/browse/HBASE-28500
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>    Reporter: Istvan Toth
>    Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 2.4.18, 4.0.0-alpha-1, 2.7.0, 3.0.0-beta-2, 2.5.9
>
>
> The Rest Java client library accepts a list of rest servers, and does random 
> load balancing between them for each request.
> This does not work for scans, which do have state on the rest server instance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (HBASE-28500) Rest Java client library assumes stateless servers

2024-04-16 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth reopened HBASE-28500:
-

The spotbugs warning makes daily bugs go red.
Gonna push an addendum for it.

> Rest Java client library assumes stateless servers
> --
>
> Key: HBASE-28500
> URL: https://issues.apache.org/jira/browse/HBASE-28500
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>    Reporter: Istvan Toth
>    Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 2.4.18, 4.0.0-alpha-1, 2.7.0, 3.0.0-beta-2, 2.5.9
>
>
> The Rest Java client library accepts a list of rest servers, and does random 
> load balancing between them for each request.
> This does not work for scans, which do have state on the rest server instance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28526) hbase-rest jar does not work with hbase-shaded-client with protobuf encoding

2024-04-16 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28526:
---

 Summary: hbase-rest jar does not work with hbase-shaded-client 
with protobuf encoding
 Key: HBASE-28526
 URL: https://issues.apache.org/jira/browse/HBASE-28526
 Project: HBase
  Issue Type: Bug
  Components: REST
Reporter: Istvan Toth


When trying to decode a protobof encoded CellSet, I get 
{noformat}
Exception in thread "main" java.lang.NoSuchMethodError: 
org.apache.hadoop.hbase.protobuf.ProtobufUtil.mergeFrom(Lcom/google/protobuf/Message$Builder;[B)V
at 
org.apache.hadoop.hbase.rest.model.CellSetModel.getObjectFromMessage(CellSetModel.java:129)
at RestClientExample.getMulti(RestClientExample.java:191)
at RestClientExample.start(RestClientExample.java:138)
at RestClientExample.main(RestClientExample.java:124)

{noformat}

Seems to be caused by relocating protobuf 2.5 in hbase-shaded-client.

It works fine with the unrelcoated client i.e. when using the 
{noformat}
export CLASSPATH=`hbase --internal-classpath classpath`:
{noformat}
command to set up the classpath for the client.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28525) Document all REST endpoints

2024-04-15 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28525:
---

 Summary: Document all REST endpoints
 Key: HBASE-28525
 URL: https://issues.apache.org/jira/browse/HBASE-28525
 Project: HBase
  Issue Type: Improvement
  Components: documentation, REST
Reporter: Istvan Toth


The new features added in HBASE-28518 do not have documentation.
While reviewing, I also found other undocumented interfaces, like TableScan, 
and options like globbed gets.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28518) Allow specifying a filter for the REST multiget endpoint

2024-04-15 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-28518.
-
Fix Version/s: 2.6.0
   2.4.18
   4.0.0-alpha-1
   2.7.0
   3.0.0-beta-2
   2.5.9
   Resolution: Fixed

Committed to all active branches.
Thanks for the review [~ankit].

> Allow specifying a filter for the REST multiget endpoint
> 
>
> Key: HBASE-28518
> URL: https://issues.apache.org/jira/browse/HBASE-28518
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>    Reporter: Istvan Toth
>    Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 2.4.18, 4.0.0-alpha-1, 2.7.0, 3.0.0-beta-2, 2.5.9
>
>
> The native HBase API allows specifying Filters for get operations.
> The REST interface does not currently expose this functionality.
> Add a parameter to the multiget enpoint to allow specifying filters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28524) Backport HBASE-28174 to branch-2.4 and branch-2.5

2024-04-15 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-28524.
-
Fix Version/s: 2.4.18
   2.5.9
 Release Note: Done.
   Resolution: Fixed

> Backport HBASE-28174 to branch-2.4 and branch-2.5
> -
>
> Key: HBASE-28524
> URL: https://issues.apache.org/jira/browse/HBASE-28524
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.4.17, 2.5.8
>    Reporter: Istvan Toth
>    Assignee: Istvan Toth
>Priority: Major
> Fix For: 2.4.18, 2.5.9
>
>
> The changes are backwards compatible and the REST interface is super limited 
> without them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28524) Backport HBASE-28174 to branch-2.4 and branch-2.5

2024-04-15 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28524:
---

 Summary: Backport HBASE-28174 to branch-2.4 and branch-2.5
 Key: HBASE-28524
 URL: https://issues.apache.org/jira/browse/HBASE-28524
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.5.8, 2.4.17
Reporter: Istvan Toth
Assignee: Istvan Toth


The changes are backwards compatible and the REST interface is super limited 
without them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28523) Use a single get call in REST multiget endpoint

2024-04-14 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28523:
---

 Summary: Use a single get call in REST multiget endpoint
 Key: HBASE-28523
 URL: https://issues.apache.org/jira/browse/HBASE-28523
 Project: HBase
  Issue Type: Improvement
  Components: REST
Reporter: Istvan Toth


The REST multiget endpoint currently issues a separate HBase GET operation for 
each key.

Use the method that accepts a list of keys instead.
That should be faster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28518) Allow specifying a filter for the REST multiget endpoint

2024-04-12 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28518:
---

 Summary: Allow specifying a filter for the REST multiget endpoint
 Key: HBASE-28518
 URL: https://issues.apache.org/jira/browse/HBASE-28518
 Project: HBase
  Issue Type: Improvement
  Components: REST
Reporter: Istvan Toth
Assignee: Istvan Toth


The native HBase API allows specifying Filters for get operations.
The REST interface does not currently expose this functionality.

Add a parameter to the multiget enpoint to allow specifying filters.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28504) Implement eviction logic for scanners in Rest APIs to prevent scanner leakage

2024-04-08 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28504:
---

 Summary: Implement eviction logic for scanners in Rest APIs to 
prevent scanner leakage
 Key: HBASE-28504
 URL: https://issues.apache.org/jira/browse/HBASE-28504
 Project: HBase
  Issue Type: Improvement
  Components: REST
Reporter: Istvan Toth
Assignee: Istvan Toth


The REST API maintains a map of _ScannerInstanceResource_s (which are 
ultimately tracking Scanner objects).

The user is supposed to delete these after using them, but if for any reason it 
does not, then these objects are maintained indefinitely.

Implement logic to evict old scanners automatically.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28501) Support non-SPNEGO authentication methods in REST java client library

2024-04-08 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28501:
---

 Summary: Support non-SPNEGO authentication methods in REST java 
client library
 Key: HBASE-28501
 URL: https://issues.apache.org/jira/browse/HBASE-28501
 Project: HBase
  Issue Type: Improvement
  Components: REST
Reporter: Istvan Toth


The current java client only supports the SPENGO authentication method.

This does not support the case when an application proxy like Apache Knox 
performs AAA conversion from BASIC/DIGEST to kerberos authentication.

Add support for BASIC username/password auth the client.

Generally, the authentication code in the client looks quite backwards, it 
seems that most of the kerberos / auth cookie code duplicates HttpClient 
functionality. AFAICT setting HttpClient up (or letting user set it up) , and 
letting it handle authentication by itself would be a better and more generic 
solution.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28500) Rest Java client library assumes stateless servers

2024-04-08 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28500:
---

 Summary: Rest Java client library assumes stateless servers
 Key: HBASE-28500
 URL: https://issues.apache.org/jira/browse/HBASE-28500
 Project: HBase
  Issue Type: Bug
  Components: REST
Reporter: Istvan Toth


The Rest Java client library accepts a list of rest servers, and does random 
load balancing between them for each request.
This does not work for scans, which do have state on the rest server instance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (HBASE-28489) Implement HTTP session support in REST server and client

2024-04-08 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth reopened HBASE-28489:
-

My assumption that the REST interface is stateless was incorrect.
Scan objects are maintained on the REST server, so sticky sessions are a must 
for any kind of HA/LB solution.

> Implement HTTP session support in REST server and client
> 
>
> Key: HBASE-28489
> URL: https://issues.apache.org/jira/browse/HBASE-28489
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>    Reporter: Istvan Toth
>    Assignee: Istvan Toth
>Priority: Major
>
> The REST server (and java client) currently does not implement sessions.
> While is not  necessary for the REST API to work, implementing sessions would 
> be a big improvement in throughput and resource usage.
> * It would make load balancing with sticky sessions possible (though it's not 
> really needed for REST)
> * It would save the overhead of performing authentication for each request
>  The gains are particularly big when using SPENGO:
> * The full SPENGO handshake can be skipped for subsequent requests
> * When Knox performs SPENGO authentication for the proxied client, it access 
> the identity store each time. When the session is set, this step is only 
> perfomed on the initial request.
> The same change has resulted in spectacular performance improvements for 
> Phoenix Query Server when implemented in Avatica.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28499) Use the latest Httpclient/Httpcore 5.x in HBase

2024-04-08 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28499:
---

 Summary: Use the latest Httpclient/Httpcore 5.x  in HBase
 Key: HBASE-28499
 URL: https://issues.apache.org/jira/browse/HBASE-28499
 Project: HBase
  Issue Type: Improvement
  Components: REST
Reporter: Istvan Toth


HttpClient 4.x is not actively maintained.

We use Httpclient directly in the REST client code, and in the tests for 
several modules.

Httpclient 4.5 is a transitive dependency at least from Hadoop and Thrift, but 
httpclient 5.x uses a separate java package, so 4.5 and 5.x  should be able to 
co-exist fine.

As of now, Httpclient 4.5 is in maintenance mode:
https://hc.apache.org/status.html




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28489) Implement HTTP session support in REST server and client

2024-04-07 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-28489.
-
Resolution: Invalid

Nothing to do, all relevant cases work already.

> Implement HTTP session support in REST server and client
> 
>
> Key: HBASE-28489
> URL: https://issues.apache.org/jira/browse/HBASE-28489
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>    Reporter: Istvan Toth
>    Assignee: Istvan Toth
>Priority: Major
>
> The REST server (and java client) currently does not implement sessions.
> While is not  necessary for the REST API to work, implementing sessions would 
> be a big improvement in throughput and resource usage.
> * It would make load balancing with sticky sessions possible (though it's not 
> really needed for REST)
> * It would save the overhead of performing authentication for each request
>  The gains are particularly big when using SPENGO:
> * The full SPENGO handshake can be skipped for subsequent requests
> * When Knox performs SPENGO authentication for the proxied client, it access 
> the identity store each time. When the session is set, this step is only 
> perfomed on the initial request.
> The same change has resulted in spectacular performance improvements for 
> Phoenix Query Server when implemented in Avatica.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28489) Implement HTTP session support in REST server and client

2024-04-05 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28489:
---

 Summary: Implement HTTP session support in REST server and client
 Key: HBASE-28489
 URL: https://issues.apache.org/jira/browse/HBASE-28489
 Project: HBase
  Issue Type: Improvement
  Components: REST
Reporter: Istvan Toth
Assignee: Istvan Toth


The REST server (and java client) currently does not implement sessions.

While is not seem to necessary for the REST API to work, implementing sessions 
would be a big improvement in throughput and resource usage.

* It would make balancing with sticky sessions possible
* It would save the overhead of performing authentication for each call

 The gains are particularly big when using SPENGO:

* The full SPENGO handshake can be skipped for subsequent requests
* When Knox performs SPENGO authentication for the proxied client, it access 
the identity store each time. When the session is set, this step is only 
perromed on the initial request.

The same change has resulted in spectacular performance improvements for 
Phoenix Query Server when implemented in Avatica.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ANNOUNCE] New HBase committer Istvan Toth

2024-04-03 Thread Istvan Toth
Thank you!

I'm looking forward to working with you on HBase.

Istvan

On Wed, Apr 3, 2024 at 7:00 AM Nihal Jain  wrote:

> Congratulations Istvan. Welcome !
>
> On Wed, 3 Apr 2024, 01:53 Rushabh Shah,  .invalid>
> wrote:
>
> > Congratulations Istvan, welcome !!
> >
> >
> > On Tue, Apr 2, 2024 at 4:23 AM Duo Zhang  wrote:
> >
> > > On behalf of the Apache HBase PMC, I am pleased to announce that
> > > Istvan Toth(stoty)
> > > has accepted the PMC's invitation to become a committer on the
> > > project. We appreciate all
> > > of Istvan Toth's generous contributions thus far and look forward to
> > > his continued involvement.
> > >
> > > Congratulations and welcome, Istvan Toth!
> > >
> > > 我很高兴代表 Apache HBase PMC 宣布 Istvan Toth 已接受我们的邀请,成
> > > 为 Apache HBase 项目的 Committer。感谢 Istvan Toth 一直以来为 HBase 项目
> > > 做出的贡献,并期待他在未来继续承担更多的责任。
> > >
> > > 欢迎 Istvan Toth!
> > >
> >
>


-- 
*István Tóth* | Sr. Staff Software Engineer
*Email*: st...@cloudera.com
cloudera.com <https://www.cloudera.com>
[image: Cloudera] <https://www.cloudera.com/>
[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
--
--


Re: Aiming for 2.6.0RC0 tomorrow

2024-03-21 Thread Istvan Toth
The *hbase classpath* and *hbase mapredcp* command outputs do include the
respective  *hbase-shaded-client-byo-hadoop* and *hbase-shaded-mapreduce*
 jars.

At least the 'hbase mapredcp' jars are used by both Spark and Hive
integration, and expected to be available on the node filesystem.
We also plan to switch the Phoenix connectors to that.

Having those two jars in a separate assembly would require further
configuration when installing HBase to tell it
where to find them, so that the classpath commands can include them.

If something needs to be removed, I propose the full fat (
*hbase-shaded-client*) shaded client JAR.
That is never returned by the hbase command AFAIK, and is also the largest
in size.
(I plan to remove that one from the upcoming Hadoop-less assembly as well)

Istvan

On Fri, Mar 22, 2024 at 4:55 AM 张铎(Duo Zhang)  wrote:

> Tested locally, after removing hbase-example from tarball, the hadoop3
> tarball is about 351MB.
>
> So you could try to include this commit to publish again, to see if this
> helps.
>
> Thanks.
>
> 张铎(Duo Zhang)  于2024年3月22日周五 09:18写道:
> >
> > If we exclude hbase-example from the binaries, will it be smaller enough
> to fit?
> >
> > We already commit the changes to master I believe. Let me see if we
> > can cherry-pick them and commit to branch-2.6 as well.
> >
> > Thanks.
> >
> > Bryan Beaudreault  于2024年3月22日周五 07:35写道:
> > >
> > > Thanks, I filed
> > > https://issues.apache.org/jira/browse/INFRA-25634
> > >
> > > On Thu, Mar 21, 2024 at 5:46 PM Andrew Purtell 
> wrote:
> > >
> > > > The hadoop3 bin tarball for 2.5.8 is 352.8MB. Perhaps we have just
> barely
> > > > and recently crossed a threshold. File an INFRA JIRA and ask about
> it.
> > > > Perhaps some limit can be increased, or maybe they will ask us to
> live
> > > > within it.
> > > >
> > > > Related, looking at the 2.5.8 hadoop3 bin tarball, the majority of
> the bulk
> > > > is ./lib/shaded-clients/ . The shaded clients are certainly useful
> but
> > > > probably are not the most popular options when taking a dependency on
> > > > HBase. Perhaps we can package these separately. We could exclude
> them from
> > > > the convenience tarballs as they will still be available from the
> Apache
> > > > Maven repository.
> > > >
> > > > On Thu, Mar 21, 2024 at 2:33 PM Bryan Beaudreault <
> bbeaudrea...@apache.org
> > > > >
> > > > wrote:
> > > >
> > > > > I got most of the way through, but failed during publish-dist:
> > > > >
> > > > > Transmitting file data ..svn: E175002: Commit failed (details
> follow):
> > > > > svn: E175002: PUT request on
> > > > >
> > > > >
> > > >
> '/repos/dist/!svn/txr/68050-1le9/dev/hbase/2.6.0RC0/hbase-2.6.0-hadoop3-bin.tar.gz'
> > > > > failed
> > > > >
> > > > > Running manually, it looks to be a Request Entity Too Large. The
> file in
> > > > > question is 356MB. Anyone have any experience with this?
> > > > >
> > > > > On Thu, Mar 21, 2024 at 2:19 AM 张铎(Duo Zhang) <
> palomino...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > HBASE-28444 has been resolved.
> > > > > >
> > > > > > Please go ahead to cut 2.6.0RC0, really a long journey :)
> > > > > >
> > > > > > 张铎(Duo Zhang)  于2024年3月20日周三 14:29写道:
> > > > > > >
> > > > > > > There is a security issue for zookeeper, but simply upgrading
> > > > > > > zookeeper will break a test.
> > > > > > >
> > > > > > > Pelase see HBASE-28444 for more details.
> > > > > > >
> > > > > > > I think we should get this in before cutting the RC.
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > > Bryan Beaudreault  于2024年3月19日周二
> 23:51写道:
> > > > > > > >
> > > > > > > > I've finished auditing fixVersions and run ITBLL for an
> extended
> > > > > > period of
> > > > > > > > time in a real cluster. I'm not aware of any open blockers.
> So
> > > > > > tomorrow I'm
> > > > > > > > going to start generating the RC0.
> > > > > > > >
> > > > > > > > Please let me know if you have any concerns or reason for
> delay.
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Andrew
> > > >
> > > > Unrest, ignorance distilled, nihilistic imbeciles -
> > > > It's what we’ve earned
> > > > Welcome, apocalypse, what’s taken you so long?
> > > > Bring us the fitting end that we’ve been counting on
> > > >- A23, Welcome, Apocalypse
> > > >
>


-- 
*István Tóth* | Sr. Staff Software Engineer
*Email*: st...@cloudera.com
cloudera.com 
[image: Cloudera] 
[image: Cloudera on Twitter]  [image:
Cloudera on Facebook]  [image: Cloudera
on LinkedIn] 
--
--


[jira] [Created] (HBASE-28431) Cleaning up binary assemblies and diagnostic tools

2024-03-08 Thread Istvan Toth (Jira)
Istvan Toth created HBASE-28431:
---

 Summary: Cleaning up binary assemblies and diagnostic tools
 Key: HBASE-28431
 URL: https://issues.apache.org/jira/browse/HBASE-28431
 Project: HBase
  Issue Type: Umbrella
Affects Versions: 3.0.0-beta-1
Reporter: Istvan Toth


As discussed on the mailing list, the current binary assembly has several 
problems.

The discussed improvements:
* Provide assembly versions without transitive Hadoop dependencies
* Remove test JARs and their dependencies from the assemblies
* Move useful diagnostic tools into the runtime jars



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Removing tests and/or Hadoop from the binary assemblies

2024-03-08 Thread Istvan Toth
Thank you Nihal.
I'm not very familiar with the tools in the test code, so you can probably
plan that work better.
I just have some generic steps in mind:
* Identify all the tools / scripts in the test jars
* Identify and analyze their dependencies (compared to the current runtime
deps)
* Decide which ones to move to the runtime JARs.
* Move them to the runtime code (or perhaps a separate module)

I have created https://issues.apache.org/jira/browse/HBASE-28431 as an
umbrella ticket to organize the sub-tasks.

Istvan

On Fri, Mar 8, 2024 at 7:06 PM Nihal Jain  wrote:

> Sure I will be able to take up. Please create tasks with necessary details
> or let me know if you want me to create.
>
> On Fri, 8 Mar 2024, 12:45 Istvan Toth,  wrote:
>
> > Thanks for volunteering, Nihal.
> >
> > I could work on the Hadoop-less, and assemblies, and you could work on
> > cleaning up the test jars.
> > Would that work for you ?
> > I know that I'm picking the smaller part, but it turns out that I won't
> > have as much time to work on this as I hoped.
> >
> > (Unless there are other volunteers, of course)
> >
> > Istvan
> >
> > On Wed, Mar 6, 2024 at 7:03 PM Istvan Toth  wrote:
> >
> > > We seem to be in agreement in principle, however the devil is in the
> > > details.
> > >
> > > The first step should be moving the diagnostic tools out of the test
> > jars.
> > > Are there any tools we don't want to move out ?
> > > Do the diagnostic tools pull in extra dependencies compared to the
> > current
> > > runtime JARs, and if they do, what are those ?
> > > I haven't thought of the chaosmonkey tests yet, do those have specific
> > > additional dependencies / scripts ?
> > >
> > > Should we move the tools simply to the normal jars, or should we move
> > them
> > > to a new module (could be called hbase-diagnostics) ?
> > >
> > > Istvan
> > >
> > > On Tue, Mar 5, 2024 at 7:10 PM Bryan Beaudreault <
> > bbeaudrea...@apache.org>
> > > wrote:
> > >
> > >> I'm +0 on hbase-examples, but +100 on any improvements we can make
> > to
> > >> ltt/pe/chaos/minicluster/etc. It's extremely frustrating how much
> > reliance
> > >> we have on test jars both generally but also specifically around these
> > >> core
> > >> test executables. Unfortunately I haven't had time to dedicate to
> these
> > >> frustrations myself, but happy to help with review, etc.
> > >>
> > >> On Tue, Mar 5, 2024 at 1:03 PM Nihal Jain 
> > wrote:
> > >>
> > >> > Thank you for bringing this up.
> > >> >
> > >> > +1 for this change.
> > >> >
> > >> > In fact, some time back, we had faced similar problem. Security
> scans
> > >> found
> > >> > that we were bundling some vulnerable hadoop test jar. To deal with
> > >> that we
> > >> > had to make a change in our internal HBase fork to exclude all HBase
> > and
> > >> > Hadoop test jars from assembly. This helped us get rid of vulnerable
> > >> jar.
> > >> > (Although I hadn't dealt with test scope dependencies there.)
> > >> >
> > >> > But, I have been thinking of pushing this change in Apache HBase,
> just
> > >> > wasn't sure if this was even acceptable. It's great to see same has
> > been
> > >> > brought up here today.
> > >> >
> > >> > We hadn't dealt with the ltt, pe etc. tools and wrote a script to
> > >> download
> > >> > them on demand to avoid massive code change in internal fork. But I
> > >> have a
> > >> > +1 on the idea of identifying and moving all such tools to a new
> > module.
> > >> > This would be great and make things easier for us as well.
> > >> >
> > >> > Also, a way we could help new users easily get started, in case we
> > >> > completely stop bundling hadoop jars, is by providing a script which
> > >> starts
> > >> > a hbase cluster in a single node setup. In fact I had written a
> simple
> > >> > script sometime back that automates this process given a release
> link
> > >> for
> > >> > both. It first downloads Hadoop and HBase binaries and then starts
> > both
> > >> > with the hbase root directory set to be on hdfs. We could pro

  1   2   >