If we don't want to publish the testing assembly, we could also just add a
profile to hbase-assembly,
so that the testing assembly can be built from the same module by enabling
the profile.


On Mon, Mar 17, 2025 at 7:22 AM Istvan Toth <st...@cloudera.com> wrote:

> I have opened HBASE-29187 for this.
>
> Should we publish this testing assembly for releases ?
> Should this testing assembly be built by default, or should we gate it
> with some property ?
>
> Istvan
>
>
> On Sat, Mar 15, 2025 at 7:13 PM Istvan Toth <st...@cloudera.com> wrote:
>
>> I will work on it next week.
>>
>> On Sat, Mar 15, 2025 at 3:01 PM 张铎(Duo Zhang) <palomino...@gmail.com>
>> wrote:
>>
>>> Do we have an issue for it?
>>>
>>> At least for releasing 3.0.0, we need to run ITBLL...
>>>
>>> Thanks.
>>>
>>> Istvan Toth <st...@apache.org> 于2025年3月15日周六 12:32写道:
>>> >
>>> > Yes, we've discussed this issue, but deferred solving it.
>>> >
>>> > IMO the easiest way is to add a third assembly which is functionally
>>> > equivalent to the 2.x assembly.
>>> > That could work for running hbase-it, ITBLL, chaos monkey, etc.
>>> > We'd also have to decide whether to build it by default, and whether we
>>> > want to publish it as part of official releases.
>>> >
>>> > In theory could also make a delta assembly that only includes the
>>> > additional test related stuff, but I'm afraid that that would require
>>> a lot
>>> > of maintenance.
>>> > We could also add a script/maven target that downloads test-related
>>> JARs
>>> > from maven, but keeping that one up to date would also be problematic.
>>> >
>>> > Istvan
>>> >
>>> > On Sat, Mar 15, 2025 at 4:51 AM 张铎(Duo Zhang) <palomino...@gmail.com>
>>> wrote:
>>> >
>>> > > After this change, we can not run ITBLL on 3.0.0 because hbase-it is
>>> > > also excluded...
>>> > >
>>> > > I tried manually copying all the tests jar and hbase-it jar to the
>>> lib
>>> > > directory but it did not work, I guess we still missed several hadoop
>>> > > jars...
>>> > >
>>> > > So what is the suggested way to run ITBLL after this change?
>>> > >
>>> > > Thanks.
>>> > >
>>> > > Istvan Toth <st...@cloudera.com.invalid> 于2025年1月20日周一 14:20写道:
>>> > > >
>>> > > > This is almost done.
>>> > > >
>>> > > > The final outstanding patch is
>>> https://github.com/apache/hbase/pull/5766
>>> > > > for the new Hadoop-less assembly.
>>> > > >
>>> > > > Could you please review it ?
>>> > > >
>>> > > >
>>> > > >
>>> > > > On Sat, Mar 9, 2024 at 8:48 AM Nihal Jain <nihaljain...@gmail.com>
>>> > > wrote:
>>> > > >
>>> > > > > I have created sub tasks with necessary details in the umbrella
>>> jira.
>>> > > Will
>>> > > > > take them up in coming days. Also will add more sub tasks later
>>> if
>>> > > needed.
>>> > > > >
>>> > > > > Regards
>>> > > > > Nihal
>>> > > > >
>>> > > > > On Sat, 9 Mar 2024, 11:53 Istvan Toth,
>>> <st...@cloudera.com.invalid>
>>> > > wrote:
>>> > > > >
>>> > > > > > Thank you Nihal.
>>> > > > > > I'm not very familiar with the tools in the test code, so you
>>> can
>>> > > > > probably
>>> > > > > > plan that work better.
>>> > > > > > I just have some generic steps in mind:
>>> > > > > > * Identify all the tools / scripts in the test jars
>>> > > > > > * Identify and analyze their dependencies (compared to the
>>> current
>>> > > > > runtime
>>> > > > > > deps)
>>> > > > > > * Decide which ones to move to the runtime JARs.
>>> > > > > > * Move them to the runtime code (or perhaps a separate module)
>>> > > > > >
>>> > > > > > I have created
>>> https://issues.apache.org/jira/browse/HBASE-28431 as
>>> > > an
>>> > > > > > umbrella ticket to organize the sub-tasks.
>>> > > > > >
>>> > > > > > Istvan
>>> > > > > >
>>> > > > > > On Fri, Mar 8, 2024 at 7:06 PM Nihal Jain <
>>> nihaljain...@gmail.com>
>>> > > > > wrote:
>>> > > > > >
>>> > > > > > > Sure I will be able to take up. Please create tasks with
>>> necessary
>>> > > > > > details
>>> > > > > > > or let me know if you want me to create.
>>> > > > > > >
>>> > > > > > > On Fri, 8 Mar 2024, 12:45 Istvan Toth,
>>> <st...@cloudera.com.invalid
>>> > > >
>>> > > > > > wrote:
>>> > > > > > >
>>> > > > > > > > Thanks for volunteering, Nihal.
>>> > > > > > > >
>>> > > > > > > > I could work on the Hadoop-less, and assemblies, and you
>>> could
>>> > > work
>>> > > > > on
>>> > > > > > > > cleaning up the test jars.
>>> > > > > > > > Would that work for you ?
>>> > > > > > > > I know that I'm picking the smaller part, but it turns out
>>> that I
>>> > > > > won't
>>> > > > > > > > have as much time to work on this as I hoped.
>>> > > > > > > >
>>> > > > > > > > (Unless there are other volunteers, of course)
>>> > > > > > > >
>>> > > > > > > > Istvan
>>> > > > > > > >
>>> > > > > > > > On Wed, Mar 6, 2024 at 7:03 PM Istvan Toth <
>>> st...@cloudera.com>
>>> > > > > wrote:
>>> > > > > > > >
>>> > > > > > > > > We seem to be in agreement in principle, however the
>>> devil is
>>> > > in
>>> > > > > the
>>> > > > > > > > > details.
>>> > > > > > > > >
>>> > > > > > > > > The first step should be moving the diagnostic tools out
>>> of the
>>> > > > > test
>>> > > > > > > > jars.
>>> > > > > > > > > Are there any tools we don't want to move out ?
>>> > > > > > > > > Do the diagnostic tools pull in extra dependencies
>>> compared to
>>> > > the
>>> > > > > > > > current
>>> > > > > > > > > runtime JARs, and if they do, what are those ?
>>> > > > > > > > > I haven't thought of the chaosmonkey tests yet, do those
>>> have
>>> > > > > > specific
>>> > > > > > > > > additional dependencies / scripts ?
>>> > > > > > > > >
>>> > > > > > > > > Should we move the tools simply to the normal jars, or
>>> should
>>> > > we
>>> > > > > move
>>> > > > > > > > them
>>> > > > > > > > > to a new module (could be called hbase-diagnostics) ?
>>> > > > > > > > >
>>> > > > > > > > > Istvan
>>> > > > > > > > >
>>> > > > > > > > > On Tue, Mar 5, 2024 at 7:10 PM Bryan Beaudreault <
>>> > > > > > > > bbeaudrea...@apache.org>
>>> > > > > > > > > wrote:
>>> > > > > > > > >
>>> > > > > > > > >> I'm +0 on hbase-examples, but +1000000 on any
>>> improvements we
>>> > > can
>>> > > > > > make
>>> > > > > > > > to
>>> > > > > > > > >> ltt/pe/chaos/minicluster/etc. It's extremely
>>> frustrating how
>>> > > much
>>> > > > > > > > reliance
>>> > > > > > > > >> we have on test jars both generally but also
>>> specifically
>>> > > around
>>> > > > > > these
>>> > > > > > > > >> core
>>> > > > > > > > >> test executables. Unfortunately I haven't had time to
>>> > > dedicate to
>>> > > > > > > these
>>> > > > > > > > >> frustrations myself, but happy to help with review, etc.
>>> > > > > > > > >>
>>> > > > > > > > >> On Tue, Mar 5, 2024 at 1:03 PM Nihal Jain <
>>> > > nihaljain...@gmail.com
>>> > > > > >
>>> > > > > > > > wrote:
>>> > > > > > > > >>
>>> > > > > > > > >> > Thank you for bringing this up.
>>> > > > > > > > >> >
>>> > > > > > > > >> > +1 for this change.
>>> > > > > > > > >> >
>>> > > > > > > > >> > In fact, some time back, we had faced similar problem.
>>> > > Security
>>> > > > > > > scans
>>> > > > > > > > >> found
>>> > > > > > > > >> > that we were bundling some vulnerable hadoop test
>>> jar. To
>>> > > deal
>>> > > > > > with
>>> > > > > > > > >> that we
>>> > > > > > > > >> > had to make a change in our internal HBase fork to
>>> exclude
>>> > > all
>>> > > > > > HBase
>>> > > > > > > > and
>>> > > > > > > > >> > Hadoop test jars from assembly. This helped us get
>>> rid of
>>> > > > > > vulnerable
>>> > > > > > > > >> jar.
>>> > > > > > > > >> > (Although I hadn't dealt with test scope dependencies
>>> > > there.)
>>> > > > > > > > >> >
>>> > > > > > > > >> > But, I have been thinking of pushing this change in
>>> Apache
>>> > > > > HBase,
>>> > > > > > > just
>>> > > > > > > > >> > wasn't sure if this was even acceptable. It's great
>>> to see
>>> > > same
>>> > > > > > has
>>> > > > > > > > been
>>> > > > > > > > >> > brought up here today.
>>> > > > > > > > >> >
>>> > > > > > > > >> > We hadn't dealt with the ltt, pe etc. tools and wrote
>>> a
>>> > > script
>>> > > > > to
>>> > > > > > > > >> download
>>> > > > > > > > >> > them on demand to avoid massive code change in
>>> internal
>>> > > fork.
>>> > > > > But
>>> > > > > > I
>>> > > > > > > > >> have a
>>> > > > > > > > >> > +1 on the idea of identifying and moving all such
>>> tools to
>>> > > a new
>>> > > > > > > > module.
>>> > > > > > > > >> > This would be great and make things easier for us as
>>> well.
>>> > > > > > > > >> >
>>> > > > > > > > >> > Also, a way we could help new users easily get
>>> started, in
>>> > > case
>>> > > > > we
>>> > > > > > > > >> > completely stop bundling hadoop jars, is by providing
>>> a
>>> > > script
>>> > > > > > which
>>> > > > > > > > >> starts
>>> > > > > > > > >> > a hbase cluster in a single node setup. In fact I had
>>> > > written a
>>> > > > > > > simple
>>> > > > > > > > >> > script sometime back that automates this process
>>> given a
>>> > > release
>>> > > > > > > link
>>> > > > > > > > >> for
>>> > > > > > > > >> > both. It first downloads Hadoop and HBase binaries
>>> and then
>>> > > > > starts
>>> > > > > > > > both
>>> > > > > > > > >> > with the hbase root directory set to be on hdfs. We
>>> could
>>> > > > > provide
>>> > > > > > > > >> something
>>> > > > > > > > >> > similar to help new users to get started easily.
>>> > > > > > > > >> >
>>> > > > > > > > >> > Although I am also +1 on the idea to provide both
>>> variants
>>> > > as
>>> > > > > > > > mentioned
>>> > > > > > > > >> by
>>> > > > > > > > >> > Nick, which might not even need any such script.
>>> > > > > > > > >> >
>>> > > > > > > > >> > Also, I am willing to volunteer for help towards this
>>> > > effort.
>>> > > > > > Please
>>> > > > > > > > >> let me
>>> > > > > > > > >> > know if anything is needed.
>>> > > > > > > > >> >
>>> > > > > > > > >> > Thanks,
>>> > > > > > > > >> > Nihal
>>> > > > > > > > >> >
>>> > > > > > > > >> >
>>> > > > > > > > >> > On Tue, 5 Mar 2024, 15:35 Nick Dimiduk, <
>>> > > ndimi...@apache.org>
>>> > > > > > > wrote:
>>> > > > > > > > >> >
>>> > > > > > > > >> > > This would be great cleanup, big +1 from me for all
>>> three
>>> > > of
>>> > > > > > these
>>> > > > > > > > >> > > adjustments, including the promotion of pe, ltt, and
>>> > > friends
>>> > > > > out
>>> > > > > > > of
>>> > > > > > > > >> the
>>> > > > > > > > >> > > test scope.
>>> > > > > > > > >> > >
>>> > > > > > > > >> > > I believe that we included hbase test jars because
>>> we
>>> > > used to
>>> > > > > > > freely
>>> > > > > > > > >> mix
>>> > > > > > > > >> > > classes needed for minicluster between runtime and
>>> test
>>> > > jars,
>>> > > > > > > which
>>> > > > > > > > in
>>> > > > > > > > >> > turn
>>> > > > > > > > >> > > relied on Hadoop minicluster capabilities. The big
>>> cleanup
>>> > > > > > around
>>> > > > > > > > >> > > HBaseTestingUtil/it addressed much (or all) of these
>>> > > issues on
>>> > > > > > > > >> branch-3.
>>> > > > > > > > >> > >
>>> > > > > > > > >> > > I believe that we include a Hadoop distribution in
>>> our
>>> > > > > assembly
>>> > > > > > > > >> because
>>> > > > > > > > >> > > that makes it easy for a new user to download our
>>> release
>>> > > > > > bin.tgz
>>> > > > > > > > and
>>> > > > > > > > >> get
>>> > > > > > > > >> > > started immediately with learning. I guess it’s
>>> high time
>>> > > that
>>> > > > > > we
>>> > > > > > > > work
>>> > > > > > > > >> > out
>>> > > > > > > > >> > > the with- and without-Hadoop variants.
>>> > > > > > > > >> > >
>>> > > > > > > > >> > > Thanks,
>>> > > > > > > > >> > > Nick
>>> > > > > > > > >> > >
>>> > > > > > > > >> > > On Tue, 5 Mar 2024 at 09:14, Istvan Toth <
>>> > > st...@apache.org>
>>> > > > > > > wrote:
>>> > > > > > > > >> > >
>>> > > > > > > > >> > > > DISCLAIMER: I don't have a patch ready, or even an
>>> > > elegant
>>> > > > > way
>>> > > > > > > > >> mapped
>>> > > > > > > > >> > out
>>> > > > > > > > >> > > > to achieve this, this is about discussing whether
>>> we
>>> > > even
>>> > > > > want
>>> > > > > > > to
>>> > > > > > > > >> make
>>> > > > > > > > >> > > > these changes.
>>> > > > > > > > >> > > > These are also substantial changes, but they
>>> could be
>>> > > > > targeted
>>> > > > > > > for
>>> > > > > > > > >> > HBase
>>> > > > > > > > >> > > > 3.0.
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > > One issue I have noticed is that we ship test
>>> jars and
>>> > > test
>>> > > > > > > > >> > dependencies
>>> > > > > > > > >> > > in
>>> > > > > > > > >> > > > the assembly.
>>> > > > > > > > >> > > > I can't see anyone using those, but it bloats the
>>> > > assembly
>>> > > > > and
>>> > > > > > > > >> > classpath,
>>> > > > > > > > >> > > > and adds unnecessary JARs with possible CVE
>>> issues. (for
>>> > > > > > example
>>> > > > > > > > >> Kerby
>>> > > > > > > > >> > > > which is a Hadoop minicluster dependency)
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > > My proposal is to exclude the test jars and the
>>> test
>>> > > scope
>>> > > > > > > > >> dependencies
>>> > > > > > > > >> > > > from the assembly.
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > > The advantages would be:
>>> > > > > > > > >> > > > * Smaller distro size
>>> > > > > > > > >> > > > * Faster startup (this is marginal)
>>> > > > > > > > >> > > > * Less CVE-prone JARs in the binary assemblies
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > > The other issue is that the assembly includes
>>> much of
>>> > > the
>>> > > > > > Hadoop
>>> > > > > > > > >> > > > distribution.
>>> > > > > > > > >> > > > The basic assumption in all scripts and
>>> instructions is
>>> > > that
>>> > > > > > the
>>> > > > > > > > >> node
>>> > > > > > > > >> > > has a
>>> > > > > > > > >> > > > fully configured Hadoop installation, and we
>>> include it
>>> > > in
>>> > > > > the
>>> > > > > > > > >> > classpath
>>> > > > > > > > >> > > of
>>> > > > > > > > >> > > > HBase.
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > > If that is true, then there is no reason to
>>> include
>>> > > Hadoop
>>> > > > > in
>>> > > > > > > the
>>> > > > > > > > >> > > assembly,
>>> > > > > > > > >> > > > HBase and its direct dependencies should be
>>> enough.
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > > One could argue that it would simplify the client
>>> side,
>>> > > > > which
>>> > > > > > is
>>> > > > > > > > >> true
>>> > > > > > > > >> > to
>>> > > > > > > > >> > > > some extent (though 95% of the client distro use
>>> cases
>>> > > are
>>> > > > > > > served
>>> > > > > > > > >> > better
>>> > > > > > > > >> > > by
>>> > > > > > > > >> > > > simply using hbase-shaded-client).
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > > We could either remove the Hadoop libraries from
>>> either
>>> > > or
>>> > > > > > both
>>> > > > > > > of
>>> > > > > > > > >> the
>>> > > > > > > > >> > > > assemblies unconditionally, or provide two
>>> variants for
>>> > > > > either
>>> > > > > > > or
>>> > > > > > > > >> both
>>> > > > > > > > >> > > > assemblies, one with Hadoop included, and one
>>> without
>>> > > it.
>>> > > > > > > > >> > > > Spark already does this, it has binary
>>> distributions
>>> > > both
>>> > > > > with
>>> > > > > > > and
>>> > > > > > > > >> > > without
>>> > > > > > > > >> > > > Hadoop.
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > > The advantages would be:
>>> > > > > > > > >> > > > * Smaller distro size
>>> > > > > > > > >> > > > * Faster startup (this is marginal)
>>> > > > > > > > >> > > > * Less chance of conflicts with the Hadoop jars
>>> > > > > > > > >> > > > * Less CVE-prone JARs in the binary assemblies
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > > Thirdly, we could consider excluding the
>>> > > > > > > > >> > > > full-fat org.apache.hbase:hbase-shaded-client JAR
>>> from
>>> > > the
>>> > > > > > > > >> Hadoop-less
>>> > > > > > > > >> > > > binary assemblies. It is not used by the
>>> assembly, and
>>> > > AFAIK
>>> > > > > > it
>>> > > > > > > is
>>> > > > > > > > >> not
>>> > > > > > > > >> > > > included in any of the 'hbase classpath' command
>>> > > variants.
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > > This would make sure that no Hadoop libraries are
>>> > > included
>>> > > > > > (even
>>> > > > > > > > in
>>> > > > > > > > >> > > shaded
>>> > > > > > > > >> > > > form) and would make the HBase distribution fully
>>> > > insulated
>>> > > > > > from
>>> > > > > > > > >> > Hadoop's
>>> > > > > > > > >> > > > CVE issues.
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > > (The full-fat hbase-shaded-client works best as
>>> direct
>>> > > > > > > build-time
>>> > > > > > > > >> > > > dependency anyway)
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > > best regards
>>> > > > > > > > >> > > > Istvan
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > >
>>> > > > > > > > >> >
>>> > > > > > > > >>
>>> > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > > > --
>>> > > > > > > > > *István Tóth* | Sr. Staff Software Engineer
>>> > > > > > > > > *Email*: st...@cloudera.com
>>> > > > > > > > > cloudera.com <https://www.cloudera.com>
>>> > > > > > > > > [image: Cloudera] <https://www.cloudera.com/>
>>> > > > > > > > > [image: Cloudera on Twitter] <
>>> https://twitter.com/cloudera>
>>> > > > > [image:
>>> > > > > > > > > Cloudera on Facebook] <https://www.facebook.com/cloudera
>>> >
>>> > > [image:
>>> > > > > > > > > Cloudera on LinkedIn] <
>>> > > https://www.linkedin.com/company/cloudera>
>>> > > > > > > > > ------------------------------
>>> > > > > > > > > ------------------------------
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > > --
>>> > > > > > > > *István Tóth* | Sr. Staff Software Engineer
>>> > > > > > > > *Email*: st...@cloudera.com
>>> > > > > > > > cloudera.com <https://www.cloudera.com>
>>> > > > > > > > [image: Cloudera] <https://www.cloudera.com/>
>>> > > > > > > > [image: Cloudera on Twitter] <https://twitter.com/cloudera
>>> >
>>> > > [image:
>>> > > > > > > > Cloudera on Facebook] <https://www.facebook.com/cloudera>
>>> > > [image:
>>> > > > > > > Cloudera
>>> > > > > > > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>> > > > > > > > ------------------------------
>>> > > > > > > > ------------------------------
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > > > --
>>> > > > > > *István Tóth* | Sr. Staff Software Engineer
>>> > > > > > *Email*: st...@cloudera.com
>>> > > > > > cloudera.com <https://www.cloudera.com>
>>> > > > > > [image: Cloudera] <https://www.cloudera.com/>
>>> > > > > > [image: Cloudera on Twitter] <https://twitter.com/cloudera>
>>> [image:
>>> > > > > > Cloudera on Facebook] <https://www.facebook.com/cloudera>
>>> [image:
>>> > > > > Cloudera
>>> > > > > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>> > > > > > ------------------------------
>>> > > > > > ------------------------------
>>> > > > > >
>>> > > > >
>>> > > >
>>> > > >
>>> > > > --
>>> > > > *István Tóth* | Sr. Staff Software Engineer
>>> > > > *Email*: st...@cloudera.com
>>> > > > cloudera.com <https://www.cloudera.com>
>>> > > > [image: Cloudera] <https://www.cloudera.com/>
>>> > > > [image: Cloudera on Twitter] <https://twitter.com/cloudera>
>>> [image:
>>> > > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>> > > Cloudera
>>> > > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>> > > > ------------------------------
>>> > > > ------------------------------
>>> > >
>>>
>>
>>
>> --
>> *István Tóth* | Sr. Staff Software Engineer
>> *Email*: st...@cloudera.com
>> cloudera.com <https://www.cloudera.com>
>> [image: Cloudera] <https://www.cloudera.com/>
>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>> ------------------------------
>> ------------------------------
>>
>
>
> --
> *István Tóth* | Sr. Staff Software Engineer
> *Email*: st...@cloudera.com
> cloudera.com <https://www.cloudera.com>
> [image: Cloudera] <https://www.cloudera.com/>
> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
> ------------------------------
> ------------------------------
>


-- 
*István Tóth* | Sr. Staff Software Engineer
*Email*: st...@cloudera.com
cloudera.com <https://www.cloudera.com>
[image: Cloudera] <https://www.cloudera.com/>
[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
------------------------------
------------------------------

Reply via email to