Thanks for volunteering, Nihal.

I could work on the Hadoop-less, and assemblies, and you could work on
cleaning up the test jars.
Would that work for you ?
I know that I'm picking the smaller part, but it turns out that I won't
have as much time to work on this as I hoped.

(Unless there are other volunteers, of course)

Istvan

On Wed, Mar 6, 2024 at 7:03 PM Istvan Toth <st...@cloudera.com> wrote:

> We seem to be in agreement in principle, however the devil is in the
> details.
>
> The first step should be moving the diagnostic tools out of the test jars.
> Are there any tools we don't want to move out ?
> Do the diagnostic tools pull in extra dependencies compared to the current
> runtime JARs, and if they do, what are those ?
> I haven't thought of the chaosmonkey tests yet, do those have specific
> additional dependencies / scripts ?
>
> Should we move the tools simply to the normal jars, or should we move them
> to a new module (could be called hbase-diagnostics) ?
>
> Istvan
>
> On Tue, Mar 5, 2024 at 7:10 PM Bryan Beaudreault <bbeaudrea...@apache.org>
> wrote:
>
>> I'm +0 on hbase-examples, but +1000000 on any improvements we can make to
>> ltt/pe/chaos/minicluster/etc. It's extremely frustrating how much reliance
>> we have on test jars both generally but also specifically around these
>> core
>> test executables. Unfortunately I haven't had time to dedicate to these
>> frustrations myself, but happy to help with review, etc.
>>
>> On Tue, Mar 5, 2024 at 1:03 PM Nihal Jain <nihaljain...@gmail.com> wrote:
>>
>> > Thank you for bringing this up.
>> >
>> > +1 for this change.
>> >
>> > In fact, some time back, we had faced similar problem. Security scans
>> found
>> > that we were bundling some vulnerable hadoop test jar. To deal with
>> that we
>> > had to make a change in our internal HBase fork to exclude all HBase and
>> > Hadoop test jars from assembly. This helped us get rid of vulnerable
>> jar.
>> > (Although I hadn't dealt with test scope dependencies there.)
>> >
>> > But, I have been thinking of pushing this change in Apache HBase, just
>> > wasn't sure if this was even acceptable. It's great to see same has been
>> > brought up here today.
>> >
>> > We hadn't dealt with the ltt, pe etc. tools and wrote a script to
>> download
>> > them on demand to avoid massive code change in internal fork. But I
>> have a
>> > +1 on the idea of identifying and moving all such tools to a new module.
>> > This would be great and make things easier for us as well.
>> >
>> > Also, a way we could help new users easily get started, in case we
>> > completely stop bundling hadoop jars, is by providing a script which
>> starts
>> > a hbase cluster in a single node setup. In fact I had written a simple
>> > script sometime back that automates this process given a release link
>> for
>> > both. It first downloads Hadoop and HBase binaries and then starts both
>> > with the hbase root directory set to be on hdfs. We could provide
>> something
>> > similar to help new users to get started easily.
>> >
>> > Although I am also +1 on the idea to provide both variants as mentioned
>> by
>> > Nick, which might not even need any such script.
>> >
>> > Also, I am willing to volunteer for help towards this effort. Please
>> let me
>> > know if anything is needed.
>> >
>> > Thanks,
>> > Nihal
>> >
>> >
>> > On Tue, 5 Mar 2024, 15:35 Nick Dimiduk, <ndimi...@apache.org> wrote:
>> >
>> > > This would be great cleanup, big +1 from me for all three of these
>> > > adjustments, including the promotion of pe, ltt, and friends out of
>> the
>> > > test scope.
>> > >
>> > > I believe that we included hbase test jars because we used to freely
>> mix
>> > > classes needed for minicluster between runtime and test jars, which in
>> > turn
>> > > relied on Hadoop minicluster capabilities. The big cleanup around
>> > > HBaseTestingUtil/it addressed much (or all) of these issues on
>> branch-3.
>> > >
>> > > I believe that we include a Hadoop distribution in our assembly
>> because
>> > > that makes it easy for a new user to download our release bin.tgz and
>> get
>> > > started immediately with learning. I guess it’s high time that we work
>> > out
>> > > the with- and without-Hadoop variants.
>> > >
>> > > Thanks,
>> > > Nick
>> > >
>> > > On Tue, 5 Mar 2024 at 09:14, Istvan Toth <st...@apache.org> wrote:
>> > >
>> > > > DISCLAIMER: I don't have a patch ready, or even an elegant way
>> mapped
>> > out
>> > > > to achieve this, this is about discussing whether we even want to
>> make
>> > > > these changes.
>> > > > These are also substantial changes, but they could be targeted for
>> > HBase
>> > > > 3.0.
>> > > >
>> > > > One issue I have noticed is that we ship test jars and test
>> > dependencies
>> > > in
>> > > > the assembly.
>> > > > I can't see anyone using those, but it bloats the assembly and
>> > classpath,
>> > > > and adds unnecessary JARs with possible CVE issues. (for example
>> Kerby
>> > > > which is a Hadoop minicluster dependency)
>> > > >
>> > > > My proposal is to exclude the test jars and the test scope
>> dependencies
>> > > > from the assembly.
>> > > >
>> > > > The advantages would be:
>> > > > * Smaller distro size
>> > > > * Faster startup (this is marginal)
>> > > > * Less CVE-prone JARs in the binary assemblies
>> > > >
>> > > > The other issue is that the assembly includes much of the Hadoop
>> > > > distribution.
>> > > > The basic assumption in all scripts and instructions is that the
>> node
>> > > has a
>> > > > fully configured Hadoop installation, and we include it in the
>> > classpath
>> > > of
>> > > > HBase.
>> > > >
>> > > > If that is true, then there is no reason to include Hadoop in the
>> > > assembly,
>> > > > HBase and its direct dependencies should be enough.
>> > > >
>> > > > One could argue that it would simplify the client side, which is
>> true
>> > to
>> > > > some extent (though 95% of the client distro use cases are served
>> > better
>> > > by
>> > > > simply using hbase-shaded-client).
>> > > >
>> > > > We could either remove the Hadoop libraries from either or both of
>> the
>> > > > assemblies unconditionally, or provide two variants for either or
>> both
>> > > > assemblies, one with Hadoop included, and one without it.
>> > > > Spark already does this, it has binary distributions both with and
>> > > without
>> > > > Hadoop.
>> > > >
>> > > > The advantages would be:
>> > > > * Smaller distro size
>> > > > * Faster startup (this is marginal)
>> > > > * Less chance of conflicts with the Hadoop jars
>> > > > * Less CVE-prone JARs in the binary assemblies
>> > > >
>> > > >
>> > > > Thirdly, we could consider excluding the
>> > > > full-fat org.apache.hbase:hbase-shaded-client JAR from the
>> Hadoop-less
>> > > > binary assemblies. It is not used by the assembly, and AFAIK it is
>> not
>> > > > included in any of the 'hbase classpath' command variants.
>> > > >
>> > > > This would make sure that no Hadoop libraries are included (even in
>> > > shaded
>> > > > form) and would make the HBase distribution fully insulated from
>> > Hadoop's
>> > > > CVE issues.
>> > > >
>> > > > (The full-fat hbase-shaded-client works best as direct build-time
>> > > > dependency anyway)
>> > > >
>> > > > best regards
>> > > > Istvan
>> > > >
>> > >
>> >
>>
>
>
> --
> *István Tóth* | Sr. Staff Software Engineer
> *Email*: st...@cloudera.com
> cloudera.com <https://www.cloudera.com>
> [image: Cloudera] <https://www.cloudera.com/>
> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
> ------------------------------
> ------------------------------
>


-- 
*István Tóth* | Sr. Staff Software Engineer
*Email*: st...@cloudera.com
cloudera.com <https://www.cloudera.com>
[image: Cloudera] <https://www.cloudera.com/>
[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
------------------------------
------------------------------

Reply via email to