If we don't want to publish the testing assembly, we could also just add a profile to hbase-assembly, so that the testing assembly can be built from the same module by enabling the profile.
On Mon, Mar 17, 2025 at 7:22 AM Istvan Toth <st...@cloudera.com> wrote: > I have opened HBASE-29187 for this. > > Should we publish this testing assembly for releases ? > Should this testing assembly be built by default, or should we gate it > with some property ? > > Istvan > > > On Sat, Mar 15, 2025 at 7:13 PM Istvan Toth <st...@cloudera.com> wrote: > >> I will work on it next week. >> >> On Sat, Mar 15, 2025 at 3:01 PM 张铎(Duo Zhang) <palomino...@gmail.com> >> wrote: >> >>> Do we have an issue for it? >>> >>> At least for releasing 3.0.0, we need to run ITBLL... >>> >>> Thanks. >>> >>> Istvan Toth <st...@apache.org> 于2025年3月15日周六 12:32写道: >>> > >>> > Yes, we've discussed this issue, but deferred solving it. >>> > >>> > IMO the easiest way is to add a third assembly which is functionally >>> > equivalent to the 2.x assembly. >>> > That could work for running hbase-it, ITBLL, chaos monkey, etc. >>> > We'd also have to decide whether to build it by default, and whether we >>> > want to publish it as part of official releases. >>> > >>> > In theory could also make a delta assembly that only includes the >>> > additional test related stuff, but I'm afraid that that would require >>> a lot >>> > of maintenance. >>> > We could also add a script/maven target that downloads test-related >>> JARs >>> > from maven, but keeping that one up to date would also be problematic. >>> > >>> > Istvan >>> > >>> > On Sat, Mar 15, 2025 at 4:51 AM 张铎(Duo Zhang) <palomino...@gmail.com> >>> wrote: >>> > >>> > > After this change, we can not run ITBLL on 3.0.0 because hbase-it is >>> > > also excluded... >>> > > >>> > > I tried manually copying all the tests jar and hbase-it jar to the >>> lib >>> > > directory but it did not work, I guess we still missed several hadoop >>> > > jars... >>> > > >>> > > So what is the suggested way to run ITBLL after this change? >>> > > >>> > > Thanks. >>> > > >>> > > Istvan Toth <st...@cloudera.com.invalid> 于2025年1月20日周一 14:20写道: >>> > > > >>> > > > This is almost done. >>> > > > >>> > > > The final outstanding patch is >>> https://github.com/apache/hbase/pull/5766 >>> > > > for the new Hadoop-less assembly. >>> > > > >>> > > > Could you please review it ? >>> > > > >>> > > > >>> > > > >>> > > > On Sat, Mar 9, 2024 at 8:48 AM Nihal Jain <nihaljain...@gmail.com> >>> > > wrote: >>> > > > >>> > > > > I have created sub tasks with necessary details in the umbrella >>> jira. >>> > > Will >>> > > > > take them up in coming days. Also will add more sub tasks later >>> if >>> > > needed. >>> > > > > >>> > > > > Regards >>> > > > > Nihal >>> > > > > >>> > > > > On Sat, 9 Mar 2024, 11:53 Istvan Toth, >>> <st...@cloudera.com.invalid> >>> > > wrote: >>> > > > > >>> > > > > > Thank you Nihal. >>> > > > > > I'm not very familiar with the tools in the test code, so you >>> can >>> > > > > probably >>> > > > > > plan that work better. >>> > > > > > I just have some generic steps in mind: >>> > > > > > * Identify all the tools / scripts in the test jars >>> > > > > > * Identify and analyze their dependencies (compared to the >>> current >>> > > > > runtime >>> > > > > > deps) >>> > > > > > * Decide which ones to move to the runtime JARs. >>> > > > > > * Move them to the runtime code (or perhaps a separate module) >>> > > > > > >>> > > > > > I have created >>> https://issues.apache.org/jira/browse/HBASE-28431 as >>> > > an >>> > > > > > umbrella ticket to organize the sub-tasks. >>> > > > > > >>> > > > > > Istvan >>> > > > > > >>> > > > > > On Fri, Mar 8, 2024 at 7:06 PM Nihal Jain < >>> nihaljain...@gmail.com> >>> > > > > wrote: >>> > > > > > >>> > > > > > > Sure I will be able to take up. Please create tasks with >>> necessary >>> > > > > > details >>> > > > > > > or let me know if you want me to create. >>> > > > > > > >>> > > > > > > On Fri, 8 Mar 2024, 12:45 Istvan Toth, >>> <st...@cloudera.com.invalid >>> > > > >>> > > > > > wrote: >>> > > > > > > >>> > > > > > > > Thanks for volunteering, Nihal. >>> > > > > > > > >>> > > > > > > > I could work on the Hadoop-less, and assemblies, and you >>> could >>> > > work >>> > > > > on >>> > > > > > > > cleaning up the test jars. >>> > > > > > > > Would that work for you ? >>> > > > > > > > I know that I'm picking the smaller part, but it turns out >>> that I >>> > > > > won't >>> > > > > > > > have as much time to work on this as I hoped. >>> > > > > > > > >>> > > > > > > > (Unless there are other volunteers, of course) >>> > > > > > > > >>> > > > > > > > Istvan >>> > > > > > > > >>> > > > > > > > On Wed, Mar 6, 2024 at 7:03 PM Istvan Toth < >>> st...@cloudera.com> >>> > > > > wrote: >>> > > > > > > > >>> > > > > > > > > We seem to be in agreement in principle, however the >>> devil is >>> > > in >>> > > > > the >>> > > > > > > > > details. >>> > > > > > > > > >>> > > > > > > > > The first step should be moving the diagnostic tools out >>> of the >>> > > > > test >>> > > > > > > > jars. >>> > > > > > > > > Are there any tools we don't want to move out ? >>> > > > > > > > > Do the diagnostic tools pull in extra dependencies >>> compared to >>> > > the >>> > > > > > > > current >>> > > > > > > > > runtime JARs, and if they do, what are those ? >>> > > > > > > > > I haven't thought of the chaosmonkey tests yet, do those >>> have >>> > > > > > specific >>> > > > > > > > > additional dependencies / scripts ? >>> > > > > > > > > >>> > > > > > > > > Should we move the tools simply to the normal jars, or >>> should >>> > > we >>> > > > > move >>> > > > > > > > them >>> > > > > > > > > to a new module (could be called hbase-diagnostics) ? >>> > > > > > > > > >>> > > > > > > > > Istvan >>> > > > > > > > > >>> > > > > > > > > On Tue, Mar 5, 2024 at 7:10 PM Bryan Beaudreault < >>> > > > > > > > bbeaudrea...@apache.org> >>> > > > > > > > > wrote: >>> > > > > > > > > >>> > > > > > > > >> I'm +0 on hbase-examples, but +1000000 on any >>> improvements we >>> > > can >>> > > > > > make >>> > > > > > > > to >>> > > > > > > > >> ltt/pe/chaos/minicluster/etc. It's extremely >>> frustrating how >>> > > much >>> > > > > > > > reliance >>> > > > > > > > >> we have on test jars both generally but also >>> specifically >>> > > around >>> > > > > > these >>> > > > > > > > >> core >>> > > > > > > > >> test executables. Unfortunately I haven't had time to >>> > > dedicate to >>> > > > > > > these >>> > > > > > > > >> frustrations myself, but happy to help with review, etc. >>> > > > > > > > >> >>> > > > > > > > >> On Tue, Mar 5, 2024 at 1:03 PM Nihal Jain < >>> > > nihaljain...@gmail.com >>> > > > > > >>> > > > > > > > wrote: >>> > > > > > > > >> >>> > > > > > > > >> > Thank you for bringing this up. >>> > > > > > > > >> > >>> > > > > > > > >> > +1 for this change. >>> > > > > > > > >> > >>> > > > > > > > >> > In fact, some time back, we had faced similar problem. >>> > > Security >>> > > > > > > scans >>> > > > > > > > >> found >>> > > > > > > > >> > that we were bundling some vulnerable hadoop test >>> jar. To >>> > > deal >>> > > > > > with >>> > > > > > > > >> that we >>> > > > > > > > >> > had to make a change in our internal HBase fork to >>> exclude >>> > > all >>> > > > > > HBase >>> > > > > > > > and >>> > > > > > > > >> > Hadoop test jars from assembly. This helped us get >>> rid of >>> > > > > > vulnerable >>> > > > > > > > >> jar. >>> > > > > > > > >> > (Although I hadn't dealt with test scope dependencies >>> > > there.) >>> > > > > > > > >> > >>> > > > > > > > >> > But, I have been thinking of pushing this change in >>> Apache >>> > > > > HBase, >>> > > > > > > just >>> > > > > > > > >> > wasn't sure if this was even acceptable. It's great >>> to see >>> > > same >>> > > > > > has >>> > > > > > > > been >>> > > > > > > > >> > brought up here today. >>> > > > > > > > >> > >>> > > > > > > > >> > We hadn't dealt with the ltt, pe etc. tools and wrote >>> a >>> > > script >>> > > > > to >>> > > > > > > > >> download >>> > > > > > > > >> > them on demand to avoid massive code change in >>> internal >>> > > fork. >>> > > > > But >>> > > > > > I >>> > > > > > > > >> have a >>> > > > > > > > >> > +1 on the idea of identifying and moving all such >>> tools to >>> > > a new >>> > > > > > > > module. >>> > > > > > > > >> > This would be great and make things easier for us as >>> well. >>> > > > > > > > >> > >>> > > > > > > > >> > Also, a way we could help new users easily get >>> started, in >>> > > case >>> > > > > we >>> > > > > > > > >> > completely stop bundling hadoop jars, is by providing >>> a >>> > > script >>> > > > > > which >>> > > > > > > > >> starts >>> > > > > > > > >> > a hbase cluster in a single node setup. In fact I had >>> > > written a >>> > > > > > > simple >>> > > > > > > > >> > script sometime back that automates this process >>> given a >>> > > release >>> > > > > > > link >>> > > > > > > > >> for >>> > > > > > > > >> > both. It first downloads Hadoop and HBase binaries >>> and then >>> > > > > starts >>> > > > > > > > both >>> > > > > > > > >> > with the hbase root directory set to be on hdfs. We >>> could >>> > > > > provide >>> > > > > > > > >> something >>> > > > > > > > >> > similar to help new users to get started easily. >>> > > > > > > > >> > >>> > > > > > > > >> > Although I am also +1 on the idea to provide both >>> variants >>> > > as >>> > > > > > > > mentioned >>> > > > > > > > >> by >>> > > > > > > > >> > Nick, which might not even need any such script. >>> > > > > > > > >> > >>> > > > > > > > >> > Also, I am willing to volunteer for help towards this >>> > > effort. >>> > > > > > Please >>> > > > > > > > >> let me >>> > > > > > > > >> > know if anything is needed. >>> > > > > > > > >> > >>> > > > > > > > >> > Thanks, >>> > > > > > > > >> > Nihal >>> > > > > > > > >> > >>> > > > > > > > >> > >>> > > > > > > > >> > On Tue, 5 Mar 2024, 15:35 Nick Dimiduk, < >>> > > ndimi...@apache.org> >>> > > > > > > wrote: >>> > > > > > > > >> > >>> > > > > > > > >> > > This would be great cleanup, big +1 from me for all >>> three >>> > > of >>> > > > > > these >>> > > > > > > > >> > > adjustments, including the promotion of pe, ltt, and >>> > > friends >>> > > > > out >>> > > > > > > of >>> > > > > > > > >> the >>> > > > > > > > >> > > test scope. >>> > > > > > > > >> > > >>> > > > > > > > >> > > I believe that we included hbase test jars because >>> we >>> > > used to >>> > > > > > > freely >>> > > > > > > > >> mix >>> > > > > > > > >> > > classes needed for minicluster between runtime and >>> test >>> > > jars, >>> > > > > > > which >>> > > > > > > > in >>> > > > > > > > >> > turn >>> > > > > > > > >> > > relied on Hadoop minicluster capabilities. The big >>> cleanup >>> > > > > > around >>> > > > > > > > >> > > HBaseTestingUtil/it addressed much (or all) of these >>> > > issues on >>> > > > > > > > >> branch-3. >>> > > > > > > > >> > > >>> > > > > > > > >> > > I believe that we include a Hadoop distribution in >>> our >>> > > > > assembly >>> > > > > > > > >> because >>> > > > > > > > >> > > that makes it easy for a new user to download our >>> release >>> > > > > > bin.tgz >>> > > > > > > > and >>> > > > > > > > >> get >>> > > > > > > > >> > > started immediately with learning. I guess it’s >>> high time >>> > > that >>> > > > > > we >>> > > > > > > > work >>> > > > > > > > >> > out >>> > > > > > > > >> > > the with- and without-Hadoop variants. >>> > > > > > > > >> > > >>> > > > > > > > >> > > Thanks, >>> > > > > > > > >> > > Nick >>> > > > > > > > >> > > >>> > > > > > > > >> > > On Tue, 5 Mar 2024 at 09:14, Istvan Toth < >>> > > st...@apache.org> >>> > > > > > > wrote: >>> > > > > > > > >> > > >>> > > > > > > > >> > > > DISCLAIMER: I don't have a patch ready, or even an >>> > > elegant >>> > > > > way >>> > > > > > > > >> mapped >>> > > > > > > > >> > out >>> > > > > > > > >> > > > to achieve this, this is about discussing whether >>> we >>> > > even >>> > > > > want >>> > > > > > > to >>> > > > > > > > >> make >>> > > > > > > > >> > > > these changes. >>> > > > > > > > >> > > > These are also substantial changes, but they >>> could be >>> > > > > targeted >>> > > > > > > for >>> > > > > > > > >> > HBase >>> > > > > > > > >> > > > 3.0. >>> > > > > > > > >> > > > >>> > > > > > > > >> > > > One issue I have noticed is that we ship test >>> jars and >>> > > test >>> > > > > > > > >> > dependencies >>> > > > > > > > >> > > in >>> > > > > > > > >> > > > the assembly. >>> > > > > > > > >> > > > I can't see anyone using those, but it bloats the >>> > > assembly >>> > > > > and >>> > > > > > > > >> > classpath, >>> > > > > > > > >> > > > and adds unnecessary JARs with possible CVE >>> issues. (for >>> > > > > > example >>> > > > > > > > >> Kerby >>> > > > > > > > >> > > > which is a Hadoop minicluster dependency) >>> > > > > > > > >> > > > >>> > > > > > > > >> > > > My proposal is to exclude the test jars and the >>> test >>> > > scope >>> > > > > > > > >> dependencies >>> > > > > > > > >> > > > from the assembly. >>> > > > > > > > >> > > > >>> > > > > > > > >> > > > The advantages would be: >>> > > > > > > > >> > > > * Smaller distro size >>> > > > > > > > >> > > > * Faster startup (this is marginal) >>> > > > > > > > >> > > > * Less CVE-prone JARs in the binary assemblies >>> > > > > > > > >> > > > >>> > > > > > > > >> > > > The other issue is that the assembly includes >>> much of >>> > > the >>> > > > > > Hadoop >>> > > > > > > > >> > > > distribution. >>> > > > > > > > >> > > > The basic assumption in all scripts and >>> instructions is >>> > > that >>> > > > > > the >>> > > > > > > > >> node >>> > > > > > > > >> > > has a >>> > > > > > > > >> > > > fully configured Hadoop installation, and we >>> include it >>> > > in >>> > > > > the >>> > > > > > > > >> > classpath >>> > > > > > > > >> > > of >>> > > > > > > > >> > > > HBase. >>> > > > > > > > >> > > > >>> > > > > > > > >> > > > If that is true, then there is no reason to >>> include >>> > > Hadoop >>> > > > > in >>> > > > > > > the >>> > > > > > > > >> > > assembly, >>> > > > > > > > >> > > > HBase and its direct dependencies should be >>> enough. >>> > > > > > > > >> > > > >>> > > > > > > > >> > > > One could argue that it would simplify the client >>> side, >>> > > > > which >>> > > > > > is >>> > > > > > > > >> true >>> > > > > > > > >> > to >>> > > > > > > > >> > > > some extent (though 95% of the client distro use >>> cases >>> > > are >>> > > > > > > served >>> > > > > > > > >> > better >>> > > > > > > > >> > > by >>> > > > > > > > >> > > > simply using hbase-shaded-client). >>> > > > > > > > >> > > > >>> > > > > > > > >> > > > We could either remove the Hadoop libraries from >>> either >>> > > or >>> > > > > > both >>> > > > > > > of >>> > > > > > > > >> the >>> > > > > > > > >> > > > assemblies unconditionally, or provide two >>> variants for >>> > > > > either >>> > > > > > > or >>> > > > > > > > >> both >>> > > > > > > > >> > > > assemblies, one with Hadoop included, and one >>> without >>> > > it. >>> > > > > > > > >> > > > Spark already does this, it has binary >>> distributions >>> > > both >>> > > > > with >>> > > > > > > and >>> > > > > > > > >> > > without >>> > > > > > > > >> > > > Hadoop. >>> > > > > > > > >> > > > >>> > > > > > > > >> > > > The advantages would be: >>> > > > > > > > >> > > > * Smaller distro size >>> > > > > > > > >> > > > * Faster startup (this is marginal) >>> > > > > > > > >> > > > * Less chance of conflicts with the Hadoop jars >>> > > > > > > > >> > > > * Less CVE-prone JARs in the binary assemblies >>> > > > > > > > >> > > > >>> > > > > > > > >> > > > >>> > > > > > > > >> > > > Thirdly, we could consider excluding the >>> > > > > > > > >> > > > full-fat org.apache.hbase:hbase-shaded-client JAR >>> from >>> > > the >>> > > > > > > > >> Hadoop-less >>> > > > > > > > >> > > > binary assemblies. It is not used by the >>> assembly, and >>> > > AFAIK >>> > > > > > it >>> > > > > > > is >>> > > > > > > > >> not >>> > > > > > > > >> > > > included in any of the 'hbase classpath' command >>> > > variants. >>> > > > > > > > >> > > > >>> > > > > > > > >> > > > This would make sure that no Hadoop libraries are >>> > > included >>> > > > > > (even >>> > > > > > > > in >>> > > > > > > > >> > > shaded >>> > > > > > > > >> > > > form) and would make the HBase distribution fully >>> > > insulated >>> > > > > > from >>> > > > > > > > >> > Hadoop's >>> > > > > > > > >> > > > CVE issues. >>> > > > > > > > >> > > > >>> > > > > > > > >> > > > (The full-fat hbase-shaded-client works best as >>> direct >>> > > > > > > build-time >>> > > > > > > > >> > > > dependency anyway) >>> > > > > > > > >> > > > >>> > > > > > > > >> > > > best regards >>> > > > > > > > >> > > > Istvan >>> > > > > > > > >> > > > >>> > > > > > > > >> > > >>> > > > > > > > >> > >>> > > > > > > > >> >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > -- >>> > > > > > > > > *István Tóth* | Sr. Staff Software Engineer >>> > > > > > > > > *Email*: st...@cloudera.com >>> > > > > > > > > cloudera.com <https://www.cloudera.com> >>> > > > > > > > > [image: Cloudera] <https://www.cloudera.com/> >>> > > > > > > > > [image: Cloudera on Twitter] < >>> https://twitter.com/cloudera> >>> > > > > [image: >>> > > > > > > > > Cloudera on Facebook] <https://www.facebook.com/cloudera >>> > >>> > > [image: >>> > > > > > > > > Cloudera on LinkedIn] < >>> > > https://www.linkedin.com/company/cloudera> >>> > > > > > > > > ------------------------------ >>> > > > > > > > > ------------------------------ >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > > >>> > > > > > > > -- >>> > > > > > > > *István Tóth* | Sr. Staff Software Engineer >>> > > > > > > > *Email*: st...@cloudera.com >>> > > > > > > > cloudera.com <https://www.cloudera.com> >>> > > > > > > > [image: Cloudera] <https://www.cloudera.com/> >>> > > > > > > > [image: Cloudera on Twitter] <https://twitter.com/cloudera >>> > >>> > > [image: >>> > > > > > > > Cloudera on Facebook] <https://www.facebook.com/cloudera> >>> > > [image: >>> > > > > > > Cloudera >>> > > > > > > > on LinkedIn] <https://www.linkedin.com/company/cloudera> >>> > > > > > > > ------------------------------ >>> > > > > > > > ------------------------------ >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > > >>> > > > > > -- >>> > > > > > *István Tóth* | Sr. Staff Software Engineer >>> > > > > > *Email*: st...@cloudera.com >>> > > > > > cloudera.com <https://www.cloudera.com> >>> > > > > > [image: Cloudera] <https://www.cloudera.com/> >>> > > > > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> >>> [image: >>> > > > > > Cloudera on Facebook] <https://www.facebook.com/cloudera> >>> [image: >>> > > > > Cloudera >>> > > > > > on LinkedIn] <https://www.linkedin.com/company/cloudera> >>> > > > > > ------------------------------ >>> > > > > > ------------------------------ >>> > > > > > >>> > > > > >>> > > > >>> > > > >>> > > > -- >>> > > > *István Tóth* | Sr. Staff Software Engineer >>> > > > *Email*: st...@cloudera.com >>> > > > cloudera.com <https://www.cloudera.com> >>> > > > [image: Cloudera] <https://www.cloudera.com/> >>> > > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> >>> [image: >>> > > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: >>> > > Cloudera >>> > > > on LinkedIn] <https://www.linkedin.com/company/cloudera> >>> > > > ------------------------------ >>> > > > ------------------------------ >>> > > >>> >> >> >> -- >> *István Tóth* | Sr. Staff Software Engineer >> *Email*: st...@cloudera.com >> cloudera.com <https://www.cloudera.com> >> [image: Cloudera] <https://www.cloudera.com/> >> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: >> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: >> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> >> ------------------------------ >> ------------------------------ >> > > > -- > *István Tóth* | Sr. Staff Software Engineer > *Email*: st...@cloudera.com > cloudera.com <https://www.cloudera.com> > [image: Cloudera] <https://www.cloudera.com/> > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: > Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> > ------------------------------ > ------------------------------ > -- *István Tóth* | Sr. Staff Software Engineer *Email*: st...@cloudera.com cloudera.com <https://www.cloudera.com> [image: Cloudera] <https://www.cloudera.com/> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> ------------------------------ ------------------------------