I have opened HBASE-29187 for this. Should we publish this testing assembly for releases ? Should this testing assembly be built by default, or should we gate it with some property ?
Istvan On Sat, Mar 15, 2025 at 7:13 PM Istvan Toth <st...@cloudera.com> wrote: > I will work on it next week. > > On Sat, Mar 15, 2025 at 3:01 PM 张铎(Duo Zhang) <palomino...@gmail.com> > wrote: > >> Do we have an issue for it? >> >> At least for releasing 3.0.0, we need to run ITBLL... >> >> Thanks. >> >> Istvan Toth <st...@apache.org> 于2025年3月15日周六 12:32写道: >> > >> > Yes, we've discussed this issue, but deferred solving it. >> > >> > IMO the easiest way is to add a third assembly which is functionally >> > equivalent to the 2.x assembly. >> > That could work for running hbase-it, ITBLL, chaos monkey, etc. >> > We'd also have to decide whether to build it by default, and whether we >> > want to publish it as part of official releases. >> > >> > In theory could also make a delta assembly that only includes the >> > additional test related stuff, but I'm afraid that that would require a >> lot >> > of maintenance. >> > We could also add a script/maven target that downloads test-related JARs >> > from maven, but keeping that one up to date would also be problematic. >> > >> > Istvan >> > >> > On Sat, Mar 15, 2025 at 4:51 AM 张铎(Duo Zhang) <palomino...@gmail.com> >> wrote: >> > >> > > After this change, we can not run ITBLL on 3.0.0 because hbase-it is >> > > also excluded... >> > > >> > > I tried manually copying all the tests jar and hbase-it jar to the lib >> > > directory but it did not work, I guess we still missed several hadoop >> > > jars... >> > > >> > > So what is the suggested way to run ITBLL after this change? >> > > >> > > Thanks. >> > > >> > > Istvan Toth <st...@cloudera.com.invalid> 于2025年1月20日周一 14:20写道: >> > > > >> > > > This is almost done. >> > > > >> > > > The final outstanding patch is >> https://github.com/apache/hbase/pull/5766 >> > > > for the new Hadoop-less assembly. >> > > > >> > > > Could you please review it ? >> > > > >> > > > >> > > > >> > > > On Sat, Mar 9, 2024 at 8:48 AM Nihal Jain <nihaljain...@gmail.com> >> > > wrote: >> > > > >> > > > > I have created sub tasks with necessary details in the umbrella >> jira. >> > > Will >> > > > > take them up in coming days. Also will add more sub tasks later if >> > > needed. >> > > > > >> > > > > Regards >> > > > > Nihal >> > > > > >> > > > > On Sat, 9 Mar 2024, 11:53 Istvan Toth, <st...@cloudera.com.invalid >> > >> > > wrote: >> > > > > >> > > > > > Thank you Nihal. >> > > > > > I'm not very familiar with the tools in the test code, so you >> can >> > > > > probably >> > > > > > plan that work better. >> > > > > > I just have some generic steps in mind: >> > > > > > * Identify all the tools / scripts in the test jars >> > > > > > * Identify and analyze their dependencies (compared to the >> current >> > > > > runtime >> > > > > > deps) >> > > > > > * Decide which ones to move to the runtime JARs. >> > > > > > * Move them to the runtime code (or perhaps a separate module) >> > > > > > >> > > > > > I have created >> https://issues.apache.org/jira/browse/HBASE-28431 as >> > > an >> > > > > > umbrella ticket to organize the sub-tasks. >> > > > > > >> > > > > > Istvan >> > > > > > >> > > > > > On Fri, Mar 8, 2024 at 7:06 PM Nihal Jain < >> nihaljain...@gmail.com> >> > > > > wrote: >> > > > > > >> > > > > > > Sure I will be able to take up. Please create tasks with >> necessary >> > > > > > details >> > > > > > > or let me know if you want me to create. >> > > > > > > >> > > > > > > On Fri, 8 Mar 2024, 12:45 Istvan Toth, >> <st...@cloudera.com.invalid >> > > > >> > > > > > wrote: >> > > > > > > >> > > > > > > > Thanks for volunteering, Nihal. >> > > > > > > > >> > > > > > > > I could work on the Hadoop-less, and assemblies, and you >> could >> > > work >> > > > > on >> > > > > > > > cleaning up the test jars. >> > > > > > > > Would that work for you ? >> > > > > > > > I know that I'm picking the smaller part, but it turns out >> that I >> > > > > won't >> > > > > > > > have as much time to work on this as I hoped. >> > > > > > > > >> > > > > > > > (Unless there are other volunteers, of course) >> > > > > > > > >> > > > > > > > Istvan >> > > > > > > > >> > > > > > > > On Wed, Mar 6, 2024 at 7:03 PM Istvan Toth < >> st...@cloudera.com> >> > > > > wrote: >> > > > > > > > >> > > > > > > > > We seem to be in agreement in principle, however the >> devil is >> > > in >> > > > > the >> > > > > > > > > details. >> > > > > > > > > >> > > > > > > > > The first step should be moving the diagnostic tools out >> of the >> > > > > test >> > > > > > > > jars. >> > > > > > > > > Are there any tools we don't want to move out ? >> > > > > > > > > Do the diagnostic tools pull in extra dependencies >> compared to >> > > the >> > > > > > > > current >> > > > > > > > > runtime JARs, and if they do, what are those ? >> > > > > > > > > I haven't thought of the chaosmonkey tests yet, do those >> have >> > > > > > specific >> > > > > > > > > additional dependencies / scripts ? >> > > > > > > > > >> > > > > > > > > Should we move the tools simply to the normal jars, or >> should >> > > we >> > > > > move >> > > > > > > > them >> > > > > > > > > to a new module (could be called hbase-diagnostics) ? >> > > > > > > > > >> > > > > > > > > Istvan >> > > > > > > > > >> > > > > > > > > On Tue, Mar 5, 2024 at 7:10 PM Bryan Beaudreault < >> > > > > > > > bbeaudrea...@apache.org> >> > > > > > > > > wrote: >> > > > > > > > > >> > > > > > > > >> I'm +0 on hbase-examples, but +1000000 on any >> improvements we >> > > can >> > > > > > make >> > > > > > > > to >> > > > > > > > >> ltt/pe/chaos/minicluster/etc. It's extremely frustrating >> how >> > > much >> > > > > > > > reliance >> > > > > > > > >> we have on test jars both generally but also specifically >> > > around >> > > > > > these >> > > > > > > > >> core >> > > > > > > > >> test executables. Unfortunately I haven't had time to >> > > dedicate to >> > > > > > > these >> > > > > > > > >> frustrations myself, but happy to help with review, etc. >> > > > > > > > >> >> > > > > > > > >> On Tue, Mar 5, 2024 at 1:03 PM Nihal Jain < >> > > nihaljain...@gmail.com >> > > > > > >> > > > > > > > wrote: >> > > > > > > > >> >> > > > > > > > >> > Thank you for bringing this up. >> > > > > > > > >> > >> > > > > > > > >> > +1 for this change. >> > > > > > > > >> > >> > > > > > > > >> > In fact, some time back, we had faced similar problem. >> > > Security >> > > > > > > scans >> > > > > > > > >> found >> > > > > > > > >> > that we were bundling some vulnerable hadoop test jar. >> To >> > > deal >> > > > > > with >> > > > > > > > >> that we >> > > > > > > > >> > had to make a change in our internal HBase fork to >> exclude >> > > all >> > > > > > HBase >> > > > > > > > and >> > > > > > > > >> > Hadoop test jars from assembly. This helped us get rid >> of >> > > > > > vulnerable >> > > > > > > > >> jar. >> > > > > > > > >> > (Although I hadn't dealt with test scope dependencies >> > > there.) >> > > > > > > > >> > >> > > > > > > > >> > But, I have been thinking of pushing this change in >> Apache >> > > > > HBase, >> > > > > > > just >> > > > > > > > >> > wasn't sure if this was even acceptable. It's great to >> see >> > > same >> > > > > > has >> > > > > > > > been >> > > > > > > > >> > brought up here today. >> > > > > > > > >> > >> > > > > > > > >> > We hadn't dealt with the ltt, pe etc. tools and wrote a >> > > script >> > > > > to >> > > > > > > > >> download >> > > > > > > > >> > them on demand to avoid massive code change in internal >> > > fork. >> > > > > But >> > > > > > I >> > > > > > > > >> have a >> > > > > > > > >> > +1 on the idea of identifying and moving all such >> tools to >> > > a new >> > > > > > > > module. >> > > > > > > > >> > This would be great and make things easier for us as >> well. >> > > > > > > > >> > >> > > > > > > > >> > Also, a way we could help new users easily get >> started, in >> > > case >> > > > > we >> > > > > > > > >> > completely stop bundling hadoop jars, is by providing a >> > > script >> > > > > > which >> > > > > > > > >> starts >> > > > > > > > >> > a hbase cluster in a single node setup. In fact I had >> > > written a >> > > > > > > simple >> > > > > > > > >> > script sometime back that automates this process given >> a >> > > release >> > > > > > > link >> > > > > > > > >> for >> > > > > > > > >> > both. It first downloads Hadoop and HBase binaries and >> then >> > > > > starts >> > > > > > > > both >> > > > > > > > >> > with the hbase root directory set to be on hdfs. We >> could >> > > > > provide >> > > > > > > > >> something >> > > > > > > > >> > similar to help new users to get started easily. >> > > > > > > > >> > >> > > > > > > > >> > Although I am also +1 on the idea to provide both >> variants >> > > as >> > > > > > > > mentioned >> > > > > > > > >> by >> > > > > > > > >> > Nick, which might not even need any such script. >> > > > > > > > >> > >> > > > > > > > >> > Also, I am willing to volunteer for help towards this >> > > effort. >> > > > > > Please >> > > > > > > > >> let me >> > > > > > > > >> > know if anything is needed. >> > > > > > > > >> > >> > > > > > > > >> > Thanks, >> > > > > > > > >> > Nihal >> > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > >> > On Tue, 5 Mar 2024, 15:35 Nick Dimiduk, < >> > > ndimi...@apache.org> >> > > > > > > wrote: >> > > > > > > > >> > >> > > > > > > > >> > > This would be great cleanup, big +1 from me for all >> three >> > > of >> > > > > > these >> > > > > > > > >> > > adjustments, including the promotion of pe, ltt, and >> > > friends >> > > > > out >> > > > > > > of >> > > > > > > > >> the >> > > > > > > > >> > > test scope. >> > > > > > > > >> > > >> > > > > > > > >> > > I believe that we included hbase test jars because we >> > > used to >> > > > > > > freely >> > > > > > > > >> mix >> > > > > > > > >> > > classes needed for minicluster between runtime and >> test >> > > jars, >> > > > > > > which >> > > > > > > > in >> > > > > > > > >> > turn >> > > > > > > > >> > > relied on Hadoop minicluster capabilities. The big >> cleanup >> > > > > > around >> > > > > > > > >> > > HBaseTestingUtil/it addressed much (or all) of these >> > > issues on >> > > > > > > > >> branch-3. >> > > > > > > > >> > > >> > > > > > > > >> > > I believe that we include a Hadoop distribution in >> our >> > > > > assembly >> > > > > > > > >> because >> > > > > > > > >> > > that makes it easy for a new user to download our >> release >> > > > > > bin.tgz >> > > > > > > > and >> > > > > > > > >> get >> > > > > > > > >> > > started immediately with learning. I guess it’s high >> time >> > > that >> > > > > > we >> > > > > > > > work >> > > > > > > > >> > out >> > > > > > > > >> > > the with- and without-Hadoop variants. >> > > > > > > > >> > > >> > > > > > > > >> > > Thanks, >> > > > > > > > >> > > Nick >> > > > > > > > >> > > >> > > > > > > > >> > > On Tue, 5 Mar 2024 at 09:14, Istvan Toth < >> > > st...@apache.org> >> > > > > > > wrote: >> > > > > > > > >> > > >> > > > > > > > >> > > > DISCLAIMER: I don't have a patch ready, or even an >> > > elegant >> > > > > way >> > > > > > > > >> mapped >> > > > > > > > >> > out >> > > > > > > > >> > > > to achieve this, this is about discussing whether >> we >> > > even >> > > > > want >> > > > > > > to >> > > > > > > > >> make >> > > > > > > > >> > > > these changes. >> > > > > > > > >> > > > These are also substantial changes, but they could >> be >> > > > > targeted >> > > > > > > for >> > > > > > > > >> > HBase >> > > > > > > > >> > > > 3.0. >> > > > > > > > >> > > > >> > > > > > > > >> > > > One issue I have noticed is that we ship test jars >> and >> > > test >> > > > > > > > >> > dependencies >> > > > > > > > >> > > in >> > > > > > > > >> > > > the assembly. >> > > > > > > > >> > > > I can't see anyone using those, but it bloats the >> > > assembly >> > > > > and >> > > > > > > > >> > classpath, >> > > > > > > > >> > > > and adds unnecessary JARs with possible CVE >> issues. (for >> > > > > > example >> > > > > > > > >> Kerby >> > > > > > > > >> > > > which is a Hadoop minicluster dependency) >> > > > > > > > >> > > > >> > > > > > > > >> > > > My proposal is to exclude the test jars and the >> test >> > > scope >> > > > > > > > >> dependencies >> > > > > > > > >> > > > from the assembly. >> > > > > > > > >> > > > >> > > > > > > > >> > > > The advantages would be: >> > > > > > > > >> > > > * Smaller distro size >> > > > > > > > >> > > > * Faster startup (this is marginal) >> > > > > > > > >> > > > * Less CVE-prone JARs in the binary assemblies >> > > > > > > > >> > > > >> > > > > > > > >> > > > The other issue is that the assembly includes much >> of >> > > the >> > > > > > Hadoop >> > > > > > > > >> > > > distribution. >> > > > > > > > >> > > > The basic assumption in all scripts and >> instructions is >> > > that >> > > > > > the >> > > > > > > > >> node >> > > > > > > > >> > > has a >> > > > > > > > >> > > > fully configured Hadoop installation, and we >> include it >> > > in >> > > > > the >> > > > > > > > >> > classpath >> > > > > > > > >> > > of >> > > > > > > > >> > > > HBase. >> > > > > > > > >> > > > >> > > > > > > > >> > > > If that is true, then there is no reason to include >> > > Hadoop >> > > > > in >> > > > > > > the >> > > > > > > > >> > > assembly, >> > > > > > > > >> > > > HBase and its direct dependencies should be enough. >> > > > > > > > >> > > > >> > > > > > > > >> > > > One could argue that it would simplify the client >> side, >> > > > > which >> > > > > > is >> > > > > > > > >> true >> > > > > > > > >> > to >> > > > > > > > >> > > > some extent (though 95% of the client distro use >> cases >> > > are >> > > > > > > served >> > > > > > > > >> > better >> > > > > > > > >> > > by >> > > > > > > > >> > > > simply using hbase-shaded-client). >> > > > > > > > >> > > > >> > > > > > > > >> > > > We could either remove the Hadoop libraries from >> either >> > > or >> > > > > > both >> > > > > > > of >> > > > > > > > >> the >> > > > > > > > >> > > > assemblies unconditionally, or provide two >> variants for >> > > > > either >> > > > > > > or >> > > > > > > > >> both >> > > > > > > > >> > > > assemblies, one with Hadoop included, and one >> without >> > > it. >> > > > > > > > >> > > > Spark already does this, it has binary >> distributions >> > > both >> > > > > with >> > > > > > > and >> > > > > > > > >> > > without >> > > > > > > > >> > > > Hadoop. >> > > > > > > > >> > > > >> > > > > > > > >> > > > The advantages would be: >> > > > > > > > >> > > > * Smaller distro size >> > > > > > > > >> > > > * Faster startup (this is marginal) >> > > > > > > > >> > > > * Less chance of conflicts with the Hadoop jars >> > > > > > > > >> > > > * Less CVE-prone JARs in the binary assemblies >> > > > > > > > >> > > > >> > > > > > > > >> > > > >> > > > > > > > >> > > > Thirdly, we could consider excluding the >> > > > > > > > >> > > > full-fat org.apache.hbase:hbase-shaded-client JAR >> from >> > > the >> > > > > > > > >> Hadoop-less >> > > > > > > > >> > > > binary assemblies. It is not used by the assembly, >> and >> > > AFAIK >> > > > > > it >> > > > > > > is >> > > > > > > > >> not >> > > > > > > > >> > > > included in any of the 'hbase classpath' command >> > > variants. >> > > > > > > > >> > > > >> > > > > > > > >> > > > This would make sure that no Hadoop libraries are >> > > included >> > > > > > (even >> > > > > > > > in >> > > > > > > > >> > > shaded >> > > > > > > > >> > > > form) and would make the HBase distribution fully >> > > insulated >> > > > > > from >> > > > > > > > >> > Hadoop's >> > > > > > > > >> > > > CVE issues. >> > > > > > > > >> > > > >> > > > > > > > >> > > > (The full-fat hbase-shaded-client works best as >> direct >> > > > > > > build-time >> > > > > > > > >> > > > dependency anyway) >> > > > > > > > >> > > > >> > > > > > > > >> > > > best regards >> > > > > > > > >> > > > Istvan >> > > > > > > > >> > > > >> > > > > > > > >> > > >> > > > > > > > >> > >> > > > > > > > >> >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > -- >> > > > > > > > > *István Tóth* | Sr. Staff Software Engineer >> > > > > > > > > *Email*: st...@cloudera.com >> > > > > > > > > cloudera.com <https://www.cloudera.com> >> > > > > > > > > [image: Cloudera] <https://www.cloudera.com/> >> > > > > > > > > [image: Cloudera on Twitter] < >> https://twitter.com/cloudera> >> > > > > [image: >> > > > > > > > > Cloudera on Facebook] <https://www.facebook.com/cloudera> >> > > [image: >> > > > > > > > > Cloudera on LinkedIn] < >> > > https://www.linkedin.com/company/cloudera> >> > > > > > > > > ------------------------------ >> > > > > > > > > ------------------------------ >> > > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > -- >> > > > > > > > *István Tóth* | Sr. Staff Software Engineer >> > > > > > > > *Email*: st...@cloudera.com >> > > > > > > > cloudera.com <https://www.cloudera.com> >> > > > > > > > [image: Cloudera] <https://www.cloudera.com/> >> > > > > > > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> >> > > [image: >> > > > > > > > Cloudera on Facebook] <https://www.facebook.com/cloudera> >> > > [image: >> > > > > > > Cloudera >> > > > > > > > on LinkedIn] <https://www.linkedin.com/company/cloudera> >> > > > > > > > ------------------------------ >> > > > > > > > ------------------------------ >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > -- >> > > > > > *István Tóth* | Sr. Staff Software Engineer >> > > > > > *Email*: st...@cloudera.com >> > > > > > cloudera.com <https://www.cloudera.com> >> > > > > > [image: Cloudera] <https://www.cloudera.com/> >> > > > > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> >> [image: >> > > > > > Cloudera on Facebook] <https://www.facebook.com/cloudera> >> [image: >> > > > > Cloudera >> > > > > > on LinkedIn] <https://www.linkedin.com/company/cloudera> >> > > > > > ------------------------------ >> > > > > > ------------------------------ >> > > > > > >> > > > > >> > > > >> > > > >> > > > -- >> > > > *István Tóth* | Sr. Staff Software Engineer >> > > > *Email*: st...@cloudera.com >> > > > cloudera.com <https://www.cloudera.com> >> > > > [image: Cloudera] <https://www.cloudera.com/> >> > > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: >> > > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: >> > > Cloudera >> > > > on LinkedIn] <https://www.linkedin.com/company/cloudera> >> > > > ------------------------------ >> > > > ------------------------------ >> > > >> > > > -- > *István Tóth* | Sr. Staff Software Engineer > *Email*: st...@cloudera.com > cloudera.com <https://www.cloudera.com> > [image: Cloudera] <https://www.cloudera.com/> > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: > Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> > ------------------------------ > ------------------------------ > -- *István Tóth* | Sr. Staff Software Engineer *Email*: st...@cloudera.com cloudera.com <https://www.cloudera.com> [image: Cloudera] <https://www.cloudera.com/> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> ------------------------------ ------------------------------