Thanks for the great input all. See below:
On Wed, Apr 12, 2017 at 9:01 AM, Nick Dimiduk <ndimi...@gmail.com> wrote: > On Wed, Apr 12, 2017 at 8:28 AM Josh Elser <els...@apache.org> wrote: > > > > > > > Sean Busbey wrote: > > > On Tue, Apr 11, 2017 at 11:43 PM Nick Dimiduk<ndimi...@gmail.com> > > wrote: > > > > > >>> This effort is about our internals. We have a mess of other > components > > >> all > > >>> up inside us such as HDFS, etc., each with their own sets of > > dependencies > > >>> many of which we have in common. This project t is about making it so > > we > > >>> can upgrade at a rate independent of when our upstreamers choose to > > >> change. > > >> > (I'd add to the above that we can upgrade libs w/o breaking downstreamers also -- but this point becomes an intrinsic later in the thread) > >> If the above quote is true, then I think what we want is a set of > shaded > > >> Hadoop client libs that we can depend on so as to not get all the > > >> transitive deps. Hadoop doesn't provide it, but we could do so > ourselves > > >> with (yet another) module in our project. Assuming, that is, the > > upstream > > >> client interfaces are well defined and don't leak stuff we care about. > We should do this too (I think you've identified the big 'if' w/ the above identified assumption). As you say later, "... it's time we firm up the boundaries between us and Hadoop.". There is some precedent with hadoop-compat-* modules. Hadoop would be relocated? Spitballing, IIUC, I think this would be a big job (once per version and the vagaries of hadoop/spark) with no guarantee of success on other end because of assumption you call out. Do I have this right? ... > Isolating our clients from our deps is best served by our shaded modules. > What do you think about turning things on their head: for 2.0 the > hbase-client jar is the shaded artifact by default, not the other way > around? We have cleanup to get our deps out of our public interfaces in > order to make this work. > > We should do this at least going forward. hbase2 is the opportunity. Testing and doc is all that is needed? I added it to our hbase2 description doc as a deliverable (though not a blocker). > This proposal of an external shaded dependencies module sounds like an > attempt to solve both concerns at once. It would isolate ourselves from > Hadoop's deps, and it would isolate our clients from our deps. However, it > doesn't isolate our clients from Hadoop's deps, so our users don't really > gain anything from it. I also argue that it creates an unreasonable release > engineering burden on our project. I'm also not clear on the implications > to downstreamers who extend us with coprocessors. > Other than a missing 'quick-fix' descriptor, you call what is proposed well ....except where you think the prebuild will be burdensome. Here I think otherwise as I think releases will be rare, there is nought 'new' in a release but packaged 3rd-party libs, and verification/vote by PMCers should be a simple affair. Do you agree that the fixing-what-we-leak-of-hadoop-to-downstreamers is distinct from the narrower task proposed here where we are trying to unhitch ourselves of the netty/guava hadoop uses? (Currently we break against hadoop3 because of netty incompat., HADOOP-13866, which we might be able to solve w/ exclusions.....but....). The two tasks can be run in parallel? For CPs, they should bring their own bedding and towels and not be trying to use ours. On the plus-side, we could upgrade core 3rd-party libs and the CP would keep working. St.Ack