+1 to the goal. I'm hugely in favor of not doing the same shading work every time for dependencies we know we'll use.
This also means that if we end up pulling in transitive dependencies we don't want in any particular module we can avoid having to adjust our repackaging strategy for that module - which I have run into face-first in the past. On Tue, Oct 17, 2017 at 9:48 AM, Kenneth Knowles <[email protected]> wrote: > Hi all, > > Shading is a big part of how we keep our dependencies sane in Beam. But > downsides: shading is super slow, causes massive jar bloat, and kind of > hard to get right because artifacts and namespaces are not 1-to-1. > > I know that some communities distribute their own shaded distributions of > dependencies. I had a thought about doing something similar that I wanted > to throw out there for people to poke holes in. > > To set the scene, here is how I view shading: > > - A module has public dependencies and private dependencies. > - Public deps are used for data interchange; users must share these deps. > - Private deps are just functionality and can be hidden (in our case, > relocated + bundled) > - It isn't necessarily that simple, because public and private deps might > interact in higher-order ways ("public" is contagious) > > Shading is an implementation detail of expressing these characteristics. We > use shading selectively because of its downsides I mentioned above. > > But what about this idea: Introduce shaded deps as a single separate > artifact. > > - sdks/java/private-deps: bundled uber jar with relocated versions of > everything we want to shade > > - sdks/java/core and sdks/java/harness: no relocation or bundling - > depends on `beam-sdks-java-private-deps` and imports like > `org.apache.beam.sdk.private.com.google.common` directly (this is what > they > are rewritten to > > Some benefits > > - much faster builds of other modules > - only one shaded uber jar > - rare/no rebuilds of the uber jar > - can use maven enforcer to forbid imports like com.google.common > - configuration all in one place > - no automated rewriting of our real code, which has led to some major > confusion > - easy to implement incrementally > > Downsides: > > - plenty of effort work to get there > - unclear how many different such deps modules we need; sharing them could > get weird > - if we hit a roadblock, we will have committed a lot of time > > Just something I was musing as I spent another evening waiting for slow > builds to try to confirm changes to brittle poms. > > Kenn >
