Hi all, Shading is a big part of how we keep our dependencies sane in Beam. But downsides: shading is super slow, causes massive jar bloat, and kind of hard to get right because artifacts and namespaces are not 1-to-1.
I know that some communities distribute their own shaded distributions of dependencies. I had a thought about doing something similar that I wanted to throw out there for people to poke holes in. To set the scene, here is how I view shading: - A module has public dependencies and private dependencies. - Public deps are used for data interchange; users must share these deps. - Private deps are just functionality and can be hidden (in our case, relocated + bundled) - It isn't necessarily that simple, because public and private deps might interact in higher-order ways ("public" is contagious) Shading is an implementation detail of expressing these characteristics. We use shading selectively because of its downsides I mentioned above. But what about this idea: Introduce shaded deps as a single separate artifact. - sdks/java/private-deps: bundled uber jar with relocated versions of everything we want to shade - sdks/java/core and sdks/java/harness: no relocation or bundling - depends on `beam-sdks-java-private-deps` and imports like `org.apache.beam.sdk.private.com.google.common` directly (this is what they are rewritten to Some benefits - much faster builds of other modules - only one shaded uber jar - rare/no rebuilds of the uber jar - can use maven enforcer to forbid imports like com.google.common - configuration all in one place - no automated rewriting of our real code, which has led to some major confusion - easy to implement incrementally Downsides: - plenty of effort work to get there - unclear how many different such deps modules we need; sharing them could get weird - if we hit a roadblock, we will have committed a lot of time Just something I was musing as I spent another evening waiting for slow builds to try to confirm changes to brittle poms. Kenn