Hi all,
Shading is a big part of how we keep our dependencies sane in Beam. But
downsides: shading is super slow, causes massive jar bloat, and kind of
hard to get right because artifacts and namespaces are not 1-to-1.
I know that some communities distribute their own shaded distributions of
dependencies. I had a thought about doing something similar that I wanted
to throw out there for people to poke holes in.
To set the scene, here is how I view shading:
- A module has public dependencies and private dependencies.
- Public deps are used for data interchange; users must share these deps.
- Private deps are just functionality and can be hidden (in our case,
relocated + bundled)
- It isn't necessarily that simple, because public and private deps might
interact in higher-order ways ("public" is contagious)
Shading is an implementation detail of expressing these characteristics. We
use shading selectively because of its downsides I mentioned above.
But what about this idea: Introduce shaded deps as a single separate
artifact.
- sdks/java/private-deps: bundled uber jar with relocated versions of
everything we want to shade
- sdks/java/core and sdks/java/harness: no relocation or bundling -
depends on `beam-sdks-java-private-deps` and imports like
`org.apache.beam.sdk.private.com.google.common` directly (this is what they
are rewritten to
Some benefits
- much faster builds of other modules
- only one shaded uber jar
- rare/no rebuilds of the uber jar
- can use maven enforcer to forbid imports like com.google.common
- configuration all in one place
- no automated rewriting of our real code, which has led to some major
confusion
- easy to implement incrementally
Downsides:
- plenty of effort work to get there
- unclear how many different such deps modules we need; sharing them could
get weird
- if we hit a roadblock, we will have committed a lot of time
Just something I was musing as I spent another evening waiting for slow
builds to try to confirm changes to brittle poms.
Kenn