Not sure we'll be able to replace them all. Things like guava Table and Multimap don't have great replacements in Java8.
On Wed, Jan 31, 2018 at 10:11 PM, Jean-Baptiste Onofré <[email protected]> wrote: > +1, it was on my TODO for a while waiting the Java8 update. > > Regards > JB > > On 02/01/2018 06:56 AM, Romain Manni-Bucau wrote: > > Why not dropping guava for all beam codebase? With java 8 it is quite > easy to do > > it and avoid a bunch of conflicts. Did it in 2 projects with quite a > good result. > > > > Le 1 févr. 2018 06:50, "Lukasz Cwik" <[email protected] > > <mailto:[email protected]>> a écrit : > > > > Make sure to include the guava version in the artifact name so that > we can > > have multiple vendored versions. > > > > On Wed, Jan 31, 2018 at 9:16 PM, Kenneth Knowles <[email protected] > > <mailto:[email protected]>> wrote: > > > > I didn't have time for this, but it just bit me. We definitely > have > > Guava on the API surface of runner support code in ways that get > > incompatibly shaded. I will probably start "1a" by making a > shaded > > library org.apache.beam:vendored-guava and starting to use it. > It sounds > > like there is generally unanimous support for that much, anyhow. > > > > Kenn > > > > On Wed, Dec 13, 2017 at 8:31 AM, Aljoscha Krettek < > [email protected] > > <mailto:[email protected]>> wrote: > > > > Thanks Ismaël for bringing up this discussion again! > > > > I would be in favour of 1) and more specifically of 1a) > > > > Aljoscha > > > > > >> On 12. Dec 2017, at 18:56, Lukasz Cwik <[email protected] > >> <mailto:[email protected]>> wrote: > >> > >> You can always run tests on post shaded artifacts instead > of the > >> compiled classes, it just requires us to change our maven > surefire > >> / gradle test configurations but it is true that most IDEs > would > >> behave better with a dependency jar unless you delegate all > the > >> build/test actions to the build system and then it won't > matter. > >> > >> On Mon, Dec 11, 2017 at 9:05 PM, Kenneth Knowles < > [email protected] > >> <mailto:[email protected]>> wrote: > >> > >> There's also, with additional overhead, > >> > >> 1a) A relocated and shipped package for each thing we > want to > >> relocate. I think this has also been tried outside > Beam... > >> > >> Pros: > >> * all the pros of 1) plus no bloat beyond what is > necessary > >> Cons: > >> * abandons whitelist approach for public deps, > reverting to > >> blacklist approach for trouble things like guava, so a > bit > >> less principled > >> > >> For both 1) and 1a) I would add: > >> > >> Pros: > >> * clearly readable dependency since code will `import > >> org.apache.beam.private.guava21` and IDEs will > understand it > >> is a distinct lilbrary > >> * can run tests on unpackaged classes, as long as deps > are > >> shaded or provided as jars > >> * no mysterious action at a distance from inherited > configuration > >> Cons: > >> * need to adjust imports > >> > >> Kenn > >> > >> On Mon, Dec 11, 2017 at 9:57 AM, Lukasz Cwik < > [email protected] > >> <mailto:[email protected]>> wrote: > >> > >> I would suggest that either we use: > >> 1) A common deps package containing shaded > dependencies > >> allows for > >> Pros > >> * doesn't require the user to build an uber jar > >> Risks > >> * dependencies package will keep growing even if > something > >> is or isn't needed by all of Apache Beam leading to > a > >> large jar anyways negating any space savings > >> > >> 2) Shade within each module to a common location > like > >> org.apache.beam.relocated.guava.... > >> Pros > >> * you only get the shaded dependencies of the > things that > >> are required > >> * its one less dependency for users to manage > >> Risks > >> * requires an uber jar to be built to get the space > >> savings (either by a user or a distribution of > Apache > >> Beam) otherwise we negate any space savings. > >> > >> If we either use a common relocation scheme or a > >> dependencies jar then each relocation should > specifically > >> contain the version number of the package because > we would > >> like to allow for us to be using two different > versions of > >> the same library. > >> > >> For the common deps package approach, should we > check in > >> code where the imports contain the relocated > location > >> (e.g. import org.apache.beam.guava.20.0.com > >> <http://org.apache.beam.guava.20.0.com/ > >.google.common.collect.ImmutableList)? > >> > >> > >> On Mon, Dec 11, 2017 at 8:47 AM, Jean-Baptiste > Onofré > >> <[email protected] <mailto:[email protected]>> wrote: > >> > >> Thanks for bringing that back. > >> > >> Indeed guava is shaded in different uber-jar. > Maybe we > >> can have a common deps module that we include > once > >> (but the user will have to explicitly define > the dep) ? > >> > >> Basically, what do you propose for protobuf > >> (unfortunately, I don't see an obvious) ? > >> > >> Regards > >> JB > >> > >> > >> On 12/11/2017 05:35 PM, Ismaël Mejía wrote: > >> > >> Hello, I wanted to bring back this subject > because > >> I think we should > >> take action on this and at least first have > a > >> shaded version of guava. > >> I was playing with a toy project and I did > the > >> procedure we use to > >> submit jars to a Hadoop cluster via > Flink/Spark > >> which involves > >> creating an uber jar and I realized that > the size > >> of the jar was way > >> bigger than I expected, and the fact that > we shade > >> guava in every > >> module contributes to this. I found guava > shaded on: > >> > >> sdks/java/core > >> runners/core-construction-java > >> runners/core-java > >> model/job-management > >> runners/spark > >> sdks/java/io/hadoop-file-system > >> sdks/java/io/kafka > >> > >> This means at least 6 times more of the > size it > >> should which counts in > >> around 15MB more (2.4MB*6 deps) of extra > weight > >> that we can simply > >> reduce by using a shaded version of the > library. > >> > >> I add this point to the previous ones > mentioned > >> during the discussion > >> because this goes against the end user > experience > >> and affects us all > >> (devs/users). > >> > >> Another question is if we should shade (and > how) > >> protocol buffers > >> because now with the portability work we are > >> exposing it closer to the > >> end users. I say this because I also found > an > >> issue while running a > >> job on YARN with the spark runner because > >> hadoop-common includes > >> protobuf-java 2 and I had to explicitly > provide > >> protocol-buffers 3 as > >> a dependency to be able to use triggers > (note the > >> Spark runner > >> translates them using some method from > >> runners/core-java). Since > >> hadoop-common is provided in the cluster > with the > >> older version of > >> protobuf, I am afraid that this will bite > us in > >> the future. > >> > >> Ismaël > >> > >> ps. There is already a JIRA for that > shading for > >> protobuf on > >> hadoop-common but this is not coming until > version > >> 3 is out. > >> https://issues.apache.org/ > jira/browse/HADOOP-13136 > >> <https://issues.apache.org/ > jira/browse/HADOOP-13136> > >> > >> ps2. Extra curious situation is to see that > the > >> dataflow-runner ends > >> up having guava shaded twice via its shaded > version on > >> core-construction-java. > >> > >> ps3. Of course this message means a > de-facto +1 at > >> least to do it for > >> guava and evaluate it for other libs. > >> > >> > >> On Tue, Oct 17, 2017 at 7:29 PM, Lukasz Cwik > >> <[email protected] > >> <mailto:[email protected]>> wrote: > >> > >> An issue to call out is how to deal > with our > >> generated code (.avro and > >> .proto) as I don't believe those > plugins allow > >> you to generate code using a > >> shaded package prefix on imports. > >> > >> On Tue, Oct 17, 2017 at 10:28 AM, > Thomas Groh > >> <[email protected] > >> <mailto:[email protected]>> > >> wrote: > >> > >> +1 to the goal. I'm hugely in favor > of not > >> doing the same shading work > >> every time for dependencies we know > we'll use. > >> > >> This also means that if we end up > pulling > >> in transitive dependencies we > >> don't want in any particular module > we can > >> avoid having to adjust our > >> repackaging strategy for that > module - > >> which I have run into face-first in > >> the past. > >> > >> On Tue, Oct 17, 2017 at 9:48 AM, > Kenneth > >> Knowles <[email protected] > >> <mailto:[email protected]>> > >> wrote: > >> > >> Hi all, > >> > >> Shading is a big part of how we > keep > >> our dependencies sane in Beam. > But > >> downsides: shading is super > slow, > >> causes massive jar bloat, and > kind of > >> hard to get right because > artifacts > >> and namespaces are not 1-to-1. > >> > >> I know that some communities > >> distribute their own shaded > >> distributions of > >> dependencies. I had a thought > about > >> doing something similar that I > wanted > >> to throw out there for people > to poke > >> holes in. > >> > >> To set the scene, here is how I > view > >> shading: > >> > >> - A module has public > dependencies > >> and private dependencies. > >> - Public deps are used for > data > >> interchange; users must share > these > >> > >> deps. > >> > >> - Private deps are just > >> functionality and can be hidden > (in > >> our case, > >> relocated + bundled) > >> - It isn't necessarily that > simple, > >> because public and private deps > >> > >> might > >> > >> interact in higher-order ways > >> ("public" is contagious) > >> > >> Shading is an implementation > detail of > >> expressing these > characteristics. > >> > >> We > >> > >> use shading selectively because > of its > >> downsides I mentioned above. > >> > >> But what about this idea: > Introduce > >> shaded deps as a single separate > >> artifact. > >> > >> - sdks/java/private-deps: > bundled > >> uber jar with relocated > versions of > >> everything we want to shade > >> > >> - sdks/java/core and > >> sdks/java/harness: no > relocation or > >> bundling - > >> depends on > >> `beam-sdks-java-private-deps` > and > >> imports like > >> `org.apache.beam.sdk.private. > com.google.common` > >> directly (this is what > >> they > >> are rewritten to > >> > >> Some benefits > >> > >> - much faster builds of other > modules > >> - only one shaded uber jar > >> - rare/no rebuilds of the > uber jar > >> - can use maven enforcer to > forbid > >> imports like com.google.common > >> - configuration all in one > place > >> - no automated rewriting of > our real > >> code, which has led to some > major > >> confusion > >> - easy to implement > incrementally > >> > >> Downsides: > >> > >> - plenty of effort work to > get there > >> - unclear how many different > such > >> deps modules we need; sharing > them > >> > >> could > >> > >> get weird > >> - if we hit a roadblock, we > will > >> have committed a lot of time > >> > >> Just something I was musing as > I spent > >> another evening waiting for slow > >> builds to try to confirm > changes to > >> brittle poms. > >> > >> Kenn > >> > >> > >> > >> -- > >> Jean-Baptiste Onofré > >> [email protected] <mailto:[email protected] > > > >> http://blog.nanthrax.net < > http://blog.nanthrax.net/> > >> Talend - http://www.talend.com < > http://www.talend.com/> > >> > >> > >> > >> > > > > > > > > -- > Jean-Baptiste Onofré > [email protected] > http://blog.nanthrax.net > Talend - http://www.talend.com >
