Why not dropping guava for all beam codebase? With java 8 it is quite easy to do it and avoid a bunch of conflicts. Did it in 2 projects with quite a good result.
Le 1 févr. 2018 06:50, "Lukasz Cwik" <lc...@google.com> a écrit : > Make sure to include the guava version in the artifact name so that we can > have multiple vendored versions. > > On Wed, Jan 31, 2018 at 9:16 PM, Kenneth Knowles <k...@google.com> wrote: > >> I didn't have time for this, but it just bit me. We definitely have Guava >> on the API surface of runner support code in ways that get incompatibly >> shaded. I will probably start "1a" by making a shaded library >> org.apache.beam:vendored-guava and starting to use it. It sounds like there >> is generally unanimous support for that much, anyhow. >> >> Kenn >> >> On Wed, Dec 13, 2017 at 8:31 AM, Aljoscha Krettek <aljos...@apache.org> >> wrote: >> >>> Thanks Ismaël for bringing up this discussion again! >>> >>> I would be in favour of 1) and more specifically of 1a) >>> >>> Aljoscha >>> >>> >>> On 12. Dec 2017, at 18:56, Lukasz Cwik <lc...@google.com> wrote: >>> >>> You can always run tests on post shaded artifacts instead of the >>> compiled classes, it just requires us to change our maven surefire / gradle >>> test configurations but it is true that most IDEs would behave better with >>> a dependency jar unless you delegate all the build/test actions to the >>> build system and then it won't matter. >>> >>> On Mon, Dec 11, 2017 at 9:05 PM, Kenneth Knowles <k...@google.com> wrote: >>> >>>> There's also, with additional overhead, >>>> >>>> 1a) A relocated and shipped package for each thing we want to relocate. >>>> I think this has also been tried outside Beam... >>>> >>>> Pros: >>>> * all the pros of 1) plus no bloat beyond what is necessary >>>> Cons: >>>> * abandons whitelist approach for public deps, reverting to blacklist >>>> approach for trouble things like guava, so a bit less principled >>>> >>>> For both 1) and 1a) I would add: >>>> >>>> Pros: >>>> * clearly readable dependency since code will `import >>>> org.apache.beam.private.guava21` and IDEs will understand it is a >>>> distinct lilbrary >>>> * can run tests on unpackaged classes, as long as deps are shaded or >>>> provided as jars >>>> * no mysterious action at a distance from inherited configuration >>>> Cons: >>>> * need to adjust imports >>>> >>>> Kenn >>>> >>>> On Mon, Dec 11, 2017 at 9:57 AM, Lukasz Cwik <lc...@google.com> wrote: >>>> >>>>> I would suggest that either we use: >>>>> 1) A common deps package containing shaded dependencies allows for >>>>> Pros >>>>> * doesn't require the user to build an uber jar >>>>> Risks >>>>> * dependencies package will keep growing even if something is or isn't >>>>> needed by all of Apache Beam leading to a large jar anyways negating any >>>>> space savings >>>>> >>>>> 2) Shade within each module to a common location like >>>>> org.apache.beam.relocated.guava.... >>>>> Pros >>>>> * you only get the shaded dependencies of the things that are required >>>>> * its one less dependency for users to manage >>>>> Risks >>>>> * requires an uber jar to be built to get the space savings (either by >>>>> a user or a distribution of Apache Beam) otherwise we negate any space >>>>> savings. >>>>> >>>>> If we either use a common relocation scheme or a dependencies jar then >>>>> each relocation should specifically contain the version number of the >>>>> package because we would like to allow for us to be using two different >>>>> versions of the same library. >>>>> >>>>> For the common deps package approach, should we check in code where >>>>> the imports contain the relocated location (e.g. import >>>>> org.apache.beam.guava.20.0.com.google.common.collect.ImmutableList)? >>>>> >>>>> >>>>> On Mon, Dec 11, 2017 at 8:47 AM, Jean-Baptiste Onofré <j...@nanthrax.net >>>>> > wrote: >>>>> >>>>>> Thanks for bringing that back. >>>>>> >>>>>> Indeed guava is shaded in different uber-jar. Maybe we can have a >>>>>> common deps module that we include once (but the user will have to >>>>>> explicitly define the dep) ? >>>>>> >>>>>> Basically, what do you propose for protobuf (unfortunately, I don't >>>>>> see an obvious) ? >>>>>> >>>>>> Regards >>>>>> JB >>>>>> >>>>>> >>>>>> On 12/11/2017 05:35 PM, Ismaël Mejía wrote: >>>>>> >>>>>>> Hello, I wanted to bring back this subject because I think we should >>>>>>> take action on this and at least first have a shaded version of >>>>>>> guava. >>>>>>> I was playing with a toy project and I did the procedure we use to >>>>>>> submit jars to a Hadoop cluster via Flink/Spark which involves >>>>>>> creating an uber jar and I realized that the size of the jar was way >>>>>>> bigger than I expected, and the fact that we shade guava in every >>>>>>> module contributes to this. I found guava shaded on: >>>>>>> >>>>>>> sdks/java/core >>>>>>> runners/core-construction-java >>>>>>> runners/core-java >>>>>>> model/job-management >>>>>>> runners/spark >>>>>>> sdks/java/io/hadoop-file-system >>>>>>> sdks/java/io/kafka >>>>>>> >>>>>>> This means at least 6 times more of the size it should which counts >>>>>>> in >>>>>>> around 15MB more (2.4MB*6 deps) of extra weight that we can simply >>>>>>> reduce by using a shaded version of the library. >>>>>>> >>>>>>> I add this point to the previous ones mentioned during the discussion >>>>>>> because this goes against the end user experience and affects us all >>>>>>> (devs/users). >>>>>>> >>>>>>> Another question is if we should shade (and how) protocol buffers >>>>>>> because now with the portability work we are exposing it closer to >>>>>>> the >>>>>>> end users. I say this because I also found an issue while running a >>>>>>> job on YARN with the spark runner because hadoop-common includes >>>>>>> protobuf-java 2 and I had to explicitly provide protocol-buffers 3 as >>>>>>> a dependency to be able to use triggers (note the Spark runner >>>>>>> translates them using some method from runners/core-java). Since >>>>>>> hadoop-common is provided in the cluster with the older version of >>>>>>> protobuf, I am afraid that this will bite us in the future. >>>>>>> >>>>>>> Ismaël >>>>>>> >>>>>>> ps. There is already a JIRA for that shading for protobuf on >>>>>>> hadoop-common but this is not coming until version 3 is out. >>>>>>> https://issues.apache.org/jira/browse/HADOOP-13136 >>>>>>> >>>>>>> ps2. Extra curious situation is to see that the dataflow-runner ends >>>>>>> up having guava shaded twice via its shaded version on >>>>>>> core-construction-java. >>>>>>> >>>>>>> ps3. Of course this message means a de-facto +1 at least to do it for >>>>>>> guava and evaluate it for other libs. >>>>>>> >>>>>>> >>>>>>> On Tue, Oct 17, 2017 at 7:29 PM, Lukasz Cwik < >>>>>>> lc...@google.com.invalid> wrote: >>>>>>> >>>>>>>> An issue to call out is how to deal with our generated code (.avro >>>>>>>> and >>>>>>>> .proto) as I don't believe those plugins allow you to generate code >>>>>>>> using a >>>>>>>> shaded package prefix on imports. >>>>>>>> >>>>>>>> On Tue, Oct 17, 2017 at 10:28 AM, Thomas Groh < >>>>>>>> tg...@google.com.invalid> >>>>>>>> wrote: >>>>>>>> >>>>>>>> +1 to the goal. I'm hugely in favor of not doing the same shading >>>>>>>>> work >>>>>>>>> every time for dependencies we know we'll use. >>>>>>>>> >>>>>>>>> This also means that if we end up pulling in transitive >>>>>>>>> dependencies we >>>>>>>>> don't want in any particular module we can avoid having to adjust >>>>>>>>> our >>>>>>>>> repackaging strategy for that module - which I have run into >>>>>>>>> face-first in >>>>>>>>> the past. >>>>>>>>> >>>>>>>>> On Tue, Oct 17, 2017 at 9:48 AM, Kenneth Knowles < >>>>>>>>> k...@google.com.invalid> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> Shading is a big part of how we keep our dependencies sane in >>>>>>>>>> Beam. But >>>>>>>>>> downsides: shading is super slow, causes massive jar bloat, and >>>>>>>>>> kind of >>>>>>>>>> hard to get right because artifacts and namespaces are not 1-to-1. >>>>>>>>>> >>>>>>>>>> I know that some communities distribute their own shaded >>>>>>>>>> distributions of >>>>>>>>>> dependencies. I had a thought about doing something similar that >>>>>>>>>> I wanted >>>>>>>>>> to throw out there for people to poke holes in. >>>>>>>>>> >>>>>>>>>> To set the scene, here is how I view shading: >>>>>>>>>> >>>>>>>>>> - A module has public dependencies and private dependencies. >>>>>>>>>> - Public deps are used for data interchange; users must share >>>>>>>>>> these >>>>>>>>>> >>>>>>>>> deps. >>>>>>>>> >>>>>>>>>> - Private deps are just functionality and can be hidden (in our >>>>>>>>>> case, >>>>>>>>>> relocated + bundled) >>>>>>>>>> - It isn't necessarily that simple, because public and private >>>>>>>>>> deps >>>>>>>>>> >>>>>>>>> might >>>>>>>>> >>>>>>>>>> interact in higher-order ways ("public" is contagious) >>>>>>>>>> >>>>>>>>>> Shading is an implementation detail of expressing these >>>>>>>>>> characteristics. >>>>>>>>>> >>>>>>>>> We >>>>>>>>> >>>>>>>>>> use shading selectively because of its downsides I mentioned >>>>>>>>>> above. >>>>>>>>>> >>>>>>>>>> But what about this idea: Introduce shaded deps as a single >>>>>>>>>> separate >>>>>>>>>> artifact. >>>>>>>>>> >>>>>>>>>> - sdks/java/private-deps: bundled uber jar with relocated >>>>>>>>>> versions of >>>>>>>>>> everything we want to shade >>>>>>>>>> >>>>>>>>>> - sdks/java/core and sdks/java/harness: no relocation or >>>>>>>>>> bundling - >>>>>>>>>> depends on `beam-sdks-java-private-deps` and imports like >>>>>>>>>> `org.apache.beam.sdk.private.com.google.common` directly (this >>>>>>>>>> is what >>>>>>>>>> they >>>>>>>>>> are rewritten to >>>>>>>>>> >>>>>>>>>> Some benefits >>>>>>>>>> >>>>>>>>>> - much faster builds of other modules >>>>>>>>>> - only one shaded uber jar >>>>>>>>>> - rare/no rebuilds of the uber jar >>>>>>>>>> - can use maven enforcer to forbid imports like >>>>>>>>>> com.google.common >>>>>>>>>> - configuration all in one place >>>>>>>>>> - no automated rewriting of our real code, which has led to >>>>>>>>>> some major >>>>>>>>>> confusion >>>>>>>>>> - easy to implement incrementally >>>>>>>>>> >>>>>>>>>> Downsides: >>>>>>>>>> >>>>>>>>>> - plenty of effort work to get there >>>>>>>>>> - unclear how many different such deps modules we need; sharing >>>>>>>>>> them >>>>>>>>>> >>>>>>>>> could >>>>>>>>> >>>>>>>>>> get weird >>>>>>>>>> - if we hit a roadblock, we will have committed a lot of time >>>>>>>>>> >>>>>>>>>> Just something I was musing as I spent another evening waiting >>>>>>>>>> for slow >>>>>>>>>> builds to try to confirm changes to brittle poms. >>>>>>>>>> >>>>>>>>>> Kenn >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>> -- >>>>>> Jean-Baptiste Onofré >>>>>> jbono...@apache.org >>>>>> http://blog.nanthrax.net >>>>>> Talend - http://www.talend.com >>>>>> >>>>> >>>>> >>>> >>> >>> >> >