That is an even better idea. A lot of guava constructs have been superseded by stuff that was added to Java 8.
On Wed, Jan 31, 2018 at 9:56 PM, Romain Manni-Bucau <rmannibu...@gmail.com> wrote: > Why not dropping guava for all beam codebase? With java 8 it is quite easy > to do it and avoid a bunch of conflicts. Did it in 2 projects with quite a > good result. > > Le 1 févr. 2018 06:50, "Lukasz Cwik" <lc...@google.com> a écrit : > >> Make sure to include the guava version in the artifact name so that we >> can have multiple vendored versions. >> >> On Wed, Jan 31, 2018 at 9:16 PM, Kenneth Knowles <k...@google.com> wrote: >> >>> I didn't have time for this, but it just bit me. We definitely have >>> Guava on the API surface of runner support code in ways that get >>> incompatibly shaded. I will probably start "1a" by making a shaded library >>> org.apache.beam:vendored-guava and starting to use it. It sounds like there >>> is generally unanimous support for that much, anyhow. >>> >>> Kenn >>> >>> On Wed, Dec 13, 2017 at 8:31 AM, Aljoscha Krettek <aljos...@apache.org> >>> wrote: >>> >>>> Thanks Ismaël for bringing up this discussion again! >>>> >>>> I would be in favour of 1) and more specifically of 1a) >>>> >>>> Aljoscha >>>> >>>> >>>> On 12. Dec 2017, at 18:56, Lukasz Cwik <lc...@google.com> wrote: >>>> >>>> You can always run tests on post shaded artifacts instead of the >>>> compiled classes, it just requires us to change our maven surefire / gradle >>>> test configurations but it is true that most IDEs would behave better with >>>> a dependency jar unless you delegate all the build/test actions to the >>>> build system and then it won't matter. >>>> >>>> On Mon, Dec 11, 2017 at 9:05 PM, Kenneth Knowles <k...@google.com> >>>> wrote: >>>> >>>>> There's also, with additional overhead, >>>>> >>>>> 1a) A relocated and shipped package for each thing we want to >>>>> relocate. I think this has also been tried outside Beam... >>>>> >>>>> Pros: >>>>> * all the pros of 1) plus no bloat beyond what is necessary >>>>> Cons: >>>>> * abandons whitelist approach for public deps, reverting to blacklist >>>>> approach for trouble things like guava, so a bit less principled >>>>> >>>>> For both 1) and 1a) I would add: >>>>> >>>>> Pros: >>>>> * clearly readable dependency since code will `import >>>>> org.apache.beam.private.guava21` and IDEs will understand it is a >>>>> distinct lilbrary >>>>> * can run tests on unpackaged classes, as long as deps are shaded or >>>>> provided as jars >>>>> * no mysterious action at a distance from inherited configuration >>>>> Cons: >>>>> * need to adjust imports >>>>> >>>>> Kenn >>>>> >>>>> On Mon, Dec 11, 2017 at 9:57 AM, Lukasz Cwik <lc...@google.com> wrote: >>>>> >>>>>> I would suggest that either we use: >>>>>> 1) A common deps package containing shaded dependencies allows for >>>>>> Pros >>>>>> * doesn't require the user to build an uber jar >>>>>> Risks >>>>>> * dependencies package will keep growing even if something is or >>>>>> isn't needed by all of Apache Beam leading to a large jar anyways >>>>>> negating >>>>>> any space savings >>>>>> >>>>>> 2) Shade within each module to a common location like >>>>>> org.apache.beam.relocated.guava.... >>>>>> Pros >>>>>> * you only get the shaded dependencies of the things that are required >>>>>> * its one less dependency for users to manage >>>>>> Risks >>>>>> * requires an uber jar to be built to get the space savings (either >>>>>> by a user or a distribution of Apache Beam) otherwise we negate any space >>>>>> savings. >>>>>> >>>>>> If we either use a common relocation scheme or a dependencies jar >>>>>> then each relocation should specifically contain the version number of >>>>>> the >>>>>> package because we would like to allow for us to be using two different >>>>>> versions of the same library. >>>>>> >>>>>> For the common deps package approach, should we check in code where >>>>>> the imports contain the relocated location (e.g. import >>>>>> org.apache.beam.guava.20.0.com.google.common.collect.ImmutableList)? >>>>>> >>>>>> >>>>>> On Mon, Dec 11, 2017 at 8:47 AM, Jean-Baptiste Onofré < >>>>>> j...@nanthrax.net> wrote: >>>>>> >>>>>>> Thanks for bringing that back. >>>>>>> >>>>>>> Indeed guava is shaded in different uber-jar. Maybe we can have a >>>>>>> common deps module that we include once (but the user will have to >>>>>>> explicitly define the dep) ? >>>>>>> >>>>>>> Basically, what do you propose for protobuf (unfortunately, I don't >>>>>>> see an obvious) ? >>>>>>> >>>>>>> Regards >>>>>>> JB >>>>>>> >>>>>>> >>>>>>> On 12/11/2017 05:35 PM, Ismaël Mejía wrote: >>>>>>> >>>>>>>> Hello, I wanted to bring back this subject because I think we should >>>>>>>> take action on this and at least first have a shaded version of >>>>>>>> guava. >>>>>>>> I was playing with a toy project and I did the procedure we use to >>>>>>>> submit jars to a Hadoop cluster via Flink/Spark which involves >>>>>>>> creating an uber jar and I realized that the size of the jar was way >>>>>>>> bigger than I expected, and the fact that we shade guava in every >>>>>>>> module contributes to this. I found guava shaded on: >>>>>>>> >>>>>>>> sdks/java/core >>>>>>>> runners/core-construction-java >>>>>>>> runners/core-java >>>>>>>> model/job-management >>>>>>>> runners/spark >>>>>>>> sdks/java/io/hadoop-file-system >>>>>>>> sdks/java/io/kafka >>>>>>>> >>>>>>>> This means at least 6 times more of the size it should which counts >>>>>>>> in >>>>>>>> around 15MB more (2.4MB*6 deps) of extra weight that we can simply >>>>>>>> reduce by using a shaded version of the library. >>>>>>>> >>>>>>>> I add this point to the previous ones mentioned during the >>>>>>>> discussion >>>>>>>> because this goes against the end user experience and affects us all >>>>>>>> (devs/users). >>>>>>>> >>>>>>>> Another question is if we should shade (and how) protocol buffers >>>>>>>> because now with the portability work we are exposing it closer to >>>>>>>> the >>>>>>>> end users. I say this because I also found an issue while running a >>>>>>>> job on YARN with the spark runner because hadoop-common includes >>>>>>>> protobuf-java 2 and I had to explicitly provide protocol-buffers 3 >>>>>>>> as >>>>>>>> a dependency to be able to use triggers (note the Spark runner >>>>>>>> translates them using some method from runners/core-java). Since >>>>>>>> hadoop-common is provided in the cluster with the older version of >>>>>>>> protobuf, I am afraid that this will bite us in the future. >>>>>>>> >>>>>>>> Ismaël >>>>>>>> >>>>>>>> ps. There is already a JIRA for that shading for protobuf on >>>>>>>> hadoop-common but this is not coming until version 3 is out. >>>>>>>> https://issues.apache.org/jira/browse/HADOOP-13136 >>>>>>>> >>>>>>>> ps2. Extra curious situation is to see that the dataflow-runner ends >>>>>>>> up having guava shaded twice via its shaded version on >>>>>>>> core-construction-java. >>>>>>>> >>>>>>>> ps3. Of course this message means a de-facto +1 at least to do it >>>>>>>> for >>>>>>>> guava and evaluate it for other libs. >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Oct 17, 2017 at 7:29 PM, Lukasz Cwik < >>>>>>>> lc...@google.com.invalid> wrote: >>>>>>>> >>>>>>>>> An issue to call out is how to deal with our generated code (.avro >>>>>>>>> and >>>>>>>>> .proto) as I don't believe those plugins allow you to generate >>>>>>>>> code using a >>>>>>>>> shaded package prefix on imports. >>>>>>>>> >>>>>>>>> On Tue, Oct 17, 2017 at 10:28 AM, Thomas Groh < >>>>>>>>> tg...@google.com.invalid> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> +1 to the goal. I'm hugely in favor of not doing the same shading >>>>>>>>>> work >>>>>>>>>> every time for dependencies we know we'll use. >>>>>>>>>> >>>>>>>>>> This also means that if we end up pulling in transitive >>>>>>>>>> dependencies we >>>>>>>>>> don't want in any particular module we can avoid having to adjust >>>>>>>>>> our >>>>>>>>>> repackaging strategy for that module - which I have run into >>>>>>>>>> face-first in >>>>>>>>>> the past. >>>>>>>>>> >>>>>>>>>> On Tue, Oct 17, 2017 at 9:48 AM, Kenneth Knowles < >>>>>>>>>> k...@google.com.invalid> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> Shading is a big part of how we keep our dependencies sane in >>>>>>>>>>> Beam. But >>>>>>>>>>> downsides: shading is super slow, causes massive jar bloat, and >>>>>>>>>>> kind of >>>>>>>>>>> hard to get right because artifacts and namespaces are not >>>>>>>>>>> 1-to-1. >>>>>>>>>>> >>>>>>>>>>> I know that some communities distribute their own shaded >>>>>>>>>>> distributions of >>>>>>>>>>> dependencies. I had a thought about doing something similar that >>>>>>>>>>> I wanted >>>>>>>>>>> to throw out there for people to poke holes in. >>>>>>>>>>> >>>>>>>>>>> To set the scene, here is how I view shading: >>>>>>>>>>> >>>>>>>>>>> - A module has public dependencies and private dependencies. >>>>>>>>>>> - Public deps are used for data interchange; users must share >>>>>>>>>>> these >>>>>>>>>>> >>>>>>>>>> deps. >>>>>>>>>> >>>>>>>>>>> - Private deps are just functionality and can be hidden (in >>>>>>>>>>> our case, >>>>>>>>>>> relocated + bundled) >>>>>>>>>>> - It isn't necessarily that simple, because public and private >>>>>>>>>>> deps >>>>>>>>>>> >>>>>>>>>> might >>>>>>>>>> >>>>>>>>>>> interact in higher-order ways ("public" is contagious) >>>>>>>>>>> >>>>>>>>>>> Shading is an implementation detail of expressing these >>>>>>>>>>> characteristics. >>>>>>>>>>> >>>>>>>>>> We >>>>>>>>>> >>>>>>>>>>> use shading selectively because of its downsides I mentioned >>>>>>>>>>> above. >>>>>>>>>>> >>>>>>>>>>> But what about this idea: Introduce shaded deps as a single >>>>>>>>>>> separate >>>>>>>>>>> artifact. >>>>>>>>>>> >>>>>>>>>>> - sdks/java/private-deps: bundled uber jar with relocated >>>>>>>>>>> versions of >>>>>>>>>>> everything we want to shade >>>>>>>>>>> >>>>>>>>>>> - sdks/java/core and sdks/java/harness: no relocation or >>>>>>>>>>> bundling - >>>>>>>>>>> depends on `beam-sdks-java-private-deps` and imports like >>>>>>>>>>> `org.apache.beam.sdk.private.com.google.common` directly (this >>>>>>>>>>> is what >>>>>>>>>>> they >>>>>>>>>>> are rewritten to >>>>>>>>>>> >>>>>>>>>>> Some benefits >>>>>>>>>>> >>>>>>>>>>> - much faster builds of other modules >>>>>>>>>>> - only one shaded uber jar >>>>>>>>>>> - rare/no rebuilds of the uber jar >>>>>>>>>>> - can use maven enforcer to forbid imports like >>>>>>>>>>> com.google.common >>>>>>>>>>> - configuration all in one place >>>>>>>>>>> - no automated rewriting of our real code, which has led to >>>>>>>>>>> some major >>>>>>>>>>> confusion >>>>>>>>>>> - easy to implement incrementally >>>>>>>>>>> >>>>>>>>>>> Downsides: >>>>>>>>>>> >>>>>>>>>>> - plenty of effort work to get there >>>>>>>>>>> - unclear how many different such deps modules we need; >>>>>>>>>>> sharing them >>>>>>>>>>> >>>>>>>>>> could >>>>>>>>>> >>>>>>>>>>> get weird >>>>>>>>>>> - if we hit a roadblock, we will have committed a lot of time >>>>>>>>>>> >>>>>>>>>>> Just something I was musing as I spent another evening waiting >>>>>>>>>>> for slow >>>>>>>>>>> builds to try to confirm changes to brittle poms. >>>>>>>>>>> >>>>>>>>>>> Kenn >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>> -- >>>>>>> Jean-Baptiste Onofré >>>>>>> jbono...@apache.org >>>>>>> http://blog.nanthrax.net >>>>>>> Talend - http://www.talend.com >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >>