Why not dropping guava for all beam codebase? With java 8 it is quite easy
to do it and avoid a bunch of conflicts. Did it in 2 projects with quite a
good result.

Le 1 févr. 2018 06:50, "Lukasz Cwik" <lc...@google.com> a écrit :

> Make sure to include the guava version in the artifact name so that we can
> have multiple vendored versions.
>
> On Wed, Jan 31, 2018 at 9:16 PM, Kenneth Knowles <k...@google.com> wrote:
>
>> I didn't have time for this, but it just bit me. We definitely have Guava
>> on the API surface of runner support code in ways that get incompatibly
>> shaded. I will probably start "1a" by making a shaded library
>> org.apache.beam:vendored-guava and starting to use it. It sounds like there
>> is generally unanimous support for that much, anyhow.
>>
>> Kenn
>>
>> On Wed, Dec 13, 2017 at 8:31 AM, Aljoscha Krettek <aljos...@apache.org>
>> wrote:
>>
>>> Thanks Ismaël for bringing up this discussion again!
>>>
>>> I would be in favour of 1) and more specifically of 1a)
>>>
>>> Aljoscha
>>>
>>>
>>> On 12. Dec 2017, at 18:56, Lukasz Cwik <lc...@google.com> wrote:
>>>
>>> You can always run tests on post shaded artifacts instead of the
>>> compiled classes, it just requires us to change our maven surefire / gradle
>>> test configurations but it is true that most IDEs would behave better with
>>> a dependency jar unless you delegate all the build/test actions to the
>>> build system and then it won't matter.
>>>
>>> On Mon, Dec 11, 2017 at 9:05 PM, Kenneth Knowles <k...@google.com> wrote:
>>>
>>>> There's also, with additional overhead,
>>>>
>>>> 1a) A relocated and shipped package for each thing we want to relocate.
>>>> I think this has also been tried outside Beam...
>>>>
>>>> Pros:
>>>> * all the pros of 1) plus no bloat beyond what is necessary
>>>> Cons:
>>>> * abandons whitelist approach for public deps, reverting to blacklist
>>>> approach for trouble things like guava, so a bit less principled
>>>>
>>>> For both 1) and 1a) I would add:
>>>>
>>>> Pros:
>>>> * clearly readable dependency since code will `import
>>>> org.apache.beam.private.guava21` and IDEs will understand it is a
>>>> distinct lilbrary
>>>> * can run tests on unpackaged classes, as long as deps are shaded or
>>>> provided as jars
>>>> * no mysterious action at a distance from inherited configuration
>>>> Cons:
>>>> * need to adjust imports
>>>>
>>>> Kenn
>>>>
>>>> On Mon, Dec 11, 2017 at 9:57 AM, Lukasz Cwik <lc...@google.com> wrote:
>>>>
>>>>> I would suggest that either we use:
>>>>> 1) A common deps package containing shaded dependencies allows for
>>>>> Pros
>>>>> * doesn't require the user to build an uber jar
>>>>> Risks
>>>>> * dependencies package will keep growing even if something is or isn't
>>>>> needed by all of Apache Beam leading to a large jar anyways negating any
>>>>> space savings
>>>>>
>>>>> 2) Shade within each module to a common location like
>>>>> org.apache.beam.relocated.guava....
>>>>> Pros
>>>>> * you only get the shaded dependencies of the things that are required
>>>>> * its one less dependency for users to manage
>>>>> Risks
>>>>> * requires an uber jar to be built to get the space savings (either by
>>>>> a user or a distribution of Apache Beam) otherwise we negate any space
>>>>> savings.
>>>>>
>>>>> If we either use a common relocation scheme or a dependencies jar then
>>>>> each relocation should specifically contain the version number of the
>>>>> package because we would like to allow for us to be using two different
>>>>> versions of the same library.
>>>>>
>>>>> For the common deps package approach, should we check in code where
>>>>> the imports contain the relocated location (e.g. import
>>>>> org.apache.beam.guava.20.0.com.google.common.collect.ImmutableList)?
>>>>>
>>>>>
>>>>> On Mon, Dec 11, 2017 at 8:47 AM, Jean-Baptiste Onofré <j...@nanthrax.net
>>>>> > wrote:
>>>>>
>>>>>> Thanks for bringing that back.
>>>>>>
>>>>>> Indeed guava is shaded in different uber-jar. Maybe we can have a
>>>>>> common deps module that we include once (but the user will have to
>>>>>> explicitly define the dep) ?
>>>>>>
>>>>>> Basically, what do you propose for protobuf (unfortunately, I don't
>>>>>> see an obvious) ?
>>>>>>
>>>>>> Regards
>>>>>> JB
>>>>>>
>>>>>>
>>>>>> On 12/11/2017 05:35 PM, Ismaël Mejía wrote:
>>>>>>
>>>>>>> Hello, I wanted to bring back this subject because I think we should
>>>>>>> take action on this and at least first have a shaded version of
>>>>>>> guava.
>>>>>>> I was playing with a toy project and I did the procedure we use to
>>>>>>> submit jars to a Hadoop cluster via Flink/Spark which involves
>>>>>>> creating an uber jar and I realized that the size of the jar was way
>>>>>>> bigger than I expected, and the fact that we shade guava in every
>>>>>>> module contributes to this. I found guava shaded on:
>>>>>>>
>>>>>>> sdks/java/core
>>>>>>> runners/core-construction-java
>>>>>>> runners/core-java
>>>>>>> model/job-management
>>>>>>> runners/spark
>>>>>>> sdks/java/io/hadoop-file-system
>>>>>>> sdks/java/io/kafka
>>>>>>>
>>>>>>> This means at least 6 times more of the size it should which counts
>>>>>>> in
>>>>>>> around 15MB more (2.4MB*6 deps) of extra weight that we can simply
>>>>>>> reduce by using a shaded version of the library.
>>>>>>>
>>>>>>> I add this point to the previous ones mentioned during the discussion
>>>>>>> because this goes against the end user experience and affects us all
>>>>>>> (devs/users).
>>>>>>>
>>>>>>> Another question is if we should shade (and how) protocol buffers
>>>>>>> because now with the portability work we are exposing it closer to
>>>>>>> the
>>>>>>> end users. I say this because I also found an issue while running a
>>>>>>> job on YARN with the spark runner because hadoop-common includes
>>>>>>> protobuf-java 2 and I had to explicitly provide protocol-buffers 3 as
>>>>>>> a dependency to be able to use triggers (note the Spark runner
>>>>>>> translates them using some method from runners/core-java). Since
>>>>>>> hadoop-common is provided in the cluster with the older version of
>>>>>>> protobuf, I am afraid that this will bite us in the future.
>>>>>>>
>>>>>>> Ismaël
>>>>>>>
>>>>>>> ps. There is already a JIRA for that shading for protobuf on
>>>>>>> hadoop-common but this is not coming until version 3 is out.
>>>>>>> https://issues.apache.org/jira/browse/HADOOP-13136
>>>>>>>
>>>>>>> ps2. Extra curious situation is to see that the dataflow-runner ends
>>>>>>> up having guava shaded twice via its shaded version on
>>>>>>> core-construction-java.
>>>>>>>
>>>>>>> ps3. Of course this message means a de-facto +1 at least to do it for
>>>>>>> guava and evaluate it for other libs.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 17, 2017 at 7:29 PM, Lukasz Cwik <
>>>>>>> lc...@google.com.invalid> wrote:
>>>>>>>
>>>>>>>> An issue to call out is how to deal with our generated code (.avro
>>>>>>>> and
>>>>>>>> .proto) as I don't believe those plugins allow you to generate code
>>>>>>>> using a
>>>>>>>> shaded package prefix on imports.
>>>>>>>>
>>>>>>>> On Tue, Oct 17, 2017 at 10:28 AM, Thomas Groh <
>>>>>>>> tg...@google.com.invalid>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> +1 to the goal. I'm hugely in favor of not doing the same shading
>>>>>>>>> work
>>>>>>>>> every time for dependencies we know we'll use.
>>>>>>>>>
>>>>>>>>> This also means that if we end up pulling in transitive
>>>>>>>>> dependencies we
>>>>>>>>> don't want in any particular module we can avoid having to adjust
>>>>>>>>> our
>>>>>>>>> repackaging strategy for that module - which I have run into
>>>>>>>>> face-first in
>>>>>>>>> the past.
>>>>>>>>>
>>>>>>>>> On Tue, Oct 17, 2017 at 9:48 AM, Kenneth Knowles <
>>>>>>>>> k...@google.com.invalid>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> Shading is a big part of how we keep our dependencies sane in
>>>>>>>>>> Beam. But
>>>>>>>>>> downsides: shading is super slow, causes massive jar bloat, and
>>>>>>>>>> kind of
>>>>>>>>>> hard to get right because artifacts and namespaces are not 1-to-1.
>>>>>>>>>>
>>>>>>>>>> I know that some communities distribute their own shaded
>>>>>>>>>> distributions of
>>>>>>>>>> dependencies. I had a thought about doing something similar that
>>>>>>>>>> I wanted
>>>>>>>>>> to throw out there for people to poke holes in.
>>>>>>>>>>
>>>>>>>>>> To set the scene, here is how I view shading:
>>>>>>>>>>
>>>>>>>>>>   - A module has public dependencies and private dependencies.
>>>>>>>>>>   - Public deps are used for data interchange; users must share
>>>>>>>>>> these
>>>>>>>>>>
>>>>>>>>> deps.
>>>>>>>>>
>>>>>>>>>>   - Private deps are just functionality and can be hidden (in our
>>>>>>>>>> case,
>>>>>>>>>> relocated + bundled)
>>>>>>>>>>   - It isn't necessarily that simple, because public and private
>>>>>>>>>> deps
>>>>>>>>>>
>>>>>>>>> might
>>>>>>>>>
>>>>>>>>>> interact in higher-order ways ("public" is contagious)
>>>>>>>>>>
>>>>>>>>>> Shading is an implementation detail of expressing these
>>>>>>>>>> characteristics.
>>>>>>>>>>
>>>>>>>>> We
>>>>>>>>>
>>>>>>>>>> use shading selectively because of its downsides I mentioned
>>>>>>>>>> above.
>>>>>>>>>>
>>>>>>>>>> But what about this idea: Introduce shaded deps as a single
>>>>>>>>>> separate
>>>>>>>>>> artifact.
>>>>>>>>>>
>>>>>>>>>>   - sdks/java/private-deps: bundled uber jar with relocated
>>>>>>>>>> versions of
>>>>>>>>>> everything we want to shade
>>>>>>>>>>
>>>>>>>>>>   - sdks/java/core and sdks/java/harness: no relocation or
>>>>>>>>>> bundling -
>>>>>>>>>> depends on `beam-sdks-java-private-deps` and imports like
>>>>>>>>>> `org.apache.beam.sdk.private.com.google.common` directly (this
>>>>>>>>>> is what
>>>>>>>>>> they
>>>>>>>>>> are rewritten to
>>>>>>>>>>
>>>>>>>>>> Some benefits
>>>>>>>>>>
>>>>>>>>>>   - much faster builds of other modules
>>>>>>>>>>   - only one shaded uber jar
>>>>>>>>>>   - rare/no rebuilds of the uber jar
>>>>>>>>>>   - can use maven enforcer to forbid imports like
>>>>>>>>>> com.google.common
>>>>>>>>>>   - configuration all in one place
>>>>>>>>>>   - no automated rewriting of our real code, which has led to
>>>>>>>>>> some major
>>>>>>>>>> confusion
>>>>>>>>>>   - easy to implement incrementally
>>>>>>>>>>
>>>>>>>>>> Downsides:
>>>>>>>>>>
>>>>>>>>>>   - plenty of effort work to get there
>>>>>>>>>>   - unclear how many different such deps modules we need; sharing
>>>>>>>>>> them
>>>>>>>>>>
>>>>>>>>> could
>>>>>>>>>
>>>>>>>>>> get weird
>>>>>>>>>>   - if we hit a roadblock, we will have committed a lot of time
>>>>>>>>>>
>>>>>>>>>> Just something I was musing as I spent another evening waiting
>>>>>>>>>> for slow
>>>>>>>>>> builds to try to confirm changes to brittle poms.
>>>>>>>>>>
>>>>>>>>>> Kenn
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>> --
>>>>>> Jean-Baptiste Onofré
>>>>>> jbono...@apache.org
>>>>>> http://blog.nanthrax.net
>>>>>> Talend - http://www.talend.com
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>

Reply via email to