+1, it was on my TODO for a while waiting the Java8 update.

Regards
JB

On 02/01/2018 06:56 AM, Romain Manni-Bucau wrote:
> Why not dropping guava for all beam codebase? With java 8 it is quite easy to 
> do
> it and avoid a bunch of conflicts. Did it in 2 projects with quite a good 
> result.
> 
> Le 1 févr. 2018 06:50, "Lukasz Cwik" <lc...@google.com
> <mailto:lc...@google.com>> a écrit :
> 
>     Make sure to include the guava version in the artifact name so that we can
>     have multiple vendored versions.
> 
>     On Wed, Jan 31, 2018 at 9:16 PM, Kenneth Knowles <k...@google.com
>     <mailto:k...@google.com>> wrote:
> 
>         I didn't have time for this, but it just bit me. We definitely have
>         Guava on the API surface of runner support code in ways that get
>         incompatibly shaded. I will probably start "1a" by making a shaded
>         library org.apache.beam:vendored-guava and starting to use it. It 
> sounds
>         like there is generally unanimous support for that much, anyhow.
> 
>         Kenn
> 
>         On Wed, Dec 13, 2017 at 8:31 AM, Aljoscha Krettek <aljos...@apache.org
>         <mailto:aljos...@apache.org>> wrote:
> 
>             Thanks Ismaël for bringing up this discussion again!
> 
>             I would be in favour of 1) and more specifically of 1a)
> 
>             Aljoscha
> 
> 
>>             On 12. Dec 2017, at 18:56, Lukasz Cwik <lc...@google.com
>>             <mailto:lc...@google.com>> wrote:
>>
>>             You can always run tests on post shaded artifacts instead of the
>>             compiled classes, it just requires us to change our maven 
>> surefire
>>             / gradle test configurations but it is true that most IDEs would
>>             behave better with a dependency jar unless you delegate all the
>>             build/test actions to the build system and then it won't matter.
>>
>>             On Mon, Dec 11, 2017 at 9:05 PM, Kenneth Knowles <k...@google.com
>>             <mailto:k...@google.com>> wrote:
>>
>>                 There's also, with additional overhead,
>>
>>                 1a) A relocated and shipped package for each thing we want to
>>                 relocate. I think this has also been tried outside Beam...
>>
>>                 Pros:
>>                 * all the pros of 1) plus no bloat beyond what is necessary
>>                 Cons:
>>                 * abandons whitelist approach for public deps, reverting to
>>                 blacklist approach for trouble things like guava, so a bit
>>                 less principled
>>
>>                 For both 1) and 1a) I would add:
>>
>>                 Pros:
>>                 * clearly readable dependency since code will `import
>>                 org.apache.beam.private.guava21` and IDEs will understand it
>>                 is a distinct lilbrary
>>                 * can run tests on unpackaged classes, as long as deps are
>>                 shaded or provided as jars
>>                 * no mysterious action at a distance from inherited 
>> configuration
>>                 Cons:
>>                 * need to adjust imports
>>
>>                 Kenn
>>
>>                 On Mon, Dec 11, 2017 at 9:57 AM, Lukasz Cwik 
>> <lc...@google.com
>>                 <mailto:lc...@google.com>> wrote:
>>
>>                     I would suggest that either we use:
>>                     1) A common deps package containing shaded dependencies
>>                     allows for 
>>                     Pros
>>                     * doesn't require the user to build an uber jar
>>                     Risks
>>                     * dependencies package will keep growing even if 
>> something
>>                     is or isn't needed by all of Apache Beam leading to a
>>                     large jar anyways negating any space savings
>>
>>                     2) Shade within each module to a common location like
>>                     org.apache.beam.relocated.guava....
>>                     Pros
>>                     * you only get the shaded dependencies of the things that
>>                     are required
>>                     * its one less dependency for users to manage
>>                     Risks
>>                     * requires an uber jar to be built to get the space
>>                     savings (either by a user or a distribution of Apache
>>                     Beam) otherwise we negate any space savings.
>>
>>                     If we either use a common relocation scheme or a
>>                     dependencies jar then each relocation should specifically
>>                     contain the version number of the package because we 
>> would
>>                     like to allow for us to be using two different versions 
>> of
>>                     the same library.
>>
>>                     For the common deps package approach, should we check in
>>                     code where the imports contain the relocated location
>>                     (e.g. import org.apache.beam.guava.20.0.com
>>                     
>> <http://org.apache.beam.guava.20.0.com/>.google.common.collect.ImmutableList)?
>>
>>
>>                     On Mon, Dec 11, 2017 at 8:47 AM, Jean-Baptiste Onofré
>>                     <j...@nanthrax.net <mailto:j...@nanthrax.net>> wrote:
>>
>>                         Thanks for bringing that back.
>>
>>                         Indeed guava is shaded in different uber-jar. Maybe 
>> we
>>                         can have a common deps module that we include once
>>                         (but the user will have to explicitly define the 
>> dep) ?
>>
>>                         Basically, what do you propose for protobuf
>>                         (unfortunately, I don't see an obvious) ?
>>
>>                         Regards
>>                         JB
>>
>>
>>                         On 12/11/2017 05:35 PM, Ismaël Mejía wrote:
>>
>>                             Hello, I wanted to bring back this subject 
>> because
>>                             I think we should
>>                             take action on this and at least first have a
>>                             shaded version of guava.
>>                             I was playing with a toy project and I did the
>>                             procedure we use to
>>                             submit jars to a Hadoop cluster via Flink/Spark
>>                             which involves
>>                             creating an uber jar and I realized that the size
>>                             of the jar was way
>>                             bigger than I expected, and the fact that we 
>> shade
>>                             guava in every
>>                             module contributes to this. I found guava shaded 
>> on:
>>
>>                             sdks/java/core
>>                             runners/core-construction-java
>>                             runners/core-java
>>                             model/job-management
>>                             runners/spark
>>                             sdks/java/io/hadoop-file-system
>>                             sdks/java/io/kafka
>>
>>                             This means at least 6 times more of the size it
>>                             should which counts in
>>                             around 15MB more (2.4MB*6 deps) of extra weight
>>                             that we can simply
>>                             reduce by using a shaded version of the library.
>>
>>                             I add this point to the previous ones mentioned
>>                             during the discussion
>>                             because this goes against the end user experience
>>                             and affects us all
>>                             (devs/users).
>>
>>                             Another question is if we should shade (and how)
>>                             protocol buffers
>>                             because now with the portability work we are
>>                             exposing it closer to the
>>                             end users. I say this because I also found an
>>                             issue while running a
>>                             job on YARN with the spark runner because
>>                             hadoop-common includes
>>                             protobuf-java 2 and I had to explicitly provide
>>                             protocol-buffers 3 as
>>                             a dependency to be able to use triggers (note the
>>                             Spark runner
>>                             translates them using some method from
>>                             runners/core-java). Since
>>                             hadoop-common is provided in the cluster with the
>>                             older version of
>>                             protobuf, I am afraid that this will bite us in
>>                             the future.
>>
>>                             Ismaël
>>
>>                             ps. There is already a JIRA for that shading for
>>                             protobuf on
>>                             hadoop-common but this is not coming until 
>> version
>>                             3 is out.
>>                             
>> https://issues.apache.org/jira/browse/HADOOP-13136
>>                             
>> <https://issues.apache.org/jira/browse/HADOOP-13136>
>>
>>                             ps2. Extra curious situation is to see that the
>>                             dataflow-runner ends
>>                             up having guava shaded twice via its shaded 
>> version on
>>                             core-construction-java.
>>
>>                             ps3. Of course this message means a de-facto +1 
>> at
>>                             least to do it for
>>                             guava and evaluate it for other libs.
>>
>>
>>                             On Tue, Oct 17, 2017 at 7:29 PM, Lukasz Cwik
>>                             <lc...@google.com.invalid
>>                             <mailto:lc...@google.com.invalid>> wrote:
>>
>>                                 An issue to call out is how to deal with our
>>                                 generated code (.avro and
>>                                 .proto) as I don't believe those plugins 
>> allow
>>                                 you to generate code using a
>>                                 shaded package prefix on imports.
>>
>>                                 On Tue, Oct 17, 2017 at 10:28 AM, Thomas Groh
>>                                 <tg...@google.com.invalid
>>                                 <mailto:tg...@google.com.invalid>>
>>                                 wrote:
>>
>>                                     +1 to the goal. I'm hugely in favor of 
>> not
>>                                     doing the same shading work
>>                                     every time for dependencies we know 
>> we'll use.
>>
>>                                     This also means that if we end up pulling
>>                                     in transitive dependencies we
>>                                     don't want in any particular module we 
>> can
>>                                     avoid having to adjust our
>>                                     repackaging strategy for that module -
>>                                     which I have run into face-first in
>>                                     the past.
>>
>>                                     On Tue, Oct 17, 2017 at 9:48 AM, Kenneth
>>                                     Knowles <k...@google.com.invalid
>>                                     <mailto:k...@google.com.invalid>>
>>                                     wrote:
>>
>>                                         Hi all,
>>
>>                                         Shading is a big part of how we keep
>>                                         our dependencies sane in Beam. But
>>                                         downsides: shading is super slow,
>>                                         causes massive jar bloat, and kind of
>>                                         hard to get right because artifacts
>>                                         and namespaces are not 1-to-1.
>>
>>                                         I know that some communities
>>                                         distribute their own shaded
>>                                         distributions of
>>                                         dependencies. I had a thought about
>>                                         doing something similar that I wanted
>>                                         to throw out there for people to poke
>>                                         holes in.
>>
>>                                         To set the scene, here is how I view
>>                                         shading:
>>
>>                                           - A module has public dependencies
>>                                         and private dependencies.
>>                                           - Public deps are used for data
>>                                         interchange; users must share these
>>
>>                                     deps.
>>
>>                                           - Private deps are just
>>                                         functionality and can be hidden (in
>>                                         our case,
>>                                         relocated + bundled)
>>                                           - It isn't necessarily that simple,
>>                                         because public and private deps
>>
>>                                     might
>>
>>                                         interact in higher-order ways
>>                                         ("public" is contagious)
>>
>>                                         Shading is an implementation detail 
>> of
>>                                         expressing these characteristics.
>>
>>                                     We
>>
>>                                         use shading selectively because of 
>> its
>>                                         downsides I mentioned above.
>>
>>                                         But what about this idea: Introduce
>>                                         shaded deps as a single separate
>>                                         artifact.
>>
>>                                           - sdks/java/private-deps: bundled
>>                                         uber jar with relocated versions of
>>                                         everything we want to shade
>>
>>                                           - sdks/java/core and
>>                                         sdks/java/harness: no relocation or
>>                                         bundling -
>>                                         depends on
>>                                         `beam-sdks-java-private-deps` and
>>                                         imports like
>>                                         
>> `org.apache.beam.sdk.private.com.google.common`
>>                                         directly (this is what
>>                                         they
>>                                         are rewritten to
>>
>>                                         Some benefits
>>
>>                                           - much faster builds of other 
>> modules
>>                                           - only one shaded uber jar
>>                                           - rare/no rebuilds of the uber jar
>>                                           - can use maven enforcer to forbid
>>                                         imports like com.google.common
>>                                           - configuration all in one place
>>                                           - no automated rewriting of our 
>> real
>>                                         code, which has led to some major
>>                                         confusion
>>                                           - easy to implement incrementally
>>
>>                                         Downsides:
>>
>>                                           - plenty of effort work to get 
>> there
>>                                           - unclear how many different such
>>                                         deps modules we need; sharing them
>>
>>                                     could
>>
>>                                         get weird
>>                                           - if we hit a roadblock, we will
>>                                         have committed a lot of time
>>
>>                                         Just something I was musing as I 
>> spent
>>                                         another evening waiting for slow
>>                                         builds to try to confirm changes to
>>                                         brittle poms.
>>
>>                                         Kenn
>>
>>
>>
>>                         -- 
>>                         Jean-Baptiste Onofré
>>                         jbono...@apache.org <mailto:jbono...@apache.org>
>>                         http://blog.nanthrax.net <http://blog.nanthrax.net/>
>>                         Talend - http://www.talend.com 
>> <http://www.talend.com/>
>>
>>
>>
>>
> 
> 
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Reply via email to