a map of list is fine and not a challenge we'll face long I hope ;)

Romain Manni-Bucau
@rmannibucau <https://twitter.com/rmannibucau> |  Blog
<https://rmannibucau.metawerx.net/> | Old Blog
<http://rmannibucau.wordpress.com> | Github <https://github.com/rmannibucau> |
LinkedIn <https://www.linkedin.com/in/rmannibucau>

2018-02-01 9:40 GMT+01:00 Reuven Lax <re...@google.com>:

> Not sure we'll be able to replace them all. Things like guava Table and
> Multimap don't have great replacements in Java8.
>
> On Wed, Jan 31, 2018 at 10:11 PM, Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
>
>> +1, it was on my TODO for a while waiting the Java8 update.
>>
>> Regards
>> JB
>>
>> On 02/01/2018 06:56 AM, Romain Manni-Bucau wrote:
>> > Why not dropping guava for all beam codebase? With java 8 it is quite
>> easy to do
>> > it and avoid a bunch of conflicts. Did it in 2 projects with quite a
>> good result.
>> >
>> > Le 1 févr. 2018 06:50, "Lukasz Cwik" <lc...@google.com
>> > <mailto:lc...@google.com>> a écrit :
>> >
>> >     Make sure to include the guava version in the artifact name so that
>> we can
>> >     have multiple vendored versions.
>> >
>> >     On Wed, Jan 31, 2018 at 9:16 PM, Kenneth Knowles <k...@google.com
>> >     <mailto:k...@google.com>> wrote:
>> >
>> >         I didn't have time for this, but it just bit me. We definitely
>> have
>> >         Guava on the API surface of runner support code in ways that get
>> >         incompatibly shaded. I will probably start "1a" by making a
>> shaded
>> >         library org.apache.beam:vendored-guava and starting to use it.
>> It sounds
>> >         like there is generally unanimous support for that much, anyhow.
>> >
>> >         Kenn
>> >
>> >         On Wed, Dec 13, 2017 at 8:31 AM, Aljoscha Krettek <
>> aljos...@apache.org
>> >         <mailto:aljos...@apache.org>> wrote:
>> >
>> >             Thanks Ismaël for bringing up this discussion again!
>> >
>> >             I would be in favour of 1) and more specifically of 1a)
>> >
>> >             Aljoscha
>> >
>> >
>> >>             On 12. Dec 2017, at 18:56, Lukasz Cwik <lc...@google.com
>> >>             <mailto:lc...@google.com>> wrote:
>> >>
>> >>             You can always run tests on post shaded artifacts instead
>> of the
>> >>             compiled classes, it just requires us to change our maven
>> surefire
>> >>             / gradle test configurations but it is true that most IDEs
>> would
>> >>             behave better with a dependency jar unless you delegate
>> all the
>> >>             build/test actions to the build system and then it won't
>> matter.
>> >>
>> >>             On Mon, Dec 11, 2017 at 9:05 PM, Kenneth Knowles <
>> k...@google.com
>> >>             <mailto:k...@google.com>> wrote:
>> >>
>> >>                 There's also, with additional overhead,
>> >>
>> >>                 1a) A relocated and shipped package for each thing we
>> want to
>> >>                 relocate. I think this has also been tried outside
>> Beam...
>> >>
>> >>                 Pros:
>> >>                 * all the pros of 1) plus no bloat beyond what is
>> necessary
>> >>                 Cons:
>> >>                 * abandons whitelist approach for public deps,
>> reverting to
>> >>                 blacklist approach for trouble things like guava, so a
>> bit
>> >>                 less principled
>> >>
>> >>                 For both 1) and 1a) I would add:
>> >>
>> >>                 Pros:
>> >>                 * clearly readable dependency since code will `import
>> >>                 org.apache.beam.private.guava21` and IDEs will
>> understand it
>> >>                 is a distinct lilbrary
>> >>                 * can run tests on unpackaged classes, as long as deps
>> are
>> >>                 shaded or provided as jars
>> >>                 * no mysterious action at a distance from inherited
>> configuration
>> >>                 Cons:
>> >>                 * need to adjust imports
>> >>
>> >>                 Kenn
>> >>
>> >>                 On Mon, Dec 11, 2017 at 9:57 AM, Lukasz Cwik <
>> lc...@google.com
>> >>                 <mailto:lc...@google.com>> wrote:
>> >>
>> >>                     I would suggest that either we use:
>> >>                     1) A common deps package containing shaded
>> dependencies
>> >>                     allows for
>> >>                     Pros
>> >>                     * doesn't require the user to build an uber jar
>> >>                     Risks
>> >>                     * dependencies package will keep growing even if
>> something
>> >>                     is or isn't needed by all of Apache Beam leading
>> to a
>> >>                     large jar anyways negating any space savings
>> >>
>> >>                     2) Shade within each module to a common location
>> like
>> >>                     org.apache.beam.relocated.guava....
>> >>                     Pros
>> >>                     * you only get the shaded dependencies of the
>> things that
>> >>                     are required
>> >>                     * its one less dependency for users to manage
>> >>                     Risks
>> >>                     * requires an uber jar to be built to get the space
>> >>                     savings (either by a user or a distribution of
>> Apache
>> >>                     Beam) otherwise we negate any space savings.
>> >>
>> >>                     If we either use a common relocation scheme or a
>> >>                     dependencies jar then each relocation should
>> specifically
>> >>                     contain the version number of the package because
>> we would
>> >>                     like to allow for us to be using two different
>> versions of
>> >>                     the same library.
>> >>
>> >>                     For the common deps package approach, should we
>> check in
>> >>                     code where the imports contain the relocated
>> location
>> >>                     (e.g. import org.apache.beam.guava.20.0.com
>> >>                     <http://org.apache.beam.guava.20.0.com/>.google.
>> common.collect.ImmutableList)?
>> >>
>> >>
>> >>                     On Mon, Dec 11, 2017 at 8:47 AM, Jean-Baptiste
>> Onofré
>> >>                     <j...@nanthrax.net <mailto:j...@nanthrax.net>> wrote:
>> >>
>> >>                         Thanks for bringing that back.
>> >>
>> >>                         Indeed guava is shaded in different uber-jar.
>> Maybe we
>> >>                         can have a common deps module that we include
>> once
>> >>                         (but the user will have to explicitly define
>> the dep) ?
>> >>
>> >>                         Basically, what do you propose for protobuf
>> >>                         (unfortunately, I don't see an obvious) ?
>> >>
>> >>                         Regards
>> >>                         JB
>> >>
>> >>
>> >>                         On 12/11/2017 05:35 PM, Ismaël Mejía wrote:
>> >>
>> >>                             Hello, I wanted to bring back this subject
>> because
>> >>                             I think we should
>> >>                             take action on this and at least first
>> have a
>> >>                             shaded version of guava.
>> >>                             I was playing with a toy project and I did
>> the
>> >>                             procedure we use to
>> >>                             submit jars to a Hadoop cluster via
>> Flink/Spark
>> >>                             which involves
>> >>                             creating an uber jar and I realized that
>> the size
>> >>                             of the jar was way
>> >>                             bigger than I expected, and the fact that
>> we shade
>> >>                             guava in every
>> >>                             module contributes to this. I found guava
>> shaded on:
>> >>
>> >>                             sdks/java/core
>> >>                             runners/core-construction-java
>> >>                             runners/core-java
>> >>                             model/job-management
>> >>                             runners/spark
>> >>                             sdks/java/io/hadoop-file-system
>> >>                             sdks/java/io/kafka
>> >>
>> >>                             This means at least 6 times more of the
>> size it
>> >>                             should which counts in
>> >>                             around 15MB more (2.4MB*6 deps) of extra
>> weight
>> >>                             that we can simply
>> >>                             reduce by using a shaded version of the
>> library.
>> >>
>> >>                             I add this point to the previous ones
>> mentioned
>> >>                             during the discussion
>> >>                             because this goes against the end user
>> experience
>> >>                             and affects us all
>> >>                             (devs/users).
>> >>
>> >>                             Another question is if we should shade
>> (and how)
>> >>                             protocol buffers
>> >>                             because now with the portability work we
>> are
>> >>                             exposing it closer to the
>> >>                             end users. I say this because I also found
>> an
>> >>                             issue while running a
>> >>                             job on YARN with the spark runner because
>> >>                             hadoop-common includes
>> >>                             protobuf-java 2 and I had to explicitly
>> provide
>> >>                             protocol-buffers 3 as
>> >>                             a dependency to be able to use triggers
>> (note the
>> >>                             Spark runner
>> >>                             translates them using some method from
>> >>                             runners/core-java). Since
>> >>                             hadoop-common is provided in the cluster
>> with the
>> >>                             older version of
>> >>                             protobuf, I am afraid that this will bite
>> us in
>> >>                             the future.
>> >>
>> >>                             Ismaël
>> >>
>> >>                             ps. There is already a JIRA for that
>> shading for
>> >>                             protobuf on
>> >>                             hadoop-common but this is not coming until
>> version
>> >>                             3 is out.
>> >>                             https://issues.apache.org/jir
>> a/browse/HADOOP-13136
>> >>                             <https://issues.apache.org/ji
>> ra/browse/HADOOP-13136>
>> >>
>> >>                             ps2. Extra curious situation is to see
>> that the
>> >>                             dataflow-runner ends
>> >>                             up having guava shaded twice via its
>> shaded version on
>> >>                             core-construction-java.
>> >>
>> >>                             ps3. Of course this message means a
>> de-facto +1 at
>> >>                             least to do it for
>> >>                             guava and evaluate it for other libs.
>> >>
>> >>
>> >>                             On Tue, Oct 17, 2017 at 7:29 PM, Lukasz
>> Cwik
>> >>                             <lc...@google.com.invalid
>> >>                             <mailto:lc...@google.com.invalid>> wrote:
>> >>
>> >>                                 An issue to call out is how to deal
>> with our
>> >>                                 generated code (.avro and
>> >>                                 .proto) as I don't believe those
>> plugins allow
>> >>                                 you to generate code using a
>> >>                                 shaded package prefix on imports.
>> >>
>> >>                                 On Tue, Oct 17, 2017 at 10:28 AM,
>> Thomas Groh
>> >>                                 <tg...@google.com.invalid
>> >>                                 <mailto:tg...@google.com.invalid>>
>> >>                                 wrote:
>> >>
>> >>                                     +1 to the goal. I'm hugely in
>> favor of not
>> >>                                     doing the same shading work
>> >>                                     every time for dependencies we
>> know we'll use.
>> >>
>> >>                                     This also means that if we end up
>> pulling
>> >>                                     in transitive dependencies we
>> >>                                     don't want in any particular
>> module we can
>> >>                                     avoid having to adjust our
>> >>                                     repackaging strategy for that
>> module -
>> >>                                     which I have run into face-first in
>> >>                                     the past.
>> >>
>> >>                                     On Tue, Oct 17, 2017 at 9:48 AM,
>> Kenneth
>> >>                                     Knowles <k...@google.com.invalid
>> >>                                     <mailto:k...@google.com.invalid>>
>> >>                                     wrote:
>> >>
>> >>                                         Hi all,
>> >>
>> >>                                         Shading is a big part of how
>> we keep
>> >>                                         our dependencies sane in Beam.
>> But
>> >>                                         downsides: shading is super
>> slow,
>> >>                                         causes massive jar bloat, and
>> kind of
>> >>                                         hard to get right because
>> artifacts
>> >>                                         and namespaces are not 1-to-1.
>> >>
>> >>                                         I know that some communities
>> >>                                         distribute their own shaded
>> >>                                         distributions of
>> >>                                         dependencies. I had a thought
>> about
>> >>                                         doing something similar that I
>> wanted
>> >>                                         to throw out there for people
>> to poke
>> >>                                         holes in.
>> >>
>> >>                                         To set the scene, here is how
>> I view
>> >>                                         shading:
>> >>
>> >>                                           - A module has public
>> dependencies
>> >>                                         and private dependencies.
>> >>                                           - Public deps are used for
>> data
>> >>                                         interchange; users must share
>> these
>> >>
>> >>                                     deps.
>> >>
>> >>                                           - Private deps are just
>> >>                                         functionality and can be
>> hidden (in
>> >>                                         our case,
>> >>                                         relocated + bundled)
>> >>                                           - It isn't necessarily that
>> simple,
>> >>                                         because public and private deps
>> >>
>> >>                                     might
>> >>
>> >>                                         interact in higher-order ways
>> >>                                         ("public" is contagious)
>> >>
>> >>                                         Shading is an implementation
>> detail of
>> >>                                         expressing these
>> characteristics.
>> >>
>> >>                                     We
>> >>
>> >>                                         use shading selectively
>> because of its
>> >>                                         downsides I mentioned above.
>> >>
>> >>                                         But what about this idea:
>> Introduce
>> >>                                         shaded deps as a single
>> separate
>> >>                                         artifact.
>> >>
>> >>                                           - sdks/java/private-deps:
>> bundled
>> >>                                         uber jar with relocated
>> versions of
>> >>                                         everything we want to shade
>> >>
>> >>                                           - sdks/java/core and
>> >>                                         sdks/java/harness: no
>> relocation or
>> >>                                         bundling -
>> >>                                         depends on
>> >>                                         `beam-sdks-java-private-deps`
>> and
>> >>                                         imports like
>> >>                                         `org.apache.beam.sdk.private.
>> com.google.common`
>> >>                                         directly (this is what
>> >>                                         they
>> >>                                         are rewritten to
>> >>
>> >>                                         Some benefits
>> >>
>> >>                                           - much faster builds of
>> other modules
>> >>                                           - only one shaded uber jar
>> >>                                           - rare/no rebuilds of the
>> uber jar
>> >>                                           - can use maven enforcer to
>> forbid
>> >>                                         imports like com.google.common
>> >>                                           - configuration all in one
>> place
>> >>                                           - no automated rewriting of
>> our real
>> >>                                         code, which has led to some
>> major
>> >>                                         confusion
>> >>                                           - easy to implement
>> incrementally
>> >>
>> >>                                         Downsides:
>> >>
>> >>                                           - plenty of effort work to
>> get there
>> >>                                           - unclear how many different
>> such
>> >>                                         deps modules we need; sharing
>> them
>> >>
>> >>                                     could
>> >>
>> >>                                         get weird
>> >>                                           - if we hit a roadblock, we
>> will
>> >>                                         have committed a lot of time
>> >>
>> >>                                         Just something I was musing as
>> I spent
>> >>                                         another evening waiting for
>> slow
>> >>                                         builds to try to confirm
>> changes to
>> >>                                         brittle poms.
>> >>
>> >>                                         Kenn
>> >>
>> >>
>> >>
>> >>                         --
>> >>                         Jean-Baptiste Onofré
>> >>                         jbono...@apache.org <mailto:
>> jbono...@apache.org>
>> >>                         http://blog.nanthrax.net <
>> http://blog.nanthrax.net/>
>> >>                         Talend - http://www.talend.com <
>> http://www.talend.com/>
>> >>
>> >>
>> >>
>> >>
>> >
>> >
>> >
>>
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>
>

Reply via email to