2018-02-02 15:37 GMT+01:00 Kenneth Knowles <k...@google.com>:

> Another couple:
>
>  - User-facing TypeDescriptor is a very thin wrapper on Guava's TypeToken
>

Technically reflect Type is enough


>  - ImmutableList and friends and their builders are very widely used and
> IMO still add a lot for readability and preventing someone coming along and
> adding mistakes to a codebase
>

Sugar but not required. When you compare the cost of a shade versus of
duplicating the parts we need there is no real match IMHO.


>
> So considering it all, I would keep a vendored Guava (but also move off
> where we can, and also have our own improvements). I hope it will be a
> near-empty build file to generate it so not a maintenance burden.
>
> Kenn
>
> On Thu, Feb 1, 2018 at 8:44 PM, Kenneth Knowles <k...@google.com> wrote:
>
>> Nice. This sounds like a great idea (or two?) and goes along with what I
>> just started for futures.
>>
>> Guava: filed https://issues.apache.org/jira/browse/BEAM-3606 and
>> assigned to Ismaël :-) and converted my futures thing to a subtask.
>>
>> Specific things for our micro guava:
>>
>>  - checkArgumentNotNull can throw the right exception
>>  - our own Optional because Java's is not Serializable
>>  - futures combinators since many are missing, especially Java's don't do
>> exceptions right
>>
>> Protobuf: didn't file an issue because I'm not sure
>>
>> I was wondering if pre-shading works. We really need to get rid of it
>> from public APIs in a 100% reliable way. It is also a problem for Dataflow.
>> I was wondering if one approach is to pre-shade gcpio-protobuf-java,
>> gcpio-protobuf-java-util, gcpio-grpc-java, etc. Anything that needs to take
>> a Message object. (and do the same for beam-model-protobuf-java since the
>> model bits have to depend on each other but nothing else).
>>
>> Kenn
>>
>> On Thu, Feb 1, 2018 at 1:56 AM, Ismaël Mejía <ieme...@gmail.com> wrote:
>>
>>> Huge +1 to get rid of Guava!
>>>
>>> This solves annoying dependency issues for some IOs and allow us to
>>> get rid of the shading that makes current jars bigger than they should
>>> be.
>>>
>>> We can create our own 'micro guava' package with some classes for
>>> things that are hard to migrate, or that we  prefer to still have like
>>> the check* methods for example. Given the size of the task we should
>>> probably divide it into subtasks, more important is to get rid of it
>>> for 'sdks/java/core'. We can then attack other areas afterwards.
>>>
>>> Other important idea would be to get rid of Protobuf in public APIs
>>> like GCPIO and to better shade it from leaking into the runners. An
>>> unexpected side effect of this is a leak of netty via gRPC/protobuf
>>> that is byting us for the Spark runner, but well that's worth a
>>> different discussion.
>>>
>>>
>>> On Thu, Feb 1, 2018 at 10:08 AM, Romain Manni-Bucau
>>> <rmannibu...@gmail.com> wrote:
>>> > a map of list is fine and not a challenge we'll face long I hope ;)
>>> >
>>> >
>>> > Romain Manni-Bucau
>>> > @rmannibucau |  Blog | Old Blog | Github | LinkedIn
>>> >
>>> > 2018-02-01 9:40 GMT+01:00 Reuven Lax <re...@google.com>:
>>> >>
>>> >> Not sure we'll be able to replace them all. Things like guava Table
>>> and
>>> >> Multimap don't have great replacements in Java8.
>>> >>
>>> >> On Wed, Jan 31, 2018 at 10:11 PM, Jean-Baptiste Onofré <
>>> j...@nanthrax.net>
>>> >> wrote:
>>> >>>
>>> >>> +1, it was on my TODO for a while waiting the Java8 update.
>>> >>>
>>> >>> Regards
>>> >>> JB
>>> >>>
>>> >>> On 02/01/2018 06:56 AM, Romain Manni-Bucau wrote:
>>> >>> > Why not dropping guava for all beam codebase? With java 8 it is
>>> quite
>>> >>> > easy to do
>>> >>> > it and avoid a bunch of conflicts. Did it in 2 projects with quite
>>> a
>>> >>> > good result.
>>> >>> >
>>> >>> > Le 1 févr. 2018 06:50, "Lukasz Cwik" <lc...@google.com
>>> >>> > <mailto:lc...@google.com>> a écrit :
>>> >>> >
>>> >>> >     Make sure to include the guava version in the artifact name so
>>> that
>>> >>> > we can
>>> >>> >     have multiple vendored versions.
>>> >>> >
>>> >>> >     On Wed, Jan 31, 2018 at 9:16 PM, Kenneth Knowles <
>>> k...@google.com
>>> >>> >     <mailto:k...@google.com>> wrote:
>>> >>> >
>>> >>> >         I didn't have time for this, but it just bit me. We
>>> definitely
>>> >>> > have
>>> >>> >         Guava on the API surface of runner support code in ways
>>> that
>>> >>> > get
>>> >>> >         incompatibly shaded. I will probably start "1a" by making a
>>> >>> > shaded
>>> >>> >         library org.apache.beam:vendored-guava and starting to use
>>> it.
>>> >>> > It sounds
>>> >>> >         like there is generally unanimous support for that much,
>>> >>> > anyhow.
>>> >>> >
>>> >>> >         Kenn
>>> >>> >
>>> >>> >         On Wed, Dec 13, 2017 at 8:31 AM, Aljoscha Krettek
>>> >>> > <aljos...@apache.org
>>> >>> >         <mailto:aljos...@apache.org>> wrote:
>>> >>> >
>>> >>> >             Thanks Ismaël for bringing up this discussion again!
>>> >>> >
>>> >>> >             I would be in favour of 1) and more specifically of 1a)
>>> >>> >
>>> >>> >             Aljoscha
>>> >>> >
>>> >>> >
>>> >>> >>             On 12. Dec 2017, at 18:56, Lukasz Cwik <
>>> lc...@google.com
>>> >>> >>             <mailto:lc...@google.com>> wrote:
>>> >>> >>
>>> >>> >>             You can always run tests on post shaded artifacts
>>> instead
>>> >>> >> of the
>>> >>> >>             compiled classes, it just requires us to change our
>>> maven
>>> >>> >> surefire
>>> >>> >>             / gradle test configurations but it is true that most
>>> IDEs
>>> >>> >> would
>>> >>> >>             behave better with a dependency jar unless you
>>> delegate
>>> >>> >> all the
>>> >>> >>             build/test actions to the build system and then it
>>> won't
>>> >>> >> matter.
>>> >>> >>
>>> >>> >>             On Mon, Dec 11, 2017 at 9:05 PM, Kenneth Knowles
>>> >>> >> <k...@google.com
>>> >>> >>             <mailto:k...@google.com>> wrote:
>>> >>> >>
>>> >>> >>                 There's also, with additional overhead,
>>> >>> >>
>>> >>> >>                 1a) A relocated and shipped package for each
>>> thing we
>>> >>> >> want to
>>> >>> >>                 relocate. I think this has also been tried outside
>>> >>> >> Beam...
>>> >>> >>
>>> >>> >>                 Pros:
>>> >>> >>                 * all the pros of 1) plus no bloat beyond what is
>>> >>> >> necessary
>>> >>> >>                 Cons:
>>> >>> >>                 * abandons whitelist approach for public deps,
>>> >>> >> reverting to
>>> >>> >>                 blacklist approach for trouble things like guava,
>>> so a
>>> >>> >> bit
>>> >>> >>                 less principled
>>> >>> >>
>>> >>> >>                 For both 1) and 1a) I would add:
>>> >>> >>
>>> >>> >>                 Pros:
>>> >>> >>                 * clearly readable dependency since code will
>>> `import
>>> >>> >>                 org.apache.beam.private.guava21` and IDEs will
>>> >>> >> understand it
>>> >>> >>                 is a distinct lilbrary
>>> >>> >>                 * can run tests on unpackaged classes, as long as
>>> deps
>>> >>> >> are
>>> >>> >>                 shaded or provided as jars
>>> >>> >>                 * no mysterious action at a distance from
>>> inherited
>>> >>> >> configuration
>>> >>> >>                 Cons:
>>> >>> >>                 * need to adjust imports
>>> >>> >>
>>> >>> >>                 Kenn
>>> >>> >>
>>> >>> >>                 On Mon, Dec 11, 2017 at 9:57 AM, Lukasz Cwik
>>> >>> >> <lc...@google.com
>>> >>> >>                 <mailto:lc...@google.com>> wrote:
>>> >>> >>
>>> >>> >>                     I would suggest that either we use:
>>> >>> >>                     1) A common deps package containing shaded
>>> >>> >> dependencies
>>> >>> >>                     allows for
>>> >>> >>                     Pros
>>> >>> >>                     * doesn't require the user to build an uber
>>> jar
>>> >>> >>                     Risks
>>> >>> >>                     * dependencies package will keep growing even
>>> if
>>> >>> >> something
>>> >>> >>                     is or isn't needed by all of Apache Beam
>>> leading
>>> >>> >> to a
>>> >>> >>                     large jar anyways negating any space savings
>>> >>> >>
>>> >>> >>                     2) Shade within each module to a common
>>> location
>>> >>> >> like
>>> >>> >>                     org.apache.beam.relocated.guava....
>>> >>> >>                     Pros
>>> >>> >>                     * you only get the shaded dependencies of the
>>> >>> >> things that
>>> >>> >>                     are required
>>> >>> >>                     * its one less dependency for users to manage
>>> >>> >>                     Risks
>>> >>> >>                     * requires an uber jar to be built to get the
>>> >>> >> space
>>> >>> >>                     savings (either by a user or a distribution of
>>> >>> >> Apache
>>> >>> >>                     Beam) otherwise we negate any space savings.
>>> >>> >>
>>> >>> >>                     If we either use a common relocation scheme
>>> or a
>>> >>> >>                     dependencies jar then each relocation should
>>> >>> >> specifically
>>> >>> >>                     contain the version number of the package
>>> because
>>> >>> >> we would
>>> >>> >>                     like to allow for us to be using two different
>>> >>> >> versions of
>>> >>> >>                     the same library.
>>> >>> >>
>>> >>> >>                     For the common deps package approach, should
>>> we
>>> >>> >> check in
>>> >>> >>                     code where the imports contain the relocated
>>> >>> >> location
>>> >>> >>                     (e.g. import org.apache.beam.guava.20.0.com
>>> >>> >>
>>> >>> >> <http://org.apache.beam.guava.20.0.com/>.google.common.colle
>>> ct.ImmutableList)?
>>> >>> >>
>>> >>> >>
>>> >>> >>                     On Mon, Dec 11, 2017 at 8:47 AM, Jean-Baptiste
>>> >>> >> Onofré
>>> >>> >>                     <j...@nanthrax.net <mailto:j...@nanthrax.net>>
>>> wrote:
>>> >>> >>
>>> >>> >>                         Thanks for bringing that back.
>>> >>> >>
>>> >>> >>                         Indeed guava is shaded in different
>>> uber-jar.
>>> >>> >> Maybe we
>>> >>> >>                         can have a common deps module that we
>>> include
>>> >>> >> once
>>> >>> >>                         (but the user will have to explicitly
>>> define
>>> >>> >> the dep) ?
>>> >>> >>
>>> >>> >>                         Basically, what do you propose for
>>> protobuf
>>> >>> >>                         (unfortunately, I don't see an obvious) ?
>>> >>> >>
>>> >>> >>                         Regards
>>> >>> >>                         JB
>>> >>> >>
>>> >>> >>
>>> >>> >>                         On 12/11/2017 05:35 PM, Ismaël Mejía
>>> wrote:
>>> >>> >>
>>> >>> >>                             Hello, I wanted to bring back this
>>> subject
>>> >>> >> because
>>> >>> >>                             I think we should
>>> >>> >>                             take action on this and at least first
>>> >>> >> have a
>>> >>> >>                             shaded version of guava.
>>> >>> >>                             I was playing with a toy project and
>>> I did
>>> >>> >> the
>>> >>> >>                             procedure we use to
>>> >>> >>                             submit jars to a Hadoop cluster via
>>> >>> >> Flink/Spark
>>> >>> >>                             which involves
>>> >>> >>                             creating an uber jar and I realized
>>> that
>>> >>> >> the size
>>> >>> >>                             of the jar was way
>>> >>> >>                             bigger than I expected, and the fact
>>> that
>>> >>> >> we shade
>>> >>> >>                             guava in every
>>> >>> >>                             module contributes to this. I found
>>> guava
>>> >>> >> shaded on:
>>> >>> >>
>>> >>> >>                             sdks/java/core
>>> >>> >>                             runners/core-construction-java
>>> >>> >>                             runners/core-java
>>> >>> >>                             model/job-management
>>> >>> >>                             runners/spark
>>> >>> >>                             sdks/java/io/hadoop-file-system
>>> >>> >>                             sdks/java/io/kafka
>>> >>> >>
>>> >>> >>                             This means at least 6 times more of
>>> the
>>> >>> >> size it
>>> >>> >>                             should which counts in
>>> >>> >>                             around 15MB more (2.4MB*6 deps) of
>>> extra
>>> >>> >> weight
>>> >>> >>                             that we can simply
>>> >>> >>                             reduce by using a shaded version of
>>> the
>>> >>> >> library.
>>> >>> >>
>>> >>> >>                             I add this point to the previous ones
>>> >>> >> mentioned
>>> >>> >>                             during the discussion
>>> >>> >>                             because this goes against the end user
>>> >>> >> experience
>>> >>> >>                             and affects us all
>>> >>> >>                             (devs/users).
>>> >>> >>
>>> >>> >>                             Another question is if we should shade
>>> >>> >> (and how)
>>> >>> >>                             protocol buffers
>>> >>> >>                             because now with the portability work
>>> we
>>> >>> >> are
>>> >>> >>                             exposing it closer to the
>>> >>> >>                             end users. I say this because I also
>>> found
>>> >>> >> an
>>> >>> >>                             issue while running a
>>> >>> >>                             job on YARN with the spark runner
>>> because
>>> >>> >>                             hadoop-common includes
>>> >>> >>                             protobuf-java 2 and I had to
>>> explicitly
>>> >>> >> provide
>>> >>> >>                             protocol-buffers 3 as
>>> >>> >>                             a dependency to be able to use
>>> triggers
>>> >>> >> (note the
>>> >>> >>                             Spark runner
>>> >>> >>                             translates them using some method from
>>> >>> >>                             runners/core-java). Since
>>> >>> >>                             hadoop-common is provided in the
>>> cluster
>>> >>> >> with the
>>> >>> >>                             older version of
>>> >>> >>                             protobuf, I am afraid that this will
>>> bite
>>> >>> >> us in
>>> >>> >>                             the future.
>>> >>> >>
>>> >>> >>                             Ismaël
>>> >>> >>
>>> >>> >>                             ps. There is already a JIRA for that
>>> >>> >> shading for
>>> >>> >>                             protobuf on
>>> >>> >>                             hadoop-common but this is not coming
>>> until
>>> >>> >> version
>>> >>> >>                             3 is out.
>>> >>> >>
>>> >>> >> https://issues.apache.org/jira/browse/HADOOP-13136
>>> >>> >>
>>> >>> >> <https://issues.apache.org/jira/browse/HADOOP-13136>
>>> >>> >>
>>> >>> >>                             ps2. Extra curious situation is to see
>>> >>> >> that the
>>> >>> >>                             dataflow-runner ends
>>> >>> >>                             up having guava shaded twice via its
>>> >>> >> shaded version on
>>> >>> >>                             core-construction-java.
>>> >>> >>
>>> >>> >>                             ps3. Of course this message means a
>>> >>> >> de-facto +1 at
>>> >>> >>                             least to do it for
>>> >>> >>                             guava and evaluate it for other libs.
>>> >>> >>
>>> >>> >>
>>> >>> >>                             On Tue, Oct 17, 2017 at 7:29 PM,
>>> Lukasz
>>> >>> >> Cwik
>>> >>> >>                             <lc...@google.com.invalid
>>> >>> >>                             <mailto:lc...@google.com.invalid>>
>>> wrote:
>>> >>> >>
>>> >>> >>                                 An issue to call out is how to
>>> deal
>>> >>> >> with our
>>> >>> >>                                 generated code (.avro and
>>> >>> >>                                 .proto) as I don't believe those
>>> >>> >> plugins allow
>>> >>> >>                                 you to generate code using a
>>> >>> >>                                 shaded package prefix on imports.
>>> >>> >>
>>> >>> >>                                 On Tue, Oct 17, 2017 at 10:28 AM,
>>> >>> >> Thomas Groh
>>> >>> >>                                 <tg...@google.com.invalid
>>> >>> >>                                 <mailto:tg...@google.com.invalid
>>> >>
>>> >>> >>                                 wrote:
>>> >>> >>
>>> >>> >>                                     +1 to the goal. I'm hugely in
>>> >>> >> favor of not
>>> >>> >>                                     doing the same shading work
>>> >>> >>                                     every time for dependencies we
>>> >>> >> know we'll use.
>>> >>> >>
>>> >>> >>                                     This also means that if we
>>> end up
>>> >>> >> pulling
>>> >>> >>                                     in transitive dependencies we
>>> >>> >>                                     don't want in any particular
>>> >>> >> module we can
>>> >>> >>                                     avoid having to adjust our
>>> >>> >>                                     repackaging strategy for that
>>> >>> >> module -
>>> >>> >>                                     which I have run into
>>> face-first
>>> >>> >> in
>>> >>> >>                                     the past.
>>> >>> >>
>>> >>> >>                                     On Tue, Oct 17, 2017 at 9:48
>>> AM,
>>> >>> >> Kenneth
>>> >>> >>                                     Knowles
>>> <k...@google.com.invalid
>>> >>> >>                                     <mailto:k...@google.com.invali
>>> d>>
>>> >>> >>                                     wrote:
>>> >>> >>
>>> >>> >>                                         Hi all,
>>> >>> >>
>>> >>> >>                                         Shading is a big part of
>>> how
>>> >>> >> we keep
>>> >>> >>                                         our dependencies sane in
>>> Beam.
>>> >>> >> But
>>> >>> >>                                         downsides: shading is
>>> super
>>> >>> >> slow,
>>> >>> >>                                         causes massive jar bloat,
>>> and
>>> >>> >> kind of
>>> >>> >>                                         hard to get right because
>>> >>> >> artifacts
>>> >>> >>                                         and namespaces are not
>>> 1-to-1.
>>> >>> >>
>>> >>> >>                                         I know that some
>>> communities
>>> >>> >>                                         distribute their own
>>> shaded
>>> >>> >>                                         distributions of
>>> >>> >>                                         dependencies. I had a
>>> thought
>>> >>> >> about
>>> >>> >>                                         doing something similar
>>> that I
>>> >>> >> wanted
>>> >>> >>                                         to throw out there for
>>> people
>>> >>> >> to poke
>>> >>> >>                                         holes in.
>>> >>> >>
>>> >>> >>                                         To set the scene, here is
>>> how
>>> >>> >> I view
>>> >>> >>                                         shading:
>>> >>> >>
>>> >>> >>                                           - A module has public
>>> >>> >> dependencies
>>> >>> >>                                         and private dependencies.
>>> >>> >>                                           - Public deps are used
>>> for
>>> >>> >> data
>>> >>> >>                                         interchange; users must
>>> share
>>> >>> >> these
>>> >>> >>
>>> >>> >>                                     deps.
>>> >>> >>
>>> >>> >>                                           - Private deps are just
>>> >>> >>                                         functionality and can be
>>> >>> >> hidden (in
>>> >>> >>                                         our case,
>>> >>> >>                                         relocated + bundled)
>>> >>> >>                                           - It isn't necessarily
>>> that
>>> >>> >> simple,
>>> >>> >>                                         because public and private
>>> >>> >> deps
>>> >>> >>
>>> >>> >>                                     might
>>> >>> >>
>>> >>> >>                                         interact in higher-order
>>> ways
>>> >>> >>                                         ("public" is contagious)
>>> >>> >>
>>> >>> >>                                         Shading is an
>>> implementation
>>> >>> >> detail of
>>> >>> >>                                         expressing these
>>> >>> >> characteristics.
>>> >>> >>
>>> >>> >>                                     We
>>> >>> >>
>>> >>> >>                                         use shading selectively
>>> >>> >> because of its
>>> >>> >>                                         downsides I mentioned
>>> above.
>>> >>> >>
>>> >>> >>                                         But what about this idea:
>>> >>> >> Introduce
>>> >>> >>                                         shaded deps as a single
>>> >>> >> separate
>>> >>> >>                                         artifact.
>>> >>> >>
>>> >>> >>                                           -
>>> sdks/java/private-deps:
>>> >>> >> bundled
>>> >>> >>                                         uber jar with relocated
>>> >>> >> versions of
>>> >>> >>                                         everything we want to
>>> shade
>>> >>> >>
>>> >>> >>                                           - sdks/java/core and
>>> >>> >>                                         sdks/java/harness: no
>>> >>> >> relocation or
>>> >>> >>                                         bundling -
>>> >>> >>                                         depends on
>>> >>> >>
>>>  `beam-sdks-java-private-deps`
>>> >>> >> and
>>> >>> >>                                         imports like
>>> >>> >>
>>> >>> >> `org.apache.beam.sdk.private.com.google.common`
>>> >>> >>                                         directly (this is what
>>> >>> >>                                         they
>>> >>> >>                                         are rewritten to
>>> >>> >>
>>> >>> >>                                         Some benefits
>>> >>> >>
>>> >>> >>                                           - much faster builds of
>>> >>> >> other modules
>>> >>> >>                                           - only one shaded uber
>>> jar
>>> >>> >>                                           - rare/no rebuilds of
>>> the
>>> >>> >> uber jar
>>> >>> >>                                           - can use maven
>>> enforcer to
>>> >>> >> forbid
>>> >>> >>                                         imports like
>>> com.google.common
>>> >>> >>                                           - configuration all in
>>> one
>>> >>> >> place
>>> >>> >>                                           - no automated
>>> rewriting of
>>> >>> >> our real
>>> >>> >>                                         code, which has led to
>>> some
>>> >>> >> major
>>> >>> >>                                         confusion
>>> >>> >>                                           - easy to implement
>>> >>> >> incrementally
>>> >>> >>
>>> >>> >>                                         Downsides:
>>> >>> >>
>>> >>> >>                                           - plenty of effort work
>>> to
>>> >>> >> get there
>>> >>> >>                                           - unclear how many
>>> different
>>> >>> >> such
>>> >>> >>                                         deps modules we need;
>>> sharing
>>> >>> >> them
>>> >>> >>
>>> >>> >>                                     could
>>> >>> >>
>>> >>> >>                                         get weird
>>> >>> >>                                           - if we hit a
>>> roadblock, we
>>> >>> >> will
>>> >>> >>                                         have committed a lot of
>>> time
>>> >>> >>
>>> >>> >>                                         Just something I was
>>> musing as
>>> >>> >> I spent
>>> >>> >>                                         another evening waiting
>>> for
>>> >>> >> slow
>>> >>> >>                                         builds to try to confirm
>>> >>> >> changes to
>>> >>> >>                                         brittle poms.
>>> >>> >>
>>> >>> >>                                         Kenn
>>> >>> >>
>>> >>> >>
>>> >>> >>
>>> >>> >>                         --
>>> >>> >>                         Jean-Baptiste Onofré
>>> >>> >>                         jbono...@apache.org
>>> >>> >> <mailto:jbono...@apache.org>
>>> >>> >>                         http://blog.nanthrax.net
>>> >>> >> <http://blog.nanthrax.net/>
>>> >>> >>                         Talend - http://www.talend.com
>>> >>> >> <http://www.talend.com/>
>>> >>> >>
>>> >>> >>
>>> >>> >>
>>> >>> >>
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>>
>>> >>> --
>>> >>> Jean-Baptiste Onofré
>>> >>> jbono...@apache.org
>>> >>> http://blog.nanthrax.net
>>> >>> Talend - http://www.talend.com
>>> >>
>>> >>
>>> >
>>>
>>
>>
>

Reply via email to