I would prefer very explicit artifact names, to avoid confusion or
collision where the jar files end up in a flat namespace (like a /lib
folder).

Example:

groupId: org.apache.beam
artifactId beam-vendored-guava-20_0

Thanks,
Thomas






On Wed, Oct 24, 2018 at 11:31 AM Kenneth Knowles <k...@apache.org> wrote:

> OK. I just opened https://github.com/apache/beam/pull/6809 to push Guava
> through. I made some comments there, and also I agree with Luke that full
> version string makes sense. For this purpose it seems easy and fine to do a
> search/replace to swap 20.0 for 20.1, and compatibility between them should
> not be a concern.
>
> I have minor suggestions and clarifications:
>
>  - Is there value to `beam` in the artifactId? I would leave it off unless
> there's a special need
>  - Users should never use these and we make it extremely clear they are
> not supported for any reasons
>  - Use 0.x versions indicating no intention of semantic versioning
>
> Bringing my comments and Luke's together, here's the proposal:
>
>     groupId: org.apache.beam
>     artifactId: vendored-guava-20_0
>     namespace: org.apache.beam.vendored.guava.v20_0
>     version: 0.1
>
> Alternatively it could be
>
>     groupId: org.apache.beam-vendored
>     artifactid: guava-20_0
>     namespace: org.apache.beam.vendored.guava.v20_0
>     version: 0.1
>
> I like the latter but I haven't gone through the process of establishing a
> new groupId.
>
> And for now we do not publish source jars. A couple of TODOs to get the
> build in good shape (classifiers, jars, interaction with plugins)
>
> Kenn
>
>
> On Wed, Oct 24, 2018 at 10:13 AM Lukasz Cwik <lc...@google.com> wrote:
>
>> It looks like we are agreeing to make each vendored dependency self
>> contained and have all their own internal dependencies packaged. For
>> example, gRPC and all its transitive dependencies would use
>> org.apache.beam.vendored.grpc.vYYY and Calcite and all its transitive
>> dependencies would use org.apache.beam.vendored.calcite.vZZZ.
>>
>> I also wanted to circle back on this question I had earlier that didn't
>> have any follow-up:
>> Currently we are relocating code depending on the version string. If the
>> major version is >= 1, we use only the major version within the package
>> string and rely on semantic versioning provided by the dependency to not
>> break people. If the major version is 0, we assume the dependency is
>> unstable and use the full version as part of the package string during
>> relocation.
>>
>> The downside of using the full version string for relocated packages:
>> 1) Users will end up with multiple copies of dependencies that differ
>> only by the minor or patch version increasing the size of their application.
>> 2) Bumping up the version of a dependency now requires the import
>> statement in all java files to be updated (not too difficult with some
>> sed/grep skills)
>>
>> The upside of using the full version string in the relocated package:
>> 1) We don't have to worry about whether a dependency maintains semantic
>> versioning which means our users won't have to worry about that either.
>> 2) This increases the odds that a user will load multiple slightly
>> different versions of the same dependency which is known to be incompatible
>> in certain situations (e.g. Netty 4.1.25 can't be on the classpath with
>> Netty 4.1.28 even though they are both shaded due to issues of how JNI with
>> tcnative works).
>>
>> My preference would be to use the full version string for import
>> statements (so org.apache.beam.vendor.grpc.v1_13_1...) since this would
>> allow multiple copies to not conflict with each other since in my opinion
>> it is a lot more difficult to help a user debug a dependency issue then to
>> use string replacement during dependency upgrades to fix import statements.
>> Also I would suggest we name the artifacts in Maven as follows:
>> groupId: org.apache.beam
>> artifactId: beam-vendor-grpc-v1_13_1
>> version: 1.0.0 (first version and subsequent versions such as 1.0.1 are
>> only for patch upgrades that fix any shading issues we may have had when
>> producing the vendored jar)
>>
>>
>> On Wed, Oct 24, 2018 at 6:01 AM Maximilian Michels <m...@apache.org>
>> wrote:
>>
>>> Would also keep it simple and optimize for the JAR size only if
>>> necessary.
>>>
>>> On 24.10.18 00:06, Kenneth Knowles wrote:
>>> > I think it makes sense for each vendored dependency to be
>>> self-contained
>>> > as much as possible. It should keep it fairly simple. Things that
>>> cross
>>> > their API surface cannot be hidden, of course. Jar size is not a
>>> concern
>>> > IMO.
>>> >
>>> > Kenn
>>> >
>>> > On Tue, Oct 23, 2018 at 9:05 AM Lukasz Cwik <lc...@google.com
>>> > <mailto:lc...@google.com>> wrote:
>>> >
>>> >     How should we handle the transitive dependencies of the things we
>>> >     want to vendor?
>>> >
>>> >     For example we use gRPC which depends on Guava 20 and we also use
>>> >     Calcite which depends on Guava 19.
>>> >
>>> >     Should the vendored gRPC/Calcite/... be self-contained so it
>>> >     contains all its dependencies, hence vendored gRPC would contain
>>> >     Guava 20 and vendored Calcite would contain Guava 19 (both under
>>> >     different namespaces)?
>>> >     This leads to larger jars but less vendored dependencies to
>>> maintain.
>>> >
>>> >     Or should we produce a vendored library for those that we want to
>>> >     share, e.g. Guava 20 that could be reused across multiple vendored
>>> >     libraries?
>>> >     Makes the vendoring process slightly more complicated, more
>>> >     dependencies to maintain, smaller jars.
>>> >
>>> >     Or should we produce a vendored library for each dependency?
>>> >     Lots of vendoring needed, likely tooling required to be built to
>>> >     maintain this.
>>> >
>>> >
>>> >
>>> >
>>> >     On Tue, Oct 23, 2018 at 8:46 AM Kenneth Knowles <k...@google.com
>>> >     <mailto:k...@google.com>> wrote:
>>> >
>>> >         I actually created the subtasks by finding things shaded by at
>>> >         least one module. I think each one should definitely have an on
>>> >         list discussion that clarifies the target artifact, namespace,
>>> >         version, possible complications, etc.
>>> >
>>> >         My impression is that many many modules shade only Guava. So
>>> for
>>> >         build time and simplification that is a big win.
>>> >
>>> >         Kenn
>>> >
>>> >         On Tue, Oct 23, 2018, 08:16 Thomas Weise <t...@apache.org
>>> >         <mailto:t...@apache.org>> wrote:
>>> >
>>> >             +1 for separate artifacts
>>> >
>>> >             I would request that we explicitly discuss and agree which
>>> >             dependencies we vendor though.
>>> >
>>> >             Not everything listed in the JIRA subtasks is currently
>>> >             relocated.
>>> >
>>> >             Thomas
>>> >
>>> >
>>> >             On Tue, Oct 23, 2018 at 8:04 AM David Morávek
>>> >             <david.mora...@gmail.com <mailto:david.mora...@gmail.com>>
>>> >             wrote:
>>> >
>>> >                 +1 This should improve build times a lot. It would be
>>> >                 great if vendored deps could stay in the main
>>> repository.
>>> >
>>> >                 D.
>>> >
>>> >                 On Tue, Oct 23, 2018 at 12:21 PM Maximilian Michels
>>> >                 <m...@apache.org <mailto:m...@apache.org>> wrote:
>>> >
>>> >                     Looks great, Kenn!
>>> >
>>> >                      > Max: what is the story behind having a separate
>>> >                     flink-shaded repo? Did that make it easier to
>>> manage
>>> >                     in some way?
>>> >
>>> >                     Better separation of concerns, but I don't think
>>> >                     releasing the shaded
>>> >                     artifacts from the main repo is a problem. I'd even
>>> >                     prefer not to split
>>> >                     up the repo because it makes updates to the
>>> vendored
>>> >                     dependencies
>>> >                     slightly easier.
>>> >
>>> >                     On 23.10.18 03:25, Kenneth Knowles wrote:
>>> >                      > OK, I've filed
>>> >                     https://issues.apache.org/jira/browse/BEAM-5819 to
>>> >                      > collect sub-tasks. This has enough upsides
>>> >                     throughout lots of areas of
>>> >                      > the project that even though it is not glamorous
>>> >                     it seems pretty
>>> >                      > valuable to start on immediately. And I want to
>>> >                     find out if there's a
>>> >                      > pitfall lurking.
>>> >                      >
>>> >                      > Max: what is the story behind having a separate
>>> >                     flink-shaded repo? Did
>>> >                      > that make it easier to manage in some way?
>>> >                      >
>>> >                      > Kenn
>>> >                      >
>>> >                      > On Mon, Oct 22, 2018 at 2:55 AM Maximilian
>>> >                     Michels <m...@apache.org <mailto:m...@apache.org>
>>> >                      > <mailto:m...@apache.org <mailto:m...@apache.org
>>> >>>
>>> >                     wrote:
>>> >                      >
>>> >                      >     +1 for publishing vendored Jars
>>> >                     independently. It will improve build
>>> >                      >     time and ease IntelliJ integration.
>>> >                      >
>>> >                      >     Flink also publishes shaded dependencies
>>> >                     separately:
>>> >                      >
>>> >                      >     - https://github.com/apache/flink-shaded
>>> >                      >     -
>>> >                     https://issues.apache.org/jira/browse/FLINK-6529
>>> >                      >
>>> >                      >     AFAIK their main motivation was to get rid
>>> of
>>> >                     duplicate shaded classes
>>> >                      >     on the classpath. We don't appear to have
>>> >                     that problem because we
>>> >                      >     already have a separate "vendor" project.
>>> >                      >
>>> >                      >      >  - With shading, it is hard (impossible?)
>>> >                     to step into dependency
>>> >                      >     code in IntelliJ's debugger, because the
>>> >                     actual symbol at runtime
>>> >                      >     does not match what is in the external jars
>>> >                      >
>>> >                      >     This would be solved by releasing the
>>> sources
>>> >                     of the shaded jars.
>>> >                      >      From a
>>> >                      >     legal perspective, this could be problematic
>>> >                     as alluded to here:
>>> >                      >
>>> https://github.com/apache/flink-shaded/issues/25
>>> >                      >
>>> >                      >     -Max
>>> >                      >
>>> >                      >     On 20.10.18 01:11, Lukasz Cwik wrote:
>>> >                      >      > I have tried several times to improve the
>>> >                     build system and intellij
>>> >                      >      > integration and each attempt ended with
>>> >                     little progress when dealing
>>> >                      >      > with vendored code. My latest attempt has
>>> >                     been the most promising
>>> >                      >     where
>>> >                      >      > I take the vendored classes/jars and
>>> >                     decompile them generating the
>>> >                      >      > source that Intellij can then use. I have
>>> >                     a branch[1] that
>>> >                      >     demonstrates
>>> >                      >      > the idea. It works pretty well (and up
>>> >                     until a change where we
>>> >                      >     started
>>> >                      >      > vendoring gRPC, was impractical to do.
>>> >                     Instructions to try it out
>>> >                      >     are:
>>> >                      >      >
>>> >                      >      > // Clean up any remnants of prior
>>> >                     builds/intellij projects
>>> >                      >      > git clean -fdx
>>> >                      >      > // Generated the source for
>>> >                     vendored/shaded modules
>>> >                      >      > ./gradlew decompile
>>> >                      >      >
>>> >                      >      > // Remove the "generated" Java sources
>>> for
>>> >                     protos so they don't
>>> >                      >     conflict with the decompiled sources.
>>> >                      >      > rm -rf
>>> >                     model/pipeline/build/generated/source/proto
>>> >                      >      > rm -rf
>>> >                     model/job-management/build/generated/source/proto
>>> >                      >      > rm -rf
>>> >                     model/fn-execution/build/generated/source/proto
>>> >                      >      > // Import the project into Intellij, most
>>> >                     code completion now
>>> >                      >     works still some issues with a few classes.
>>> >                      >      > // Note that the Java decompiler doesn't
>>> >                     generate valid source so
>>> >                      >     still need to delegate to Gradle for
>>> >                     build/run/test actions
>>> >                      >      > // Other decompilers may do a
>>> better/worse
>>> >                     job but haven't tried
>>> >                      >     them.
>>> >                      >      >
>>> >                      >      >
>>> >                      >      > The problems that I face are that the
>>> >                     generated Java source from the
>>> >                      >      > protos and the decompiled source from the
>>> >                     compiled version of that
>>> >                      >      > source post shading are both being
>>> >                     imported as content roots and
>>> >                      >     then
>>> >                      >      > conflict. Also, the CFR decompiler isn't
>>> >                     producing valid source, if
>>> >                      >      > people could try others and report their
>>> >                     mileage, we may find one
>>> >                      >     that
>>> >                      >      > works and then we would be able to use
>>> >                     intellij to build/run our
>>> >                      >     code
>>> >                      >      > and not need to delegate all our
>>> >                     build/run/test actions to Gradle.
>>> >                      >      >
>>> >                      >      > After all these attempts I have done,
>>> >                     vendoring the dependencies
>>> >                      >     outside
>>> >                      >      > of the project seems like a sane approach
>>> >                     and unless someone
>>> >                      >     wants to
>>> >                      >      > take a stab at the best progress I have
>>> >                     made above, I would go
>>> >                      >     with what
>>> >                      >      > Kenn is suggesting even though it will
>>> >                     mean that we will need to
>>> >                      >     perform
>>> >                      >      > releases every time we want to change the
>>> >                     version of one of our
>>> >                      >     vendored
>>> >                      >      > dependencies.
>>> >                      >      >
>>> >                      >      > 1:
>>> >
>>> https://github.com/lukecwik/incubator-beam/tree/intellij
>>> >                      >      >
>>> >                      >      >
>>> >                      >      > On Fri, Oct 19, 2018 at 10:43 AM Kenneth
>>> >                     Knowles <k...@apache.org <mailto:k...@apache.org>
>>> >                      >     <mailto:k...@apache.org <mailto:
>>> k...@apache.org>>
>>> >                      >      > <mailto:k...@apache.org
>>> >                     <mailto:k...@apache.org> <mailto:k...@apache.org
>>> >                     <mailto:k...@apache.org>>>> wrote:
>>> >                      >      >
>>> >                      >      >     Another reason to push on this is to
>>> >                     get build times down.
>>> >                      >     Once only
>>> >                      >      >     generated proto classes use the
>>> shadow
>>> >                     plugin we'll cut the build
>>> >                      >      >     time in ~half? And there is no reason
>>> >                     to constantly re-vendor.
>>> >                      >      >
>>> >                      >      >     Kenn
>>> >                      >      >
>>> >                      >      >     On Fri, Oct 19, 2018 at 10:39 AM
>>> >                     Kenneth Knowles
>>> >                      >     <k...@google.com <mailto:k...@google.com>
>>> >                     <mailto:k...@google.com <mailto:k...@google.com>>
>>> >                      >      >     <mailto:k...@google.com
>>> >                     <mailto:k...@google.com> <mailto:k...@google.com
>>> >                     <mailto:k...@google.com>>>> wrote:
>>> >                      >      >
>>> >                      >      >         Hi all,
>>> >                      >      >
>>> >                      >      >         A while ago we had pretty good
>>> >                     consensus that we should
>>> >                      >     vendor
>>> >                      >      >         ("pre-shade") specific
>>> >                     dependencies, and there's start on it
>>> >                      >      >         here:
>>> >                     https://github.com/apache/beam/tree/master/vendor
>>> >                      >      >
>>> >                      >      >         IntelliJ notes:
>>> >                      >      >
>>> >                      >      >           - With shading, it is hard
>>> >                     (impossible?) to step into
>>> >                      >      >         dependency code in IntelliJ's
>>> >                     debugger, because the actual
>>> >                      >      >         symbol at runtime does not match
>>> >                     what is in the external jars
>>> >                      >      >
>>> >                      >      >
>>> >                      >      > Intellij can step through the classes if
>>> >                     they were published
>>> >                      >     outside the
>>> >                      >      > project since it can decompile them. The
>>> >                     source won't be legible.
>>> >                      >      > Decompiling the source as in my example
>>> >                     effectively shows the
>>> >                      >     same issue.
>>> >                      >      >
>>> >                      >      >           - With vendoring, if the
>>> >                     vendored dependencies are part
>>> >                      >     of the
>>> >                      >      >         project then IntelliJ gets
>>> >                     confused because it operates on
>>> >                      >      >         source, not the produced jars
>>> >                      >      >
>>> >                      >      >
>>> >                      >      > Yes, I tried several ways to get intellij
>>> >                     to ignore the source
>>> >                      >     and use
>>> >                      >      > the output jars but no luck.
>>> >                      >      >
>>> >                      >      >         The second one has a quick fix
>>> for
>>> >                     most cases*: don't
>>> >                      >     make the
>>> >                      >      >         vendored dep a subproject, but
>>> >                     just separately build and
>>> >                      >     publish
>>> >                      >      >         it. Since a vendored dependency
>>> >                     should change much more
>>> >                      >      >         infrequently (or if we bake the
>>> >                     version into the name,
>>> >                      >     ~never)
>>> >                      >      >         this means we publish once and
>>> >                     save headache and build
>>> >                      >     time forever.
>>> >                      >      >
>>> >                      >      >         WDYT? Have I overlooked
>>> something?
>>> >                     How about we set up
>>> >                      >     vendored
>>> >                      >      >         versions of guava, protobuf,
>>> grpc,
>>> >                     and publish them. We don't
>>> >                      >      >         have to actually start using them
>>> >                     yet, and can do it
>>> >                      >     incrementally.
>>> >                      >      >
>>> >                      >      >
>>> >                      >      > Currently we are relocating code
>>> depending
>>> >                     on the version string.
>>> >                      >     If the
>>> >                      >      > major version is >= 1, we use only the
>>> >                     major version within the
>>> >                      >     package
>>> >                      >      > string and rely on semantic versioning
>>> >                     provided by the dependency
>>> >                      >     to not
>>> >                      >      > break people. If the major version is 0,
>>> >                     we assume the dependency is
>>> >                      >      > unstable and use the full version as part
>>> >                     of the package string
>>> >                      >     during
>>> >                      >      > relocation.
>>> >                      >      >
>>> >                      >      > The downside of using the full version
>>> >                     string for relocated packages:
>>> >                      >      > 1) Users will end up with multiple copies
>>> >                     of dependencies that
>>> >                      >     differ
>>> >                      >      > only by the minor or patch version
>>> >                     increasing the size of their
>>> >                      >     application.
>>> >                      >      > 2) Bumping up the version of a dependency
>>> >                     now requires the import
>>> >                      >      > statement in all java files to be updated
>>> >                     (not too difficult with
>>> >                      >     some
>>> >                      >      > sed/grep skills)
>>> >                      >      >
>>> >                      >      > The upside of using the full version
>>> >                     string in the relocated package:
>>> >                      >      > 1) We don't have to worry about whether a
>>> >                     dependency maintains
>>> >                      >     semantic
>>> >                      >      > versioning which means our users won't
>>> >                     have to worry about that
>>> >                      >     either.
>>> >                      >      > 2) This increases the odds that a user
>>> >                     will load multiple slightly
>>> >                      >      > different versions of the same dependency
>>> >                     which is known to be
>>> >                      >      > incompatible in certain situations (e.g.
>>> >                     Netty 4.1.25 can't be on
>>> >                      >     the
>>> >                      >      > classpath with Netty 4.1.28 even though
>>> >                     they are both shaded due to
>>> >                      >      > issues of how JNI with tcnative works).
>>> >                      >      >
>>> >                      >      >
>>> >                      >      >         (side note: what do other
>>> projects
>>> >                     like Flink do?)
>>> >                      >      >
>>> >                      >      >         Kenn
>>> >                      >      >
>>> >                      >      >         *for generated proto classes,
>>> they
>>> >                     need to be altered after
>>> >                      >      >         being generated so shading
>>> happens
>>> >                     there, but actually only
>>> >                      >      >         relocation and the shared
>>> vendored
>>> >                     dep should work
>>> >                      >     elsewhere in
>>> >                      >      >         the project
>>> >                      >      >
>>> >                      >      >
>>> >                      >      > We could publish the protos and treat
>>> them
>>> >                     as "external"
>>> >                      >     dependencies
>>> >                      >      > within the Java projects which would also
>>> >                     remove this pain point.
>>> >                      >
>>> >
>>>
>>

Reply via email to