I would prefer very explicit artifact names, to avoid confusion or collision where the jar files end up in a flat namespace (like a /lib folder).
Example: groupId: org.apache.beam artifactId beam-vendored-guava-20_0 Thanks, Thomas On Wed, Oct 24, 2018 at 11:31 AM Kenneth Knowles <k...@apache.org> wrote: > OK. I just opened https://github.com/apache/beam/pull/6809 to push Guava > through. I made some comments there, and also I agree with Luke that full > version string makes sense. For this purpose it seems easy and fine to do a > search/replace to swap 20.0 for 20.1, and compatibility between them should > not be a concern. > > I have minor suggestions and clarifications: > > - Is there value to `beam` in the artifactId? I would leave it off unless > there's a special need > - Users should never use these and we make it extremely clear they are > not supported for any reasons > - Use 0.x versions indicating no intention of semantic versioning > > Bringing my comments and Luke's together, here's the proposal: > > groupId: org.apache.beam > artifactId: vendored-guava-20_0 > namespace: org.apache.beam.vendored.guava.v20_0 > version: 0.1 > > Alternatively it could be > > groupId: org.apache.beam-vendored > artifactid: guava-20_0 > namespace: org.apache.beam.vendored.guava.v20_0 > version: 0.1 > > I like the latter but I haven't gone through the process of establishing a > new groupId. > > And for now we do not publish source jars. A couple of TODOs to get the > build in good shape (classifiers, jars, interaction with plugins) > > Kenn > > > On Wed, Oct 24, 2018 at 10:13 AM Lukasz Cwik <lc...@google.com> wrote: > >> It looks like we are agreeing to make each vendored dependency self >> contained and have all their own internal dependencies packaged. For >> example, gRPC and all its transitive dependencies would use >> org.apache.beam.vendored.grpc.vYYY and Calcite and all its transitive >> dependencies would use org.apache.beam.vendored.calcite.vZZZ. >> >> I also wanted to circle back on this question I had earlier that didn't >> have any follow-up: >> Currently we are relocating code depending on the version string. If the >> major version is >= 1, we use only the major version within the package >> string and rely on semantic versioning provided by the dependency to not >> break people. If the major version is 0, we assume the dependency is >> unstable and use the full version as part of the package string during >> relocation. >> >> The downside of using the full version string for relocated packages: >> 1) Users will end up with multiple copies of dependencies that differ >> only by the minor or patch version increasing the size of their application. >> 2) Bumping up the version of a dependency now requires the import >> statement in all java files to be updated (not too difficult with some >> sed/grep skills) >> >> The upside of using the full version string in the relocated package: >> 1) We don't have to worry about whether a dependency maintains semantic >> versioning which means our users won't have to worry about that either. >> 2) This increases the odds that a user will load multiple slightly >> different versions of the same dependency which is known to be incompatible >> in certain situations (e.g. Netty 4.1.25 can't be on the classpath with >> Netty 4.1.28 even though they are both shaded due to issues of how JNI with >> tcnative works). >> >> My preference would be to use the full version string for import >> statements (so org.apache.beam.vendor.grpc.v1_13_1...) since this would >> allow multiple copies to not conflict with each other since in my opinion >> it is a lot more difficult to help a user debug a dependency issue then to >> use string replacement during dependency upgrades to fix import statements. >> Also I would suggest we name the artifacts in Maven as follows: >> groupId: org.apache.beam >> artifactId: beam-vendor-grpc-v1_13_1 >> version: 1.0.0 (first version and subsequent versions such as 1.0.1 are >> only for patch upgrades that fix any shading issues we may have had when >> producing the vendored jar) >> >> >> On Wed, Oct 24, 2018 at 6:01 AM Maximilian Michels <m...@apache.org> >> wrote: >> >>> Would also keep it simple and optimize for the JAR size only if >>> necessary. >>> >>> On 24.10.18 00:06, Kenneth Knowles wrote: >>> > I think it makes sense for each vendored dependency to be >>> self-contained >>> > as much as possible. It should keep it fairly simple. Things that >>> cross >>> > their API surface cannot be hidden, of course. Jar size is not a >>> concern >>> > IMO. >>> > >>> > Kenn >>> > >>> > On Tue, Oct 23, 2018 at 9:05 AM Lukasz Cwik <lc...@google.com >>> > <mailto:lc...@google.com>> wrote: >>> > >>> > How should we handle the transitive dependencies of the things we >>> > want to vendor? >>> > >>> > For example we use gRPC which depends on Guava 20 and we also use >>> > Calcite which depends on Guava 19. >>> > >>> > Should the vendored gRPC/Calcite/... be self-contained so it >>> > contains all its dependencies, hence vendored gRPC would contain >>> > Guava 20 and vendored Calcite would contain Guava 19 (both under >>> > different namespaces)? >>> > This leads to larger jars but less vendored dependencies to >>> maintain. >>> > >>> > Or should we produce a vendored library for those that we want to >>> > share, e.g. Guava 20 that could be reused across multiple vendored >>> > libraries? >>> > Makes the vendoring process slightly more complicated, more >>> > dependencies to maintain, smaller jars. >>> > >>> > Or should we produce a vendored library for each dependency? >>> > Lots of vendoring needed, likely tooling required to be built to >>> > maintain this. >>> > >>> > >>> > >>> > >>> > On Tue, Oct 23, 2018 at 8:46 AM Kenneth Knowles <k...@google.com >>> > <mailto:k...@google.com>> wrote: >>> > >>> > I actually created the subtasks by finding things shaded by at >>> > least one module. I think each one should definitely have an on >>> > list discussion that clarifies the target artifact, namespace, >>> > version, possible complications, etc. >>> > >>> > My impression is that many many modules shade only Guava. So >>> for >>> > build time and simplification that is a big win. >>> > >>> > Kenn >>> > >>> > On Tue, Oct 23, 2018, 08:16 Thomas Weise <t...@apache.org >>> > <mailto:t...@apache.org>> wrote: >>> > >>> > +1 for separate artifacts >>> > >>> > I would request that we explicitly discuss and agree which >>> > dependencies we vendor though. >>> > >>> > Not everything listed in the JIRA subtasks is currently >>> > relocated. >>> > >>> > Thomas >>> > >>> > >>> > On Tue, Oct 23, 2018 at 8:04 AM David Morávek >>> > <david.mora...@gmail.com <mailto:david.mora...@gmail.com>> >>> > wrote: >>> > >>> > +1 This should improve build times a lot. It would be >>> > great if vendored deps could stay in the main >>> repository. >>> > >>> > D. >>> > >>> > On Tue, Oct 23, 2018 at 12:21 PM Maximilian Michels >>> > <m...@apache.org <mailto:m...@apache.org>> wrote: >>> > >>> > Looks great, Kenn! >>> > >>> > > Max: what is the story behind having a separate >>> > flink-shaded repo? Did that make it easier to >>> manage >>> > in some way? >>> > >>> > Better separation of concerns, but I don't think >>> > releasing the shaded >>> > artifacts from the main repo is a problem. I'd even >>> > prefer not to split >>> > up the repo because it makes updates to the >>> vendored >>> > dependencies >>> > slightly easier. >>> > >>> > On 23.10.18 03:25, Kenneth Knowles wrote: >>> > > OK, I've filed >>> > https://issues.apache.org/jira/browse/BEAM-5819 to >>> > > collect sub-tasks. This has enough upsides >>> > throughout lots of areas of >>> > > the project that even though it is not glamorous >>> > it seems pretty >>> > > valuable to start on immediately. And I want to >>> > find out if there's a >>> > > pitfall lurking. >>> > > >>> > > Max: what is the story behind having a separate >>> > flink-shaded repo? Did >>> > > that make it easier to manage in some way? >>> > > >>> > > Kenn >>> > > >>> > > On Mon, Oct 22, 2018 at 2:55 AM Maximilian >>> > Michels <m...@apache.org <mailto:m...@apache.org> >>> > > <mailto:m...@apache.org <mailto:m...@apache.org >>> >>> >>> > wrote: >>> > > >>> > > +1 for publishing vendored Jars >>> > independently. It will improve build >>> > > time and ease IntelliJ integration. >>> > > >>> > > Flink also publishes shaded dependencies >>> > separately: >>> > > >>> > > - https://github.com/apache/flink-shaded >>> > > - >>> > https://issues.apache.org/jira/browse/FLINK-6529 >>> > > >>> > > AFAIK their main motivation was to get rid >>> of >>> > duplicate shaded classes >>> > > on the classpath. We don't appear to have >>> > that problem because we >>> > > already have a separate "vendor" project. >>> > > >>> > > > - With shading, it is hard (impossible?) >>> > to step into dependency >>> > > code in IntelliJ's debugger, because the >>> > actual symbol at runtime >>> > > does not match what is in the external jars >>> > > >>> > > This would be solved by releasing the >>> sources >>> > of the shaded jars. >>> > > From a >>> > > legal perspective, this could be problematic >>> > as alluded to here: >>> > > >>> https://github.com/apache/flink-shaded/issues/25 >>> > > >>> > > -Max >>> > > >>> > > On 20.10.18 01:11, Lukasz Cwik wrote: >>> > > > I have tried several times to improve the >>> > build system and intellij >>> > > > integration and each attempt ended with >>> > little progress when dealing >>> > > > with vendored code. My latest attempt has >>> > been the most promising >>> > > where >>> > > > I take the vendored classes/jars and >>> > decompile them generating the >>> > > > source that Intellij can then use. I have >>> > a branch[1] that >>> > > demonstrates >>> > > > the idea. It works pretty well (and up >>> > until a change where we >>> > > started >>> > > > vendoring gRPC, was impractical to do. >>> > Instructions to try it out >>> > > are: >>> > > > >>> > > > // Clean up any remnants of prior >>> > builds/intellij projects >>> > > > git clean -fdx >>> > > > // Generated the source for >>> > vendored/shaded modules >>> > > > ./gradlew decompile >>> > > > >>> > > > // Remove the "generated" Java sources >>> for >>> > protos so they don't >>> > > conflict with the decompiled sources. >>> > > > rm -rf >>> > model/pipeline/build/generated/source/proto >>> > > > rm -rf >>> > model/job-management/build/generated/source/proto >>> > > > rm -rf >>> > model/fn-execution/build/generated/source/proto >>> > > > // Import the project into Intellij, most >>> > code completion now >>> > > works still some issues with a few classes. >>> > > > // Note that the Java decompiler doesn't >>> > generate valid source so >>> > > still need to delegate to Gradle for >>> > build/run/test actions >>> > > > // Other decompilers may do a >>> better/worse >>> > job but haven't tried >>> > > them. >>> > > > >>> > > > >>> > > > The problems that I face are that the >>> > generated Java source from the >>> > > > protos and the decompiled source from the >>> > compiled version of that >>> > > > source post shading are both being >>> > imported as content roots and >>> > > then >>> > > > conflict. Also, the CFR decompiler isn't >>> > producing valid source, if >>> > > > people could try others and report their >>> > mileage, we may find one >>> > > that >>> > > > works and then we would be able to use >>> > intellij to build/run our >>> > > code >>> > > > and not need to delegate all our >>> > build/run/test actions to Gradle. >>> > > > >>> > > > After all these attempts I have done, >>> > vendoring the dependencies >>> > > outside >>> > > > of the project seems like a sane approach >>> > and unless someone >>> > > wants to >>> > > > take a stab at the best progress I have >>> > made above, I would go >>> > > with what >>> > > > Kenn is suggesting even though it will >>> > mean that we will need to >>> > > perform >>> > > > releases every time we want to change the >>> > version of one of our >>> > > vendored >>> > > > dependencies. >>> > > > >>> > > > 1: >>> > >>> https://github.com/lukecwik/incubator-beam/tree/intellij >>> > > > >>> > > > >>> > > > On Fri, Oct 19, 2018 at 10:43 AM Kenneth >>> > Knowles <k...@apache.org <mailto:k...@apache.org> >>> > > <mailto:k...@apache.org <mailto: >>> k...@apache.org>> >>> > > > <mailto:k...@apache.org >>> > <mailto:k...@apache.org> <mailto:k...@apache.org >>> > <mailto:k...@apache.org>>>> wrote: >>> > > > >>> > > > Another reason to push on this is to >>> > get build times down. >>> > > Once only >>> > > > generated proto classes use the >>> shadow >>> > plugin we'll cut the build >>> > > > time in ~half? And there is no reason >>> > to constantly re-vendor. >>> > > > >>> > > > Kenn >>> > > > >>> > > > On Fri, Oct 19, 2018 at 10:39 AM >>> > Kenneth Knowles >>> > > <k...@google.com <mailto:k...@google.com> >>> > <mailto:k...@google.com <mailto:k...@google.com>> >>> > > > <mailto:k...@google.com >>> > <mailto:k...@google.com> <mailto:k...@google.com >>> > <mailto:k...@google.com>>>> wrote: >>> > > > >>> > > > Hi all, >>> > > > >>> > > > A while ago we had pretty good >>> > consensus that we should >>> > > vendor >>> > > > ("pre-shade") specific >>> > dependencies, and there's start on it >>> > > > here: >>> > https://github.com/apache/beam/tree/master/vendor >>> > > > >>> > > > IntelliJ notes: >>> > > > >>> > > > - With shading, it is hard >>> > (impossible?) to step into >>> > > > dependency code in IntelliJ's >>> > debugger, because the actual >>> > > > symbol at runtime does not match >>> > what is in the external jars >>> > > > >>> > > > >>> > > > Intellij can step through the classes if >>> > they were published >>> > > outside the >>> > > > project since it can decompile them. The >>> > source won't be legible. >>> > > > Decompiling the source as in my example >>> > effectively shows the >>> > > same issue. >>> > > > >>> > > > - With vendoring, if the >>> > vendored dependencies are part >>> > > of the >>> > > > project then IntelliJ gets >>> > confused because it operates on >>> > > > source, not the produced jars >>> > > > >>> > > > >>> > > > Yes, I tried several ways to get intellij >>> > to ignore the source >>> > > and use >>> > > > the output jars but no luck. >>> > > > >>> > > > The second one has a quick fix >>> for >>> > most cases*: don't >>> > > make the >>> > > > vendored dep a subproject, but >>> > just separately build and >>> > > publish >>> > > > it. Since a vendored dependency >>> > should change much more >>> > > > infrequently (or if we bake the >>> > version into the name, >>> > > ~never) >>> > > > this means we publish once and >>> > save headache and build >>> > > time forever. >>> > > > >>> > > > WDYT? Have I overlooked >>> something? >>> > How about we set up >>> > > vendored >>> > > > versions of guava, protobuf, >>> grpc, >>> > and publish them. We don't >>> > > > have to actually start using them >>> > yet, and can do it >>> > > incrementally. >>> > > > >>> > > > >>> > > > Currently we are relocating code >>> depending >>> > on the version string. >>> > > If the >>> > > > major version is >= 1, we use only the >>> > major version within the >>> > > package >>> > > > string and rely on semantic versioning >>> > provided by the dependency >>> > > to not >>> > > > break people. If the major version is 0, >>> > we assume the dependency is >>> > > > unstable and use the full version as part >>> > of the package string >>> > > during >>> > > > relocation. >>> > > > >>> > > > The downside of using the full version >>> > string for relocated packages: >>> > > > 1) Users will end up with multiple copies >>> > of dependencies that >>> > > differ >>> > > > only by the minor or patch version >>> > increasing the size of their >>> > > application. >>> > > > 2) Bumping up the version of a dependency >>> > now requires the import >>> > > > statement in all java files to be updated >>> > (not too difficult with >>> > > some >>> > > > sed/grep skills) >>> > > > >>> > > > The upside of using the full version >>> > string in the relocated package: >>> > > > 1) We don't have to worry about whether a >>> > dependency maintains >>> > > semantic >>> > > > versioning which means our users won't >>> > have to worry about that >>> > > either. >>> > > > 2) This increases the odds that a user >>> > will load multiple slightly >>> > > > different versions of the same dependency >>> > which is known to be >>> > > > incompatible in certain situations (e.g. >>> > Netty 4.1.25 can't be on >>> > > the >>> > > > classpath with Netty 4.1.28 even though >>> > they are both shaded due to >>> > > > issues of how JNI with tcnative works). >>> > > > >>> > > > >>> > > > (side note: what do other >>> projects >>> > like Flink do?) >>> > > > >>> > > > Kenn >>> > > > >>> > > > *for generated proto classes, >>> they >>> > need to be altered after >>> > > > being generated so shading >>> happens >>> > there, but actually only >>> > > > relocation and the shared >>> vendored >>> > dep should work >>> > > elsewhere in >>> > > > the project >>> > > > >>> > > > >>> > > > We could publish the protos and treat >>> them >>> > as "external" >>> > > dependencies >>> > > > within the Java projects which would also >>> > remove this pain point. >>> > > >>> > >>> >>