Okay, I'll request a repo for the shading. On Wed, Jun 21, 2017 at 1:38 PM, Chesnay Schepler <ches...@apache.org> wrote:
> I like your suggestion Robert. A lot actually. > > Having the actual dependency version (i.e 18 for guava) in the version > should improve clarity a lot. > > Originally i intended to release 1 artifact per Flink version, with the > normal versioning scheme > that we use. But given that the shaded dependencies aren't changed often > (even rarely might be a stretch), > and aren't actually coupled to the Flink release cycle this doesn't make a > lot of sense. > > Having separate repos looks like a reasonable separation of concerns. The > release for Flink itself > would work just like it does now; we don't have to modify any scripts or > do extra steps. > > Since the build, release and development process are separate (since > flink-shaded isn't part of Flink build > process, has a separate release process and changes to it will /never > /require immediate changes to Flink) > it seems like a very good candidate to move it into a separate repo. > > > On 21.06.2017 11:26, Robert Metzger wrote: > >> Its not completely clear to me how we want to version the shaded >> dependencies, and where we are putting them. >> >> One concern are the official apache release rules. If we want to release >> something to maven central, we need to do a proper vote over a source >> archive. >> I would propose to create a new repository "flink-shaded.git" that >> contains >> the following maven module structure: >> - flink-shaded: 1 >> - flink-shaded-asm: 1-5.2 >> - flink-shaded-guava: 1-18.0 >> - ... >> >> The number indicates the version (for ASM, I've just guessed). >> The version for the parent "flink-shaded" needs to be updated on each >> parent pom change (new module added, new / changed plugins, ...) >> >> We could create a separate release script in this repository that creates >> the flink-shaded-src.zip from the code and deploys the artifacts to the >> maven staging area. >> >> The advantage of a separate repo would be that we don't need to maintain >> separate maven projects in the same git repo. >> Also, the src archives for the release vote can be created from the repo >> content (without much filtering). >> >> >> On Tue, Jun 20, 2017 at 9:44 PM, Stephan Ewen <se...@apache.org> wrote: >> >> I like this approach. >>> >>> Two additional things can be mention here: >>> >>> - We need to deploy these artifacts independently and not as part of >>> the >>> build. That is a manual step once per "bump" in the dependency of that >>> library. >>> >>> - We reduce the shading complexity of the original build and should >>> thus >>> also speed up build times :-) >>> >>> Stephan >>> >>> >>> >>> >>> On Tue, Jun 20, 2017 at 1:15 PM, Chesnay Schepler <ches...@apache.org> >>> wrote: >>> >>> I would like to start working on this. >>>> >>>> I've looked into adding a flink-shaded-guava module. Working against the >>>> shaded namespaces seems >>>> to work without problems from the IDE, and we could forbid un-shaded >>>> usages with checkstyle. >>>> >>>> So for the list of dependencies that we want to shade we currently got: >>>> >>>> * asm >>>> * guava >>>> * netty >>>> * hadoop >>>> * curator >>>> >>>> I've had a chat with Stephan Ewan and he brought up kryo + chill as >>>> well. >>>> >>>> The nice thing is that we can do this incrementally, one dependency at a >>>> time. As such i would propose >>>> to go through the whole process for guava and see what problems arise. >>>> >>>> This would include adding a flink-shaded module and a child >>>> flink-shaded-guava module to the flink repository >>>> that are not part of the build process, replacing all usages of guava in >>>> Flink, adding the >>>> checkstyle rule (optional) and deploying the artifact to maven central. >>>> >>>> >>>> On 11.05.2017 10:54, Stephan Ewen wrote: >>>> >>>> @Ufuk - I have never set up artifact deployment in Maven, could need >>>>> >>>> some >>> >>>> help there. >>>>> >>>>> Regarding shading Netty, I agree, would be good to do that as well... >>>>> >>>>> On Thu, May 11, 2017 at 10:52 AM, Ufuk Celebi <u...@apache.org> wrote: >>>>> >>>>> The advantages you've listed sound really compelling to me. >>>>> >>>>>> - Do you have time to implement these changes or do we need a >>>>>> >>>>> volunteer? >>> >>>> ;) >>>>>> >>>>>> - I assume that republishing the artifacts as you propose doesn't have >>>>>> any new legal implications since we already publish them with our >>>>>> JARs, right? >>>>>> >>>>>> - We might think about adding Netty to the list of shaded artifacts >>>>>> since some dependency conflicts were reported recently. Would have to >>>>>> double check the reported issues before doing that though. ;-) >>>>>> >>>>>> – Ufuk >>>>>> >>>>>> >>>>>> On Wed, May 10, 2017 at 8:45 PM, Stephan Ewen <se...@apache.org> >>>>>> >>>>> wrote: >>> >>>> @chesnay: I used ASM as an example in the proposal. Maybe I did not >>>>>>> >>>>>> say >>> >>>> that clearly. >>>>>>> >>>>>>> If we like that approach, we should deal with the other libraries (at >>>>>>> >>>>>>> least >>>>>> >>>>>> the frequently used ones) in the same way. >>>>>>> >>>>>>> >>>>>>> I would imagine to have a project layout like that: >>>>>>> >>>>>>> flink-shaded-deps >>>>>>> - flink-shaded-asm >>>>>>> - flink-shaded-guava >>>>>>> - flink-shaded-curator >>>>>>> - flink-shaded-hadoop >>>>>>> >>>>>>> >>>>>>> "flink-shaded-deps" would not be built every time (and not be >>>>>>> released >>>>>>> every time), but only when needed. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, May 10, 2017 at 7:28 PM, Chesnay Schepler < >>>>>>> ches...@apache.org >>>>>>> wrote: >>>>>>> >>>>>>> I like the idea, thank you for bringing it up. >>>>>>> >>>>>>>> Given that the raised problems aren't really ASM specific would it >>>>>>>> >>>>>>> make >>> >>>> sense to create one flink-shaded module that contains all frequently >>>>>>>> >>>>>>>> shaded >>>>>>> libraries? (or maybe even all shaded dependencies by core modules) >>>>>>> The >>>>>>> >>>>>>>> proposal limits the scope of this to ASM and i was wondering why. >>>>>>>> >>>>>>>> I also remember that there was a discussion recently about why we >>>>>>>> >>>>>>> shade >>> >>>> things at all, and the idea of working against the shaded namespaces >>>>>>>> was >>>>>>>> brought up. Back then i was expressing doubts as to whether IDE's >>>>>>>> >>>>>>> would >>> >>>> properly support this; what's the state on that? >>>>>>>> >>>>>>>> On 10.05.2017 18:18, Stephan Ewen wrote: >>>>>>>> >>>>>>>> Hi! >>>>>>>> >>>>>>>>> This is a discussion about altering the way we handle dependencies >>>>>>>>> >>>>>>>> and >>> >>>> shading in Flink. >>>>>>>>> I ran into quite a view problems trying to adjust / fix some >>>>>>>>> shading >>>>>>>>> issues >>>>>>>>> during release validation. >>>>>>>>> >>>>>>>>> The issue is tracked under: https://issues.apache.org/jira >>>>>>>>> /browse/FLINK-6529 >>>>>>>>> Bring this discussion thread up because it is a bigger issue >>>>>>>>> >>>>>>>>> *Problem* >>>>>>>>> >>>>>>>>> Currently, Flink shades dependencies like ASM and Guava into all >>>>>>>>> >>>>>>>> jars >>> >>>> of >>>>>>>> >>>>>>> projects that reference it and relocate the classes. >>>>>>> >>>>>>>> There are some drawbacks to that approach, let's discuss them at the >>>>>>>>> example of ASM: >>>>>>>>> >>>>>>>>> - The ASM classes are for example in flink-core, flink-java, >>>>>>>>> flink-scala, >>>>>>>>> flink-runtime, etc. >>>>>>>>> >>>>>>>>> - Users that reference these dependencies have the classes >>>>>>>>> multiple >>>>>>>>> times >>>>>>>>> in the classpath. That is unclean (works, through, because the >>>>>>>>> >>>>>>>> classes >>> >>>> are >>>>>>>> >>>>>>> identical). The same happens when building the final dist. jar. >>>>>>> >>>>>>>> - Some of these dependencies require to include license files >>>>>>>>> in >>>>>>>>> the >>>>>>>>> shaded jar. It is hard to impossible to build a good automatic >>>>>>>>> solution >>>>>>>>> for >>>>>>>>> that, partly due to Maven's very poor cross-project path support >>>>>>>>> >>>>>>>>> - Most importantly: Scala does not support shading really >>>>>>>>> well. >>>>>>>>> >>>>>>>>> Scala >>>>>>>> >>>>>>> classes have references to classes in more places than just the class >>>>>>> >>>>>>>> names >>>>>>>>> (apparently for Scala reflect support). Referencing a Scala project >>>>>>>>> >>>>>>>>> with >>>>>>>> >>>>>>> shaded ASM still requires to add a reference to unshaded ASM (at >>>>>>> least >>>>>>> >>>>>>>> as >>>>>>>> >>>>>>> a >>>>>>> >>>>>>>> compile dependency). >>>>>>>>> >>>>>>>>> *Proposal* >>>>>>>>> >>>>>>>>> I propose that we build and deploy a asm-flink-shaded version of >>>>>>>>> ASM >>>>>>>>> >>>>>>>>> and >>>>>>>> >>>>>>> directly program against the relocated namespaces. Since we never use >>>>>>> >>>>>>>> classes that we relocate in public interfaces, Flink users will >>>>>>>>> >>>>>>>> never >>> >>>> see >>>>>>>> >>>>>>> the relocated class names. Internally, it does not hurt to use them. >>>>>>> >>>>>>>> - Proper maven dependency management, no hidden (shaded) >>>>>>>>> >>>>>>>>> dependencies >>>>>>>> >>>>>>> - One copy of each class for shaded dependencies >>>>>>> >>>>>>>> - Proper Scala interoperability >>>>>>>>> >>>>>>>>> - Natural License management (license is part of deployed >>>>>>>>> asm-flink-shaded jar) >>>>>>>>> >>>>>>>>> >>>>>>>>> Happy to hear thoughts! >>>>>>>>> >>>>>>>>> Stephan >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >