+1 on including dependency version. -------- Original message --------From: Stephan Ewen <se...@apache.org> Date: 6/26/17 5:01 AM (GMT-08:00) To: dev@flink.apache.org Subject: Re: [DISCUSS] Changing Flink's shading model Looks good, thanks Chesnay!.
How about including the dependency version names in the module names, like "flink-shaded-guava-18.0"? On Mon, Jun 26, 2017 at 11:49 AM, Chesnay Schepler <ches...@apache.org> wrote: > The new repo was created and is accessible here: > https://github.com/apache/flink-shaded > > I've already opened a PR to add a shaded guava module. > > Once the shaded-guava module is merged I would like to do a first release > of flink-shaded, > only containing guava. I already have a branch with all the changes > necessary to > integrate this dependency into Flink. > > The alternative would be to first create shaded modules for all > dependencies and make a > single release, but I feel that would delay things quite a bit. > > > > On 21.06.2017 17:00, Robert Metzger wrote: > >> Okay, I'll request a repo for the shading. >> >> On Wed, Jun 21, 2017 at 1:38 PM, Chesnay Schepler <ches...@apache.org> >> wrote: >> >> I like your suggestion Robert. A lot actually. >>> >>> Having the actual dependency version (i.e 18 for guava) in the version >>> should improve clarity a lot. >>> >>> Originally i intended to release 1 artifact per Flink version, with the >>> normal versioning scheme >>> that we use. But given that the shaded dependencies aren't changed often >>> (even rarely might be a stretch), >>> and aren't actually coupled to the Flink release cycle this doesn't make >>> a >>> lot of sense. >>> >>> Having separate repos looks like a reasonable separation of concerns. The >>> release for Flink itself >>> would work just like it does now; we don't have to modify any scripts or >>> do extra steps. >>> >>> Since the build, release and development process are separate (since >>> flink-shaded isn't part of Flink build >>> process, has a separate release process and changes to it will /never >>> /require immediate changes to Flink) >>> it seems like a very good candidate to move it into a separate repo. >>> >>> >>> On 21.06.2017 11:26, Robert Metzger wrote: >>> >>> Its not completely clear to me how we want to version the shaded >>>> dependencies, and where we are putting them. >>>> >>>> One concern are the official apache release rules. If we want to release >>>> something to maven central, we need to do a proper vote over a source >>>> archive. >>>> I would propose to create a new repository "flink-shaded.git" that >>>> contains >>>> the following maven module structure: >>>> - flink-shaded: 1 >>>> - flink-shaded-asm: 1-5.2 >>>> - flink-shaded-guava: 1-18.0 >>>> - ... >>>> >>>> The number indicates the version (for ASM, I've just guessed). >>>> The version for the parent "flink-shaded" needs to be updated on each >>>> parent pom change (new module added, new / changed plugins, ...) >>>> >>>> We could create a separate release script in this repository that >>>> creates >>>> the flink-shaded-src.zip from the code and deploys the artifacts to the >>>> maven staging area. >>>> >>>> The advantage of a separate repo would be that we don't need to maintain >>>> separate maven projects in the same git repo. >>>> Also, the src archives for the release vote can be created from the repo >>>> content (without much filtering). >>>> >>>> >>>> On Tue, Jun 20, 2017 at 9:44 PM, Stephan Ewen <se...@apache.org> wrote: >>>> >>>> I like this approach. >>>> >>>>> Two additional things can be mention here: >>>>> >>>>> - We need to deploy these artifacts independently and not as part >>>>> of >>>>> the >>>>> build. That is a manual step once per "bump" in the dependency of that >>>>> library. >>>>> >>>>> - We reduce the shading complexity of the original build and should >>>>> thus >>>>> also speed up build times :-) >>>>> >>>>> Stephan >>>>> >>>>> >>>>> >>>>> >>>>> On Tue, Jun 20, 2017 at 1:15 PM, Chesnay Schepler <ches...@apache.org> >>>>> wrote: >>>>> >>>>> I would like to start working on this. >>>>> >>>>>> I've looked into adding a flink-shaded-guava module. Working against >>>>>> the >>>>>> shaded namespaces seems >>>>>> to work without problems from the IDE, and we could forbid un-shaded >>>>>> usages with checkstyle. >>>>>> >>>>>> So for the list of dependencies that we want to shade we currently >>>>>> got: >>>>>> >>>>>> * asm >>>>>> * guava >>>>>> * netty >>>>>> * hadoop >>>>>> * curator >>>>>> >>>>>> I've had a chat with Stephan Ewan and he brought up kryo + chill as >>>>>> well. >>>>>> >>>>>> The nice thing is that we can do this incrementally, one dependency >>>>>> at a >>>>>> time. As such i would propose >>>>>> to go through the whole process for guava and see what problems arise. >>>>>> >>>>>> This would include adding a flink-shaded module and a child >>>>>> flink-shaded-guava module to the flink repository >>>>>> that are not part of the build process, replacing all usages of guava >>>>>> in >>>>>> Flink, adding the >>>>>> checkstyle rule (optional) and deploying the artifact to maven >>>>>> central. >>>>>> >>>>>> >>>>>> On 11.05.2017 10:54, Stephan Ewen wrote: >>>>>> >>>>>> @Ufuk - I have never set up artifact deployment in Maven, could need >>>>>> some >>>>>> help there. >>>>>> >>>>>>> Regarding shading Netty, I agree, would be good to do that as well... >>>>>>> >>>>>>> On Thu, May 11, 2017 at 10:52 AM, Ufuk Celebi <u...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>> The advantages you've listed sound really compelling to me. >>>>>>> >>>>>>> - Do you have time to implement these changes or do we need a >>>>>>>> >>>>>>>> volunteer? >>>>>>> >>>>>> ;) >>>>>> >>>>>>> - I assume that republishing the artifacts as you propose doesn't >>>>>>>> have >>>>>>>> any new legal implications since we already publish them with our >>>>>>>> JARs, right? >>>>>>>> >>>>>>>> - We might think about adding Netty to the list of shaded artifacts >>>>>>>> since some dependency conflicts were reported recently. Would have >>>>>>>> to >>>>>>>> double check the reported issues before doing that though. ;-) >>>>>>>> >>>>>>>> – Ufuk >>>>>>>> >>>>>>>> >>>>>>>> On Wed, May 10, 2017 at 8:45 PM, Stephan Ewen <se...@apache.org> >>>>>>>> >>>>>>>> wrote: >>>>>>> >>>>>> @chesnay: I used ASM as an example in the proposal. Maybe I did not >>>>>> >>>>>>> say >>>>>>>> >>>>>>> that clearly. >>>>>> >>>>>>> If we like that approach, we should deal with the other libraries (at >>>>>>>>> >>>>>>>>> least >>>>>>>>> >>>>>>>> the frequently used ones) in the same way. >>>>>>>> >>>>>>>>> >>>>>>>>> I would imagine to have a project layout like that: >>>>>>>>> >>>>>>>>> flink-shaded-deps >>>>>>>>> - flink-shaded-asm >>>>>>>>> - flink-shaded-guava >>>>>>>>> - flink-shaded-curator >>>>>>>>> - flink-shaded-hadoop >>>>>>>>> >>>>>>>>> >>>>>>>>> "flink-shaded-deps" would not be built every time (and not be >>>>>>>>> released >>>>>>>>> every time), but only when needed. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, May 10, 2017 at 7:28 PM, Chesnay Schepler < >>>>>>>>> ches...@apache.org >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> I like the idea, thank you for bringing it up. >>>>>>>>> >>>>>>>>> Given that the raised problems aren't really ASM specific would it >>>>>>>>>> >>>>>>>>>> make >>>>>>>>> >>>>>>>> sense to create one flink-shaded module that contains all frequently >>>>>> >>>>>>> shaded >>>>>>>>>> >>>>>>>>> libraries? (or maybe even all shaded dependencies by core modules) >>>>>>>>> The >>>>>>>>> >>>>>>>>> proposal limits the scope of this to ASM and i was wondering why. >>>>>>>>>> >>>>>>>>>> I also remember that there was a discussion recently about why we >>>>>>>>>> >>>>>>>>>> shade >>>>>>>>> >>>>>>>> things at all, and the idea of working against the shaded namespaces >>>>>> >>>>>>> was >>>>>>>>>> brought up. Back then i was expressing doubts as to whether IDE's >>>>>>>>>> >>>>>>>>>> would >>>>>>>>> >>>>>>>> properly support this; what's the state on that? >>>>>> >>>>>>> On 10.05.2017 18:18, Stephan Ewen wrote: >>>>>>>>>> >>>>>>>>>> Hi! >>>>>>>>>> >>>>>>>>>> This is a discussion about altering the way we handle dependencies >>>>>>>>>>> >>>>>>>>>>> and >>>>>>>>>> >>>>>>>>> shading in Flink. >>>>>> >>>>>>> I ran into quite a view problems trying to adjust / fix some >>>>>>>>>>> shading >>>>>>>>>>> issues >>>>>>>>>>> during release validation. >>>>>>>>>>> >>>>>>>>>>> The issue is tracked under: https://issues.apache.org/jira >>>>>>>>>>> /browse/FLINK-6529 >>>>>>>>>>> Bring this discussion thread up because it is a bigger issue >>>>>>>>>>> >>>>>>>>>>> *Problem* >>>>>>>>>>> >>>>>>>>>>> Currently, Flink shades dependencies like ASM and Guava into all >>>>>>>>>>> >>>>>>>>>>> jars >>>>>>>>>> >>>>>>>>> of >>>>>> >>>>>>> projects that reference it and relocate the classes. >>>>>>>>> >>>>>>>>> There are some drawbacks to that approach, let's discuss them at >>>>>>>>>> the >>>>>>>>>> >>>>>>>>>>> example of ASM: >>>>>>>>>>> >>>>>>>>>>> - The ASM classes are for example in flink-core, >>>>>>>>>>> flink-java, >>>>>>>>>>> flink-scala, >>>>>>>>>>> flink-runtime, etc. >>>>>>>>>>> >>>>>>>>>>> - Users that reference these dependencies have the classes >>>>>>>>>>> multiple >>>>>>>>>>> times >>>>>>>>>>> in the classpath. That is unclean (works, through, because the >>>>>>>>>>> >>>>>>>>>>> classes >>>>>>>>>> >>>>>>>>> are >>>>>> >>>>>>> identical). The same happens when building the final dist. jar. >>>>>>>>> >>>>>>>>> - Some of these dependencies require to include license files >>>>>>>>>> >>>>>>>>>>> in >>>>>>>>>>> the >>>>>>>>>>> shaded jar. It is hard to impossible to build a good automatic >>>>>>>>>>> solution >>>>>>>>>>> for >>>>>>>>>>> that, partly due to Maven's very poor cross-project path support >>>>>>>>>>> >>>>>>>>>>> - Most importantly: Scala does not support shading really >>>>>>>>>>> well. >>>>>>>>>>> >>>>>>>>>>> Scala >>>>>>>>>>> >>>>>>>>>> classes have references to classes in more places than just the >>>>>>>>> class >>>>>>>>> >>>>>>>>> names >>>>>>>>>> >>>>>>>>>>> (apparently for Scala reflect support). Referencing a Scala >>>>>>>>>>> project >>>>>>>>>>> >>>>>>>>>>> with >>>>>>>>>>> >>>>>>>>>> shaded ASM still requires to add a reference to unshaded ASM (at >>>>>>>>> least >>>>>>>>> >>>>>>>>> as >>>>>>>>>> >>>>>>>>>> a >>>>>>>>> >>>>>>>>> compile dependency). >>>>>>>>>> >>>>>>>>>>> *Proposal* >>>>>>>>>>> >>>>>>>>>>> I propose that we build and deploy a asm-flink-shaded version of >>>>>>>>>>> ASM >>>>>>>>>>> >>>>>>>>>>> and >>>>>>>>>>> >>>>>>>>>> directly program against the relocated namespaces. Since we never >>>>>>>>> use >>>>>>>>> >>>>>>>>> classes that we relocate in public interfaces, Flink users will >>>>>>>>>> never >>>>>>>>>> >>>>>>>>> see >>>>>> >>>>>>> the relocated class names. Internally, it does not hurt to use them. >>>>>>>>> >>>>>>>>> - Proper maven dependency management, no hidden (shaded) >>>>>>>>>> >>>>>>>>>>> dependencies >>>>>>>>>>> >>>>>>>>>> - One copy of each class for shaded dependencies >>>>>>>>> >>>>>>>>> - Proper Scala interoperability >>>>>>>>>> >>>>>>>>>>> - Natural License management (license is part of deployed >>>>>>>>>>> asm-flink-shaded jar) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Happy to hear thoughts! >>>>>>>>>>> >>>>>>>>>>> Stephan >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >