+1, it was on my TODO for a while waiting the Java8 update. Regards JB
On 02/01/2018 06:56 AM, Romain Manni-Bucau wrote: > Why not dropping guava for all beam codebase? With java 8 it is quite easy to > do > it and avoid a bunch of conflicts. Did it in 2 projects with quite a good > result. > > Le 1 févr. 2018 06:50, "Lukasz Cwik" <lc...@google.com > <mailto:lc...@google.com>> a écrit : > > Make sure to include the guava version in the artifact name so that we can > have multiple vendored versions. > > On Wed, Jan 31, 2018 at 9:16 PM, Kenneth Knowles <k...@google.com > <mailto:k...@google.com>> wrote: > > I didn't have time for this, but it just bit me. We definitely have > Guava on the API surface of runner support code in ways that get > incompatibly shaded. I will probably start "1a" by making a shaded > library org.apache.beam:vendored-guava and starting to use it. It > sounds > like there is generally unanimous support for that much, anyhow. > > Kenn > > On Wed, Dec 13, 2017 at 8:31 AM, Aljoscha Krettek <aljos...@apache.org > <mailto:aljos...@apache.org>> wrote: > > Thanks Ismaël for bringing up this discussion again! > > I would be in favour of 1) and more specifically of 1a) > > Aljoscha > > >> On 12. Dec 2017, at 18:56, Lukasz Cwik <lc...@google.com >> <mailto:lc...@google.com>> wrote: >> >> You can always run tests on post shaded artifacts instead of the >> compiled classes, it just requires us to change our maven >> surefire >> / gradle test configurations but it is true that most IDEs would >> behave better with a dependency jar unless you delegate all the >> build/test actions to the build system and then it won't matter. >> >> On Mon, Dec 11, 2017 at 9:05 PM, Kenneth Knowles <k...@google.com >> <mailto:k...@google.com>> wrote: >> >> There's also, with additional overhead, >> >> 1a) A relocated and shipped package for each thing we want to >> relocate. I think this has also been tried outside Beam... >> >> Pros: >> * all the pros of 1) plus no bloat beyond what is necessary >> Cons: >> * abandons whitelist approach for public deps, reverting to >> blacklist approach for trouble things like guava, so a bit >> less principled >> >> For both 1) and 1a) I would add: >> >> Pros: >> * clearly readable dependency since code will `import >> org.apache.beam.private.guava21` and IDEs will understand it >> is a distinct lilbrary >> * can run tests on unpackaged classes, as long as deps are >> shaded or provided as jars >> * no mysterious action at a distance from inherited >> configuration >> Cons: >> * need to adjust imports >> >> Kenn >> >> On Mon, Dec 11, 2017 at 9:57 AM, Lukasz Cwik >> <lc...@google.com >> <mailto:lc...@google.com>> wrote: >> >> I would suggest that either we use: >> 1) A common deps package containing shaded dependencies >> allows for >> Pros >> * doesn't require the user to build an uber jar >> Risks >> * dependencies package will keep growing even if >> something >> is or isn't needed by all of Apache Beam leading to a >> large jar anyways negating any space savings >> >> 2) Shade within each module to a common location like >> org.apache.beam.relocated.guava.... >> Pros >> * you only get the shaded dependencies of the things that >> are required >> * its one less dependency for users to manage >> Risks >> * requires an uber jar to be built to get the space >> savings (either by a user or a distribution of Apache >> Beam) otherwise we negate any space savings. >> >> If we either use a common relocation scheme or a >> dependencies jar then each relocation should specifically >> contain the version number of the package because we >> would >> like to allow for us to be using two different versions >> of >> the same library. >> >> For the common deps package approach, should we check in >> code where the imports contain the relocated location >> (e.g. import org.apache.beam.guava.20.0.com >> >> <http://org.apache.beam.guava.20.0.com/>.google.common.collect.ImmutableList)? >> >> >> On Mon, Dec 11, 2017 at 8:47 AM, Jean-Baptiste Onofré >> <j...@nanthrax.net <mailto:j...@nanthrax.net>> wrote: >> >> Thanks for bringing that back. >> >> Indeed guava is shaded in different uber-jar. Maybe >> we >> can have a common deps module that we include once >> (but the user will have to explicitly define the >> dep) ? >> >> Basically, what do you propose for protobuf >> (unfortunately, I don't see an obvious) ? >> >> Regards >> JB >> >> >> On 12/11/2017 05:35 PM, Ismaël Mejía wrote: >> >> Hello, I wanted to bring back this subject >> because >> I think we should >> take action on this and at least first have a >> shaded version of guava. >> I was playing with a toy project and I did the >> procedure we use to >> submit jars to a Hadoop cluster via Flink/Spark >> which involves >> creating an uber jar and I realized that the size >> of the jar was way >> bigger than I expected, and the fact that we >> shade >> guava in every >> module contributes to this. I found guava shaded >> on: >> >> sdks/java/core >> runners/core-construction-java >> runners/core-java >> model/job-management >> runners/spark >> sdks/java/io/hadoop-file-system >> sdks/java/io/kafka >> >> This means at least 6 times more of the size it >> should which counts in >> around 15MB more (2.4MB*6 deps) of extra weight >> that we can simply >> reduce by using a shaded version of the library. >> >> I add this point to the previous ones mentioned >> during the discussion >> because this goes against the end user experience >> and affects us all >> (devs/users). >> >> Another question is if we should shade (and how) >> protocol buffers >> because now with the portability work we are >> exposing it closer to the >> end users. I say this because I also found an >> issue while running a >> job on YARN with the spark runner because >> hadoop-common includes >> protobuf-java 2 and I had to explicitly provide >> protocol-buffers 3 as >> a dependency to be able to use triggers (note the >> Spark runner >> translates them using some method from >> runners/core-java). Since >> hadoop-common is provided in the cluster with the >> older version of >> protobuf, I am afraid that this will bite us in >> the future. >> >> Ismaël >> >> ps. There is already a JIRA for that shading for >> protobuf on >> hadoop-common but this is not coming until >> version >> 3 is out. >> >> https://issues.apache.org/jira/browse/HADOOP-13136 >> >> <https://issues.apache.org/jira/browse/HADOOP-13136> >> >> ps2. Extra curious situation is to see that the >> dataflow-runner ends >> up having guava shaded twice via its shaded >> version on >> core-construction-java. >> >> ps3. Of course this message means a de-facto +1 >> at >> least to do it for >> guava and evaluate it for other libs. >> >> >> On Tue, Oct 17, 2017 at 7:29 PM, Lukasz Cwik >> <lc...@google.com.invalid >> <mailto:lc...@google.com.invalid>> wrote: >> >> An issue to call out is how to deal with our >> generated code (.avro and >> .proto) as I don't believe those plugins >> allow >> you to generate code using a >> shaded package prefix on imports. >> >> On Tue, Oct 17, 2017 at 10:28 AM, Thomas Groh >> <tg...@google.com.invalid >> <mailto:tg...@google.com.invalid>> >> wrote: >> >> +1 to the goal. I'm hugely in favor of >> not >> doing the same shading work >> every time for dependencies we know >> we'll use. >> >> This also means that if we end up pulling >> in transitive dependencies we >> don't want in any particular module we >> can >> avoid having to adjust our >> repackaging strategy for that module - >> which I have run into face-first in >> the past. >> >> On Tue, Oct 17, 2017 at 9:48 AM, Kenneth >> Knowles <k...@google.com.invalid >> <mailto:k...@google.com.invalid>> >> wrote: >> >> Hi all, >> >> Shading is a big part of how we keep >> our dependencies sane in Beam. But >> downsides: shading is super slow, >> causes massive jar bloat, and kind of >> hard to get right because artifacts >> and namespaces are not 1-to-1. >> >> I know that some communities >> distribute their own shaded >> distributions of >> dependencies. I had a thought about >> doing something similar that I wanted >> to throw out there for people to poke >> holes in. >> >> To set the scene, here is how I view >> shading: >> >> - A module has public dependencies >> and private dependencies. >> - Public deps are used for data >> interchange; users must share these >> >> deps. >> >> - Private deps are just >> functionality and can be hidden (in >> our case, >> relocated + bundled) >> - It isn't necessarily that simple, >> because public and private deps >> >> might >> >> interact in higher-order ways >> ("public" is contagious) >> >> Shading is an implementation detail >> of >> expressing these characteristics. >> >> We >> >> use shading selectively because of >> its >> downsides I mentioned above. >> >> But what about this idea: Introduce >> shaded deps as a single separate >> artifact. >> >> - sdks/java/private-deps: bundled >> uber jar with relocated versions of >> everything we want to shade >> >> - sdks/java/core and >> sdks/java/harness: no relocation or >> bundling - >> depends on >> `beam-sdks-java-private-deps` and >> imports like >> >> `org.apache.beam.sdk.private.com.google.common` >> directly (this is what >> they >> are rewritten to >> >> Some benefits >> >> - much faster builds of other >> modules >> - only one shaded uber jar >> - rare/no rebuilds of the uber jar >> - can use maven enforcer to forbid >> imports like com.google.common >> - configuration all in one place >> - no automated rewriting of our >> real >> code, which has led to some major >> confusion >> - easy to implement incrementally >> >> Downsides: >> >> - plenty of effort work to get >> there >> - unclear how many different such >> deps modules we need; sharing them >> >> could >> >> get weird >> - if we hit a roadblock, we will >> have committed a lot of time >> >> Just something I was musing as I >> spent >> another evening waiting for slow >> builds to try to confirm changes to >> brittle poms. >> >> Kenn >> >> >> >> -- >> Jean-Baptiste Onofré >> jbono...@apache.org <mailto:jbono...@apache.org> >> http://blog.nanthrax.net <http://blog.nanthrax.net/> >> Talend - http://www.talend.com >> <http://www.talend.com/> >> >> >> >> > > > -- Jean-Baptiste Onofré jbono...@apache.org http://blog.nanthrax.net Talend - http://www.talend.com