a map of list is fine and not a challenge we'll face long I hope ;)
Romain Manni-Bucau @rmannibucau <https://twitter.com/rmannibucau> | Blog <https://rmannibucau.metawerx.net/> | Old Blog <http://rmannibucau.wordpress.com> | Github <https://github.com/rmannibucau> | LinkedIn <https://www.linkedin.com/in/rmannibucau> 2018-02-01 9:40 GMT+01:00 Reuven Lax <re...@google.com>: > Not sure we'll be able to replace them all. Things like guava Table and > Multimap don't have great replacements in Java8. > > On Wed, Jan 31, 2018 at 10:11 PM, Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > >> +1, it was on my TODO for a while waiting the Java8 update. >> >> Regards >> JB >> >> On 02/01/2018 06:56 AM, Romain Manni-Bucau wrote: >> > Why not dropping guava for all beam codebase? With java 8 it is quite >> easy to do >> > it and avoid a bunch of conflicts. Did it in 2 projects with quite a >> good result. >> > >> > Le 1 févr. 2018 06:50, "Lukasz Cwik" <lc...@google.com >> > <mailto:lc...@google.com>> a écrit : >> > >> > Make sure to include the guava version in the artifact name so that >> we can >> > have multiple vendored versions. >> > >> > On Wed, Jan 31, 2018 at 9:16 PM, Kenneth Knowles <k...@google.com >> > <mailto:k...@google.com>> wrote: >> > >> > I didn't have time for this, but it just bit me. We definitely >> have >> > Guava on the API surface of runner support code in ways that get >> > incompatibly shaded. I will probably start "1a" by making a >> shaded >> > library org.apache.beam:vendored-guava and starting to use it. >> It sounds >> > like there is generally unanimous support for that much, anyhow. >> > >> > Kenn >> > >> > On Wed, Dec 13, 2017 at 8:31 AM, Aljoscha Krettek < >> aljos...@apache.org >> > <mailto:aljos...@apache.org>> wrote: >> > >> > Thanks Ismaël for bringing up this discussion again! >> > >> > I would be in favour of 1) and more specifically of 1a) >> > >> > Aljoscha >> > >> > >> >> On 12. Dec 2017, at 18:56, Lukasz Cwik <lc...@google.com >> >> <mailto:lc...@google.com>> wrote: >> >> >> >> You can always run tests on post shaded artifacts instead >> of the >> >> compiled classes, it just requires us to change our maven >> surefire >> >> / gradle test configurations but it is true that most IDEs >> would >> >> behave better with a dependency jar unless you delegate >> all the >> >> build/test actions to the build system and then it won't >> matter. >> >> >> >> On Mon, Dec 11, 2017 at 9:05 PM, Kenneth Knowles < >> k...@google.com >> >> <mailto:k...@google.com>> wrote: >> >> >> >> There's also, with additional overhead, >> >> >> >> 1a) A relocated and shipped package for each thing we >> want to >> >> relocate. I think this has also been tried outside >> Beam... >> >> >> >> Pros: >> >> * all the pros of 1) plus no bloat beyond what is >> necessary >> >> Cons: >> >> * abandons whitelist approach for public deps, >> reverting to >> >> blacklist approach for trouble things like guava, so a >> bit >> >> less principled >> >> >> >> For both 1) and 1a) I would add: >> >> >> >> Pros: >> >> * clearly readable dependency since code will `import >> >> org.apache.beam.private.guava21` and IDEs will >> understand it >> >> is a distinct lilbrary >> >> * can run tests on unpackaged classes, as long as deps >> are >> >> shaded or provided as jars >> >> * no mysterious action at a distance from inherited >> configuration >> >> Cons: >> >> * need to adjust imports >> >> >> >> Kenn >> >> >> >> On Mon, Dec 11, 2017 at 9:57 AM, Lukasz Cwik < >> lc...@google.com >> >> <mailto:lc...@google.com>> wrote: >> >> >> >> I would suggest that either we use: >> >> 1) A common deps package containing shaded >> dependencies >> >> allows for >> >> Pros >> >> * doesn't require the user to build an uber jar >> >> Risks >> >> * dependencies package will keep growing even if >> something >> >> is or isn't needed by all of Apache Beam leading >> to a >> >> large jar anyways negating any space savings >> >> >> >> 2) Shade within each module to a common location >> like >> >> org.apache.beam.relocated.guava.... >> >> Pros >> >> * you only get the shaded dependencies of the >> things that >> >> are required >> >> * its one less dependency for users to manage >> >> Risks >> >> * requires an uber jar to be built to get the space >> >> savings (either by a user or a distribution of >> Apache >> >> Beam) otherwise we negate any space savings. >> >> >> >> If we either use a common relocation scheme or a >> >> dependencies jar then each relocation should >> specifically >> >> contain the version number of the package because >> we would >> >> like to allow for us to be using two different >> versions of >> >> the same library. >> >> >> >> For the common deps package approach, should we >> check in >> >> code where the imports contain the relocated >> location >> >> (e.g. import org.apache.beam.guava.20.0.com >> >> <http://org.apache.beam.guava.20.0.com/>.google. >> common.collect.ImmutableList)? >> >> >> >> >> >> On Mon, Dec 11, 2017 at 8:47 AM, Jean-Baptiste >> Onofré >> >> <j...@nanthrax.net <mailto:j...@nanthrax.net>> wrote: >> >> >> >> Thanks for bringing that back. >> >> >> >> Indeed guava is shaded in different uber-jar. >> Maybe we >> >> can have a common deps module that we include >> once >> >> (but the user will have to explicitly define >> the dep) ? >> >> >> >> Basically, what do you propose for protobuf >> >> (unfortunately, I don't see an obvious) ? >> >> >> >> Regards >> >> JB >> >> >> >> >> >> On 12/11/2017 05:35 PM, Ismaël Mejía wrote: >> >> >> >> Hello, I wanted to bring back this subject >> because >> >> I think we should >> >> take action on this and at least first >> have a >> >> shaded version of guava. >> >> I was playing with a toy project and I did >> the >> >> procedure we use to >> >> submit jars to a Hadoop cluster via >> Flink/Spark >> >> which involves >> >> creating an uber jar and I realized that >> the size >> >> of the jar was way >> >> bigger than I expected, and the fact that >> we shade >> >> guava in every >> >> module contributes to this. I found guava >> shaded on: >> >> >> >> sdks/java/core >> >> runners/core-construction-java >> >> runners/core-java >> >> model/job-management >> >> runners/spark >> >> sdks/java/io/hadoop-file-system >> >> sdks/java/io/kafka >> >> >> >> This means at least 6 times more of the >> size it >> >> should which counts in >> >> around 15MB more (2.4MB*6 deps) of extra >> weight >> >> that we can simply >> >> reduce by using a shaded version of the >> library. >> >> >> >> I add this point to the previous ones >> mentioned >> >> during the discussion >> >> because this goes against the end user >> experience >> >> and affects us all >> >> (devs/users). >> >> >> >> Another question is if we should shade >> (and how) >> >> protocol buffers >> >> because now with the portability work we >> are >> >> exposing it closer to the >> >> end users. I say this because I also found >> an >> >> issue while running a >> >> job on YARN with the spark runner because >> >> hadoop-common includes >> >> protobuf-java 2 and I had to explicitly >> provide >> >> protocol-buffers 3 as >> >> a dependency to be able to use triggers >> (note the >> >> Spark runner >> >> translates them using some method from >> >> runners/core-java). Since >> >> hadoop-common is provided in the cluster >> with the >> >> older version of >> >> protobuf, I am afraid that this will bite >> us in >> >> the future. >> >> >> >> Ismaël >> >> >> >> ps. There is already a JIRA for that >> shading for >> >> protobuf on >> >> hadoop-common but this is not coming until >> version >> >> 3 is out. >> >> https://issues.apache.org/jir >> a/browse/HADOOP-13136 >> >> <https://issues.apache.org/ji >> ra/browse/HADOOP-13136> >> >> >> >> ps2. Extra curious situation is to see >> that the >> >> dataflow-runner ends >> >> up having guava shaded twice via its >> shaded version on >> >> core-construction-java. >> >> >> >> ps3. Of course this message means a >> de-facto +1 at >> >> least to do it for >> >> guava and evaluate it for other libs. >> >> >> >> >> >> On Tue, Oct 17, 2017 at 7:29 PM, Lukasz >> Cwik >> >> <lc...@google.com.invalid >> >> <mailto:lc...@google.com.invalid>> wrote: >> >> >> >> An issue to call out is how to deal >> with our >> >> generated code (.avro and >> >> .proto) as I don't believe those >> plugins allow >> >> you to generate code using a >> >> shaded package prefix on imports. >> >> >> >> On Tue, Oct 17, 2017 at 10:28 AM, >> Thomas Groh >> >> <tg...@google.com.invalid >> >> <mailto:tg...@google.com.invalid>> >> >> wrote: >> >> >> >> +1 to the goal. I'm hugely in >> favor of not >> >> doing the same shading work >> >> every time for dependencies we >> know we'll use. >> >> >> >> This also means that if we end up >> pulling >> >> in transitive dependencies we >> >> don't want in any particular >> module we can >> >> avoid having to adjust our >> >> repackaging strategy for that >> module - >> >> which I have run into face-first in >> >> the past. >> >> >> >> On Tue, Oct 17, 2017 at 9:48 AM, >> Kenneth >> >> Knowles <k...@google.com.invalid >> >> <mailto:k...@google.com.invalid>> >> >> wrote: >> >> >> >> Hi all, >> >> >> >> Shading is a big part of how >> we keep >> >> our dependencies sane in Beam. >> But >> >> downsides: shading is super >> slow, >> >> causes massive jar bloat, and >> kind of >> >> hard to get right because >> artifacts >> >> and namespaces are not 1-to-1. >> >> >> >> I know that some communities >> >> distribute their own shaded >> >> distributions of >> >> dependencies. I had a thought >> about >> >> doing something similar that I >> wanted >> >> to throw out there for people >> to poke >> >> holes in. >> >> >> >> To set the scene, here is how >> I view >> >> shading: >> >> >> >> - A module has public >> dependencies >> >> and private dependencies. >> >> - Public deps are used for >> data >> >> interchange; users must share >> these >> >> >> >> deps. >> >> >> >> - Private deps are just >> >> functionality and can be >> hidden (in >> >> our case, >> >> relocated + bundled) >> >> - It isn't necessarily that >> simple, >> >> because public and private deps >> >> >> >> might >> >> >> >> interact in higher-order ways >> >> ("public" is contagious) >> >> >> >> Shading is an implementation >> detail of >> >> expressing these >> characteristics. >> >> >> >> We >> >> >> >> use shading selectively >> because of its >> >> downsides I mentioned above. >> >> >> >> But what about this idea: >> Introduce >> >> shaded deps as a single >> separate >> >> artifact. >> >> >> >> - sdks/java/private-deps: >> bundled >> >> uber jar with relocated >> versions of >> >> everything we want to shade >> >> >> >> - sdks/java/core and >> >> sdks/java/harness: no >> relocation or >> >> bundling - >> >> depends on >> >> `beam-sdks-java-private-deps` >> and >> >> imports like >> >> `org.apache.beam.sdk.private. >> com.google.common` >> >> directly (this is what >> >> they >> >> are rewritten to >> >> >> >> Some benefits >> >> >> >> - much faster builds of >> other modules >> >> - only one shaded uber jar >> >> - rare/no rebuilds of the >> uber jar >> >> - can use maven enforcer to >> forbid >> >> imports like com.google.common >> >> - configuration all in one >> place >> >> - no automated rewriting of >> our real >> >> code, which has led to some >> major >> >> confusion >> >> - easy to implement >> incrementally >> >> >> >> Downsides: >> >> >> >> - plenty of effort work to >> get there >> >> - unclear how many different >> such >> >> deps modules we need; sharing >> them >> >> >> >> could >> >> >> >> get weird >> >> - if we hit a roadblock, we >> will >> >> have committed a lot of time >> >> >> >> Just something I was musing as >> I spent >> >> another evening waiting for >> slow >> >> builds to try to confirm >> changes to >> >> brittle poms. >> >> >> >> Kenn >> >> >> >> >> >> >> >> -- >> >> Jean-Baptiste Onofré >> >> jbono...@apache.org <mailto: >> jbono...@apache.org> >> >> http://blog.nanthrax.net < >> http://blog.nanthrax.net/> >> >> Talend - http://www.talend.com < >> http://www.talend.com/> >> >> >> >> >> >> >> >> >> > >> > >> > >> >> -- >> Jean-Baptiste Onofré >> jbono...@apache.org >> http://blog.nanthrax.net >> Talend - http://www.talend.com >> > >