Another couple: - User-facing TypeDescriptor is a very thin wrapper on Guava's TypeToken - ImmutableList and friends and their builders are very widely used and IMO still add a lot for readability and preventing someone coming along and adding mistakes to a codebase
So considering it all, I would keep a vendored Guava (but also move off where we can, and also have our own improvements). I hope it will be a near-empty build file to generate it so not a maintenance burden. Kenn On Thu, Feb 1, 2018 at 8:44 PM, Kenneth Knowles <[email protected]> wrote: > Nice. This sounds like a great idea (or two?) and goes along with what I > just started for futures. > > Guava: filed https://issues.apache.org/jira/browse/BEAM-3606 and assigned > to Ismaël :-) and converted my futures thing to a subtask. > > Specific things for our micro guava: > > - checkArgumentNotNull can throw the right exception > - our own Optional because Java's is not Serializable > - futures combinators since many are missing, especially Java's don't do > exceptions right > > Protobuf: didn't file an issue because I'm not sure > > I was wondering if pre-shading works. We really need to get rid of it from > public APIs in a 100% reliable way. It is also a problem for Dataflow. I > was wondering if one approach is to pre-shade gcpio-protobuf-java, > gcpio-protobuf-java-util, gcpio-grpc-java, etc. Anything that needs to take > a Message object. (and do the same for beam-model-protobuf-java since the > model bits have to depend on each other but nothing else). > > Kenn > > On Thu, Feb 1, 2018 at 1:56 AM, Ismaël Mejía <[email protected]> wrote: > >> Huge +1 to get rid of Guava! >> >> This solves annoying dependency issues for some IOs and allow us to >> get rid of the shading that makes current jars bigger than they should >> be. >> >> We can create our own 'micro guava' package with some classes for >> things that are hard to migrate, or that we prefer to still have like >> the check* methods for example. Given the size of the task we should >> probably divide it into subtasks, more important is to get rid of it >> for 'sdks/java/core'. We can then attack other areas afterwards. >> >> Other important idea would be to get rid of Protobuf in public APIs >> like GCPIO and to better shade it from leaking into the runners. An >> unexpected side effect of this is a leak of netty via gRPC/protobuf >> that is byting us for the Spark runner, but well that's worth a >> different discussion. >> >> >> On Thu, Feb 1, 2018 at 10:08 AM, Romain Manni-Bucau >> <[email protected]> wrote: >> > a map of list is fine and not a challenge we'll face long I hope ;) >> > >> > >> > Romain Manni-Bucau >> > @rmannibucau | Blog | Old Blog | Github | LinkedIn >> > >> > 2018-02-01 9:40 GMT+01:00 Reuven Lax <[email protected]>: >> >> >> >> Not sure we'll be able to replace them all. Things like guava Table and >> >> Multimap don't have great replacements in Java8. >> >> >> >> On Wed, Jan 31, 2018 at 10:11 PM, Jean-Baptiste Onofré < >> [email protected]> >> >> wrote: >> >>> >> >>> +1, it was on my TODO for a while waiting the Java8 update. >> >>> >> >>> Regards >> >>> JB >> >>> >> >>> On 02/01/2018 06:56 AM, Romain Manni-Bucau wrote: >> >>> > Why not dropping guava for all beam codebase? With java 8 it is >> quite >> >>> > easy to do >> >>> > it and avoid a bunch of conflicts. Did it in 2 projects with quite a >> >>> > good result. >> >>> > >> >>> > Le 1 févr. 2018 06:50, "Lukasz Cwik" <[email protected] >> >>> > <mailto:[email protected]>> a écrit : >> >>> > >> >>> > Make sure to include the guava version in the artifact name so >> that >> >>> > we can >> >>> > have multiple vendored versions. >> >>> > >> >>> > On Wed, Jan 31, 2018 at 9:16 PM, Kenneth Knowles < >> [email protected] >> >>> > <mailto:[email protected]>> wrote: >> >>> > >> >>> > I didn't have time for this, but it just bit me. We >> definitely >> >>> > have >> >>> > Guava on the API surface of runner support code in ways that >> >>> > get >> >>> > incompatibly shaded. I will probably start "1a" by making a >> >>> > shaded >> >>> > library org.apache.beam:vendored-guava and starting to use >> it. >> >>> > It sounds >> >>> > like there is generally unanimous support for that much, >> >>> > anyhow. >> >>> > >> >>> > Kenn >> >>> > >> >>> > On Wed, Dec 13, 2017 at 8:31 AM, Aljoscha Krettek >> >>> > <[email protected] >> >>> > <mailto:[email protected]>> wrote: >> >>> > >> >>> > Thanks Ismaël for bringing up this discussion again! >> >>> > >> >>> > I would be in favour of 1) and more specifically of 1a) >> >>> > >> >>> > Aljoscha >> >>> > >> >>> > >> >>> >> On 12. Dec 2017, at 18:56, Lukasz Cwik < >> [email protected] >> >>> >> <mailto:[email protected]>> wrote: >> >>> >> >> >>> >> You can always run tests on post shaded artifacts >> instead >> >>> >> of the >> >>> >> compiled classes, it just requires us to change our >> maven >> >>> >> surefire >> >>> >> / gradle test configurations but it is true that most >> IDEs >> >>> >> would >> >>> >> behave better with a dependency jar unless you delegate >> >>> >> all the >> >>> >> build/test actions to the build system and then it >> won't >> >>> >> matter. >> >>> >> >> >>> >> On Mon, Dec 11, 2017 at 9:05 PM, Kenneth Knowles >> >>> >> <[email protected] >> >>> >> <mailto:[email protected]>> wrote: >> >>> >> >> >>> >> There's also, with additional overhead, >> >>> >> >> >>> >> 1a) A relocated and shipped package for each thing >> we >> >>> >> want to >> >>> >> relocate. I think this has also been tried outside >> >>> >> Beam... >> >>> >> >> >>> >> Pros: >> >>> >> * all the pros of 1) plus no bloat beyond what is >> >>> >> necessary >> >>> >> Cons: >> >>> >> * abandons whitelist approach for public deps, >> >>> >> reverting to >> >>> >> blacklist approach for trouble things like guava, >> so a >> >>> >> bit >> >>> >> less principled >> >>> >> >> >>> >> For both 1) and 1a) I would add: >> >>> >> >> >>> >> Pros: >> >>> >> * clearly readable dependency since code will >> `import >> >>> >> org.apache.beam.private.guava21` and IDEs will >> >>> >> understand it >> >>> >> is a distinct lilbrary >> >>> >> * can run tests on unpackaged classes, as long as >> deps >> >>> >> are >> >>> >> shaded or provided as jars >> >>> >> * no mysterious action at a distance from inherited >> >>> >> configuration >> >>> >> Cons: >> >>> >> * need to adjust imports >> >>> >> >> >>> >> Kenn >> >>> >> >> >>> >> On Mon, Dec 11, 2017 at 9:57 AM, Lukasz Cwik >> >>> >> <[email protected] >> >>> >> <mailto:[email protected]>> wrote: >> >>> >> >> >>> >> I would suggest that either we use: >> >>> >> 1) A common deps package containing shaded >> >>> >> dependencies >> >>> >> allows for >> >>> >> Pros >> >>> >> * doesn't require the user to build an uber jar >> >>> >> Risks >> >>> >> * dependencies package will keep growing even >> if >> >>> >> something >> >>> >> is or isn't needed by all of Apache Beam >> leading >> >>> >> to a >> >>> >> large jar anyways negating any space savings >> >>> >> >> >>> >> 2) Shade within each module to a common >> location >> >>> >> like >> >>> >> org.apache.beam.relocated.guava.... >> >>> >> Pros >> >>> >> * you only get the shaded dependencies of the >> >>> >> things that >> >>> >> are required >> >>> >> * its one less dependency for users to manage >> >>> >> Risks >> >>> >> * requires an uber jar to be built to get the >> >>> >> space >> >>> >> savings (either by a user or a distribution of >> >>> >> Apache >> >>> >> Beam) otherwise we negate any space savings. >> >>> >> >> >>> >> If we either use a common relocation scheme or >> a >> >>> >> dependencies jar then each relocation should >> >>> >> specifically >> >>> >> contain the version number of the package >> because >> >>> >> we would >> >>> >> like to allow for us to be using two different >> >>> >> versions of >> >>> >> the same library. >> >>> >> >> >>> >> For the common deps package approach, should we >> >>> >> check in >> >>> >> code where the imports contain the relocated >> >>> >> location >> >>> >> (e.g. import org.apache.beam.guava.20.0.com >> >>> >> >> >>> >> <http://org.apache.beam.guava.20.0.com/>.google.common.colle >> ct.ImmutableList)? >> >>> >> >> >>> >> >> >>> >> On Mon, Dec 11, 2017 at 8:47 AM, Jean-Baptiste >> >>> >> Onofré >> >>> >> <[email protected] <mailto:[email protected]>> >> wrote: >> >>> >> >> >>> >> Thanks for bringing that back. >> >>> >> >> >>> >> Indeed guava is shaded in different >> uber-jar. >> >>> >> Maybe we >> >>> >> can have a common deps module that we >> include >> >>> >> once >> >>> >> (but the user will have to explicitly >> define >> >>> >> the dep) ? >> >>> >> >> >>> >> Basically, what do you propose for protobuf >> >>> >> (unfortunately, I don't see an obvious) ? >> >>> >> >> >>> >> Regards >> >>> >> JB >> >>> >> >> >>> >> >> >>> >> On 12/11/2017 05:35 PM, Ismaël Mejía wrote: >> >>> >> >> >>> >> Hello, I wanted to bring back this >> subject >> >>> >> because >> >>> >> I think we should >> >>> >> take action on this and at least first >> >>> >> have a >> >>> >> shaded version of guava. >> >>> >> I was playing with a toy project and I >> did >> >>> >> the >> >>> >> procedure we use to >> >>> >> submit jars to a Hadoop cluster via >> >>> >> Flink/Spark >> >>> >> which involves >> >>> >> creating an uber jar and I realized >> that >> >>> >> the size >> >>> >> of the jar was way >> >>> >> bigger than I expected, and the fact >> that >> >>> >> we shade >> >>> >> guava in every >> >>> >> module contributes to this. I found >> guava >> >>> >> shaded on: >> >>> >> >> >>> >> sdks/java/core >> >>> >> runners/core-construction-java >> >>> >> runners/core-java >> >>> >> model/job-management >> >>> >> runners/spark >> >>> >> sdks/java/io/hadoop-file-system >> >>> >> sdks/java/io/kafka >> >>> >> >> >>> >> This means at least 6 times more of the >> >>> >> size it >> >>> >> should which counts in >> >>> >> around 15MB more (2.4MB*6 deps) of >> extra >> >>> >> weight >> >>> >> that we can simply >> >>> >> reduce by using a shaded version of the >> >>> >> library. >> >>> >> >> >>> >> I add this point to the previous ones >> >>> >> mentioned >> >>> >> during the discussion >> >>> >> because this goes against the end user >> >>> >> experience >> >>> >> and affects us all >> >>> >> (devs/users). >> >>> >> >> >>> >> Another question is if we should shade >> >>> >> (and how) >> >>> >> protocol buffers >> >>> >> because now with the portability work >> we >> >>> >> are >> >>> >> exposing it closer to the >> >>> >> end users. I say this because I also >> found >> >>> >> an >> >>> >> issue while running a >> >>> >> job on YARN with the spark runner >> because >> >>> >> hadoop-common includes >> >>> >> protobuf-java 2 and I had to explicitly >> >>> >> provide >> >>> >> protocol-buffers 3 as >> >>> >> a dependency to be able to use triggers >> >>> >> (note the >> >>> >> Spark runner >> >>> >> translates them using some method from >> >>> >> runners/core-java). Since >> >>> >> hadoop-common is provided in the >> cluster >> >>> >> with the >> >>> >> older version of >> >>> >> protobuf, I am afraid that this will >> bite >> >>> >> us in >> >>> >> the future. >> >>> >> >> >>> >> Ismaël >> >>> >> >> >>> >> ps. There is already a JIRA for that >> >>> >> shading for >> >>> >> protobuf on >> >>> >> hadoop-common but this is not coming >> until >> >>> >> version >> >>> >> 3 is out. >> >>> >> >> >>> >> https://issues.apache.org/jira/browse/HADOOP-13136 >> >>> >> >> >>> >> <https://issues.apache.org/jira/browse/HADOOP-13136> >> >>> >> >> >>> >> ps2. Extra curious situation is to see >> >>> >> that the >> >>> >> dataflow-runner ends >> >>> >> up having guava shaded twice via its >> >>> >> shaded version on >> >>> >> core-construction-java. >> >>> >> >> >>> >> ps3. Of course this message means a >> >>> >> de-facto +1 at >> >>> >> least to do it for >> >>> >> guava and evaluate it for other libs. >> >>> >> >> >>> >> >> >>> >> On Tue, Oct 17, 2017 at 7:29 PM, Lukasz >> >>> >> Cwik >> >>> >> <[email protected] >> >>> >> <mailto:[email protected]>> >> wrote: >> >>> >> >> >>> >> An issue to call out is how to deal >> >>> >> with our >> >>> >> generated code (.avro and >> >>> >> .proto) as I don't believe those >> >>> >> plugins allow >> >>> >> you to generate code using a >> >>> >> shaded package prefix on imports. >> >>> >> >> >>> >> On Tue, Oct 17, 2017 at 10:28 AM, >> >>> >> Thomas Groh >> >>> >> <[email protected] >> >>> >> <mailto:[email protected]>> >> >>> >> wrote: >> >>> >> >> >>> >> +1 to the goal. I'm hugely in >> >>> >> favor of not >> >>> >> doing the same shading work >> >>> >> every time for dependencies we >> >>> >> know we'll use. >> >>> >> >> >>> >> This also means that if we end >> up >> >>> >> pulling >> >>> >> in transitive dependencies we >> >>> >> don't want in any particular >> >>> >> module we can >> >>> >> avoid having to adjust our >> >>> >> repackaging strategy for that >> >>> >> module - >> >>> >> which I have run into >> face-first >> >>> >> in >> >>> >> the past. >> >>> >> >> >>> >> On Tue, Oct 17, 2017 at 9:48 >> AM, >> >>> >> Kenneth >> >>> >> Knowles <[email protected] >> >>> >> <mailto:[email protected] >> >> >> >>> >> wrote: >> >>> >> >> >>> >> Hi all, >> >>> >> >> >>> >> Shading is a big part of >> how >> >>> >> we keep >> >>> >> our dependencies sane in >> Beam. >> >>> >> But >> >>> >> downsides: shading is super >> >>> >> slow, >> >>> >> causes massive jar bloat, >> and >> >>> >> kind of >> >>> >> hard to get right because >> >>> >> artifacts >> >>> >> and namespaces are not >> 1-to-1. >> >>> >> >> >>> >> I know that some >> communities >> >>> >> distribute their own shaded >> >>> >> distributions of >> >>> >> dependencies. I had a >> thought >> >>> >> about >> >>> >> doing something similar >> that I >> >>> >> wanted >> >>> >> to throw out there for >> people >> >>> >> to poke >> >>> >> holes in. >> >>> >> >> >>> >> To set the scene, here is >> how >> >>> >> I view >> >>> >> shading: >> >>> >> >> >>> >> - A module has public >> >>> >> dependencies >> >>> >> and private dependencies. >> >>> >> - Public deps are used >> for >> >>> >> data >> >>> >> interchange; users must >> share >> >>> >> these >> >>> >> >> >>> >> deps. >> >>> >> >> >>> >> - Private deps are just >> >>> >> functionality and can be >> >>> >> hidden (in >> >>> >> our case, >> >>> >> relocated + bundled) >> >>> >> - It isn't necessarily >> that >> >>> >> simple, >> >>> >> because public and private >> >>> >> deps >> >>> >> >> >>> >> might >> >>> >> >> >>> >> interact in higher-order >> ways >> >>> >> ("public" is contagious) >> >>> >> >> >>> >> Shading is an >> implementation >> >>> >> detail of >> >>> >> expressing these >> >>> >> characteristics. >> >>> >> >> >>> >> We >> >>> >> >> >>> >> use shading selectively >> >>> >> because of its >> >>> >> downsides I mentioned >> above. >> >>> >> >> >>> >> But what about this idea: >> >>> >> Introduce >> >>> >> shaded deps as a single >> >>> >> separate >> >>> >> artifact. >> >>> >> >> >>> >> - sdks/java/private-deps: >> >>> >> bundled >> >>> >> uber jar with relocated >> >>> >> versions of >> >>> >> everything we want to shade >> >>> >> >> >>> >> - sdks/java/core and >> >>> >> sdks/java/harness: no >> >>> >> relocation or >> >>> >> bundling - >> >>> >> depends on >> >>> >> >> `beam-sdks-java-private-deps` >> >>> >> and >> >>> >> imports like >> >>> >> >> >>> >> `org.apache.beam.sdk.private.com.google.common` >> >>> >> directly (this is what >> >>> >> they >> >>> >> are rewritten to >> >>> >> >> >>> >> Some benefits >> >>> >> >> >>> >> - much faster builds of >> >>> >> other modules >> >>> >> - only one shaded uber >> jar >> >>> >> - rare/no rebuilds of the >> >>> >> uber jar >> >>> >> - can use maven enforcer >> to >> >>> >> forbid >> >>> >> imports like >> com.google.common >> >>> >> - configuration all in >> one >> >>> >> place >> >>> >> - no automated rewriting >> of >> >>> >> our real >> >>> >> code, which has led to some >> >>> >> major >> >>> >> confusion >> >>> >> - easy to implement >> >>> >> incrementally >> >>> >> >> >>> >> Downsides: >> >>> >> >> >>> >> - plenty of effort work >> to >> >>> >> get there >> >>> >> - unclear how many >> different >> >>> >> such >> >>> >> deps modules we need; >> sharing >> >>> >> them >> >>> >> >> >>> >> could >> >>> >> >> >>> >> get weird >> >>> >> - if we hit a roadblock, >> we >> >>> >> will >> >>> >> have committed a lot of >> time >> >>> >> >> >>> >> Just something I was >> musing as >> >>> >> I spent >> >>> >> another evening waiting for >> >>> >> slow >> >>> >> builds to try to confirm >> >>> >> changes to >> >>> >> brittle poms. >> >>> >> >> >>> >> Kenn >> >>> >> >> >>> >> >> >>> >> >> >>> >> -- >> >>> >> Jean-Baptiste Onofré >> >>> >> [email protected] >> >>> >> <mailto:[email protected]> >> >>> >> http://blog.nanthrax.net >> >>> >> <http://blog.nanthrax.net/> >> >>> >> Talend - http://www.talend.com >> >>> >> <http://www.talend.com/> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> > >> >>> > >> >>> > >> >>> >> >>> -- >> >>> Jean-Baptiste Onofré >> >>> [email protected] >> >>> http://blog.nanthrax.net >> >>> Talend - http://www.talend.com >> >> >> >> >> > >> > >
