On Fri, Feb 2, 2018 at 6:41 AM, Romain Manni-Bucau <[email protected]> wrote:
> > > 2018-02-02 15:37 GMT+01:00 Kenneth Knowles <[email protected]>: > >> Another couple: >> >> - User-facing TypeDescriptor is a very thin wrapper on Guava's TypeToken >> > > Technically reflect Type is enough > If you want to try to remove TypeToken from underneath TypeDescriptor, I have no objections as long as you expand the test suite to cover all the functionality where we trust TypeToken's tests. Good luck :-) Kenn > > >> - ImmutableList and friends and their builders are very widely used and >> IMO still add a lot for readability and preventing someone coming along and >> adding mistakes to a codebase >> > > Sugar but not required. When you compare the cost of a shade versus of > duplicating the parts we need there is no real match IMHO. > > >> >> So considering it all, I would keep a vendored Guava (but also move off >> where we can, and also have our own improvements). I hope it will be a >> near-empty build file to generate it so not a maintenance burden. >> >> Kenn >> >> On Thu, Feb 1, 2018 at 8:44 PM, Kenneth Knowles <[email protected]> wrote: >> >>> Nice. This sounds like a great idea (or two?) and goes along with what I >>> just started for futures. >>> >>> Guava: filed https://issues.apache.org/jira/browse/BEAM-3606 and >>> assigned to Ismaël :-) and converted my futures thing to a subtask. >>> >>> Specific things for our micro guava: >>> >>> - checkArgumentNotNull can throw the right exception >>> - our own Optional because Java's is not Serializable >>> - futures combinators since many are missing, especially Java's don't >>> do exceptions right >>> >>> Protobuf: didn't file an issue because I'm not sure >>> >>> I was wondering if pre-shading works. We really need to get rid of it >>> from public APIs in a 100% reliable way. It is also a problem for Dataflow. >>> I was wondering if one approach is to pre-shade gcpio-protobuf-java, >>> gcpio-protobuf-java-util, gcpio-grpc-java, etc. Anything that needs to take >>> a Message object. (and do the same for beam-model-protobuf-java since the >>> model bits have to depend on each other but nothing else). >>> >>> Kenn >>> >>> On Thu, Feb 1, 2018 at 1:56 AM, Ismaël Mejía <[email protected]> wrote: >>> >>>> Huge +1 to get rid of Guava! >>>> >>>> This solves annoying dependency issues for some IOs and allow us to >>>> get rid of the shading that makes current jars bigger than they should >>>> be. >>>> >>>> We can create our own 'micro guava' package with some classes for >>>> things that are hard to migrate, or that we prefer to still have like >>>> the check* methods for example. Given the size of the task we should >>>> probably divide it into subtasks, more important is to get rid of it >>>> for 'sdks/java/core'. We can then attack other areas afterwards. >>>> >>>> Other important idea would be to get rid of Protobuf in public APIs >>>> like GCPIO and to better shade it from leaking into the runners. An >>>> unexpected side effect of this is a leak of netty via gRPC/protobuf >>>> that is byting us for the Spark runner, but well that's worth a >>>> different discussion. >>>> >>>> >>>> On Thu, Feb 1, 2018 at 10:08 AM, Romain Manni-Bucau >>>> <[email protected]> wrote: >>>> > a map of list is fine and not a challenge we'll face long I hope ;) >>>> > >>>> > >>>> > Romain Manni-Bucau >>>> > @rmannibucau | Blog | Old Blog | Github | LinkedIn >>>> > >>>> > 2018-02-01 9:40 GMT+01:00 Reuven Lax <[email protected]>: >>>> >> >>>> >> Not sure we'll be able to replace them all. Things like guava Table >>>> and >>>> >> Multimap don't have great replacements in Java8. >>>> >> >>>> >> On Wed, Jan 31, 2018 at 10:11 PM, Jean-Baptiste Onofré < >>>> [email protected]> >>>> >> wrote: >>>> >>> >>>> >>> +1, it was on my TODO for a while waiting the Java8 update. >>>> >>> >>>> >>> Regards >>>> >>> JB >>>> >>> >>>> >>> On 02/01/2018 06:56 AM, Romain Manni-Bucau wrote: >>>> >>> > Why not dropping guava for all beam codebase? With java 8 it is >>>> quite >>>> >>> > easy to do >>>> >>> > it and avoid a bunch of conflicts. Did it in 2 projects with >>>> quite a >>>> >>> > good result. >>>> >>> > >>>> >>> > Le 1 févr. 2018 06:50, "Lukasz Cwik" <[email protected] >>>> >>> > <mailto:[email protected]>> a écrit : >>>> >>> > >>>> >>> > Make sure to include the guava version in the artifact name >>>> so that >>>> >>> > we can >>>> >>> > have multiple vendored versions. >>>> >>> > >>>> >>> > On Wed, Jan 31, 2018 at 9:16 PM, Kenneth Knowles < >>>> [email protected] >>>> >>> > <mailto:[email protected]>> wrote: >>>> >>> > >>>> >>> > I didn't have time for this, but it just bit me. We >>>> definitely >>>> >>> > have >>>> >>> > Guava on the API surface of runner support code in ways >>>> that >>>> >>> > get >>>> >>> > incompatibly shaded. I will probably start "1a" by making >>>> a >>>> >>> > shaded >>>> >>> > library org.apache.beam:vendored-guava and starting to >>>> use it. >>>> >>> > It sounds >>>> >>> > like there is generally unanimous support for that much, >>>> >>> > anyhow. >>>> >>> > >>>> >>> > Kenn >>>> >>> > >>>> >>> > On Wed, Dec 13, 2017 at 8:31 AM, Aljoscha Krettek >>>> >>> > <[email protected] >>>> >>> > <mailto:[email protected]>> wrote: >>>> >>> > >>>> >>> > Thanks Ismaël for bringing up this discussion again! >>>> >>> > >>>> >>> > I would be in favour of 1) and more specifically of >>>> 1a) >>>> >>> > >>>> >>> > Aljoscha >>>> >>> > >>>> >>> > >>>> >>> >> On 12. Dec 2017, at 18:56, Lukasz Cwik < >>>> [email protected] >>>> >>> >> <mailto:[email protected]>> wrote: >>>> >>> >> >>>> >>> >> You can always run tests on post shaded artifacts >>>> instead >>>> >>> >> of the >>>> >>> >> compiled classes, it just requires us to change our >>>> maven >>>> >>> >> surefire >>>> >>> >> / gradle test configurations but it is true that >>>> most IDEs >>>> >>> >> would >>>> >>> >> behave better with a dependency jar unless you >>>> delegate >>>> >>> >> all the >>>> >>> >> build/test actions to the build system and then it >>>> won't >>>> >>> >> matter. >>>> >>> >> >>>> >>> >> On Mon, Dec 11, 2017 at 9:05 PM, Kenneth Knowles >>>> >>> >> <[email protected] >>>> >>> >> <mailto:[email protected]>> wrote: >>>> >>> >> >>>> >>> >> There's also, with additional overhead, >>>> >>> >> >>>> >>> >> 1a) A relocated and shipped package for each >>>> thing we >>>> >>> >> want to >>>> >>> >> relocate. I think this has also been tried >>>> outside >>>> >>> >> Beam... >>>> >>> >> >>>> >>> >> Pros: >>>> >>> >> * all the pros of 1) plus no bloat beyond what is >>>> >>> >> necessary >>>> >>> >> Cons: >>>> >>> >> * abandons whitelist approach for public deps, >>>> >>> >> reverting to >>>> >>> >> blacklist approach for trouble things like >>>> guava, so a >>>> >>> >> bit >>>> >>> >> less principled >>>> >>> >> >>>> >>> >> For both 1) and 1a) I would add: >>>> >>> >> >>>> >>> >> Pros: >>>> >>> >> * clearly readable dependency since code will >>>> `import >>>> >>> >> org.apache.beam.private.guava21` and IDEs will >>>> >>> >> understand it >>>> >>> >> is a distinct lilbrary >>>> >>> >> * can run tests on unpackaged classes, as long >>>> as deps >>>> >>> >> are >>>> >>> >> shaded or provided as jars >>>> >>> >> * no mysterious action at a distance from >>>> inherited >>>> >>> >> configuration >>>> >>> >> Cons: >>>> >>> >> * need to adjust imports >>>> >>> >> >>>> >>> >> Kenn >>>> >>> >> >>>> >>> >> On Mon, Dec 11, 2017 at 9:57 AM, Lukasz Cwik >>>> >>> >> <[email protected] >>>> >>> >> <mailto:[email protected]>> wrote: >>>> >>> >> >>>> >>> >> I would suggest that either we use: >>>> >>> >> 1) A common deps package containing shaded >>>> >>> >> dependencies >>>> >>> >> allows for >>>> >>> >> Pros >>>> >>> >> * doesn't require the user to build an uber >>>> jar >>>> >>> >> Risks >>>> >>> >> * dependencies package will keep growing >>>> even if >>>> >>> >> something >>>> >>> >> is or isn't needed by all of Apache Beam >>>> leading >>>> >>> >> to a >>>> >>> >> large jar anyways negating any space savings >>>> >>> >> >>>> >>> >> 2) Shade within each module to a common >>>> location >>>> >>> >> like >>>> >>> >> org.apache.beam.relocated.guava.... >>>> >>> >> Pros >>>> >>> >> * you only get the shaded dependencies of the >>>> >>> >> things that >>>> >>> >> are required >>>> >>> >> * its one less dependency for users to manage >>>> >>> >> Risks >>>> >>> >> * requires an uber jar to be built to get the >>>> >>> >> space >>>> >>> >> savings (either by a user or a distribution >>>> of >>>> >>> >> Apache >>>> >>> >> Beam) otherwise we negate any space savings. >>>> >>> >> >>>> >>> >> If we either use a common relocation scheme >>>> or a >>>> >>> >> dependencies jar then each relocation should >>>> >>> >> specifically >>>> >>> >> contain the version number of the package >>>> because >>>> >>> >> we would >>>> >>> >> like to allow for us to be using two >>>> different >>>> >>> >> versions of >>>> >>> >> the same library. >>>> >>> >> >>>> >>> >> For the common deps package approach, should >>>> we >>>> >>> >> check in >>>> >>> >> code where the imports contain the relocated >>>> >>> >> location >>>> >>> >> (e.g. import org.apache.beam.guava.20.0.com >>>> >>> >> >>>> >>> >> <http://org.apache.beam.guava.20.0.com/>.google.common.colle >>>> ct.ImmutableList)? >>>> >>> >> >>>> >>> >> >>>> >>> >> On Mon, Dec 11, 2017 at 8:47 AM, >>>> Jean-Baptiste >>>> >>> >> Onofré >>>> >>> >> <[email protected] <mailto:[email protected]>> >>>> wrote: >>>> >>> >> >>>> >>> >> Thanks for bringing that back. >>>> >>> >> >>>> >>> >> Indeed guava is shaded in different >>>> uber-jar. >>>> >>> >> Maybe we >>>> >>> >> can have a common deps module that we >>>> include >>>> >>> >> once >>>> >>> >> (but the user will have to explicitly >>>> define >>>> >>> >> the dep) ? >>>> >>> >> >>>> >>> >> Basically, what do you propose for >>>> protobuf >>>> >>> >> (unfortunately, I don't see an obvious) ? >>>> >>> >> >>>> >>> >> Regards >>>> >>> >> JB >>>> >>> >> >>>> >>> >> >>>> >>> >> On 12/11/2017 05:35 PM, Ismaël Mejía >>>> wrote: >>>> >>> >> >>>> >>> >> Hello, I wanted to bring back this >>>> subject >>>> >>> >> because >>>> >>> >> I think we should >>>> >>> >> take action on this and at least >>>> first >>>> >>> >> have a >>>> >>> >> shaded version of guava. >>>> >>> >> I was playing with a toy project and >>>> I did >>>> >>> >> the >>>> >>> >> procedure we use to >>>> >>> >> submit jars to a Hadoop cluster via >>>> >>> >> Flink/Spark >>>> >>> >> which involves >>>> >>> >> creating an uber jar and I realized >>>> that >>>> >>> >> the size >>>> >>> >> of the jar was way >>>> >>> >> bigger than I expected, and the fact >>>> that >>>> >>> >> we shade >>>> >>> >> guava in every >>>> >>> >> module contributes to this. I found >>>> guava >>>> >>> >> shaded on: >>>> >>> >> >>>> >>> >> sdks/java/core >>>> >>> >> runners/core-construction-java >>>> >>> >> runners/core-java >>>> >>> >> model/job-management >>>> >>> >> runners/spark >>>> >>> >> sdks/java/io/hadoop-file-system >>>> >>> >> sdks/java/io/kafka >>>> >>> >> >>>> >>> >> This means at least 6 times more of >>>> the >>>> >>> >> size it >>>> >>> >> should which counts in >>>> >>> >> around 15MB more (2.4MB*6 deps) of >>>> extra >>>> >>> >> weight >>>> >>> >> that we can simply >>>> >>> >> reduce by using a shaded version of >>>> the >>>> >>> >> library. >>>> >>> >> >>>> >>> >> I add this point to the previous ones >>>> >>> >> mentioned >>>> >>> >> during the discussion >>>> >>> >> because this goes against the end >>>> user >>>> >>> >> experience >>>> >>> >> and affects us all >>>> >>> >> (devs/users). >>>> >>> >> >>>> >>> >> Another question is if we should >>>> shade >>>> >>> >> (and how) >>>> >>> >> protocol buffers >>>> >>> >> because now with the portability >>>> work we >>>> >>> >> are >>>> >>> >> exposing it closer to the >>>> >>> >> end users. I say this because I also >>>> found >>>> >>> >> an >>>> >>> >> issue while running a >>>> >>> >> job on YARN with the spark runner >>>> because >>>> >>> >> hadoop-common includes >>>> >>> >> protobuf-java 2 and I had to >>>> explicitly >>>> >>> >> provide >>>> >>> >> protocol-buffers 3 as >>>> >>> >> a dependency to be able to use >>>> triggers >>>> >>> >> (note the >>>> >>> >> Spark runner >>>> >>> >> translates them using some method >>>> from >>>> >>> >> runners/core-java). Since >>>> >>> >> hadoop-common is provided in the >>>> cluster >>>> >>> >> with the >>>> >>> >> older version of >>>> >>> >> protobuf, I am afraid that this will >>>> bite >>>> >>> >> us in >>>> >>> >> the future. >>>> >>> >> >>>> >>> >> Ismaël >>>> >>> >> >>>> >>> >> ps. There is already a JIRA for that >>>> >>> >> shading for >>>> >>> >> protobuf on >>>> >>> >> hadoop-common but this is not coming >>>> until >>>> >>> >> version >>>> >>> >> 3 is out. >>>> >>> >> >>>> >>> >> https://issues.apache.org/jira/browse/HADOOP-13136 >>>> >>> >> >>>> >>> >> <https://issues.apache.org/jira/browse/HADOOP-13136> >>>> >>> >> >>>> >>> >> ps2. Extra curious situation is to >>>> see >>>> >>> >> that the >>>> >>> >> dataflow-runner ends >>>> >>> >> up having guava shaded twice via its >>>> >>> >> shaded version on >>>> >>> >> core-construction-java. >>>> >>> >> >>>> >>> >> ps3. Of course this message means a >>>> >>> >> de-facto +1 at >>>> >>> >> least to do it for >>>> >>> >> guava and evaluate it for other libs. >>>> >>> >> >>>> >>> >> >>>> >>> >> On Tue, Oct 17, 2017 at 7:29 PM, >>>> Lukasz >>>> >>> >> Cwik >>>> >>> >> <[email protected] >>>> >>> >> <mailto:[email protected]>> >>>> wrote: >>>> >>> >> >>>> >>> >> An issue to call out is how to >>>> deal >>>> >>> >> with our >>>> >>> >> generated code (.avro and >>>> >>> >> .proto) as I don't believe those >>>> >>> >> plugins allow >>>> >>> >> you to generate code using a >>>> >>> >> shaded package prefix on imports. >>>> >>> >> >>>> >>> >> On Tue, Oct 17, 2017 at 10:28 AM, >>>> >>> >> Thomas Groh >>>> >>> >> <[email protected] >>>> >>> >> <mailto:[email protected] >>>> >> >>>> >>> >> wrote: >>>> >>> >> >>>> >>> >> +1 to the goal. I'm hugely in >>>> >>> >> favor of not >>>> >>> >> doing the same shading work >>>> >>> >> every time for dependencies >>>> we >>>> >>> >> know we'll use. >>>> >>> >> >>>> >>> >> This also means that if we >>>> end up >>>> >>> >> pulling >>>> >>> >> in transitive dependencies we >>>> >>> >> don't want in any particular >>>> >>> >> module we can >>>> >>> >> avoid having to adjust our >>>> >>> >> repackaging strategy for that >>>> >>> >> module - >>>> >>> >> which I have run into >>>> face-first >>>> >>> >> in >>>> >>> >> the past. >>>> >>> >> >>>> >>> >> On Tue, Oct 17, 2017 at 9:48 >>>> AM, >>>> >>> >> Kenneth >>>> >>> >> Knowles >>>> <[email protected] >>>> >>> >> <mailto: >>>> [email protected]>> >>>> >>> >> wrote: >>>> >>> >> >>>> >>> >> Hi all, >>>> >>> >> >>>> >>> >> Shading is a big part of >>>> how >>>> >>> >> we keep >>>> >>> >> our dependencies sane in >>>> Beam. >>>> >>> >> But >>>> >>> >> downsides: shading is >>>> super >>>> >>> >> slow, >>>> >>> >> causes massive jar >>>> bloat, and >>>> >>> >> kind of >>>> >>> >> hard to get right because >>>> >>> >> artifacts >>>> >>> >> and namespaces are not >>>> 1-to-1. >>>> >>> >> >>>> >>> >> I know that some >>>> communities >>>> >>> >> distribute their own >>>> shaded >>>> >>> >> distributions of >>>> >>> >> dependencies. I had a >>>> thought >>>> >>> >> about >>>> >>> >> doing something similar >>>> that I >>>> >>> >> wanted >>>> >>> >> to throw out there for >>>> people >>>> >>> >> to poke >>>> >>> >> holes in. >>>> >>> >> >>>> >>> >> To set the scene, here >>>> is how >>>> >>> >> I view >>>> >>> >> shading: >>>> >>> >> >>>> >>> >> - A module has public >>>> >>> >> dependencies >>>> >>> >> and private dependencies. >>>> >>> >> - Public deps are used >>>> for >>>> >>> >> data >>>> >>> >> interchange; users must >>>> share >>>> >>> >> these >>>> >>> >> >>>> >>> >> deps. >>>> >>> >> >>>> >>> >> - Private deps are just >>>> >>> >> functionality and can be >>>> >>> >> hidden (in >>>> >>> >> our case, >>>> >>> >> relocated + bundled) >>>> >>> >> - It isn't necessarily >>>> that >>>> >>> >> simple, >>>> >>> >> because public and >>>> private >>>> >>> >> deps >>>> >>> >> >>>> >>> >> might >>>> >>> >> >>>> >>> >> interact in higher-order >>>> ways >>>> >>> >> ("public" is contagious) >>>> >>> >> >>>> >>> >> Shading is an >>>> implementation >>>> >>> >> detail of >>>> >>> >> expressing these >>>> >>> >> characteristics. >>>> >>> >> >>>> >>> >> We >>>> >>> >> >>>> >>> >> use shading selectively >>>> >>> >> because of its >>>> >>> >> downsides I mentioned >>>> above. >>>> >>> >> >>>> >>> >> But what about this idea: >>>> >>> >> Introduce >>>> >>> >> shaded deps as a single >>>> >>> >> separate >>>> >>> >> artifact. >>>> >>> >> >>>> >>> >> - >>>> sdks/java/private-deps: >>>> >>> >> bundled >>>> >>> >> uber jar with relocated >>>> >>> >> versions of >>>> >>> >> everything we want to >>>> shade >>>> >>> >> >>>> >>> >> - sdks/java/core and >>>> >>> >> sdks/java/harness: no >>>> >>> >> relocation or >>>> >>> >> bundling - >>>> >>> >> depends on >>>> >>> >> >>>> `beam-sdks-java-private-deps` >>>> >>> >> and >>>> >>> >> imports like >>>> >>> >> >>>> >>> >> `org.apache.beam.sdk.private.com.google.common` >>>> >>> >> directly (this is what >>>> >>> >> they >>>> >>> >> are rewritten to >>>> >>> >> >>>> >>> >> Some benefits >>>> >>> >> >>>> >>> >> - much faster builds of >>>> >>> >> other modules >>>> >>> >> - only one shaded uber >>>> jar >>>> >>> >> - rare/no rebuilds of >>>> the >>>> >>> >> uber jar >>>> >>> >> - can use maven >>>> enforcer to >>>> >>> >> forbid >>>> >>> >> imports like >>>> com.google.common >>>> >>> >> - configuration all in >>>> one >>>> >>> >> place >>>> >>> >> - no automated >>>> rewriting of >>>> >>> >> our real >>>> >>> >> code, which has led to >>>> some >>>> >>> >> major >>>> >>> >> confusion >>>> >>> >> - easy to implement >>>> >>> >> incrementally >>>> >>> >> >>>> >>> >> Downsides: >>>> >>> >> >>>> >>> >> - plenty of effort >>>> work to >>>> >>> >> get there >>>> >>> >> - unclear how many >>>> different >>>> >>> >> such >>>> >>> >> deps modules we need; >>>> sharing >>>> >>> >> them >>>> >>> >> >>>> >>> >> could >>>> >>> >> >>>> >>> >> get weird >>>> >>> >> - if we hit a >>>> roadblock, we >>>> >>> >> will >>>> >>> >> have committed a lot of >>>> time >>>> >>> >> >>>> >>> >> Just something I was >>>> musing as >>>> >>> >> I spent >>>> >>> >> another evening waiting >>>> for >>>> >>> >> slow >>>> >>> >> builds to try to confirm >>>> >>> >> changes to >>>> >>> >> brittle poms. >>>> >>> >> >>>> >>> >> Kenn >>>> >>> >> >>>> >>> >> >>>> >>> >> >>>> >>> >> -- >>>> >>> >> Jean-Baptiste Onofré >>>> >>> >> [email protected] >>>> >>> >> <mailto:[email protected]> >>>> >>> >> http://blog.nanthrax.net >>>> >>> >> <http://blog.nanthrax.net/> >>>> >>> >> Talend - http://www.talend.com >>>> >>> >> <http://www.talend.com/> >>>> >>> >> >>>> >>> >> >>>> >>> >> >>>> >>> >> >>>> >>> > >>>> >>> > >>>> >>> > >>>> >>> >>>> >>> -- >>>> >>> Jean-Baptiste Onofré >>>> >>> [email protected] >>>> >>> http://blog.nanthrax.net >>>> >>> Talend - http://www.talend.com >>>> >> >>>> >> >>>> > >>>> >>> >>> >> >
