Apologies to Danny, i haven't read your considered reply yet, as i need to address Kerry's question on priorities immediately.
I feel it's entirely critical to have before native streaming, because it serves users more in the long term. The cost of waiting is that we can't establish the convention of registering things with the SDK to get performance boosts and instead ask people to fiddle with their code again a few months down the road. This leads to proliferation of code that will be unable to set the precedent of efficiency for organizations who aren't able to keep up with the latest and greatest in Beam. The cost of waiting is having users burn through CPU unnecessarily contributing to the ongoing boiling of the planet. The cost of waiting is making users spend more on said CPU and memory, when they could have faster completing jobs more cheaply now, rather than later. That does more for all Go SDK users now than streaming would. Setting conventions, and having a good experience with the SDK is something we should be doing sooner rather than later, so we aren't constantly changing things on our users with New Ways To Do Old Things, vs unlocking abilities to do things in another SDK, that have Options in the other SDKs already. On Thu, Mar 17, 2022, 8:01 AM Kerry Donny-Clark <[email protected]> wrote: > My thoughts: > What are the benefits of doing this now/the costs of waiting? > Are generics something that gives us more value than streaming features? > It makes sense to me to punt this work until the SDK is more developed, so > that we can observe the consequences of streaming instead of imagining > them. > I'm the least expert Go developer on this thread, so please let me know if > I'm missing something obvious. I really appreciate the clear back and forth > :) > > On Thu, Mar 17, 2022 at 8:36 AM Danny McCormick <[email protected]> > wrote: > >> Robert and I have been catching up offline, I did still want to respond >> here for the broader audience, both to clarify some of the questions Robert >> brought up and to share some of the progress we've made towards getting on >> the same page. Structurally, I'll walk through the questions/assertions >> brought up above to establish common ground and where we have seen the >> world differently. Some of that has been covered in our offline >> conversation, some has not. At the end I'll try to give a brief summary of >> what we've talked about offline. Robert, please feel free to correct me if >> I've misrepresented any of that conversation! >> >> *Robert's tl;dr* >> >> > I'd also like to emphasize that we gain 99.99999...% of the code >> generator benefit by covering the ProcessElement calls. >> > No other method is called similarly often by the system (on every >> element). Perfect is the enemy of Good Enough. >> >> I think this is actually fundamentally where we diverged (more on that >> below) - I've been treating coverage of all structural DoFn methods as a >> requirement for a code generator replacement, Robert has not. More on that >> below. >> >> *1. Problem: Users have to determine which Registration function to use.* >> >> *1.a)* I think that perhaps there has been a breakdown of terminology >> here. When I say on the order of 1000s of DoFn variants, I'm not just >> talking about DoFn1x0 to DoFn7x4. If that is all we need to worry about I >> agree that's not a problem. I'm referring to the section of my doc where I >> discuss the need for a proliferation of interfaces to support each >> structural DoFn method such that a user can pass in a DoFn of the exact >> right interface. See >> https://docs.google.com/document/d/1imYbBeu2FNJkwPNm6E9GEJkjpHnHscvFoKAE6AISvFA/edit#heading=h.e24tsm6ev6f0 >> where I discuss the problem - specifically, it culminates in "To use >> this function, the user would be required to know which of the many DoFn >> variants they should using...". >> >> It's fine if you disagree with that (and I think you have brought up some >> valid points about ways to reduce the complexity by composing interfaces), >> but I'm heavily relying on shared understanding of that doc for >> contextualizing this conversation. >> >> *1.b) *Sorry that was confusing, you are correct that I've been abusing >> the term cast when really what I've meant in most cases is a user forcing >> type inference (e.g. `foo.(Bar)`) >> >> *1.c) *Agreed if there are ~30 types of DoFn reasonably named, I think >> this becomes exceedingly inconvenient if there is indeed an explosion of >> types into the hundreds or thousands. I don't think we disagree here, this >> point mostly reduces to (1.a) - please correct me if I'm wrong. >> >> *1.d) *That's true, we could also have it print out arbitrary >> registration code though - I don't think that's a particularly compelling >> user experience either way. >> >> *2. Feasibility.* >> >> *2.a) *To be honest, I just flat out disagree here - like I said above, >> your example doesn't touch the actual user facing function which is the >> hard part of the problem IMO. I think it starts to show a path towards >> simplifying the code generator (which I don't think we have any >> disagreement on the feasibility of), but I'd appreciate some justification >> for your statement that it demonstrates feasibility in most cases. >> >> As an aside, I would say that if you think you had a basically working >> example (or could come up with one very quickly), I would've appreciated >> you sharing that as a resource associated with your 1 pager (not >> necessarily presented as *the solution*, but as a possible approach). >> >> *2.b) and 2.c) *I won't touch this too much here because I think this >> has been explored a lot more in our offline conversation (summarized >> below). At a high level, I agree, it's just a matter of which tradeoffs we >> want to make. As mentioned above, I've been treating coverage of all >> structural DoFn methods as a requirement for a code generator replacement >> (largely in service of "It should be performance neutral" from the >> original one-pager, but also because I generally think that's the right >> approach). >> >> *3. Justifications for the work* >> >> I don't have much to add here, other than asking that this kind of >> context be included in future one-pagers. Context like this is both really >> valuable to me as a newer contributor and really hard to find on my own if >> it isn't written down. Thanks for sharing it now! >> >> With this context, I'm generally on board with at least using generics to >> clean up some of these issues with the generator. I don't foresee the same >> set of problems I've described in my doc keeping us from simplifying the >> generator. >> >> >> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ >> >> *Summary of offline conversation* >> >> As alluded to above, I believe the fundamental difference in how Robert >> and I have been seeing this problem is one of requirements: I've been >> treating coverage of all structural DoFn methods as a requirement for a >> code generator replacement, Robert has not. If you remove that requirement >> and just focus on the easier ones to optimize (including ProcessElement, >> all the other non-sdf functions, and a subset of the sdf functions), the >> main problems presented in my doc go away. >> >> A huge piece of Robert's reasoning is looking at the frequency with which >> these functions will be invoked. ProcessElement gets invoked way more often >> than the rest of the structural DoFn methods, so we should focus on >> optimizing for that and not worry as much about the rest. >> >> I currently think that it is a good idea to continue to leave the door >> open for optimizing all the structural DoFn methods. The one I'm most >> worried about is watermark estimation, which doesn't actually exist yet in >> the Go Sdk - I think its likely that it will require invocation of at least >> 1 structural DoFn function at the element or bundle level, and I don't want >> to box ourselves out of being able to support performant watermark >> estimation (and thus fully performant full streaming support). Generally, >> the questions I'm interested in are: >> >> 1) Can we answer definitively at what level we will need structural DoFn >> invocation for watermark estimation 2) If its at the bundle/element level >> does that preclude us from a generics approach that doesn't fully optimize >> for that 3) If its unclear, should we prioritize watermark estimation >> design (and maybe implementation) ahead of generics/code generator >> >> While Robert has been framing it as: >> >> 4) Does a feature, like Watermark estimation, require us to do something >> different with the generic code generator such that we need to put off >> working on it now? >> >> I'm also still not 100% convinced that optimizing only at the >> bundle/element level is the right approach, though I'm probably moving in >> that direction. >> >> I would love input from others here if anyone has thoughts or context >> that is missing from this conversation! Otherwise, I'm sure Robert and I >> will continue our offline conversations and drive towards a solution. >> >> Thanks, >> Danny >> >> On Wed, Mar 16, 2022 at 2:18 PM Robert Burke <[email protected]> wrote: >> >>> Great questions! >>> >>> I think, ultimately, I was too coy in my one pager, and that's leading >>> to the disconnect we're having. I go into much more details here. >>> >>> *tl;dr;* >>> >>> I didn't emphasize the issues with the current generator in my doc. >>> Naming is a *huge* problem, and a generic implementation of the >>> generated code avoids most of it, and a total replacement avoids all of it. >>> >>> I'd also like to emphasize that we gain 99.99999...% of the code >>> generator benefit by covering the ProcessElement calls. >>> No other method is called similarly often by the system (on every >>> element). Perfect is the enemy of Good Enough. >>> >>> It's certainly good enough if more users are able to benefit from it, >>> rather than those with infrastructure, or the rigor to constantly update >>> their generated code. Users need to register their DoFns to make things >>> work anyway, they might as well get a performance boost out of it. >>> >>> *Longer version* >>> >>> I find endlessly inlined points and counterpoints to be very hard to >>> read, so I'm enumerating them so they can be referred to. >>> >>> *0.* Lets avoid discussing using Generics for KVs, Iterators, Emitters. >>> etc. That convenience is orthogonal to the code generator concerns. They >>> are related only by generics, and the code generator is a big enough topic >>> by itself. We can discuss it elsewhere. We gain nothing by conflating the >>> two and scope creeping to include them at this time. >>> * 0.a )* Unless there's a compelling argument to address them now, that >>> I'm missing, of course. >>> >>> *1. Problem: Users have to determine which Registration function to use.* >>> >>> *1.a)* I think "thousands" is overstating the issue. For the >>> "supported" variants for which we do call time optimization, it's from 1x0 >>> to 7x4. So about 28-30 Registrations for structural DoFns, and an >>> equivalent set for functional DoFns. A big number, but certainly not >>> thousands. >>> >>> Recall the generics largely are unaware of which specific types they >>> actually need to deal with. It's not clear to me where the explosion is >>> coming from, considering the other life cycle methods can be in related, >>> but independent interfaces the user doesn't need to be aware of. >>> >>> The code generator (and most of the rest of the invoker framework) is >>> unaware of whether a parameter is a context, or a window, or a value, or an >>> iterator, or an emitter. It just type asserts, and >>> >>> *1.b)* You keep mentioning "casting". Per my example, there's no >>> "casting", just selecting the right registration function, and specifying >>> types. Please clarify if that's what you're referring to, as Go doesn't >>> have "casting" as a concept. There are type conversions[0] for most values >>> of identical memory layout, and type assertions[1] for interfaces. >>> >>> *1.c) *I think users are capable of picking the right registration >>> method and filling in the types, if they are provide with clear examples in >>> the SDK, and documentation. Users will go through a fair amount in order to >>> squeeze out some performance. >>> >>> *1.d) *If all else fails, with a generic method, we can have the SDK >>> litterally print out the registration code for the users. >>> >>> *2. Feasibility.* >>> >>> *2.a)* I feel my example demonstrates the feasibility for *most >>> cases*. >>> >>> *2.b)*There are certainly areas that are currently lacking. This is >>> also true of the code generator as is. It doesn't support map side inputs >>> in any form right now. Either we move to generics and stop asking the >>> question, or we need to fix it for every complex function type we accept as >>> a parameter. >>> >>> This is a naming problem that is avoided entirely with a generic >>> implementation. >>> >>> *2.c)* Even if we can't or don't initially support SplittableDoFns >>> (which are not currently supported properly anyway), or CombineFns (which >>> are), there will always be a "just one more" problem. That's OK. Most DoFns >>> are much simpler, and can and should get the performance benefits. Perfect >>> is the enemy of Good Enough. >>> >>> *3. Justifications for the work* >>> >>> *3.a) *I'm only aware of one user for the code generator at present, >>> and that's presently for Google internal use. >>> >>> It relies on Bazel rules that users need to specify instead of the >>> usual "go_library" or "go_binary" rules by default. Already this is an >>> encumbrance. Static analysis is never straightforward, so users must still >>> manually list their dofns in those BUILD files, outside of their code. The >>> main benefit of this approach is users can be ignorant of the .shims.go >>> file we're adding to their package. >>> >>> Through this usage, issues with the code generator have been discovered: >>> >>> * Separate from the code, and not type checked -> Their IDEs can't >>> assist with typos and similar, but with an in-code generic solution this is >>> resolved. A work around exists by having the static analysis attempt to >>> figure things out, but writing clear error messages in this case is >>> non-obvious, and generally separated from the code, which would need to be >>> corrected in the separate file. >>> * It's not possible to use the code generator for both package code, and >>> test code, as there are naming collisions around iterators and emitters, as >>> they generate in separate shim files in the same package. >>> * The code generator needs to determine and provide imports for the >>> shims files for the various user types. This is not entirely trivial, and >>> also runs into name collisions for the short package names. >>> * See newer iterator/side input types issue above. >>> >>> These could all be fixed in the code generator, but in the end, we'd >>> largely be replicating what the compiler could be doing for us instead, if >>> the users were calling functions in their own code files. >>> >>> These problems are not exclusive to Googles internal approach to using >>> the generator either. >>> >>> *3.b) *The Code generator by itself isn't great to use for open source. >>> >>> * Requires an explicit `go generate` call by the user, which generates a >>> file they should largely not touch, until it causes a problem with >>> compilation. >>> * It's presently loosely versioned, so we can't enforce that users >>> re-generate when they update their code, even if it benefits them. >>> * The errors and code it generates are inscrutable. >>> >>> *3.c)* Benefits of moving to generic generation from static generation >>> >>> Assuming we determine users won't want to write the registration calls >>> themselves, then we still need to fix the above issues. I posit that >>> generating a file that contains generic function calls is much better than >>> what we currently have, which uses arbitrary unicode symbols to avoid >>> collisions, and similar. >>> >>> Should we move to generating calls to generic methods, we avoid all >>> naming problems, except for the import short name issue. >>> >>> This alone is worth it in the avoided edge cases. >>> >>> Robert Burke >>> Beam Go Busybody >>> >>> [0] https://go.dev/doc/effective_go#conversions >>> [1] https://go.dev/doc/effective_go#interface_conversions >>> >>> >>> On Wed, 16 Mar 2022 at 07:00, Danny McCormick <[email protected]> >>> wrote: >>> >>>> Hey Robert, thanks for the thoughts - I replied to your thoughts in the >>>> doc, but will add some thoughts here as well: >>>> >>>> > I agree that it's going to be considered inconvenient for users >>>> to manually specify types at this stage. >>>> > I think it's a substantial improvement over the status quo, simply >>>> because it removes a step in the iteration cycle when a change needs to be >>>> made. >>>> > It has the compiler do the work for us mostly. >>>> >>>> I think this is understating the problem significantly. The issue as >>>> described in the doc isn't just asking users to specify types in their >>>> registration call, it's that they would need to find the correct interface >>>> to cast their function to from many generated DoFn variants (on the order >>>> of thousands) - then they would need to specify types, which is the easy >>>> part. If we're able to reduce the problem to users specifying types then I >>>> wholeheartedly agree we should move forward with that approach. >>>> >>>> > I don't think the doc went far enough. See a working sketch of my >>>> thoughts here: https://go.dev/play/p/Bwf4eSavNxt?v=gotip >>>> >>>> and >>>> >>>> > The doc covers it's explored options very well, but it leaves out the >>>> possibility of getting halfway there: >>>> https://go.dev/play/p/Bwf4eSavNxt?v=gotip >>>> >>>> It's not 100% clear to me what you're getting at here, but I do think >>>> that this falls well short of the target of this investigation and fails to >>>> address what is actually the sticking point explained in the doc (the user >>>> experience of trying to find the right function from DoFn variants) - this >>>> doesn't touch the actual user facing registration function which is the >>>> hard part IMO. If your point is that we might be able to leverage generics >>>> to simplify/improve our code generator, I mention that below - if not, >>>> could you help me understand what you're suggesting? >>>> >>>> > And finally, even if inference is ultimately the sticking point: We >>>> should still rewrite the code generator to generate generic code for us, >>>> since it moves the harder parts (de-duplication, name collisions in types >>>> if used in both package code, and test code) into the Go compiler, rather >>>> than in our code. >>>> >>>> I don't necessarily disagree - just simplifying/improving the code >>>> generator with generics (along with generic KVs and other user facing >>>> improvements) was out of the scope of the doc, which was targeted >>>> specifically at whether we can fully replace the code generator. I >>>> intentionally targeted that because it was the clearest value add for users >>>> and definitely seemed to justify the investment - it is not as immediately >>>> clear to me how important replacing pieces of the code generator with >>>> generics is and whether we should prioritize that work immediately, save it >>>> for later, or wait and see if the type inference blocker is something the >>>> Go team decides to pick up. >>>> >>>> I think that's a discussion worth having, but it only matters if >>>> replacing the code generator is not actually feasible, so until we have >>>> consensus on that I'd prefer to hold off on going too far down that path. >>>> >>>> Thanks, >>>> Danny >>>> >>>> On Tue, Mar 15, 2022 at 8:31 PM Robert Burke <[email protected]> >>>> wrote: >>>> >>>>> The problem with being on vacation for a week is so many cool things >>>>> come up while you're gone that you can't get ahead of... >>>>> >>>>> Thank you Danny for the initial write up! I hope the exercise helped >>>>> you understand more about the SDK and Go, but I think there's more to do >>>>> here. >>>>> >>>>> *tl;dr; * >>>>> 1. I don't think the doc went far enough. See a working sketch of my >>>>> thoughts here: https://go.dev/play/p/Bwf4eSavNxt?v=gotip >>>>> 2. Agreed that supporting generic KVs should still be possible, but >>>>> it's an orthogonal investigation to the code generator (and its generated >>>>> code) question. >>>>> I won't discuss that further, as I have a different one-pager in the >>>>> works on that topic, and that's a very very different thread. >>>>> >>>>> *On the Code Generator* >>>>> Overall, the goal is to totally do-away with the current approach of >>>>> pre-generating. But I don't think a lack of inference is an obstacle to >>>>> that goal. >>>>> >>>>> I agree that it's going to be considered inconvenient for users >>>>> to manually specify types at this stage. >>>>> I think it's a substantial improvement over the status quo, simply >>>>> because it removes a step in the iteration cycle when a change needs to be >>>>> made. >>>>> It has the compiler do the work for us mostly. >>>>> >>>>> The doc covers it's explored options very well, but it leaves out the >>>>> possibility of getting halfway there: >>>>> https://go.dev/play/p/Bwf4eSavNxt?v=gotip >>>>> >>>>> To cover the varieties of expected arities, we should generate the SDK >>>>> side registration code in order to avoid missing cases, and ensure even >>>>> coverage. >>>>> >>>>> And finally, even if inference is ultimately the sticking point: We >>>>> should still rewrite the code generator to generate generic code for us, >>>>> since it moves the harder parts (de-duplication, name collisions in types >>>>> if used in both package code, and test code) into the Go compiler, rather >>>>> than in our code. >>>>> >>>>> And who knows? Perhaps in Go 1.19 or 1.20, the inferences Danny >>>>> proposed become possible, and we can clean things up further. >>>>> >>>>> Robert Burke, >>>>> Beam Go Busybody >>>>> >>>>> On Fri, 11 Mar 2022 at 06:03, Danny McCormick < >>>>> [email protected]> wrote: >>>>> >>>>>> > Can generics be used to at least get rid of the oddities of KV >>>>>> handling? >>>>>> >>>>>> Good question - my honest answer is that I don't know: this >>>>>> doc/investigation was specifically focused on replacing the code >>>>>> generator >>>>>> and generics not being useful in this context does not mean they can't be >>>>>> useful elsewhere. >>>>>> >>>>>> My speculative answer (without having a full familiarity with how we >>>>>> currently handle KVs or the associated problems) is that I don't *think >>>>>> *the core blocker keeping us from replacing the code generator >>>>>> applies here. The big issue we're facing here is that we can't infer >>>>>> generic interface types based on the parameters of method functions of a >>>>>> struct. In the case of KV, we would be inferring the interface types >>>>>> based >>>>>> on the types of member variables of the struct, which Go should be able >>>>>> to >>>>>> do. >>>>>> >>>>>> All that to say, in no way should this investigation preclude us from >>>>>> looking for other ways to apply generics to make our lives easier >>>>>> (whether >>>>>> that be in KV handling or improving other experiences). >>>>>> >>>>>> On Thu, Mar 10, 2022 at 6:55 PM Robert Bradshaw <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Thanks for the writeup. That's too bad. Can generics be used to at >>>>>>> least get rid of the oddities of KV handling? >>>>>>> >>>>>>> On Thu, Mar 10, 2022 at 3:40 PM Luke Cwik <[email protected]> wrote: >>>>>>> >>>>>>>> Well written doc and unfortunate that Generics seem to not work out >>>>>>>> for us. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Mar 10, 2022 at 11:03 AM Jack McCluskey < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Well, looks like Go generics weren't quite the magic bullet we >>>>>>>>> were hoping for in their current state. We'll get that code generator >>>>>>>>> ripped out one day, but not today. >>>>>>>>> >>>>>>>>> On Thu, Mar 10, 2022 at 12:44 PM Danny McCormick < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hey folks, >>>>>>>>>> >>>>>>>>>> Right now, the Go Sdk relies on users to generate code as a >>>>>>>>>> pre-compile step to avoid some overhead of excessive runtime >>>>>>>>>> reflection. >>>>>>>>>> This is suboptimal because it relies on users to regenerate code >>>>>>>>>> every time >>>>>>>>>> they create a DoFn, causing friction to users and slowness when they >>>>>>>>>> don't >>>>>>>>>> remember to regenerate (among other reasons). @lostluck wrote a one >>>>>>>>>> pager >>>>>>>>>> around replacing this code generator with generics - after spending >>>>>>>>>> pieces >>>>>>>>>> of the last couple days digging in my takeaway is that it's not >>>>>>>>>> doable in a >>>>>>>>>> way that benefits the Sdk at this time and I wanted to share my >>>>>>>>>> findings. I >>>>>>>>>> wrote them up here - >>>>>>>>>> https://docs.google.com/document/d/1imYbBeu2FNJkwPNm6E9GEJkjpHnHscvFoKAE6AISvFA/edit?usp=sharing >>>>>>>>>> - >>>>>>>>>> please take a look if you're interested, and definitely let me know >>>>>>>>>> if I'm >>>>>>>>>> missing anything! >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Danny >>>>>>>>>> >>>>>>>>>
