Great questions! I think, ultimately, I was too coy in my one pager, and that's leading to the disconnect we're having. I go into much more details here.
*tl;dr;* I didn't emphasize the issues with the current generator in my doc. Naming is a *huge* problem, and a generic implementation of the generated code avoids most of it, and a total replacement avoids all of it. I'd also like to emphasize that we gain 99.99999...% of the code generator benefit by covering the ProcessElement calls. No other method is called similarly often by the system (on every element). Perfect is the enemy of Good Enough. It's certainly good enough if more users are able to benefit from it, rather than those with infrastructure, or the rigor to constantly update their generated code. Users need to register their DoFns to make things work anyway, they might as well get a performance boost out of it. *Longer version* I find endlessly inlined points and counterpoints to be very hard to read, so I'm enumerating them so they can be referred to. *0.* Lets avoid discussing using Generics for KVs, Iterators, Emitters. etc. That convenience is orthogonal to the code generator concerns. They are related only by generics, and the code generator is a big enough topic by itself. We can discuss it elsewhere. We gain nothing by conflating the two and scope creeping to include them at this time. * 0.a )* Unless there's a compelling argument to address them now, that I'm missing, of course. *1. Problem: Users have to determine which Registration function to use.* *1.a)* I think "thousands" is overstating the issue. For the "supported" variants for which we do call time optimization, it's from 1x0 to 7x4. So about 28-30 Registrations for structural DoFns, and an equivalent set for functional DoFns. A big number, but certainly not thousands. Recall the generics largely are unaware of which specific types they actually need to deal with. It's not clear to me where the explosion is coming from, considering the other life cycle methods can be in related, but independent interfaces the user doesn't need to be aware of. The code generator (and most of the rest of the invoker framework) is unaware of whether a parameter is a context, or a window, or a value, or an iterator, or an emitter. It just type asserts, and *1.b)* You keep mentioning "casting". Per my example, there's no "casting", just selecting the right registration function, and specifying types. Please clarify if that's what you're referring to, as Go doesn't have "casting" as a concept. There are type conversions[0] for most values of identical memory layout, and type assertions[1] for interfaces. *1.c) *I think users are capable of picking the right registration method and filling in the types, if they are provide with clear examples in the SDK, and documentation. Users will go through a fair amount in order to squeeze out some performance. *1.d) *If all else fails, with a generic method, we can have the SDK litterally print out the registration code for the users. *2. Feasibility.* *2.a)* I feel my example demonstrates the feasibility for *most cases*. *2.b)*There are certainly areas that are currently lacking. This is also true of the code generator as is. It doesn't support map side inputs in any form right now. Either we move to generics and stop asking the question, or we need to fix it for every complex function type we accept as a parameter. This is a naming problem that is avoided entirely with a generic implementation. *2.c)* Even if we can't or don't initially support SplittableDoFns (which are not currently supported properly anyway), or CombineFns (which are), there will always be a "just one more" problem. That's OK. Most DoFns are much simpler, and can and should get the performance benefits. Perfect is the enemy of Good Enough. *3. Justifications for the work* *3.a) *I'm only aware of one user for the code generator at present, and that's presently for Google internal use. It relies on Bazel rules that users need to specify instead of the usual "go_library" or "go_binary" rules by default. Already this is an encumbrance. Static analysis is never straightforward, so users must still manually list their dofns in those BUILD files, outside of their code. The main benefit of this approach is users can be ignorant of the .shims.go file we're adding to their package. Through this usage, issues with the code generator have been discovered: * Separate from the code, and not type checked -> Their IDEs can't assist with typos and similar, but with an in-code generic solution this is resolved. A work around exists by having the static analysis attempt to figure things out, but writing clear error messages in this case is non-obvious, and generally separated from the code, which would need to be corrected in the separate file. * It's not possible to use the code generator for both package code, and test code, as there are naming collisions around iterators and emitters, as they generate in separate shim files in the same package. * The code generator needs to determine and provide imports for the shims files for the various user types. This is not entirely trivial, and also runs into name collisions for the short package names. * See newer iterator/side input types issue above. These could all be fixed in the code generator, but in the end, we'd largely be replicating what the compiler could be doing for us instead, if the users were calling functions in their own code files. These problems are not exclusive to Googles internal approach to using the generator either. *3.b) *The Code generator by itself isn't great to use for open source. * Requires an explicit `go generate` call by the user, which generates a file they should largely not touch, until it causes a problem with compilation. * It's presently loosely versioned, so we can't enforce that users re-generate when they update their code, even if it benefits them. * The errors and code it generates are inscrutable. *3.c)* Benefits of moving to generic generation from static generation Assuming we determine users won't want to write the registration calls themselves, then we still need to fix the above issues. I posit that generating a file that contains generic function calls is much better than what we currently have, which uses arbitrary unicode symbols to avoid collisions, and similar. Should we move to generating calls to generic methods, we avoid all naming problems, except for the import short name issue. This alone is worth it in the avoided edge cases. Robert Burke Beam Go Busybody [0] https://go.dev/doc/effective_go#conversions [1] https://go.dev/doc/effective_go#interface_conversions On Wed, 16 Mar 2022 at 07:00, Danny McCormick <[email protected]> wrote: > Hey Robert, thanks for the thoughts - I replied to your thoughts in the > doc, but will add some thoughts here as well: > > > I agree that it's going to be considered inconvenient for users > to manually specify types at this stage. > > I think it's a substantial improvement over the status quo, simply > because it removes a step in the iteration cycle when a change needs to be > made. > > It has the compiler do the work for us mostly. > > I think this is understating the problem significantly. The issue as > described in the doc isn't just asking users to specify types in their > registration call, it's that they would need to find the correct interface > to cast their function to from many generated DoFn variants (on the order > of thousands) - then they would need to specify types, which is the easy > part. If we're able to reduce the problem to users specifying types then I > wholeheartedly agree we should move forward with that approach. > > > I don't think the doc went far enough. See a working sketch of my > thoughts here: https://go.dev/play/p/Bwf4eSavNxt?v=gotip > > and > > > The doc covers it's explored options very well, but it leaves out the > possibility of getting halfway there: > https://go.dev/play/p/Bwf4eSavNxt?v=gotip > > It's not 100% clear to me what you're getting at here, but I do think that > this falls well short of the target of this investigation and fails to > address what is actually the sticking point explained in the doc (the user > experience of trying to find the right function from DoFn variants) - this > doesn't touch the actual user facing registration function which is the > hard part IMO. If your point is that we might be able to leverage generics > to simplify/improve our code generator, I mention that below - if not, > could you help me understand what you're suggesting? > > > And finally, even if inference is ultimately the sticking point: We > should still rewrite the code generator to generate generic code for us, > since it moves the harder parts (de-duplication, name collisions in types > if used in both package code, and test code) into the Go compiler, rather > than in our code. > > I don't necessarily disagree - just simplifying/improving the code > generator with generics (along with generic KVs and other user facing > improvements) was out of the scope of the doc, which was targeted > specifically at whether we can fully replace the code generator. I > intentionally targeted that because it was the clearest value add for users > and definitely seemed to justify the investment - it is not as immediately > clear to me how important replacing pieces of the code generator with > generics is and whether we should prioritize that work immediately, save it > for later, or wait and see if the type inference blocker is something the > Go team decides to pick up. > > I think that's a discussion worth having, but it only matters if replacing > the code generator is not actually feasible, so until we have consensus on > that I'd prefer to hold off on going too far down that path. > > Thanks, > Danny > > On Tue, Mar 15, 2022 at 8:31 PM Robert Burke <[email protected]> wrote: > >> The problem with being on vacation for a week is so many cool things come >> up while you're gone that you can't get ahead of... >> >> Thank you Danny for the initial write up! I hope the exercise helped you >> understand more about the SDK and Go, but I think there's more to do here. >> >> *tl;dr; * >> 1. I don't think the doc went far enough. See a working sketch of my >> thoughts here: https://go.dev/play/p/Bwf4eSavNxt?v=gotip >> 2. Agreed that supporting generic KVs should still be possible, but it's >> an orthogonal investigation to the code generator (and its generated code) >> question. >> I won't discuss that further, as I have a different one-pager in the >> works on that topic, and that's a very very different thread. >> >> *On the Code Generator* >> Overall, the goal is to totally do-away with the current approach of >> pre-generating. But I don't think a lack of inference is an obstacle to >> that goal. >> >> I agree that it's going to be considered inconvenient for users >> to manually specify types at this stage. >> I think it's a substantial improvement over the status quo, simply >> because it removes a step in the iteration cycle when a change needs to be >> made. >> It has the compiler do the work for us mostly. >> >> The doc covers it's explored options very well, but it leaves out the >> possibility of getting halfway there: >> https://go.dev/play/p/Bwf4eSavNxt?v=gotip >> >> To cover the varieties of expected arities, we should generate the SDK >> side registration code in order to avoid missing cases, and ensure even >> coverage. >> >> And finally, even if inference is ultimately the sticking point: We >> should still rewrite the code generator to generate generic code for us, >> since it moves the harder parts (de-duplication, name collisions in types >> if used in both package code, and test code) into the Go compiler, rather >> than in our code. >> >> And who knows? Perhaps in Go 1.19 or 1.20, the inferences Danny proposed >> become possible, and we can clean things up further. >> >> Robert Burke, >> Beam Go Busybody >> >> On Fri, 11 Mar 2022 at 06:03, Danny McCormick <[email protected]> >> wrote: >> >>> > Can generics be used to at least get rid of the oddities of KV >>> handling? >>> >>> Good question - my honest answer is that I don't know: this >>> doc/investigation was specifically focused on replacing the code generator >>> and generics not being useful in this context does not mean they can't be >>> useful elsewhere. >>> >>> My speculative answer (without having a full familiarity with how we >>> currently handle KVs or the associated problems) is that I don't *think >>> *the core blocker keeping us from replacing the code generator applies >>> here. The big issue we're facing here is that we can't infer generic >>> interface types based on the parameters of method functions of a struct. In >>> the case of KV, we would be inferring the interface types based on the >>> types of member variables of the struct, which Go should be able to do. >>> >>> All that to say, in no way should this investigation preclude us from >>> looking for other ways to apply generics to make our lives easier (whether >>> that be in KV handling or improving other experiences). >>> >>> On Thu, Mar 10, 2022 at 6:55 PM Robert Bradshaw <[email protected]> >>> wrote: >>> >>>> Thanks for the writeup. That's too bad. Can generics be used to at >>>> least get rid of the oddities of KV handling? >>>> >>>> On Thu, Mar 10, 2022 at 3:40 PM Luke Cwik <[email protected]> wrote: >>>> >>>>> Well written doc and unfortunate that Generics seem to not work out >>>>> for us. >>>>> >>>>> >>>>> >>>>> On Thu, Mar 10, 2022 at 11:03 AM Jack McCluskey < >>>>> [email protected]> wrote: >>>>> >>>>>> Well, looks like Go generics weren't quite the magic bullet we were >>>>>> hoping for in their current state. We'll get that code generator ripped >>>>>> out >>>>>> one day, but not today. >>>>>> >>>>>> On Thu, Mar 10, 2022 at 12:44 PM Danny McCormick < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hey folks, >>>>>>> >>>>>>> Right now, the Go Sdk relies on users to generate code as a >>>>>>> pre-compile step to avoid some overhead of excessive runtime reflection. >>>>>>> This is suboptimal because it relies on users to regenerate code every >>>>>>> time >>>>>>> they create a DoFn, causing friction to users and slowness when they >>>>>>> don't >>>>>>> remember to regenerate (among other reasons). @lostluck wrote a one >>>>>>> pager >>>>>>> around replacing this code generator with generics - after spending >>>>>>> pieces >>>>>>> of the last couple days digging in my takeaway is that it's not doable >>>>>>> in a >>>>>>> way that benefits the Sdk at this time and I wanted to share my >>>>>>> findings. I >>>>>>> wrote them up here - >>>>>>> https://docs.google.com/document/d/1imYbBeu2FNJkwPNm6E9GEJkjpHnHscvFoKAE6AISvFA/edit?usp=sharing >>>>>>> - >>>>>>> please take a look if you're interested, and definitely let me know if >>>>>>> I'm >>>>>>> missing anything! >>>>>>> >>>>>>> Thanks, >>>>>>> Danny >>>>>>> >>>>>>
