+1 very well summarized Danny!

Robert B
Beam Go Busybody

PS. Apologies to the list for my emphatic but overly abrasive reply up
thread. I wrote brashly, and misunderstood Kerry's earnest question.


On Mon, Mar 28, 2022, 8:47 AM Ahmet Altay <al...@google.com> wrote:

> Thank you Danny for sharing a summary for the rest of us on the mailing
> list. Thank you all for the due diligence and the discussion. I appreciate
> it :)
>
> On Mon, Mar 28, 2022 at 6:49 AM Danny McCormick <dannymccorm...@google.com>
> wrote:
>
>> To close the loop here, Robert and I dug in offline, and Robert
>> ultimately convinced me that we can relax the completeness requirement from
>> my doc and not require coverage of all structural DoFn methods in order to
>> replace the code generator. Ultimately, the most important things that
>> changed my mind were:
>>
>> 1) Better understanding the current uses and challenges with the code
>> generator (this includes both the current uses and probably more
>> importantly understanding more of the challenges around things like naming
>> collisions). Robert summarized these challenges nicely in his 3rd point
>> earlier in the thread. Ultimately, if the generator is really hard to use
>> and nobody (or few people) in the OSS community are benefiting from it as a
>> result, then completeness matters less than making it usable.
>> 2) Robert's suggestion of interface composition in the doc helps reduce
>> the complexity for *most *cases, meaning that even if we can't hit all
>> structural DoFn methods, we can get closer than I initially thought.
>> 3) Our up-thread conversation about the frequency of ProcessElement calls
>> vs other calls. While I'm still worried about watermark estimation and some
>> other future sdf methods, we can get most of the benefits of a generics
>> approach without worrying about those methods.
>>
>> From the beginning we've agreed that if we relax the completeness
>> requirement then most of the technical challenges outlined in my doc go
>> away and this work becomes feasible. I've updated the doc to lead with a
>> disclaimer that we've agreed to relax this requirement, but I would mostly
>> like to keep it as is other than that - it is still technically solid, it
>> provides a useful outline of why we can't guarantee complete coverage of
>> all structural DoFn methods, and it gives us a good starting point to reach
>> full coverage if the Go team eventually decides to take on
>> https://github.com/golang/go/issues/41176.
>>
>> I'm still planning on taking on watermark estimation before this work
>> since it could have an impact on how we think about the final design. After
>> that, I will probably come back to this and put together another small doc
>> outlining the updated approach with the relaxed requirement (assuming
>> nobody gets to it first 🙂).
>>
>> Thanks,
>> Danny
>>
>> On Thu, Mar 17, 2022 at 11:38 AM Robert Burke <rob...@frantil.com> wrote:
>>
>>> Apologies to Danny, i haven't read your considered reply yet, as i need
>>> to address Kerry's question on priorities immediately.
>>>
>>>
>>>
>>> I feel it's entirely critical to have before native streaming, because
>>> it serves users more in the long term.
>>>
>>> The cost of waiting is that we can't establish the convention of
>>> registering things with the SDK to get performance boosts and instead ask
>>> people to fiddle with their code again a few months down the road. This
>>> leads to proliferation of code that will be unable to set the precedent of
>>> efficiency for organizations who aren't able to keep up with the latest and
>>> greatest in Beam.
>>>
>>> The cost of waiting is having users burn through CPU unnecessarily
>>> contributing to the ongoing boiling of the planet.
>>>
>>> The cost of waiting is making users spend more on said CPU and memory,
>>> when they could have faster completing jobs more cheaply now, rather than
>>> later. That does more for all Go SDK users now than streaming would.
>>>
>>> Setting conventions, and having a good experience with the SDK is
>>> something we should be doing sooner rather than later, so we aren't
>>> constantly changing things on our users with New Ways To Do Old Things, vs
>>> unlocking abilities to do things in another SDK, that have Options in the
>>> other SDKs already.
>>>
>>>
>>> On Thu, Mar 17, 2022, 8:01 AM Kerry Donny-Clark <kerr...@google.com>
>>> wrote:
>>>
>>>> My thoughts:
>>>> What are the benefits of doing this now/the costs of waiting?
>>>> Are generics something that gives us more value than streaming
>>>> features? It makes sense to me to punt this work until the SDK is more
>>>> developed, so that we can observe the consequences of streaming instead of
>>>> imagining them.
>>>> I'm the least expert Go developer on this thread, so please let me know
>>>> if I'm missing something obvious. I really appreciate the clear back and
>>>> forth :)
>>>>
>>>> On Thu, Mar 17, 2022 at 8:36 AM Danny McCormick <
>>>> dannymccorm...@google.com> wrote:
>>>>
>>>>> Robert and I have been catching up offline, I did still want to
>>>>> respond here for the broader audience, both to clarify some of the
>>>>> questions Robert brought up and to share some of the progress we've made
>>>>> towards getting on the same page. Structurally, I'll walk through the
>>>>> questions/assertions brought up above to establish common ground and
>>>>> where we have seen the world differently. Some of that has been covered in
>>>>> our offline conversation, some has not. At the end I'll try to give a 
>>>>> brief
>>>>> summary of what we've talked about offline. Robert, please feel free to
>>>>> correct me if I've misrepresented any of that conversation!
>>>>>
>>>>> *Robert's tl;dr*
>>>>>
>>>>> > I'd also like to emphasize that we gain 99.99999...% of the code
>>>>> generator benefit by covering the ProcessElement calls.
>>>>> > No other method is called similarly often by the system (on every
>>>>> element). Perfect is the enemy of Good Enough.
>>>>>
>>>>> I think this is actually fundamentally where we diverged (more on that
>>>>> below) - I've been treating coverage of all structural DoFn methods as a
>>>>> requirement for a code generator replacement, Robert has not. More on that
>>>>> below.
>>>>>
>>>>> *1. Problem: Users have to determine which Registration function to
>>>>> use.*
>>>>>
>>>>> *1.a)* I think that perhaps there has been a breakdown of terminology
>>>>> here. When I say on the order of 1000s of DoFn variants, I'm not just
>>>>> talking about DoFn1x0 to DoFn7x4. If that is all we need to worry about I
>>>>> agree that's not a problem. I'm referring to the section of my doc where I
>>>>> discuss the need for a proliferation of interfaces to support each
>>>>> structural DoFn method such that a user can pass in a DoFn of the exact
>>>>> right interface. See
>>>>> https://docs.google.com/document/d/1imYbBeu2FNJkwPNm6E9GEJkjpHnHscvFoKAE6AISvFA/edit#heading=h.e24tsm6ev6f0
>>>>> where I discuss the problem - specifically, it culminates in "To use
>>>>> this function, the user would be required to know which of the many
>>>>> DoFn variants they should using...".
>>>>>
>>>>> It's fine if you disagree with that (and I think you have brought up
>>>>> some valid points about ways to reduce the complexity by composing
>>>>> interfaces), but I'm heavily relying on shared understanding of that doc
>>>>> for contextualizing this conversation.
>>>>>
>>>>> *1.b) *Sorry that was confusing, you are correct that I've been
>>>>> abusing the term cast when really what I've meant in most cases is a user
>>>>> forcing type inference (e.g. `foo.(Bar)`)
>>>>>
>>>>> *1.c) *Agreed if there are ~30 types of DoFn reasonably named, I
>>>>> think this becomes exceedingly inconvenient if there is indeed an 
>>>>> explosion
>>>>> of types into the hundreds or thousands. I don't think we disagree here,
>>>>> this point mostly reduces to (1.a) - please correct me if I'm wrong.
>>>>>
>>>>> *1.d) *That's true, we could also have it print out arbitrary
>>>>> registration code though - I don't think that's a particularly compelling
>>>>> user experience either way.
>>>>>
>>>>> *2. Feasibility.*
>>>>>
>>>>> *2.a) *To be honest, I just flat out disagree here - like I said
>>>>> above, your example doesn't touch the actual user facing function which is
>>>>> the hard part of the problem IMO. I think it starts to show a path towards
>>>>> simplifying the code generator (which I don't think we have any
>>>>> disagreement on the feasibility of), but I'd appreciate some justification
>>>>> for your statement that it demonstrates feasibility in most cases.
>>>>>
>>>>> As an aside, I would say that if you think you had a basically working
>>>>> example (or could come up with one very quickly), I would've appreciated
>>>>> you sharing that as a resource associated with your 1 pager (not
>>>>> necessarily presented as *the solution*, but as a possible approach).
>>>>>
>>>>> *2.b) and 2.c) *I won't touch this too much here because I think this
>>>>> has been explored a lot more in our offline conversation (summarized
>>>>> below). At a high level, I agree, it's just a matter of which tradeoffs we
>>>>> want to make. As mentioned above, I've been treating coverage of all
>>>>> structural DoFn methods as a requirement for a code generator replacement
>>>>> (largely in service of "It should be performance neutral" from the
>>>>> original one-pager, but also because I generally think that's the right
>>>>> approach).
>>>>>
>>>>> *3. Justifications for the work*
>>>>>
>>>>> I don't have much to add here, other than asking that this kind of
>>>>> context be included in future one-pagers. Context like this is both really
>>>>> valuable to me as a newer contributor and really hard to find on my own if
>>>>> it isn't written down. Thanks for sharing it now!
>>>>>
>>>>> With this context, I'm generally on board with at least using generics
>>>>> to clean up some of these issues with the generator. I don't foresee the
>>>>> same set of problems I've described in my doc keeping us from simplifying
>>>>> the generator.
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>
>>>>> *Summary of offline conversation*
>>>>>
>>>>> As alluded to above, I believe the fundamental difference in how
>>>>> Robert and I have been seeing this problem is one of requirements: I've
>>>>> been treating coverage of all structural DoFn methods as a requirement for
>>>>> a code generator replacement, Robert has not. If you remove that
>>>>> requirement and just focus on the easier ones to optimize (including
>>>>> ProcessElement, all the other non-sdf functions, and a subset of the sdf
>>>>> functions), the main problems presented in my doc go away.
>>>>>
>>>>> A huge piece of Robert's reasoning is looking at the frequency with
>>>>> which these functions will be invoked. ProcessElement gets invoked way 
>>>>> more
>>>>> often than the rest of the structural DoFn methods, so we should focus on
>>>>> optimizing for that and not worry as much about the rest.
>>>>>
>>>>> I currently think that it is a good idea to continue to leave the door
>>>>> open for optimizing all the structural DoFn methods. The one I'm most
>>>>> worried about is watermark estimation, which doesn't actually exist yet in
>>>>> the Go Sdk - I think its likely that it will require invocation of at 
>>>>> least
>>>>> 1 structural DoFn function at the element or bundle level, and I don't 
>>>>> want
>>>>> to box ourselves out of being able to support performant watermark
>>>>> estimation (and thus fully performant full streaming support). Generally,
>>>>> the questions I'm interested in are:
>>>>>
>>>>> 1) Can we answer definitively at what level we will need structural
>>>>> DoFn invocation for watermark estimation 2) If its at the bundle/element
>>>>> level does that preclude us from a generics approach that doesn't fully
>>>>> optimize for that 3) If its unclear, should we prioritize watermark
>>>>> estimation design (and maybe implementation) ahead of generics/code
>>>>> generator
>>>>>
>>>>> While Robert has been framing it as:
>>>>>
>>>>> 4) Does a feature, like Watermark estimation, require us to do
>>>>> something different with the generic code generator such that we need to
>>>>> put off working on it now?
>>>>>
>>>>> I'm also still not 100% convinced that optimizing only at the
>>>>> bundle/element level is the right approach, though I'm probably moving in
>>>>> that direction.
>>>>>
>>>>> I would love input from others here if anyone has thoughts or context
>>>>> that is missing from this conversation! Otherwise, I'm sure Robert and I
>>>>> will continue our offline conversations and drive towards a solution.
>>>>>
>>>>> Thanks,
>>>>> Danny
>>>>>
>>>>> On Wed, Mar 16, 2022 at 2:18 PM Robert Burke <rob...@frantil.com>
>>>>> wrote:
>>>>>
>>>>>> Great questions!
>>>>>>
>>>>>> I think, ultimately, I was too coy in my one pager, and that's
>>>>>> leading to the disconnect we're having. I go into much more details here.
>>>>>>
>>>>>> *tl;dr;*
>>>>>>
>>>>>> I didn't emphasize the issues with the current generator in my doc.
>>>>>> Naming is a *huge* problem, and a generic implementation of the
>>>>>> generated code avoids most of it, and a total replacement avoids all of 
>>>>>> it.
>>>>>>
>>>>>> I'd also like to emphasize that we gain 99.99999...% of the code
>>>>>> generator benefit by covering the ProcessElement calls.
>>>>>> No other method is called similarly often by the system (on every
>>>>>> element). Perfect is the enemy of Good Enough.
>>>>>>
>>>>>> It's certainly good enough if more users are able to benefit from it,
>>>>>> rather than those with infrastructure, or the rigor to constantly update
>>>>>> their generated code.  Users need to register their DoFns to make things
>>>>>> work anyway, they might as well get a performance boost out of it.
>>>>>>
>>>>>> *Longer version*
>>>>>>
>>>>>> I find endlessly inlined points and counterpoints to be very hard to
>>>>>> read, so I'm enumerating them so they can be referred to.
>>>>>>
>>>>>> *0.* Lets avoid discussing using Generics for KVs, Iterators,
>>>>>> Emitters. etc. That convenience is orthogonal to the code generator
>>>>>> concerns. They are related only by generics, and the code generator is a
>>>>>> big enough topic by itself.  We can discuss it elsewhere. We gain nothing
>>>>>> by conflating the two and scope creeping to include them at this time.
>>>>>> * 0.a )* Unless there's a compelling argument to address them now,
>>>>>> that I'm missing, of course.
>>>>>>
>>>>>> *1. Problem: Users have to determine which Registration function to
>>>>>> use.*
>>>>>>
>>>>>> *1.a)* I think "thousands" is overstating the issue. For the
>>>>>> "supported" variants for which we do call time optimization, it's from 
>>>>>> 1x0
>>>>>> to 7x4. So about 28-30 Registrations for structural DoFns, and an
>>>>>> equivalent set for functional DoFns. A big number, but certainly not
>>>>>> thousands.
>>>>>>
>>>>>> Recall the generics largely are unaware of which specific types they
>>>>>> actually need to deal with.  It's not clear to me where the explosion is
>>>>>> coming from, considering the other life cycle methods can be in related,
>>>>>> but independent interfaces the user doesn't need to be aware of.
>>>>>>
>>>>>> The code generator (and most of the rest of the invoker framework) is
>>>>>> unaware of whether a parameter is a context, or a window, or a value, or 
>>>>>> an
>>>>>> iterator, or an emitter. It just type asserts, and
>>>>>>
>>>>>> *1.b)* You keep mentioning "casting". Per my example, there's no
>>>>>> "casting", just selecting the right registration function, and specifying
>>>>>> types. Please clarify if that's what you're referring to, as Go doesn't
>>>>>> have "casting" as a concept. There are type conversions[0] for most 
>>>>>> values
>>>>>> of identical memory layout, and type assertions[1] for interfaces.
>>>>>>
>>>>>> *1.c) *I think users are capable of picking the right registration
>>>>>> method and filling in the types, if they are provide with clear examples 
>>>>>> in
>>>>>> the SDK, and documentation. Users will go through a fair amount in order 
>>>>>> to
>>>>>> squeeze out some performance.
>>>>>>
>>>>>> *1.d) *If all else fails, with a generic method, we can have the SDK
>>>>>> litterally print out the registration code for the users.
>>>>>>
>>>>>> *2. Feasibility.*
>>>>>>
>>>>>> *2.a)* I feel my example demonstrates the feasibility for *most
>>>>>> cases*.
>>>>>>
>>>>>> *2.b)*There are certainly areas that are currently lacking. This is
>>>>>> also true of the code generator as is. It doesn't support map side inputs
>>>>>> in any form right now. Either we move to generics and stop asking the
>>>>>> question, or we need to fix it for every complex function type we accept 
>>>>>> as
>>>>>> a parameter.
>>>>>>
>>>>>> This is a naming problem that is avoided entirely with a generic
>>>>>> implementation.
>>>>>>
>>>>>> *2.c)* Even if we can't or don't initially support SplittableDoFns
>>>>>> (which are not currently supported properly anyway), or CombineFns (which
>>>>>> are), there will always be a "just one more" problem. That's OK. Most 
>>>>>> DoFns
>>>>>> are much simpler, and can and should get the performance benefits. 
>>>>>> Perfect
>>>>>> is the enemy of Good Enough.
>>>>>>
>>>>>> *3. Justifications for the work*
>>>>>>
>>>>>> *3.a) *I'm only aware of one user for the code generator at present,
>>>>>> and that's presently for Google internal use.
>>>>>>
>>>>>>  It relies on Bazel rules that users need to specify instead of the
>>>>>> usual "go_library" or "go_binary" rules by default. Already this is an
>>>>>> encumbrance.  Static analysis is never straightforward, so users must 
>>>>>> still
>>>>>> manually list their dofns in those BUILD files, outside of their code. 
>>>>>> The
>>>>>> main benefit of this approach is users can be ignorant of the .shims.go
>>>>>> file we're adding to their package.
>>>>>>
>>>>>> Through this usage, issues with the code generator have been
>>>>>> discovered:
>>>>>>
>>>>>> * Separate from the code, and not type checked -> Their IDEs can't
>>>>>> assist with typos and similar, but with an in-code generic solution this 
>>>>>> is
>>>>>> resolved.  A work around exists by having the static analysis attempt to
>>>>>> figure things out, but writing clear error messages in this case is
>>>>>> non-obvious, and generally separated from the code, which would need to 
>>>>>> be
>>>>>> corrected in the separate file.
>>>>>> * It's not possible to use the code generator for both package code,
>>>>>> and test code, as there are naming collisions around iterators and
>>>>>> emitters, as they generate in separate shim files in the same package.
>>>>>> * The code generator needs to determine and provide imports for the
>>>>>> shims files for the various user types. This is not entirely trivial, and
>>>>>> also runs into name collisions for the short package names.
>>>>>> * See newer iterator/side input types issue above.
>>>>>>
>>>>>> These could all be fixed in the code generator, but in the end, we'd
>>>>>> largely be replicating what the compiler could be doing for us instead, 
>>>>>> if
>>>>>> the users were calling functions in their own code files.
>>>>>>
>>>>>> These problems are not exclusive to Googles internal approach to
>>>>>> using the generator either.
>>>>>>
>>>>>> *3.b) *The Code generator by itself isn't great to use for open
>>>>>> source.
>>>>>>
>>>>>> * Requires an explicit `go generate` call by the user, which
>>>>>> generates a file they should largely not touch, until it causes a problem
>>>>>> with compilation.
>>>>>> * It's presently loosely versioned, so we can't enforce that users
>>>>>> re-generate when they update their code, even if it benefits them.
>>>>>> * The errors and code it generates are inscrutable.
>>>>>>
>>>>>> *3.c)* Benefits of moving to generic generation from static
>>>>>> generation
>>>>>>
>>>>>> Assuming we determine users won't want to write the registration
>>>>>> calls themselves, then we still need to fix the above issues. I posit 
>>>>>> that
>>>>>> generating a file that contains generic function calls is much better 
>>>>>> than
>>>>>> what we currently have, which uses arbitrary unicode symbols to avoid
>>>>>> collisions, and similar.
>>>>>>
>>>>>> Should we move to generating calls to generic methods, we avoid all
>>>>>> naming problems, except for the import short name issue.
>>>>>>
>>>>>> This alone is worth it in the avoided edge cases.
>>>>>>
>>>>>> Robert Burke
>>>>>> Beam Go Busybody
>>>>>>
>>>>>> [0] https://go.dev/doc/effective_go#conversions
>>>>>> [1] https://go.dev/doc/effective_go#interface_conversions
>>>>>>
>>>>>>
>>>>>> On Wed, 16 Mar 2022 at 07:00, Danny McCormick <
>>>>>> dannymccorm...@google.com> wrote:
>>>>>>
>>>>>>> Hey Robert, thanks for the thoughts - I replied to your thoughts in
>>>>>>> the doc, but will add some thoughts here as well:
>>>>>>>
>>>>>>> > I agree that it's going to be considered inconvenient for users
>>>>>>> to manually specify types at this stage.
>>>>>>> > I think it's a substantial improvement over the status quo, simply
>>>>>>> because it removes a step in the iteration cycle when a change needs to 
>>>>>>> be
>>>>>>> made.
>>>>>>> > It has the compiler do the work for us mostly.
>>>>>>>
>>>>>>> I think this is understating the problem significantly. The issue as
>>>>>>> described in the doc isn't just asking users to specify types in their
>>>>>>> registration call, it's that they would need to find the correct 
>>>>>>> interface
>>>>>>> to cast their function to from many generated DoFn variants (on the 
>>>>>>> order
>>>>>>> of thousands) - then they would need to specify types, which is the easy
>>>>>>> part. If we're able to reduce the problem to users specifying types 
>>>>>>> then I
>>>>>>> wholeheartedly agree we should move forward with that approach.
>>>>>>>
>>>>>>> > I don't think the doc went far enough. See a working sketch of my
>>>>>>> thoughts here: https://go.dev/play/p/Bwf4eSavNxt?v=gotip
>>>>>>>
>>>>>>> and
>>>>>>>
>>>>>>> > The doc covers it's explored options very well, but it leaves out
>>>>>>> the possibility of getting halfway there:
>>>>>>> https://go.dev/play/p/Bwf4eSavNxt?v=gotip
>>>>>>>
>>>>>>> It's not 100% clear to me what you're getting at here, but I do
>>>>>>> think that this falls well short of the target of this investigation and
>>>>>>> fails to address what is actually the sticking point explained in the 
>>>>>>> doc
>>>>>>> (the user experience of trying to find the right function from DoFn
>>>>>>> variants) - this doesn't touch the actual user facing registration 
>>>>>>> function
>>>>>>> which is the hard part IMO. If your point is that we might be able to
>>>>>>> leverage generics to simplify/improve our code generator, I mention that
>>>>>>> below - if not, could you help me understand what you're suggesting?
>>>>>>>
>>>>>>> > And finally, even if inference is ultimately the sticking point:
>>>>>>> We should still rewrite the code generator to generate generic code for 
>>>>>>> us,
>>>>>>> since it moves the harder parts (de-duplication, name collisions in 
>>>>>>> types
>>>>>>> if used in both package code, and test code) into the Go compiler, 
>>>>>>> rather
>>>>>>> than in our code.
>>>>>>>
>>>>>>> I don't necessarily disagree - just simplifying/improving the code
>>>>>>> generator with generics (along with generic KVs and other user facing
>>>>>>> improvements) was out of the scope of the doc, which was targeted
>>>>>>> specifically at whether we can fully replace the code generator. I
>>>>>>> intentionally targeted that because it was the clearest value add for 
>>>>>>> users
>>>>>>> and definitely seemed to justify the investment - it is not as 
>>>>>>> immediately
>>>>>>> clear to me how important replacing pieces of the code generator with
>>>>>>> generics is and whether we should prioritize that work immediately, 
>>>>>>> save it
>>>>>>> for later, or wait and see if the type inference blocker is something 
>>>>>>> the
>>>>>>> Go team decides to pick up.
>>>>>>>
>>>>>>> I think that's a discussion worth having, but it only matters if
>>>>>>> replacing the code generator is not actually feasible, so until we have
>>>>>>> consensus on that I'd prefer to hold off on going too far down that 
>>>>>>> path.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Danny
>>>>>>>
>>>>>>> On Tue, Mar 15, 2022 at 8:31 PM Robert Burke <rob...@frantil.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> The problem with being on vacation for a week is so many cool
>>>>>>>> things come up while you're gone that you can't get ahead of...
>>>>>>>>
>>>>>>>> Thank you Danny for the initial write up! I hope the exercise
>>>>>>>> helped you understand more about the SDK and Go, but I think there's 
>>>>>>>> more
>>>>>>>> to do here.
>>>>>>>>
>>>>>>>> *tl;dr; *
>>>>>>>> 1. I don't think the doc went far enough. See a working sketch of
>>>>>>>> my thoughts here: https://go.dev/play/p/Bwf4eSavNxt?v=gotip
>>>>>>>> 2. Agreed that supporting generic KVs should still be possible, but
>>>>>>>> it's an orthogonal investigation to the code generator (and its 
>>>>>>>> generated
>>>>>>>> code) question.
>>>>>>>>  I won't discuss that further, as I have a different one-pager in
>>>>>>>> the works on that topic, and that's a very very different thread.
>>>>>>>>
>>>>>>>> *On the Code Generator*
>>>>>>>> Overall, the goal is to totally do-away with the current approach
>>>>>>>> of pre-generating. But I don't think a lack of inference is an 
>>>>>>>> obstacle to
>>>>>>>> that goal.
>>>>>>>>
>>>>>>>> I agree that it's going to be considered inconvenient for users
>>>>>>>> to manually specify types at this stage.
>>>>>>>> I think it's a substantial improvement over the status quo, simply
>>>>>>>> because it removes a step in the iteration cycle when a change needs 
>>>>>>>> to be
>>>>>>>> made.
>>>>>>>> It has the compiler do the work for us mostly.
>>>>>>>>
>>>>>>>> The doc covers it's explored options very well, but it leaves out
>>>>>>>> the possibility of getting halfway there:
>>>>>>>> https://go.dev/play/p/Bwf4eSavNxt?v=gotip
>>>>>>>>
>>>>>>>> To cover the varieties of expected arities, we should generate the
>>>>>>>> SDK side registration code in order to avoid missing cases, and ensure 
>>>>>>>> even
>>>>>>>> coverage.
>>>>>>>>
>>>>>>>> And finally, even if inference is ultimately the sticking point: We
>>>>>>>> should still rewrite the code generator to generate generic code for 
>>>>>>>> us,
>>>>>>>> since it moves the harder parts (de-duplication, name collisions in 
>>>>>>>> types
>>>>>>>> if used in both package code, and test code) into the Go compiler, 
>>>>>>>> rather
>>>>>>>> than in our code.
>>>>>>>>
>>>>>>>> And who knows? Perhaps in Go 1.19 or 1.20, the inferences Danny
>>>>>>>> proposed become possible, and we can clean things up further.
>>>>>>>>
>>>>>>>> Robert Burke,
>>>>>>>> Beam Go Busybody
>>>>>>>>
>>>>>>>> On Fri, 11 Mar 2022 at 06:03, Danny McCormick <
>>>>>>>> dannymccorm...@google.com> wrote:
>>>>>>>>
>>>>>>>>> > Can generics be used to at least get rid of the oddities of KV
>>>>>>>>> handling?
>>>>>>>>>
>>>>>>>>> Good question - my honest answer is that I don't know: this
>>>>>>>>> doc/investigation was specifically focused on replacing the code 
>>>>>>>>> generator
>>>>>>>>> and generics not being useful in this context does not mean they 
>>>>>>>>> can't be
>>>>>>>>> useful elsewhere.
>>>>>>>>>
>>>>>>>>> My speculative answer (without having a full familiarity with how
>>>>>>>>> we currently handle KVs or the associated problems) is that I don't 
>>>>>>>>> *think
>>>>>>>>> *the core blocker keeping us from replacing the code generator
>>>>>>>>> applies here. The big issue we're facing here is that we can't infer
>>>>>>>>> generic interface types based on the parameters of method functions 
>>>>>>>>> of a
>>>>>>>>> struct. In the case of KV, we would be inferring the interface types 
>>>>>>>>> based
>>>>>>>>> on the types of member variables of the struct, which Go should be 
>>>>>>>>> able to
>>>>>>>>> do.
>>>>>>>>>
>>>>>>>>> All that to say, in no way should this investigation preclude us
>>>>>>>>> from looking for other ways to apply generics to make our lives easier
>>>>>>>>> (whether that be in KV handling or improving other experiences).
>>>>>>>>>
>>>>>>>>> On Thu, Mar 10, 2022 at 6:55 PM Robert Bradshaw <
>>>>>>>>> rober...@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks for the writeup. That's too bad. Can generics be used to
>>>>>>>>>> at least get rid of the oddities of KV handling?
>>>>>>>>>>
>>>>>>>>>> On Thu, Mar 10, 2022 at 3:40 PM Luke Cwik <lc...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Well written doc and unfortunate that Generics seem to not work
>>>>>>>>>>> out for us.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Mar 10, 2022 at 11:03 AM Jack McCluskey <
>>>>>>>>>>> jrmcclus...@google.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Well, looks like Go generics weren't quite the magic bullet we
>>>>>>>>>>>> were hoping for in their current state. We'll get that code 
>>>>>>>>>>>> generator
>>>>>>>>>>>> ripped out one day, but not today.
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Mar 10, 2022 at 12:44 PM Danny McCormick <
>>>>>>>>>>>> dannymccorm...@google.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hey folks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Right now, the Go Sdk relies on users to generate code as a
>>>>>>>>>>>>> pre-compile step to avoid some overhead of excessive runtime 
>>>>>>>>>>>>> reflection.
>>>>>>>>>>>>> This is suboptimal because it relies on users to regenerate code 
>>>>>>>>>>>>> every time
>>>>>>>>>>>>> they create a DoFn, causing friction to users and slowness when 
>>>>>>>>>>>>> they don't
>>>>>>>>>>>>> remember to regenerate (among other reasons). @lostluck wrote a 
>>>>>>>>>>>>> one pager
>>>>>>>>>>>>> around replacing this code generator with generics - after 
>>>>>>>>>>>>> spending pieces
>>>>>>>>>>>>> of the last couple days digging in my takeaway is that it's not 
>>>>>>>>>>>>> doable in a
>>>>>>>>>>>>> way that benefits the Sdk at this time and I wanted to share my 
>>>>>>>>>>>>> findings. I
>>>>>>>>>>>>> wrote them up here -
>>>>>>>>>>>>> https://docs.google.com/document/d/1imYbBeu2FNJkwPNm6E9GEJkjpHnHscvFoKAE6AISvFA/edit?usp=sharing
>>>>>>>>>>>>>  -
>>>>>>>>>>>>> please take a look if you're interested, and definitely let me 
>>>>>>>>>>>>> know if I'm
>>>>>>>>>>>>> missing anything!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Danny
>>>>>>>>>>>>>
>>>>>>>>>>>>

Reply via email to