hjtran commented on issue #36214: URL: https://github.com/apache/beam/issues/36214#issuecomment-3321040104
> 2. Enforcing that you only persist GBK'd data (this is how most runners work/checkpoint already AFAIK). The Schrodinger SeamRunner doesn't work this way. Many of the SeamRunner stage boundaries are just from GBKs but there are many stage boundaries that arise from incompatible environments as well. Not sure how other runners handle stage boundaries - do they just pipe together data streams directly between workers? > An alternative would be to eventually offer both and let the runner choose the mode that works for them. If we had an API that specifies how secrets are determined as a PipelineOption, then only the only-GBK-persisting runners could replace GBKs with GBEKs and other runners could use the secret in all places where they persist pcollections. I think a main concern here for me is the addition of a new transform that blurs the SDK/runner boundary. When onboarding new developers onto Beam, the biggest hurdle I face is introducing `Reshuffle` since it breaks the promise of "As a pipeline author, you don't have to worry about the _how_ of execution" and GBEK may be another transform like `Reshuffle. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
