hjtran commented on issue #36214:
URL: https://github.com/apache/beam/issues/36214#issuecomment-3321040104

   > 2. Enforcing that you only persist GBK'd data (this is how most runners 
work/checkpoint already AFAIK).
   
   The Schrodinger SeamRunner doesn't work this way. Many of the SeamRunner 
stage boundaries are just from GBKs but there are many stage boundaries that 
arise from incompatible environments as well. Not sure how other runners handle 
stage boundaries - do they just pipe together data streams directly between 
workers?
   
   > An alternative would be to eventually offer both and let the runner choose 
the mode that works for them.
   
   If we had an API that specifies how secrets are determined as a 
PipelineOption, then only the only-GBK-persisting runners could replace GBKs 
with GBEKs and other runners could use the secret in all places where they 
persist pcollections.
   
   I think a main concern here for me is the addition of a new transform that 
blurs the SDK/runner boundary. When onboarding new developers onto Beam, the 
biggest hurdle I face is introducing `Reshuffle` since it breaks the promise of 
"As a pipeline author, you don't have to worry about the _how_ of execution" 
and GBEK may be another transform like `Reshuffle.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to