scwhittle commented on PR #36631:
URL: https://github.com/apache/beam/pull/36631#issuecomment-3526736892

   > Thanks, had a few comments. Would you mind updating the PR description 
noting which scenario this PR is supposed to fix?
   
   Sorry I clarified what this PR does in the description and how our 
expectations of when setup/teardown are called were incorrect.  So this fixes 
some missing abort calls, but those calls ended up being no-ops for 
ParDoOperation (maybe needed for other operation types though).  This PR adds 
testing validating reuse of dofns.
   
   > 
   > Reading [#18592 
(comment)](https://github.com/apache/beam/issues/18592#issuecomment-3428871902) 
it seems this fixed teardown on setup exception on Dataflow streaming runner. 
Would it also apply to batch? However I tested this PR with my old test case 
for #31381, currently the behavior appears remain the same as HEAD. In 
particular, I have 3 fused DoFn's, and throw in DoFn2's setup.
   > 
   > For batch, only DoFn2's teardown is invoked. However I also notice 
previously setup DoFn not re-setup as well. e.g.
   > 
   > ```
   > @Setup3 null, 743 at 277fbd5a
   > @Setup2 null, 510 at 67875efd
   > @Teardown2 510 at 67875efd
   > Failure processing work item 
google.com:clouddfe;2025-11-12_09_44_38-12709838531006866747;5913875962390955537:
 Uncaught exception occurred during work unit execution. This will be retried.
   > Finished processing stage s01 with 1 errors in 0.87 seconds
   > Starting MapTask stage s01
   > @Setup2 null, 399 at 145c86d
   > @Setup null, 309 at 23ddbaee
   > @StartBundle3 743 at 277fbd5a
   > ...
   > ```
   > 
   > (job id `2025-11-12_09_44_38-12709838531006866747`)
   > 
   > for streaming the runner creates 20 DoFn instances at one time, one 
crashed at setup and called its teardown, but did not re-setup a DoFn2 (I still 
get 20 counts for DoFn2 setup) (job id 
`2025-11-12_11_32_33-12848924584850197035`)
   
   Hopefully this is clarified by the updated PR description. This is actually 
the expected and good behavior of the code. We don't need to throw away DoFn2 
in the case DoFn had an error setup. The dofns are independent at this point 
and we haven't interacted in such a way with DoFn2 instance that we can't reuse 
it. We are still respecting the lifecycle of DoFn2 and instead of 
teardown/resetup, we just cache and reuse the Dofn2 instance when we are 
recreating a new MapTaskExecutor.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to