scwhittle commented on PR #36631: URL: https://github.com/apache/beam/pull/36631#issuecomment-3526736892
> Thanks, had a few comments. Would you mind updating the PR description noting which scenario this PR is supposed to fix? Sorry I clarified what this PR does in the description and how our expectations of when setup/teardown are called were incorrect. So this fixes some missing abort calls, but those calls ended up being no-ops for ParDoOperation (maybe needed for other operation types though). This PR adds testing validating reuse of dofns. > > Reading [#18592 (comment)](https://github.com/apache/beam/issues/18592#issuecomment-3428871902) it seems this fixed teardown on setup exception on Dataflow streaming runner. Would it also apply to batch? However I tested this PR with my old test case for #31381, currently the behavior appears remain the same as HEAD. In particular, I have 3 fused DoFn's, and throw in DoFn2's setup. > > For batch, only DoFn2's teardown is invoked. However I also notice previously setup DoFn not re-setup as well. e.g. > > ``` > @Setup3 null, 743 at 277fbd5a > @Setup2 null, 510 at 67875efd > @Teardown2 510 at 67875efd > Failure processing work item google.com:clouddfe;2025-11-12_09_44_38-12709838531006866747;5913875962390955537: Uncaught exception occurred during work unit execution. This will be retried. > Finished processing stage s01 with 1 errors in 0.87 seconds > Starting MapTask stage s01 > @Setup2 null, 399 at 145c86d > @Setup null, 309 at 23ddbaee > @StartBundle3 743 at 277fbd5a > ... > ``` > > (job id `2025-11-12_09_44_38-12709838531006866747`) > > for streaming the runner creates 20 DoFn instances at one time, one crashed at setup and called its teardown, but did not re-setup a DoFn2 (I still get 20 counts for DoFn2 setup) (job id `2025-11-12_11_32_33-12848924584850197035`) Hopefully this is clarified by the updated PR description. This is actually the expected and good behavior of the code. We don't need to throw away DoFn2 in the case DoFn had an error setup. The dofns are independent at this point and we haven't interacted in such a way with DoFn2 instance that we can't reuse it. We are still respecting the lifecycle of DoFn2 and instead of teardown/resetup, we just cache and reuse the Dofn2 instance when we are recreating a new MapTaskExecutor. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
