Github user rdblue commented on the issue: https://github.com/apache/spark/pull/17540 @srowen, agreed. Closely related but not the same code paths. The question is: when should `withNewExecutionId` get called? I'm running the test suite now and this patch causes test failures when `withNewExecutionId` is called twice; once in `DataFrameWriter` and once in `InsertIntoHadoopFsRelationCommand`. It looks like the call has now been littered about the codebase (e.g. in `InsertIntoHadoopFsRelationCommand` and other execution nodes) to fix this problem on certain operations, so we should decide where it should be used and fix tests around that. The reason why I added it to `DataFrameWriter` is that it is called in `Dataset` actions, and it makes sense to call it once from where an action is started. I think it makes the most sense for action methods, like `Dataset#collect` or `DataFrameWriter#insertInto` to minimize the number of places we need to add it. I don't think this is a concern that should be addressed by the execution plan.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org