Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/17540
  
    @srowen, agreed. Closely related but not the same code paths. The question 
is: when should `withNewExecutionId` get called?
    
    I'm running the test suite now and this patch causes test failures when 
`withNewExecutionId` is called twice; once in `DataFrameWriter` and once in 
`InsertIntoHadoopFsRelationCommand`. It looks like the call has now been 
littered about the codebase (e.g. in `InsertIntoHadoopFsRelationCommand` and 
other execution nodes) to fix this problem on certain operations, so we should 
decide where it should be used and fix tests around that.
    
    The reason why I added it to `DataFrameWriter` is that it is called in 
`Dataset` actions, and it makes sense to call it once from where an action is 
started. I think it makes the most sense for action methods, like 
`Dataset#collect` or `DataFrameWriter#insertInto` to minimize the number of 
places we need to add it. I don't think this is a concern that should be 
addressed by the execution plan.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to