Tom van Bussel created SPARK-52312:
--------------------------------------

             Summary: Caching AppendData plan causes data to be inserted twice
                 Key: SPARK-52312
                 URL: https://issues.apache.org/jira/browse/SPARK-52312
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 4.0.0
            Reporter: Tom van Bussel


We’ve identified an issue where a {{DataFrame}} created from an {{INSERT}} SQL 
statement and then cached will cause the {{INSERT}} to be executed twice. This 
happens because the logical plan for the {{INSERT}} ({{{}AppendData{}}}) 
doesn’t extend the {{IgnoreCachedData}} trait, so it isn’t ignored during 
caching as expected. As a result, the plan is cached and re-executed. We should 
fix this by ensuring that plans used by {{INSERT}} all extend the 
{{IgnoreCachedData}} trait.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to