rdblue opened a new pull request #28993:
URL: https://github.com/apache/spark/pull/28993


   ### What changes were proposed in this pull request?
   
   When converting an `INSERT OVERWRITE` query to a v2 overwrite plan, Spark 
attempts to detect when a dynamic overwrite and a static overwrite will produce 
the same result so it can use the static overwrite. Spark incorrectly detects 
when dynamic and static overwrites are equivalent when there are hidden 
partitions, such as `days(ts)`.
   
   This updates the analyzer rule `ResolveInsertInto` to always use a dynamic 
overwrite when the mode is dynamic, and static when the mode is static. This 
avoids the problem by not trying to determine whether the two plans are 
equivalent and always using the one that corresponds to the partition overwrite 
mode.
   
   ### Why are the changes needed?
   
   This is a correctness bug. If a table has hidden partitions, all of the 
values for those partitions are dropped instead of dynamically overwriting 
changed partitions.
   
   This only affects SQL commands (not `DataFrameWriter`) writing to tables 
that have hidden partitions.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, it fixes the correctness bug detailed above.
   
   ### How was this patch tested?
   
   * This updates the in-memory table to support a hidden partition transform, 
`days`, and adds a test case to `DataSourceV2SQLSuite` in which the table uses 
this hidden partition function. This test fails without the fix to 
`ResolveInsertInto`.
   * This updates the test case `InsertInto: overwrite - multiple static 
partitions - dynamic mode` in `InsertIntoTests`. The result of the SQL command 
is unchanged, but the SQL command will now use a dynamic overwrite so the test 
now uses `dynamicOverwriteTest`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to