[ https://issues.apache.org/jira/browse/SPARK-16032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343029#comment-15343029 ]
Wenchen Fan edited comment on SPARK-16032 at 6/22/16 1:15 AM: -------------------------------------------------------------- I think it doesn't make sense to use `partitionBy` with `insertInto`, as we can not map `DataFrameWriter.insertInto` to SQL INSERT for 2 reasons: 1. `DataFrameWriter` doesn't support static partition 2. `DataFrameWriter` specifies the partition columns of the data to insert, not the table to be inserted. And it's already broken(mostly) in 1.6, according to the test cases at https://gist.github.com/cloud-fan/14ada3f2b3225b5db52ccaa12aacfbd4 , the only case that seems reasonable in 1.6 is when the data to insert has same schema with the table to be inserted and the `partitionBy` specifies the correct partition columns. But I think it's worth to break it and make the overall semantics more clear. Maybe we are wrong, it will be good if we come up with a clean semantics to explain the behavior of `DataFrame.insertInto`, but after spent a lot of time on it, we failed, and that's why we wanna make these changes and rush in into 2.0. was (Author: cloud_fan): I think it's nonsense to use `partitionBy` with `insertInto`, as we can not map `DataFrameWriter.insertInto` to SQL INSERT for 2 reasons: 1. `DataFrameWriter` doesn't support static partition 2. `DataFrameWriter` specifies the partition columns of the data to insert, not the table to be inserted. And it's already broken(mostly) in 1.6, according to the test cases at https://gist.github.com/cloud-fan/14ada3f2b3225b5db52ccaa12aacfbd4 , the only case that seems reasonable in 1.6 is when the data to insert has same schema with the table to be inserted and the `partitionBy` specifies the correct partition columns. But I think it's worth to break it and make the overall semantics more clear. Maybe we are wrong, it will be good if we come up with a clean semantics to explain the behavior of `DataFrame.insertInto`, but after spent a lot of time on it, we failed, and that's why we wanna make these changes and rush in into 2.0. > Audit semantics of various insertion operations related to partitioned tables > ----------------------------------------------------------------------------- > > Key: SPARK-16032 > URL: https://issues.apache.org/jira/browse/SPARK-16032 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0 > Reporter: Cheng Lian > Assignee: Wenchen Fan > Priority: Critical > Attachments: [SPARK-16032] Spark SQL table insertion auditing - > Google Docs.pdf > > > We found that semantics of various insertion operations related to partition > tables can be inconsistent. This is an umbrella ticket for all related > tickets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org