[jira] [Commented] (SPARK-18544) Append with df.saveAsTable writes data to wrong location

2016-11-22 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15688471#comment-15688471
 ] 

Apache Spark commented on SPARK-18544:
--

User 'ericl' has created a pull request for this issue:
https://github.com/apache/spark/pull/15983

> Append with df.saveAsTable writes data to wrong location
> 
>
> Key: SPARK-18544
> URL: https://issues.apache.org/jira/browse/SPARK-18544
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Eric Liang
>Priority: Blocker
>
> When using saveAsTable in append mode, data will be written to the wrong 
> location for non-managed Datasource tables. The following example illustrates 
> this.
> It seems somehow pass the wrong table path to InsertIntoHadoopFsRelation from 
> DataFrameWriter. Also, we should probably remove the repair table call at the 
> end of saveAsTable in DataFrameWriter. That shouldn't be needed in either the 
> Hive or Datasource case.
> {code}
> scala> spark.sqlContext.range(100).selectExpr("id", "id as A", "id as 
> B").write.partitionBy("A", "B").mode("overwrite").parquet("/tmp/test")
> scala> sql("create table test (id long, A int, B int) USING parquet OPTIONS 
> (path '/tmp/test') PARTITIONED BY (A, B)")
> scala> sql("msck repair table test")
> scala> sql("select * from test where A = 1").count
> res6: Long = 1
> scala> spark.sqlContext.range(10).selectExpr("id", "id as A", "id as 
> B").write.partitionBy("A", "B").mode("append").saveAsTable("test")
> scala> sql("select * from test where A = 1").count
> res8: Long = 1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18544) Append with df.saveAsTable writes data to wrong location

2016-11-22 Thread Eric Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15687884#comment-15687884
 ] 

Eric Liang commented on SPARK-18544:


cc [~yhuai] [~cloud_fan] I'll try to look at this today but might not get to it.

> Append with df.saveAsTable writes data to wrong location
> 
>
> Key: SPARK-18544
> URL: https://issues.apache.org/jira/browse/SPARK-18544
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Eric Liang
>
> When using saveAsTable in append mode, data will be written to the wrong 
> location for non-managed Datasource tables. The following example illustrates 
> this.
> It seems somehow pass the wrong table path to InsertIntoHadoopFsRelation from 
> DataFrameWriter. Also, we should probably remove the repair table call at the 
> end of saveAsTable in DataFrameWriter. That shouldn't be needed in either the 
> Hive or Datasource case.
> {code}
> scala> spark.sqlContext.range(1).selectExpr("id", "id as A", "id as 
> B").write.partitionBy("A", "B").mode("overwrite").parquet("/tmp/test_10k")
> scala> sql("msck repair table test_10k")
> scala> sql("select * from test_10k where A = 1").count
> res6: Long = 1
> scala> spark.sqlContext.range(10).selectExpr("id", "id as A", "id as 
> B").write.partitionBy("A", "B").mode("append").parquet("/tmp/test_10k")
> scala> sql("select * from test_10k where A = 1").count
> res8: Long = 1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org