[ https://issues.apache.org/jira/browse/SPARK-31751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17141317#comment-17141317 ]
Apache Spark commented on SPARK-31751: -------------------------------------- User 'TJX2014' has created a pull request for this issue: https://github.com/apache/spark/pull/28882 > spark serde property path overwrites table property location > ------------------------------------------------------------ > > Key: SPARK-31751 > URL: https://issues.apache.org/jira/browse/SPARK-31751 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.1, 2.4.5 > Reporter: Nithin > Priority: Major > > This is an issue that have caused us so many data errors. > 1) using spark ( with hive context enabled ) > {code} > df = spark.createDataFrame([{"a": "x", "b": "y", "c": "3"}]) > df.write.format("orc").option("compression", > "ZLIB").mode("overwrite").saveAsTable('test_spark'); > {code} > > 2) from hive > {code} > alter table test_spark rename to test_spark2 > {code} > > 3)from spark-sql from command line ( note : not pyspark or spark-shell ) > {code} > select * from test_spark2 > {code} > > will give output > {code} > NULL NULL NULL > Time taken: 0.334 seconds, Fetched 1 row(s) > {code} > > This will throw NULL because , pyspark write API will add a serde property > called path into the hive metastore. when hive renames the table , it do not > understand this serde and hence keep it as it is. Now when spark-sql tries to > read it , it will honor the serde property first and then tries to read from > the non-existent hdfs location. If it had given an error , then also it would > have been fine , but throwing out NULL will cause applications to fail pretty > bad. Spark claims to support hive tables , hence it should respect hive > metastore location property rather than spark serde property when trying to > read a table. This cannot be classified as a expected behaviour. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org