[jira] [Commented] (SPARK-27826) saveAsTable() function case table have "HiveFileFormat" "ParquetFileFormat" format issue

fengtlyer (JIRA) Mon, 27 May 2019 03:29:09 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16848811#comment-16848811
 ]


fengtlyer commented on SPARK-27826:
-----------------------------------

Hi Hyukjin,

Our team think this is a compatibility issue.

We are fully understand, if we use format("hive") this line of code would  
work, However, all of our actions should work with "parquet" format.

Why it is not parquet format when we use impala created parquet table?

If we use impala SQL query "stored as parquet" in the Hue, then we checked the 
HDFS, the files end with ".parq", but we can't use 
"write.format("parquet").mode("append").saveAsTable()"  to append this table.

We think there should be some compatibility issues.

> saveAsTable() function case table have "HiveFileFormat" "ParquetFileFormat" 
> format issue
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-27826
>                 URL: https://issues.apache.org/jira/browse/SPARK-27826
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0, 2.4.0
>         Environment: CDH 5.13.1 - Spark version 2.2.0.cloudera2
> CDH 6.1.1 - Spark version 2.4.0-cdh6.1.1
>            Reporter: fengtlyer
>            Priority: Minor
>
> Hi Spark Dev Team,
> We tested a few times and found this bug can reappearance in multi Spark 
> version
> We tested in CDH 5.13.1 - Spark version 2.2.0.cloudera2 and CDH 6.1.1 - Spark 
> version 2.4.0-cdh6.1.1
> Both of them have this bug:
> 1. If one table created by Impala or Hive in the HUE, then in Spark code, 
> "write.format("parquet").mode("append").saveAsTable()" will case the format 
> issue (see the below error log)
> 2. Hive/Impala in the HUE created table, then 
> "write.format("parquet").mode("overwrite").saveAsTable()", this code still 
> does not work.
>  2.1 Hive/Impala in the HUE created table, and 
> "write.format("parquet").mode("overwrite").saveAsTable()", then 
> "write.format("parquet").mode("append").saveAsTable()" can work.
> 3. Hive/Impala in the HUE created table, then "insertInto()" still will work.
>  3.1 Hive/Impala in the HUE created a table, and used "insertInto()" insert 
> some new record, then try to use 
> "write.format("parquet").mode("append").saveAsTable()", it will get the same 
> format error log
> 4. Created parquet table and insert some data by Hive shell, then 
> "write.format("parquet").mode("append").saveAsTable()" can insert data, but 
> spark only shows data which insert by spark, and Hive only show data which 
> hive insert.
> =========================================================================== 
> Error Log 
> ===========================================================================
> {code}
> spark.read.format("csv").option("sep",",").option("header","true").load("hdfs:///temp1/test_paquettest.csv").write.format("parquet").mode("append").saveAsTable("parquet_test_table")
> {code}
> {code}
> org.apache.spark.sql.AnalysisException: The format of the existing table 
> default.parquet_test_table is `HiveFileFormat`. It doesn't match the 
> specified format `ParquetFileFormat`.;
> at 
> org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:115)
> at 
> org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:75)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
> at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266)
> at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256)
> at 
> org.apache.spark.sql.execution.datasources.PreprocessTableCreation.apply(rules.scala:75)
> at 
> org.apache.spark.sql.execution.datasources.PreprocessTableCreation.apply(rules.scala:71)
> at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)
> at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)
> at 
> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
> at 
> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
> at scala.collection.mutable.ArrayBuffer.foldLeft(ArrayBuffer.scala:48)
> at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82)
> at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74)
> at 
> org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69)
> at 
> org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:67)
> at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:50)
> at 
> org.apache.spark.sql.execution.QueryExecution.withCachedData$lzycompute(QueryExecution.scala:73)
> at 
> org.apache.spark.sql.execution.QueryExecution.withCachedData(QueryExecution.scala:72)
> at 
> org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:78)
> at 
> org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:78)
> at 
> org.apache.spark.sql.execution.QueryExecution.completeString(QueryExecution.scala:220)
> at 
> org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:203)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
> at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:609)
> at org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:419)
> at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:398)
> at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:354)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27826) saveAsTable() function case table have "HiveFileFormat" "ParquetFileFormat" format issue

Reply via email to