[ https://issues.apache.org/jira/browse/SPARK-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17493316#comment-17493316 ]
fengtlyer commented on SPARK-27826: ----------------------------------- Hi Bandhu, sorry for late reply. At the end, we decide re-create all the table by one way (spark) and we didn't face this issue any more > saveAsTable() function case table have "HiveFileFormat" "ParquetFileFormat" > format issue > ---------------------------------------------------------------------------------------- > > Key: SPARK-27826 > URL: https://issues.apache.org/jira/browse/SPARK-27826 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.2.0, 2.4.0 > Environment: CDH 5.13.1 - Spark version 2.2.0.cloudera2 > CDH 6.1.1 - Spark version 2.4.0-cdh6.1.1 > Reporter: fengtlyer > Priority: Minor > > Hi Spark Dev Team, > We tested a few times and found this bug can reappearance in multi Spark > version > We tested in CDH 5.13.1 - Spark version 2.2.0.cloudera2 and CDH 6.1.1 - Spark > version 2.4.0-cdh6.1.1 > Both of them have this bug: > 1. If one table created by Impala or Hive in the HUE, then in Spark code, > "write.format("parquet").mode("append").saveAsTable()" will case the format > issue (see the below error log) > 2. Hive/Impala in the HUE created table, then > "write.format("parquet").mode("overwrite").saveAsTable()", this code still > does not work. > 2.1 Hive/Impala in the HUE created table, and > "write.format("parquet").mode("overwrite").saveAsTable()", then > "write.format("parquet").mode("append").saveAsTable()" can work. > 3. Hive/Impala in the HUE created table, then "insertInto()" still will work. > 3.1 Hive/Impala in the HUE created a table, and used "insertInto()" insert > some new record, then try to use > "write.format("parquet").mode("append").saveAsTable()", it will get the same > format error log > 4. Created parquet table and insert some data by Hive shell, then > "write.format("parquet").mode("append").saveAsTable()" can insert data, but > spark only shows data which insert by spark, and Hive only show data which > hive insert. > =========================================================================== > Error Log > =========================================================================== > {code} > spark.read.format("csv").option("sep",",").option("header","true").load("hdfs:///temp1/test_paquettest.csv").write.format("parquet").mode("append").saveAsTable("parquet_test_table") > {code} > {code} > org.apache.spark.sql.AnalysisException: The format of the existing table > default.parquet_test_table is `HiveFileFormat`. It doesn't match the > specified format `ParquetFileFormat`.; > at > org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:115) > at > org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:75) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266) > at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256) > at > org.apache.spark.sql.execution.datasources.PreprocessTableCreation.apply(rules.scala:75) > at > org.apache.spark.sql.execution.datasources.PreprocessTableCreation.apply(rules.scala:71) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82) > at > scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57) > at > scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66) > at scala.collection.mutable.ArrayBuffer.foldLeft(ArrayBuffer.scala:48) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74) > at scala.collection.immutable.List.foreach(List.scala:381) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74) > at > org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69) > at > org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:67) > at > org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:50) > at > org.apache.spark.sql.execution.QueryExecution.withCachedData$lzycompute(QueryExecution.scala:73) > at > org.apache.spark.sql.execution.QueryExecution.withCachedData(QueryExecution.scala:72) > at > org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:78) > at > org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:78) > at > org.apache.spark.sql.execution.QueryExecution.completeString(QueryExecution.scala:220) > at > org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:203) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74) > at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:609) > at org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:419) > at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:398) > at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:354) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org