[jira] [Commented] (SPARK-27826) saveAsTable() function case table have "HiveFileFormat" "ParquetFileFormat" format issue

fengtlyer (Jira) Wed, 16 Feb 2022 07:58:06 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-27826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17493316#comment-17493316
 ]


fengtlyer commented on SPARK-27826:
-----------------------------------

Hi Bandhu, sorry for late reply.

At the end, we decide re-create all the table by one way (spark) and we didn't 
face this issue any more

> saveAsTable() function case table have "HiveFileFormat" "ParquetFileFormat" 
> format issue
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-27826
>                 URL: https://issues.apache.org/jira/browse/SPARK-27826
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0, 2.4.0
>         Environment: CDH 5.13.1 - Spark version 2.2.0.cloudera2
> CDH 6.1.1 - Spark version 2.4.0-cdh6.1.1
>            Reporter: fengtlyer
>            Priority: Minor
>
> Hi Spark Dev Team,
> We tested a few times and found this bug can reappearance in multi Spark 
> version
> We tested in CDH 5.13.1 - Spark version 2.2.0.cloudera2 and CDH 6.1.1 - Spark 
> version 2.4.0-cdh6.1.1
> Both of them have this bug:
> 1. If one table created by Impala or Hive in the HUE, then in Spark code, 
> "write.format("parquet").mode("append").saveAsTable()" will case the format 
> issue (see the below error log)
> 2. Hive/Impala in the HUE created table, then 
> "write.format("parquet").mode("overwrite").saveAsTable()", this code still 
> does not work.
>  2.1 Hive/Impala in the HUE created table, and 
> "write.format("parquet").mode("overwrite").saveAsTable()", then 
> "write.format("parquet").mode("append").saveAsTable()" can work.
> 3. Hive/Impala in the HUE created table, then "insertInto()" still will work.
>  3.1 Hive/Impala in the HUE created a table, and used "insertInto()" insert 
> some new record, then try to use 
> "write.format("parquet").mode("append").saveAsTable()", it will get the same 
> format error log
> 4. Created parquet table and insert some data by Hive shell, then 
> "write.format("parquet").mode("append").saveAsTable()" can insert data, but 
> spark only shows data which insert by spark, and Hive only show data which 
> hive insert.
> =========================================================================== 
> Error Log 
> ===========================================================================
> {code}
> spark.read.format("csv").option("sep",",").option("header","true").load("hdfs:///temp1/test_paquettest.csv").write.format("parquet").mode("append").saveAsTable("parquet_test_table")
> {code}
> {code}
> org.apache.spark.sql.AnalysisException: The format of the existing table 
> default.parquet_test_table is `HiveFileFormat`. It doesn't match the 
> specified format `ParquetFileFormat`.;
> at 
> org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:115)
> at 
> org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:75)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
> at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266)
> at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256)
> at 
> org.apache.spark.sql.execution.datasources.PreprocessTableCreation.apply(rules.scala:75)
> at 
> org.apache.spark.sql.execution.datasources.PreprocessTableCreation.apply(rules.scala:71)
> at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)
> at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)
> at 
> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
> at 
> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
> at scala.collection.mutable.ArrayBuffer.foldLeft(ArrayBuffer.scala:48)
> at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82)
> at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74)
> at 
> org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69)
> at 
> org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:67)
> at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:50)
> at 
> org.apache.spark.sql.execution.QueryExecution.withCachedData$lzycompute(QueryExecution.scala:73)
> at 
> org.apache.spark.sql.execution.QueryExecution.withCachedData(QueryExecution.scala:72)
> at 
> org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:78)
> at 
> org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:78)
> at 
> org.apache.spark.sql.execution.QueryExecution.completeString(QueryExecution.scala:220)
> at 
> org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:203)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
> at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:609)
> at org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:419)
> at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:398)
> at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:354)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27826) saveAsTable() function case table have "HiveFileFormat" "ParquetFileFormat" format issue

Reply via email to