[jira] [Commented] (SPARK-25271) Creating parquet table with all the column null throws exception

ASF GitHub Bot (JIRA) Tue, 11 Dec 2018 00:36:44 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-25271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16716560#comment-16716560
 ]


ASF GitHub Bot commented on SPARK-25271:
----------------------------------------

viirya commented on a change in pull request #22514: [SPARK-25271][SQL] Hive 
ctas commands should use data source if it is convertible
URL: https://github.com/apache/spark/pull/22514#discussion_r240512341
 
 

 ##########
 File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala
 ##########
 @@ -97,9 +77,118 @@ case class CreateHiveTableAsSelectCommand(
     Seq.empty[Row]
   }
 
+  // Returns `DataWritingCommand` used to write data when the table exists.
+  def writingCommandForExistingTable(
+    catalog: SessionCatalog,
+    tableDesc: CatalogTable): DataWritingCommand
+
+  // Returns `DataWritingCommand` used to write data when the table doesn't 
exist.
+  def writingCommandForNewTable(
+    catalog: SessionCatalog,
+    tableDesc: CatalogTable): DataWritingCommand
+
   override def argString: String = {
     s"[Database:${tableDesc.database}, " +
     s"TableName: ${tableDesc.identifier.table}, " +
     s"InsertIntoHiveTable]"
   }
 }
+
+/**
+ * Create table and insert the query result into it.
+ *
+ * @param tableDesc the table description, which may contain serde, storage 
handler etc.
+ * @param query the query whose result will be insert into the new relation
+ * @param mode SaveMode
+ */
+case class CreateHiveTableAsSelectCommand(
+    tableDesc: CatalogTable,
+    query: LogicalPlan,
+    outputColumnNames: Seq[String],
+    mode: SaveMode)
+  extends CreateHiveTableAsSelectBase {
+
+  override def writingCommandForExistingTable(
+      catalog: SessionCatalog,
+      tableDesc: CatalogTable): DataWritingCommand = {
+    // For CTAS, there is no static partition values to insert.
+    val partition = tableDesc.partitionColumnNames.map(_ -> None).toMap
+    InsertIntoHiveTable(
+      tableDesc,
+      partition,
+      query,
+      overwrite = false,
 
 Review comment:
   ha, ok.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Creating parquet table with all the column null throws exception
> ----------------------------------------------------------------
>
>                 Key: SPARK-25271
>                 URL: https://issues.apache.org/jira/browse/SPARK-25271
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.1
>            Reporter: Shivu Sondur
>            Priority: Critical
>         Attachments: image-2018-09-07-09-12-34-944.png, 
> image-2018-09-07-09-29-33-370.png, image-2018-09-07-09-29-52-899.png, 
> image-2018-09-07-09-32-43-892.png, image-2018-09-07-09-33-03-095.png
>
>
> {code:java}
>  1)cat /data/parquet.dat
> 1$abc2$pqr:3$xyz
> null{code}
>  
> {code:java}
> 2)spark.sql("create table vp_reader_temp (projects map<int, string>) ROW 
> FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY ':' 
> MAP KEYS TERMINATED BY '$'")
> {code}
> {code:java}
> 3)spark.sql("
> LOAD DATA LOCAL INPATH '/data/parquet.dat' INTO TABLE vp_reader_temp")
> {code}
> {code:java}
> 4)spark.sql("create table vp_reader STORED AS PARQUET as select * from 
> vp_reader_temp")
> {code}
> *Result :* Throwing exception (Working fine with spark 2.2.1)
> {code:java}
> java.lang.RuntimeException: Parquet record is malformed: empty fields are 
> illegal, the field should be ommited completely instead
>       at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:64)
>       at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59)
>       at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31)
>       at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:123)
>       at 
> org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:180)
>       at 
> org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:46)
>       at 
> org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:112)
>       at 
> org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:125)
>       at 
> org.apache.spark.sql.hive.execution.HiveOutputWriter.write(HiveFileFormat.scala:149)
>       at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:406)
>       at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:283)
>       at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:281)
>       at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1438)
>       at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:286)
>       at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:211)
>       at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:210)
>       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>       at org.apache.spark.scheduler.Task.run(Task.scala:109)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:349)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.parquet.io.ParquetEncodingException: empty fields are 
> illegal, the field should be ommited completely instead
>       at 
> org.apache.parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.endField(MessageColumnIO.java:320)
>       at 
> org.apache.parquet.io.RecordConsumerLoggingWrapper.endField(RecordConsumerLoggingWrapper.java:165)
>       at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeMap(DataWritableWriter.java:241)
>       at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeValue(DataWritableWriter.java:116)
>       at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeGroupFields(DataWritableWriter.java:89)
>       at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:60)
>       ... 21 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25271) Creating parquet table with all the column null throws exception

Reply via email to