[ https://issues.apache.org/jira/browse/SPARK-23271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16348217#comment-16348217 ]
Pavlo Z. commented on SPARK-23271: ---------------------------------- Yes, SPARK-15393 is similar, linked in JIRA to this issue. Such code can be used as workaround: {code} if (originalDF.rdd.isEmpty) { originalDF.sparkSession.createDataFrame(List.empty[Row].asJava, originalDF.schema) } else { originalDF } {code} > Parquet output contains only "_SUCCESS" file after empty DataFrame saving > -------------------------------------------------------------------------- > > Key: SPARK-23271 > URL: https://issues.apache.org/jira/browse/SPARK-23271 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.2.0 > Reporter: Pavlo Z. > Priority: Minor > Attachments: parquet-empty-output.zip > > > Sophisticated case, reproduced only if read empty CSV file without header > with assigned schema. > Steps for reproduce (Scala): > {code:java} > val anySchema = StructType(StructField("anyName", StringType, nullable = > false) :: Nil) > val inputDF = spark.read.schema(anySchema).csv(inputFolderWithEmptyCSVFile) > inputDF.write.parquet(outputFolderName) > // Exception: org.apache.spark.sql.AnalysisException: Unable to infer schema > for Parquet. It must be specified manually.; > val actualDF = spark.read.parquet(outputFolderName) > > {code} > *Actual:* Only "_SUCCESS" file in output directory > *Expected*: at least one Parquet file with schema. > Project for reproduce is attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org