[ https://issues.apache.org/jira/browse/SPARK-23271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16345228#comment-16345228 ]
Pavlo Z. commented on SPARK-23271: ---------------------------------- In example schema is provided, and this schema have to be written to output, with no data. if use such empty dataframe instead of CSV: {code:java} val inputDF = List.empty[String].toDF() {code} in output directory present "parquet" file, and output is read without errors; for me it looks correct. Behaviour for CSV file is different. Looks like it is bug - empty DataFrame cannot be written differently, depending on source. > Parquet output contains only "_SUCCESS" file after empty DataFrame saving > -------------------------------------------------------------------------- > > Key: SPARK-23271 > URL: https://issues.apache.org/jira/browse/SPARK-23271 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.2.0 > Reporter: Pavlo Z. > Priority: Minor > Attachments: parquet-empty-output.zip > > > Sophisticated case, reproduced only if read empty CSV file without header > with assigned schema. > Steps for reproduce (Scala): > {code:java} > val anySchema = StructType(StructField("anyName", StringType, nullable = > false) :: Nil) > val inputDF = spark.read.schema(anySchema).csv(inputFolderWithEmptyCSVFile) > inputDF.write.parquet(outputFolderName) > // Exception: org.apache.spark.sql.AnalysisException: Unable to infer schema > for Parquet. It must be specified manually.; > val actualDF = spark.read.parquet(outputFolderName) > > {code} > *Actual:* Only "_SUCCESS" file in output directory > *Expected*: at least one Parquet file with schema. > Project for reproduce is attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org