[ https://issues.apache.org/jira/browse/SPARK-23271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-23271: ------------------------------------ Assignee: Apache Spark > Parquet output contains only "_SUCCESS" file after empty DataFrame saving > -------------------------------------------------------------------------- > > Key: SPARK-23271 > URL: https://issues.apache.org/jira/browse/SPARK-23271 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.2.0 > Reporter: Pavlo Z. > Assignee: Apache Spark > Priority: Minor > Attachments: parquet-empty-output.zip > > > Sophisticated case, reproduced only if read empty CSV file without header > with assigned schema. > Steps for reproduce (Scala): > {code:java} > val anySchema = StructType(StructField("anyName", StringType, nullable = > false) :: Nil) > val inputDF = spark.read.schema(anySchema).csv(inputFolderWithEmptyCSVFile) > inputDF.write.parquet(outputFolderName) > // Exception: org.apache.spark.sql.AnalysisException: Unable to infer schema > for Parquet. It must be specified manually.; > val actualDF = spark.read.parquet(outputFolderName) > > {code} > *Actual:* Only "_SUCCESS" file in output directory > *Expected*: at least one Parquet file with schema. > Project for reproduce is attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org