[ https://issues.apache.org/jira/browse/SPARK-23271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354384#comment-16354384 ]
Dilip Biswal edited comment on SPARK-23271 at 2/6/18 7:23 PM: -------------------------------------------------------------- Thank you [~smilegator]. I will try to create a PR to fix this by trying to repartition the RDD before setting up the write job. We can discuss whether its the right approach to fix this issue in the PR. was (Author: dkbiswal): Thank you [~smilegator]. I will try to create a PR to fix this by trying to repartition the RDD before setting up the write job. We can discuss whether its the right approach in the PR. > Parquet output contains only "_SUCCESS" file after empty DataFrame saving > -------------------------------------------------------------------------- > > Key: SPARK-23271 > URL: https://issues.apache.org/jira/browse/SPARK-23271 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.2.0 > Reporter: Pavlo Z. > Priority: Minor > Attachments: parquet-empty-output.zip > > > Sophisticated case, reproduced only if read empty CSV file without header > with assigned schema. > Steps for reproduce (Scala): > {code:java} > val anySchema = StructType(StructField("anyName", StringType, nullable = > false) :: Nil) > val inputDF = spark.read.schema(anySchema).csv(inputFolderWithEmptyCSVFile) > inputDF.write.parquet(outputFolderName) > // Exception: org.apache.spark.sql.AnalysisException: Unable to infer schema > for Parquet. It must be specified manually.; > val actualDF = spark.read.parquet(outputFolderName) > > {code} > *Actual:* Only "_SUCCESS" file in output directory > *Expected*: at least one Parquet file with schema. > Project for reproduce is attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org