[jira] [Comment Edited] (SPARK-23271) Parquet output contains only "_SUCCESS" file after empty DataFrame saving

Dilip Biswal (JIRA) Tue, 06 Feb 2018 11:24:56 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-23271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354384#comment-16354384
 ]


Dilip Biswal edited comment on SPARK-23271 at 2/6/18 7:23 PM:
--------------------------------------------------------------

Thank you [~smilegator]. I will try to create a PR to fix this by trying to 
repartition the RDD before setting up the write job. 
We can discuss whether its the right approach to fix this issue in the PR.


was (Author: dkbiswal):
Thank you [~smilegator]. I will try to create a PR to fix this by trying to 
repartition the RDD before setting up the write job. We can discuss whether its 
the right approach in the PR.

> Parquet output contains only "_SUCCESS" file after empty DataFrame saving 
> --------------------------------------------------------------------------
>
>                 Key: SPARK-23271
>                 URL: https://issues.apache.org/jira/browse/SPARK-23271
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Pavlo Z.
>            Priority: Minor
>         Attachments: parquet-empty-output.zip
>
>
> Sophisticated case, reproduced only if read empty CSV file without header 
> with assigned schema.
> Steps for reproduce (Scala):
> {code:java}
> val anySchema = StructType(StructField("anyName", StringType, nullable = 
> false) :: Nil)
> val inputDF = spark.read.schema(anySchema).csv(inputFolderWithEmptyCSVFile)
> inputDF.write.parquet(outputFolderName)
> // Exception: org.apache.spark.sql.AnalysisException: Unable to infer schema 
> for Parquet. It must be specified manually.;
> val actualDF = spark.read.parquet(outputFolderName)
>  
> {code}
> *Actual:* Only "_SUCCESS" file in output directory
> *Expected*: at least one Parquet file with schema.
> Project for reproduce is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-23271) Parquet output contains only "_SUCCESS" file after empty DataFrame saving

Reply via email to