[jira] [Commented] (SPARK-21852) Empty Parquet Files created as a result of spark jobs fail when read
[ https://issues.apache.org/jira/browse/SPARK-21852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350871#comment-16350871 ] Ravi Chittilla commented on SPARK-21852: +1 > Empty Parquet Files created as a result of spark jobs fail when read > > > Key: SPARK-21852 > URL: https://issues.apache.org/jira/browse/SPARK-21852 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.2.0 >Reporter: Shivam Dalmia >Priority: Major > > I have faced an issue intermittently with certain spark jobs writing parquet > files which apparently succeed but the written .parquet directory in HDFS is > an empty directory (with no _SUCCESS and _metadata parts, even). > Surprisingly, no errors are thrown from spark dataframe writer. > However, when attempting to read this written file, spark throws the error: > {{Unable to infer schema for Parquet. It must be specified manually}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21852) Empty Parquet Files created as a result of spark jobs fail when read
[ https://issues.apache.org/jira/browse/SPARK-21852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144650#comment-16144650 ] Hyukjin Kwon commented on SPARK-21852: -- I generally agree with Sean and am quite sure this is not an issue. However, I want to make sure before resolving this (as at least I have seed few corner cases so far). BTW, I'd close Parquet's JIRA you opened. This does not look a Parquet issue. I would resolve this if any more details can't be provided. > Empty Parquet Files created as a result of spark jobs fail when read > > > Key: SPARK-21852 > URL: https://issues.apache.org/jira/browse/SPARK-21852 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.2.0 >Reporter: Shivam Dalmia > > I have faced an issue intermittently with certain spark jobs writing parquet > files which apparently succeed but the written .parquet directory in HDFS is > an empty directory (with no _SUCCESS and _metadata parts, even). > Surprisingly, no errors are thrown from spark dataframe writer. > However, when attempting to read this written file, spark throws the error: > {{Unable to infer schema for Parquet. It must be specified manually}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21852) Empty Parquet Files created as a result of spark jobs fail when read
[ https://issues.apache.org/jira/browse/SPARK-21852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143987#comment-16143987 ] Sean Owen commented on SPARK-21852: --- The message isn't a problem, if there is no data at all. You're saying it's due to some app-level failure, which explains it. There are no corrupt records here. I don't see a problem described here, and those are better for the mailing list or StackOverflow anyway. > Empty Parquet Files created as a result of spark jobs fail when read > > > Key: SPARK-21852 > URL: https://issues.apache.org/jira/browse/SPARK-21852 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.2.0 >Reporter: Shivam Dalmia > > I have faced an issue intermittently with certain spark jobs writing parquet > files which apparently succeed but the written .parquet directory in HDFS is > an empty directory (with no _SUCCESS and _metadata parts, even). > Surprisingly, no errors are thrown from spark dataframe writer. > However, when attempting to read this written file, spark throws the error: > {{Unable to infer schema for Parquet. It must be specified manually}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21852) Empty Parquet Files created as a result of spark jobs fail when read
[ https://issues.apache.org/jira/browse/SPARK-21852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143975#comment-16143975 ] Shivam Dalmia commented on SPARK-21852: --- Of course, this is because of the empty directory being created, which is an intermittent and often difficult to replicate scenario. So 1.) How and why are these empty parquet files being created? Any leads would be helpful. 2.) Is there any way to have spark check if the file is empty/corrupt before attempting to infer its schema? > Empty Parquet Files created as a result of spark jobs fail when read > > > Key: SPARK-21852 > URL: https://issues.apache.org/jira/browse/SPARK-21852 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.2.0 >Reporter: Shivam Dalmia > > I have faced an issue intermittently with certain spark jobs writing parquet > files which apparently succeed but the written .parquet directory in HDFS is > an empty directory (with no _SUCCESS and _metadata parts, even). > Surprisingly, no errors are thrown from spark dataframe writer. > However, when attempting to read this written file, spark throws the error: > {{Unable to infer schema for Parquet. It must be specified manually}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org