[jira] [Commented] (SPARK-21852) Empty Parquet Files created as a result of spark jobs fail when read

2018-02-02 Thread Ravi Chittilla (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350871#comment-16350871
 ] 

Ravi Chittilla commented on SPARK-21852:


+1

> Empty Parquet Files created as a result of spark jobs fail when read
> 
>
> Key: SPARK-21852
> URL: https://issues.apache.org/jira/browse/SPARK-21852
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.2.0
>Reporter: Shivam Dalmia
>Priority: Major
>
> I have faced an issue intermittently with certain spark jobs writing parquet 
> files which apparently succeed but the written .parquet directory in HDFS is 
> an empty directory (with no _SUCCESS and _metadata parts, even). 
> Surprisingly, no errors are thrown from spark dataframe writer.
> However, when attempting to read this written file, spark throws the error:
> {{Unable to infer schema for Parquet. It must be specified manually}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21852) Empty Parquet Files created as a result of spark jobs fail when read

2017-08-28 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144650#comment-16144650
 ] 

Hyukjin Kwon commented on SPARK-21852:
--

I generally agree with Sean and am quite sure this is not an issue. However, I 
want to make sure before resolving this (as at least I have seed few corner 
cases so far).

BTW, I'd close Parquet's JIRA you opened. This does not look a Parquet issue. I 
would resolve this if any more details can't be provided.

> Empty Parquet Files created as a result of spark jobs fail when read
> 
>
> Key: SPARK-21852
> URL: https://issues.apache.org/jira/browse/SPARK-21852
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.2.0
>Reporter: Shivam Dalmia
>
> I have faced an issue intermittently with certain spark jobs writing parquet 
> files which apparently succeed but the written .parquet directory in HDFS is 
> an empty directory (with no _SUCCESS and _metadata parts, even). 
> Surprisingly, no errors are thrown from spark dataframe writer.
> However, when attempting to read this written file, spark throws the error:
> {{Unable to infer schema for Parquet. It must be specified manually}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21852) Empty Parquet Files created as a result of spark jobs fail when read

2017-08-28 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143987#comment-16143987
 ] 

Sean Owen commented on SPARK-21852:
---

The message isn't a problem, if there is no data at all. You're saying it's due 
to some app-level failure, which explains it. There are no corrupt records 
here. I don't see a problem described here, and those are better for the 
mailing list or StackOverflow anyway.

> Empty Parquet Files created as a result of spark jobs fail when read
> 
>
> Key: SPARK-21852
> URL: https://issues.apache.org/jira/browse/SPARK-21852
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.2.0
>Reporter: Shivam Dalmia
>
> I have faced an issue intermittently with certain spark jobs writing parquet 
> files which apparently succeed but the written .parquet directory in HDFS is 
> an empty directory (with no _SUCCESS and _metadata parts, even). 
> Surprisingly, no errors are thrown from spark dataframe writer.
> However, when attempting to read this written file, spark throws the error:
> {{Unable to infer schema for Parquet. It must be specified manually}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21852) Empty Parquet Files created as a result of spark jobs fail when read

2017-08-28 Thread Shivam Dalmia (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143975#comment-16143975
 ] 

Shivam Dalmia commented on SPARK-21852:
---

Of course, this is because of the empty directory being created, which is an 
intermittent and often difficult to replicate scenario.

So 
1.) How and why are these empty parquet files being created? Any leads would be 
helpful.
2.) Is there any way to have spark check if the file is empty/corrupt before 
attempting to infer its schema?

> Empty Parquet Files created as a result of spark jobs fail when read
> 
>
> Key: SPARK-21852
> URL: https://issues.apache.org/jira/browse/SPARK-21852
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.2.0
>Reporter: Shivam Dalmia
>
> I have faced an issue intermittently with certain spark jobs writing parquet 
> files which apparently succeed but the written .parquet directory in HDFS is 
> an empty directory (with no _SUCCESS and _metadata parts, even). 
> Surprisingly, no errors are thrown from spark dataframe writer.
> However, when attempting to read this written file, spark throws the error:
> {{Unable to infer schema for Parquet. It must be specified manually}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org