[ 
https://issues.apache.org/jira/browse/SPARK-26943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-26943.
-------------------------------
    Resolution: Cannot Reproduce

I don't think this is a bug, or at least, I can think of other reasons this 
happens.

Your transformation and/or data have some problem (see the error). It doesn't 
come up in .count() because, for example, Spark can avoid actually parsing the 
data if you just want to know how many things there are. To cache it requires 
persisting its representation in memory and actually parsing it, and so that's 
why it comes up.

> Weird behaviour with `.cache()`
> -------------------------------
>
>                 Key: SPARK-26943
>                 URL: https://issues.apache.org/jira/browse/SPARK-26943
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.1.0
>            Reporter: Will Uto
>            Priority: Major
>
>  
> {code:java}
> sdf.count(){code}
>  
> works fine. However:
>  
> {code:java}
> sdf = sdf.cache()
> sdf.count()
> {code}
>  does not, and produces error
> {code:java}
> Py4JJavaError: An error occurred while calling o314.count.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 75 
> in stage 8.0 failed 4 times, most recent failure: Lost task 75.3 in stage 8.0 
> (TID 438, uat-datanode-02, executor 1): java.text.ParseException: Unparseable 
> number: "(N/A)"
>       at java.text.NumberFormat.parse(NumberFormat.java:350)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to