[jira] [Commented] (SPARK-23840) PySpark error when converting a DataFrame to rdd

Hyukjin Kwon (JIRA) Mon, 02 Apr 2018 06:43:42 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-23840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16422509#comment-16422509
 ]


Hyukjin Kwon commented on SPARK-23840:
--------------------------------------

It would be nicer if we can have error messages since it's hard to reproduce 
and it's quite difficult to debug only given that information .. FYI, the 
execution path would be roughly Python --py4j--> Spark Driver ---> Spark 
Executor --> Python worker and I can't check every code path :( ..

> PySpark error when converting a DataFrame to rdd
> ------------------------------------------------
>
>                 Key: SPARK-23840
>                 URL: https://issues.apache.org/jira/browse/SPARK-23840
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.3.0
>            Reporter: Uri Goren
>            Priority: Major
>
> I am running code in the `pyspark` shell on an `emr` cluster, and 
> encountering an error I have never seen before...
> This line works:
> spark.read.parquet(s3_input).take(99)
> While this line causes an exception:
> spark.read.parquet(s3_input).rdd.take(99)
> With
> > TypeError: 'int' object is not iterable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23840) PySpark error when converting a DataFrame to rdd

Reply via email to