[jira] [Commented] (SPARK-23840) PySpark error when converting a DataFrame to rdd

2018-04-02 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16422509#comment-16422509
 ] 

Hyukjin Kwon commented on SPARK-23840:
--

It would be nicer if we can have error messages since it's hard to reproduce 
and it's quite difficult to debug only given that information .. FYI, the 
execution path would be roughly Python --py4j--> Spark Driver ---> Spark 
Executor --> Python worker and I can't check every code path :( ..

> PySpark error when converting a DataFrame to rdd
> 
>
> Key: SPARK-23840
> URL: https://issues.apache.org/jira/browse/SPARK-23840
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.0
>Reporter: Uri Goren
>Priority: Major
>
> I am running code in the `pyspark` shell on an `emr` cluster, and 
> encountering an error I have never seen before...
> This line works:
> spark.read.parquet(s3_input).take(99)
> While this line causes an exception:
> spark.read.parquet(s3_input).rdd.take(99)
> With
> > TypeError: 'int' object is not iterable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23840) PySpark error when converting a DataFrame to rdd

2018-04-02 Thread Uri Goren (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16422242#comment-16422242
 ] 

Uri Goren commented on SPARK-23840:
---

This error does not occur locally,

I have checked why, and apparently I was running pyspark 2.3 remotely, and 
pyspark 2.2.1 locally.

After downgrading the pyspark version, everything works as expected.

The error message occurs in `worker.py` in a `loads` function.

If you need, I can regenerate the exception with pyspark 2.3.0 on EMR 5.11, by 
using basic rdd operations on parquet data.

I can check with my employer if I can share a cluster / data set with you if 
it's absolutely necessary.

> PySpark error when converting a DataFrame to rdd
> 
>
> Key: SPARK-23840
> URL: https://issues.apache.org/jira/browse/SPARK-23840
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.0
>Reporter: Uri Goren
>Priority: Major
>
> I am running code in the `pyspark` shell on an `emr` cluster, and 
> encountering an error I have never seen before...
> This line works:
> spark.read.parquet(s3_input).take(99)
> While this line causes an exception:
> spark.read.parquet(s3_input).rdd.take(99)
> With
> > TypeError: 'int' object is not iterable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23840) PySpark error when converting a DataFrame to rdd

2018-04-01 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16421912#comment-16421912
 ] 

Hyukjin Kwon commented on SPARK-23840:
--

Can you give me the full error messages? and .. does this work in your local?

> PySpark error when converting a DataFrame to rdd
> 
>
> Key: SPARK-23840
> URL: https://issues.apache.org/jira/browse/SPARK-23840
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.0
>Reporter: Uri Goren
>Priority: Major
>
> I am running code in the `pyspark` shell on an `emr` cluster, and 
> encountering an error I have never seen before...
> This line works:
> spark.read.parquet(s3_input).take(99)
> While this line causes an exception:
> spark.read.parquet(s3_input).rdd.take(99)
> With
> > TypeError: 'int' object is not iterable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org