[jira] [Commented] (SPARK-23840) PySpark error when converting a DataFrame to rdd
[ https://issues.apache.org/jira/browse/SPARK-23840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16422509#comment-16422509 ] Hyukjin Kwon commented on SPARK-23840: -- It would be nicer if we can have error messages since it's hard to reproduce and it's quite difficult to debug only given that information .. FYI, the execution path would be roughly Python --py4j--> Spark Driver ---> Spark Executor --> Python worker and I can't check every code path :( .. > PySpark error when converting a DataFrame to rdd > > > Key: SPARK-23840 > URL: https://issues.apache.org/jira/browse/SPARK-23840 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.0 >Reporter: Uri Goren >Priority: Major > > I am running code in the `pyspark` shell on an `emr` cluster, and > encountering an error I have never seen before... > This line works: > spark.read.parquet(s3_input).take(99) > While this line causes an exception: > spark.read.parquet(s3_input).rdd.take(99) > With > > TypeError: 'int' object is not iterable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23840) PySpark error when converting a DataFrame to rdd
[ https://issues.apache.org/jira/browse/SPARK-23840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16422242#comment-16422242 ] Uri Goren commented on SPARK-23840: --- This error does not occur locally, I have checked why, and apparently I was running pyspark 2.3 remotely, and pyspark 2.2.1 locally. After downgrading the pyspark version, everything works as expected. The error message occurs in `worker.py` in a `loads` function. If you need, I can regenerate the exception with pyspark 2.3.0 on EMR 5.11, by using basic rdd operations on parquet data. I can check with my employer if I can share a cluster / data set with you if it's absolutely necessary. > PySpark error when converting a DataFrame to rdd > > > Key: SPARK-23840 > URL: https://issues.apache.org/jira/browse/SPARK-23840 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.0 >Reporter: Uri Goren >Priority: Major > > I am running code in the `pyspark` shell on an `emr` cluster, and > encountering an error I have never seen before... > This line works: > spark.read.parquet(s3_input).take(99) > While this line causes an exception: > spark.read.parquet(s3_input).rdd.take(99) > With > > TypeError: 'int' object is not iterable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23840) PySpark error when converting a DataFrame to rdd
[ https://issues.apache.org/jira/browse/SPARK-23840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16421912#comment-16421912 ] Hyukjin Kwon commented on SPARK-23840: -- Can you give me the full error messages? and .. does this work in your local? > PySpark error when converting a DataFrame to rdd > > > Key: SPARK-23840 > URL: https://issues.apache.org/jira/browse/SPARK-23840 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.0 >Reporter: Uri Goren >Priority: Major > > I am running code in the `pyspark` shell on an `emr` cluster, and > encountering an error I have never seen before... > This line works: > spark.read.parquet(s3_input).take(99) > While this line causes an exception: > spark.read.parquet(s3_input).rdd.take(99) > With > > TypeError: 'int' object is not iterable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org