Re: coalesce on SchemaRDD in pyspark

2014-09-12 Thread Davies Liu
This is a bug, I had create an issue to track this: https://issues.apache.org/jira/browse/SPARK-3500 Also, there is PR to fix this: https://github.com/apache/spark/pull/2369 Before next bugfix release, you can workaround this by: srdd = sqlCtx.jsonRDD(rdd) srdd2 =

Re: coalesce on SchemaRDD in pyspark

2014-09-12 Thread Brad Miller
Hi Davies, Thanks for the quick fix. I'm sorry to send out a bug report on release day - 1.1.0 really is a great release. I've been running the 1.1 branch for a while and there's definitely lots of good stuff. For the workaround, I think you may have meant: srdd2 =

Re: coalesce on SchemaRDD in pyspark

2014-09-12 Thread Davies Liu
On Fri, Sep 12, 2014 at 8:55 AM, Brad Miller bmill...@eecs.berkeley.edu wrote: Hi Davies, Thanks for the quick fix. I'm sorry to send out a bug report on release day - 1.1.0 really is a great release. I've been running the 1.1 branch for a while and there's definitely lots of good stuff.

coalesce on SchemaRDD in pyspark

2014-09-11 Thread Brad Miller
Hi All, I'm having some trouble with the coalesce and repartition functions for SchemaRDD objects in pyspark. When I run: sqlCtx.jsonRDD(sc.parallelize(['{foo:bar}', '{foo:baz}'])).coalesce(1) I get this error: Py4JError: An error occurred while calling o94.coalesce. Trace: