pyspark 1.6.1 `partitionBy` does not provide meaningful information for `join` to use

2016-07-29 Thread Sisyphuss
import numpy as np def id(x): return x rdd = sc.parallelize(np.arange(1000)) rdd = rdd.map(lambda x: (x,1)) rdd = rdd.partitionBy(8, id) rdd = rdd.cache().setName('milestone') rdd.join(rdd).collect() The above code generates this DAG:

Is it a bug?

2016-05-08 Thread Sisyphuss
-- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-it-a-bug-tp26898.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Python3 does not have Module 'UserString'

2016-02-12 Thread Sisyphuss
When trying the `reduceByKey` transformation on Python3.4, I got the following error: ImportError: No module named 'UserString' -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Python3-does-not-have-Module-UserString-tp26212.html Sent from the Apache Spark