I wanted to share a Python implementation of RDDs: pysparkling.
http://trivial.io/post/120179819751/pysparkling-is-a-native-implementation-of-the
The benefit is that you can apply the same code that you use in PySpark on
large datasets in pysparkling on small datasets or single documents. When
, May 29, 2015 at 2:46 PM Davies Liu dav...@databricks.com wrote:
There is another implementation of RDD interface in Python, called
DPark [1], Could you have a few words to compare these two?
[1] https://github.com/douban/dpark/
On Fri, May 29, 2015 at 8:29 AM, Sven Kreiss s...@svenkreiss.com