Python implementation of RDD interface

2015-05-29 Thread Sven Kreiss
I wanted to share a Python implementation of RDDs: pysparkling. http://trivial.io/post/120179819751/pysparkling-is-a-native-implementation-of-the The benefit is that you can apply the same code that you use in PySpark on large datasets in pysparkling on small datasets or single documents. When

Re: Python implementation of RDD interface

2015-05-29 Thread Davies Liu
, May 29, 2015 at 2:46 PM Davies Liu dav...@databricks.com wrote: There is another implementation of RDD interface in Python, called DPark [1], Could you have a few words to compare these two? [1] https://github.com/douban/dpark/ On Fri, May 29, 2015 at 8:29 AM, Sven Kreiss s...@svenkreiss.com

Re: Python implementation of RDD interface

2015-05-29 Thread Sven Kreiss
, May 29, 2015 at 2:46 PM Davies Liu dav...@databricks.com wrote: There is another implementation of RDD interface in Python, called DPark [1], Could you have a few words to compare these two? [1] https://github.com/douban/dpark/ On Fri, May 29, 2015 at 8:29 AM, Sven Kreiss s...@svenkreiss.com

Re: Python implementation of RDD interface

2015-05-29 Thread Davies Liu
There is another implementation of RDD interface in Python, called DPark [1], Could you have a few words to compare these two? [1] https://github.com/douban/dpark/ On Fri, May 29, 2015 at 8:29 AM, Sven Kreiss s...@svenkreiss.com wrote: I wanted to share a Python implementation of RDDs