I wanted to share a Python implementation of RDDs: pysparkling.

http://trivial.io/post/120179819751/pysparkling-is-a-native-implementation-of-the

The benefit is that you can apply the same code that you use in PySpark on
large datasets in pysparkling on small datasets or single documents. When
running with pysparkling, there is no dependency on the Java Virtual
Machine or Hadoop.

Sven

Reply via email to