Hi guys, A coworker at PayPal (Ohad Raviv) and I have prepared some Spark code as a contribution to DataFu. I've gone ahead and prepared a side branch for review on our (DataFu's) GitHub.
The branch can be found here: https://github.com/apache/datafu/tree/spark-tmp Here is a JIRA I opened for this contribution. https://issues.apache.org/jira/browse/DATAFU-148 There is more PayPal content we'd like to contribute, but we're waiting for legal approval for the next batch. Please take a look at the code and give us your comments, feedback, etc. Thanks, Eyal (and Ohad)