I think DataFu needs to support Spark, to include additions/UDFs for Spark, to continue to thrive as a project. Pig has been abandoned by Hortonworks and some others, and I'm not sure it will continue to thrive in the future.
Personally, I work in PySpark and will try to come up with a list of five additions I wish Spark had that aren't likely to be accepted as direct additions to the API. I can think of utils for joining nested/complex RDDs that I would like to see, in particular. I'll think of some others. Can any Scala Spark users do the same for Scala Spark? Make up a list of five additions you would like to see DataFu make to Scala. If anyone has thoughts on DataFu and Spark, please lets hear them. How would Spark in DataFu work? Thanks! --- Russell Jurney @rjurney <http://twitter.com/rjurney> russell.jur...@gmail.com LI <http://linkedin.com/in/russelljurney> FB <http://facebook.com/jurney> datasyndrome.com