Hmmm... a lot of duplicated work. Sorry I didn't get my stuff in a more usable form for you, but I wasn't aware that anybody was even interested in it. I've got some stuff that I want to rework a little, and I'm still thinking through the best way to integrate with the new reducers code in Clojure, but I haven't had the right combination of time and motivation to finish off what I started and document it. At any rate, we should work at merging the two efforts, since I don't see any need for duplicate APIs.
In taking a quick first pass at it, I wasn't able to get your code and examples to work, but I'm curious what your reasoning is for using serializable.fn and avoiding use of clojure.core/fn or #(). I'm not sure that is strictly necessary. For example, the following works just fine with my API: (require 'spark.api.clojure.core) (wrappers!) ; one of the pieces I want to re-work, but allows functions like map to work with either Clojure collections or RDDs (set-spark-context! "local[4]" "cljspark") (def rdd (parallelize [1 2 3 4])) (def mrdd1 (map #(+ 2 %) rdd)) (def result1 (collect mrdd1)) (def offset1 4) (def mrdd2 (map #(+ offset %) rdd)) (def result2 (collect mrdd2)) (def mrdd3 (map (let [offset2 5] (+ offset %)) rdd)) (def result3 (collect mrdd3)) That will result in result1, result2, and result3 being [3 4 5 6], [5 6 7 8], and [6 7 8 9] respectively, without any need for serializable-fn. On Tuesday, January 22, 2013 6:55:53 AM UTC-8, Marc Limotte wrote: > A Clojure api for the Spark Project. I am aware that there is another > clojure spark wrapper project which looks very interesting, This project > has similar goals. And also similar to that project it is not absolutely > complete, but it is does have some documentation and examples. And it is > useable and should be easy enough to extend as needed. This is the result > of about three weeks of work. It handles many of the initial problems like > serializing anonymous functions, converting back and forth between Scala > Tuples and Clojure seqs, and converting RDDs to PairRDDs. > > The project is available here: > > https://github.com/TheClimateCorporation/clj-spark > > Thanks to The Climate Corporation for allowing me to release it. At > Climate, we do the majority of our Big Data work with Cascalog (on top of > Cascading). I was looking into Spark for some of the benefits that it > provides. I suspect we will explore Shark next, and may work it in to our > processes for some of our more adhoc/exploratory queries. > > I think it would be interesting to see a Cascading planner on top of > Spark, which would enable Cascalog queries (mostly) for free. I suspect > that might be a superior method of using Clojure on Spark. > > Marc Limotte > > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en