Hi Ray > This is partly prompted by the lack of activity on the Github repos.
Maybe you have higher standards here than I do... last commits on Flambo and Sparkling were 3 and 2 months ago, respectively. That doesn't raise any alarm bells for me personally. Moreover, looking at the contributor graphs, I don't particularly get the impression that the projects have ground to a halt: https://github.com/gorillalabs/sparkling/graphs/contributors https://github.com/yieldbot/flambo/graphs/contributors I haven't used either, but I've heard good things from folks who've used Flambo. Last I heard, Flambo was a pretty key component of Yieldbot's infrastructure, and they seem to be doing well, so I wouldn't expect the project to go away any time soon. I don't know as much about Sparkling, but it seems to have actually started as a fork of Flambo, so I'd imagine the APIs are at least somewhat similar, and if one went defunct, you'd probably have a migration path towards the other. You may also want to take a look at Onyx: https://github.com/onyx-platform/onyx. It's written from the ground up in Clojure, and is really wonderfully designed, with a very data-centric (Clojuric) API. They have a very active Gitter chat room (https://gitter.im/onyx-platform/onyx), and the developers are very friendly and helpful folks. You should know ahead of time that in contrast with Spark and MR, which are "batch centric" technologies, Onyx is foundationally a built on a streaming model, with support for typical batch processes built on top of this streaming base. IIRC, this is modeled after some of the Dataflow work Google has been doing, and due to the shifting economics around the cost of data transmission, this approach ends up being pretty competitive for batch workflows, while also offering a path towards more seamless streaming workflows should such a setup benefit you. I haven't spent a ton of time on the Clojurians slack channel, or any big data Google groups, but there is a Clojure Datascience site/chat room that I host which has at least some activity. Most of the chatter there has been more on the side of statistics, machine learning, data viz and such, and less specifically on big data per se, but we'd welcome you to join and broaden the discussion: https://gitter.im/metasoarous/clojure-datascience. There's actually been an uptick in activity there since the Conj, and I'd love to see that momentum continue. Good luck Chris On Wednesday, October 18, 2017 at 4:03:11 AM UTC-7, Ray Miller wrote: > > Hi, > > Here at Metail we have been using Clojure for medium-sized data processing > on AWS EMR. We started out with Cascalog about 5 years ago, switched to > Parkour 2 years ago, and are now considering a move to Spark. > > My question is: is Clojure still a good choice for medium/large data > processing on EMR? > > This is partly prompted by the lack of activity on the Github repos. Are > the Parkour, Flambo and Sparkling libraries rock solid, or simply not > getting enough use to trigger bugs and feature requests? > > The #bigdata channel over on Clojurians slack is also suspiciously quiet, > as are many of the Google groups. > > Ray. > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.