Re: Clojure for big data

Christopher Small Wed, 18 Oct 2017 22:54:12 -0700

Hi Ray

> This is partly prompted by the lack of activity on the Github repos.

Maybe you have higher standards here than I do... last commits on Flambo 
and Sparkling were 3 and 2 months ago, respectively. That doesn't raise any 
alarm bells for me personally. Moreover, looking at the contributor graphs, 
I don't particularly get the impression that the projects have ground to a 
halt:

https://github.com/gorillalabs/sparkling/graphs/contributors
https://github.com/yieldbot/flambo/graphs/contributors

I haven't used either, but I've heard good things from folks who've used 
Flambo. Last I heard, Flambo was a pretty key component of Yieldbot's 
infrastructure, and they seem to be doing well, so I wouldn't expect the 
project to go away any time soon. I don't know as much about Sparkling, but 
it seems to have actually started as a fork of Flambo, so I'd imagine the 
APIs are at least somewhat similar, and if one went defunct, you'd probably 
have a migration path towards the other.

You may also want to take a look at 
Onyx: https://github.com/onyx-platform/onyx. It's written from the ground 
up in Clojure, and is really wonderfully designed, with a very data-centric 
(Clojuric) API. They have a very active Gitter chat room 
(https://gitter.im/onyx-platform/onyx), and the developers are very 
friendly and helpful folks. You should know ahead of time that in contrast 
with Spark and MR, which are "batch centric" technologies, Onyx is 
foundationally a built on a streaming model, with support for typical batch 
processes built on top of this streaming base. IIRC, this is modeled after 
some of the Dataflow work Google has been doing, and due to the shifting 
economics around the cost of data transmission, this approach ends up being 
pretty competitive for batch workflows, while also offering a path towards 
more seamless streaming workflows should such a setup benefit you.

I haven't spent a ton of time on the Clojurians slack channel, or any big 
data Google groups, but there is a Clojure Datascience site/chat room that 
I host which has at least some activity. Most of the chatter there has been 
more on the side of statistics, machine learning, data viz and such, and 
less specifically on big data per se, but we'd welcome you to join and 
broaden the discussion: https://gitter.im/metasoarous/clojure-datascience. 
There's actually been an uptick in activity there since the Conj, and I'd 
love to see that momentum continue.

Good luck

Chris

On Wednesday, October 18, 2017 at 4:03:11 AM UTC-7, Ray Miller wrote:
>
> Hi,
>
> Here at Metail we have been using Clojure for medium-sized data processing 
> on AWS EMR. We started out with Cascalog about 5 years ago, switched to 
> Parkour 2 years ago, and are now considering a move to Spark.
>
> My question is: is Clojure still a good choice for medium/large data 
> processing on EMR?
>
> This is partly prompted by the lack of activity on the Github repos. Are 
> the Parkour, Flambo and Sparkling libraries rock solid, or simply not 
> getting enough use to trigger bugs and feature requests? 
>
> The #bigdata channel over on Clojurians slack is also suspiciously quiet, 
> as are many of the Google groups.
>
> Ray.
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Clojure for big data

Reply via email to