Great, thanks -- I hadn't looked too closely at Cascalog yet only because I don't currently have the rest of the Hadoop infrastructure. But adding that in isn't out of the question, so I'll definitely look at it more closely. And I may have underestimated the utility of Cascalog without Hadoop...
On Tue, Aug 21, 2012 at 4:38 AM, Sam Ritchie <sritchi...@gmail.com> wrote: > Definitely +1 for Cascalog -- I maintain Cascalog, along with Nathan Marz. > Here's the wiki: > > https://github.com/nathanmarz/cascalog/wiki > > Head on over to the > cascalog-user<https://groups.google.com/forum/?fromgroups#!forum/cascalog-user> > mailing > list with any questions. Looking forward to seeing you there. > > > On Mon, Aug 20, 2012 at 5:55 PM, ronen <nark...@gmail.com> wrote: > >> Terabyte size and chain of dependent tasks might hint toward >> Cascalog<https://github.com/nathanmarz/cascalog/wiki> this assumes that >> your doing batch job processing (on top of hadoop) >> >> If you need a more soft real time datalog based query then I would check >> datomic <http://www.datomic.com/> although from your description is >> sounds less so. >> >> Ronen >> >> On Tuesday, August 21, 2012 3:14:23 AM UTC+3, Leif wrote: >>> >>> +1. I know of a couple tools in python for this purpose that are called >>> "workflow management systems." It would be good to know if there is a >>> robust one in clojure. >>> >>> On Monday, August 20, 2012 12:18:54 AM UTC-4, matt hoffman wrote: >>>> >>>> I have a problem that I'm trying to figure out how to tackle. I'm new >>>> to Clojure, but I'm interested, and perhaps this will be my excuse to give >>>> it a try. Any of the following answers would help: >>>> "What you're describing really sounds like X" >>>> "You could think of that problem like this, instead" >>>> "You may want to search for term 'Y'...it sounds related" (I imagine >>>> I'm probably describing some well-established domain...I just don't know >>>> the right terms to search for) >>>> >>>> So, the problem: >>>> I have an app that is in production doing some fairly complex >>>> calculations on large-ish (terabyte-range) amounts of data. The >>>> calculations are expressed as chains of dependent tasks, where each tasks >>>> can have a number of inputs and outputs. But the code has become hard to >>>> maintain, full of accidental complexity and very difficult for newer >>>> developers to understand. So, I'm trying to find the right abstractions to >>>> put in place to keep things simple. >>>> One of the sources of complexity is the intermingling of code involving >>>> loading data, dividing up data to be executed in parallel, processing data, >>>> persisting data, and handling the execution flow on an individual datum >>>> (configuring pipelines of components,etc.) I'd like to keep the functions >>>> pure and push the other concerns off to a framework -- and, ideally, not >>>> have to write that framework. >>>> >>>> So I think my problem statement is this: >>>> I'd like to be able to define functions that specify, somehow, what >>>> input they want, and perhaps what output they produce. Then I'd like to >>>> push the concern of how those inputs are calculated -- loaded from a db, >>>> calculated from source data -- off on some other party. >>>> >>>> For example, if I define a function that requires "foo", and I call >>>> that function without providing "foo", I'd like for _something_ to step in >>>> and say, "Ok, you require foo. I have this function over here that produces >>>> foo. Let me call that for you, then hand you the output." Perhaps instead >>>> of a framework that transparently looks up and executes that function and >>>> provides a Future for the result, perhaps I can explicitly build a >>>> dependency graph up-front containing all the functions required to produce >>>> the end result, and then execute them all in order... I think the effect is >>>> the same. >>>> >>>> From a bit of searching I've done today, dataflow programming like >>>> clojure.contrib.dataflow sounds like it might be close to what I'm looking >>>> for, but I'd love to hear ideas. Am I describing something that already >>>> exists? Would this actually be simpler than it seems using some clever >>>> macros? Are there some keywords I should search for to get started? Or >>>> perhaps I'm coming at this problem wrong, and I should think about it a >>>> different way... >>>> >>>> -- >> You received this message because you are subscribed to the Google >> Groups "Clojure" group. >> To post to this group, send email to clojure@googlegroups.com >> Note that posts from new members are moderated - please be patient with >> your first post. >> To unsubscribe from this group, send email to >> clojure+unsubscr...@googlegroups.com >> For more options, visit this group at >> http://groups.google.com/group/clojure?hl=en >> > > > > -- > Sam Ritchie, Twitter Inc > 703.662.1337 > @sritchie > > (Too brief? Here's why! http://emailcharter.org) > > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with > your first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en