Great, thanks -- I hadn't looked too closely at Cascalog yet only because I
don't currently have the rest of the Hadoop infrastructure. But adding that
in isn't out of the question, so I'll definitely look at it more closely.
 And I may have underestimated the utility of Cascalog without Hadoop...



On Tue, Aug 21, 2012 at 4:38 AM, Sam Ritchie <sritchi...@gmail.com> wrote:

> Definitely +1 for Cascalog -- I maintain Cascalog, along with Nathan Marz.
> Here's the wiki:
>
> https://github.com/nathanmarz/cascalog/wiki
>
> Head on over to the 
> cascalog-user<https://groups.google.com/forum/?fromgroups#!forum/cascalog-user>
>  mailing
> list with any questions. Looking forward to seeing you there.
>
>
> On Mon, Aug 20, 2012 at 5:55 PM, ronen <nark...@gmail.com> wrote:
>
>> Terabyte size and chain of dependent tasks might hint toward 
>> Cascalog<https://github.com/nathanmarz/cascalog/wiki> this assumes that
>> your doing batch job processing (on top of hadoop)
>>
>> If you need a more soft real time datalog based query then I would check
>> datomic <http://www.datomic.com/> although from your description is
>> sounds less so.
>>
>> Ronen
>>
>> On Tuesday, August 21, 2012 3:14:23 AM UTC+3, Leif wrote:
>>>
>>> +1.  I know of a couple tools in python for this purpose that are called
>>> "workflow management systems."   It would be good to know if there is a
>>> robust one in clojure.
>>>
>>> On Monday, August 20, 2012 12:18:54 AM UTC-4, matt hoffman wrote:
>>>>
>>>> I have a problem that I'm trying to figure out how to tackle. I'm new
>>>> to Clojure, but I'm interested, and perhaps this will be my excuse to give
>>>> it a try. Any of the following answers would help:
>>>> "What you're describing really sounds like X"
>>>> "You could think of that problem like this, instead"
>>>> "You may want to search for term 'Y'...it sounds related" (I imagine
>>>> I'm probably describing some well-established domain...I just don't know
>>>> the right terms to search for)
>>>>
>>>> So, the problem:
>>>> I have an app that is in production doing some fairly complex
>>>> calculations on large-ish (terabyte-range) amounts of data.  The
>>>> calculations are expressed as chains of dependent tasks, where each tasks
>>>> can have a number of inputs and outputs. But the code has become hard to
>>>> maintain, full of accidental complexity and very difficult for newer
>>>> developers to understand. So, I'm trying to find the right abstractions to
>>>> put in place to keep things simple.
>>>> One of the sources of complexity is the intermingling of code involving
>>>> loading data, dividing up data to be executed in parallel, processing data,
>>>> persisting data, and handling the execution flow on an individual datum
>>>> (configuring pipelines of components,etc.) I'd like to keep the functions
>>>> pure and push the other concerns off to a framework -- and, ideally, not
>>>> have to write that framework.
>>>>
>>>> So I think my problem statement is this:
>>>> I'd like to be able to define functions that specify, somehow, what
>>>> input they want, and perhaps what output they produce. Then I'd like to
>>>> push the concern of how those inputs are calculated -- loaded from a db,
>>>> calculated from source data -- off on some other party.
>>>>
>>>> For example, if I define a function that requires "foo", and I call
>>>> that function without providing "foo", I'd like for _something_ to step in
>>>> and say, "Ok, you require foo. I have this function over here that produces
>>>> foo. Let me call that for you, then hand you the output."  Perhaps instead
>>>> of a framework that transparently looks up and executes that function and
>>>> provides a Future for the result, perhaps I can explicitly build a
>>>> dependency graph up-front containing all the functions required to produce
>>>> the end result, and then execute them all in order... I think the effect is
>>>> the same.
>>>>
>>>> From a bit of searching I've done today, dataflow programming like
>>>> clojure.contrib.dataflow sounds like it might be close to what I'm looking
>>>> for, but I'd love to hear ideas.   Am I describing something that already
>>>> exists?  Would this actually be simpler than it seems using some clever
>>>> macros? Are there some keywords I should search for to get started?  Or
>>>> perhaps I'm coming at this problem wrong, and I should think about it a
>>>> different way...
>>>>
>>>>  --
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to clojure@googlegroups.com
>> Note that posts from new members are moderated - please be patient with
>> your first post.
>> To unsubscribe from this group, send email to
>> clojure+unsubscr...@googlegroups.com
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>>
>
>
>
> --
> Sam Ritchie, Twitter Inc
> 703.662.1337
> @sritchie
>
> (Too brief? Here's why! http://emailcharter.org)
>
>  --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to