On Wed, Mar 25, 2009 at 10:56 AM, Ted Dunning <ted.dunn...@gmail.com> wrote:
> David,
>
> You are right that this is veering a little bit away from Mahout's central
> focus.  We will have to beg a bit of forgiveness on that.

I'm not picky, certainly... :-)

>
> I have a question for you and some hints about useful directions.
>
> First, is is possible for Scala to move the byte code or other
> representation of a closure to another machine?

scala closures are just objects. With the compiler plugin I wrote it's
trivial to to serialize closures and send them down the wire. In fact,
that's how SMR works at the moment.

int a = 3;

for( (k,v) <- pairs) yield (v,k+ a)

translates to

pairs.map( new anonfun$obfuscationgarbage$$1(a) )

for some appropriately magical anonfun, and and under the hood SMR
writes that instance to the distributed cache and the worker nodes
pick it up on the other side.

Moving the classfile for the anonymous function is much more of pain,
and I rely on hadoop's jar infrastructure for that. Though I guess you
can do classfile hackery to get at it. You won't be able to use the
interpreter though, because it uses too much magic.

>
> That was my major pain in implementing grool.  I could use closures to
> generate very concise representations of a map-reduce program, but sending
> the closure to another machine was difficult especially since it could have
> references to free variables.
>
> Secondly, Cascading provides a relatively open representation of map-reduce
> flows that it will optimize.  That means that if you can move functions
> around between machines, that you could use Scala to define the program and
> Cascading to optimize it and execute it.  The cascading logical plan can
> include things like grouping and joins.  This substantially decreases the
> effort you need to put in to get to near pig-equivalent functionality.

interesting... SMR already chains together multiple maps (and filters
and flatmaps) if you use them before calling a reduce. I'll look into
the magic that Cascading does.

-- David

>
> On Tue, Mar 24, 2009 at 4:34 PM, David Hall <d...@cs.stanford.edu> wrote:
>
>> You are right that Pig is usually more useful for many
>> tasks, and one of my plans is to duplicate some of its functionality,
>> though I actually think I prefer Dryad/LINQ's kind of syntax.
>>
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>
> 111 West Evelyn Ave. Ste. 202
> Sunnyvale, CA 94086
> www.deepdyve.com
> 408-773-0110 ext. 738
> 858-414-0013 (m)
> 408-773-0220 (fax)
>

Reply via email to