Re: Introduction for student interested in GSoC

Ted Dunning Wed, 25 Mar 2009 10:57:29 -0700

David,

You are right that this is veering a little bit away from Mahout's central
focus.  We will have to beg a bit of forgiveness on that.

I have a question for you and some hints about useful directions.

First, is is possible for Scala to move the byte code or other
representation of a closure to another machine?

That was my major pain in implementing grool.  I could use closures to
generate very concise representations of a map-reduce program, but sending
the closure to another machine was difficult especially since it could have
references to free variables.

Secondly, Cascading provides a relatively open representation of map-reduce
flows that it will optimize.  That means that if you can move functions
around between machines, that you could use Scala to define the program and
Cascading to optimize it and execute it.  The cascading logical plan can
include things like grouping and joins.  This substantially decreases the
effort you need to put in to get to near pig-equivalent functionality.

On Tue, Mar 24, 2009 at 4:34 PM, David Hall <d...@cs.stanford.edu> wrote:

> You are right that Pig is usually more useful for many
> tasks, and one of my plans is to duplicate some of its functionality,
> though I actually think I prefer Dryad/LINQ's kind of syntax.
>

-- 
Ted Dunning, CTO
DeepDyve

111 West Evelyn Ave. Ste. 202
Sunnyvale, CA 94086
www.deepdyve.com
408-773-0110 ext. 738
858-414-0013 (m)
408-773-0220 (fax)

Re: Introduction for student interested in GSoC

Reply via email to