David, You are right that this is veering a little bit away from Mahout's central focus. We will have to beg a bit of forgiveness on that.
I have a question for you and some hints about useful directions. First, is is possible for Scala to move the byte code or other representation of a closure to another machine? That was my major pain in implementing grool. I could use closures to generate very concise representations of a map-reduce program, but sending the closure to another machine was difficult especially since it could have references to free variables. Secondly, Cascading provides a relatively open representation of map-reduce flows that it will optimize. That means that if you can move functions around between machines, that you could use Scala to define the program and Cascading to optimize it and execute it. The cascading logical plan can include things like grouping and joins. This substantially decreases the effort you need to put in to get to near pig-equivalent functionality. On Tue, Mar 24, 2009 at 4:34 PM, David Hall <d...@cs.stanford.edu> wrote: > You are right that Pig is usually more useful for many > tasks, and one of my plans is to duplicate some of its functionality, > though I actually think I prefer Dryad/LINQ's kind of syntax. > -- Ted Dunning, CTO DeepDyve 111 West Evelyn Ave. Ste. 202 Sunnyvale, CA 94086 www.deepdyve.com 408-773-0110 ext. 738 858-414-0013 (m) 408-773-0220 (fax)