On Mon, Sep 5, 2011 at 9:02 AM, Jake Mannix <[email protected]> wrote:
>
> This is my impression too. The more I play with Spark, the more it looks
> like
> "the Right Paradigm" for this kind of computation: how many years has I
> been
> complaining that all I've ever wanted from Hadoop (and/or Mahout) is to be
> able
> to say something like:
>
> vectors = load("hdfs://mydataFile");
> vectors.map(new Function<Vector, Vector>() {
> Vector apply(Vector in) { return in.normailze(1); })
> .filter(new Predicate<Vector>() {
> boolean apply(Vector in) { return
> in.numNonDefaultValues() < 1000; })
> .reduce(new Function<Pair<Vector, Vector>, Vector>() {
> Vector apply(Pair<Vector, Vector> pair) { return
> pair.getFirst().plus(pair.getSecond()); });
>
+1 for advocating side effect free programming!
Twister is pretty interesting too and can model Hadoop jobs in a functional
style:
http://www.iterativemapreduce.org/