subject:"Reproducing the function of a Hadoop Reducer"

Re: Reproducing the function of a Hadoop Reducer

2014-09-20 Thread Victor Tso-Guillen

So sorry about teasing you with the Scala. But the method is there in Java too, I just checked. On Fri, Sep 19, 2014 at 2:02 PM, Victor Tso-Guillen v...@paxata.com wrote: It might not be the same as a real hadoop reducer, but I think it would accomplish the same. Take a look at: import

Re: Reproducing the function of a Hadoop Reducer

2014-09-20 Thread Steve Lewis

OK so in Java - pardon the verbosity I might say something like the code below but I face the following issues 1) I need to store all values in memory as I run combineByKey - it I could return an RDD which consumed values that would be great but I don't know how to do that - 2) In my version of

Re: Reproducing the function of a Hadoop Reducer

2014-09-20 Thread Victor Tso-Guillen

1. Actually, I disagree that combineByKey requires that all values be held in memory for a key. Only the use case groupByKey does that, whereas reduceByKey, foldByKey, and the generic combineByKey do not necessarily make that requirement. If your combine logic really shrinks the result