Re: reduce, transform, combine

Manish Amde Sun, 04 May 2014 10:07:22 -0700

Thanks DB. I will work with mapPartition for now.


Question to the community in general: should we consider adding such an 
operation to RDDs especially as a developer API?

On Sun, May 4, 2014 at 1:41 AM, DB Tsai <dbt...@stanford.edu> wrote:

> You could easily achieve this by mapPartition. However, it seems that it
> can not be done by using aggregate type of operation. I can see that it's a
> general useful operation. For now, you could use mapPartition.
> Sincerely,
> DB Tsai
> -------------------------------------------------------
> My Blog: https://www.dbtsai.com
> LinkedIn: https://www.linkedin.com/in/dbtsai
> On Sun, May 4, 2014 at 1:12 AM, Manish Amde <manish...@gmail.com> wrote:
>> I am currently using the RDD aggregate operation to reduce (fold) per
>> partition and then combine using the RDD aggregate operation.
>> def aggregate[U: ClassTag](zeroValue: U)(seqOp: (U, T) => U, combOp: (U, U)
>> => U): U
>>
>> I need to perform a transform operation after the seqOp and before the
>> combOp. The signature would look like
>> def foldTransformCombine[U: ClassTag](zeroReduceValue: V, zeroCombineValue:
>> U)(seqOp: (V, T) => V, transformOp: (V) => U, combOp: (U, U) => U): U
>>
>> This is especially useful in the scenario where the transformOp is
>> expensive and should be performed once per partition before combining. Is
>> there a way to accomplish this with existing RDD operations? If yes, great
>> but if not, should we consider adding such a general transformation to the
>> list of RDD operations?
>>
>> -Manish
>>

Re: reduce, transform, combine

Reply via email to