Thanks DB. I will work with mapPartition for now.
Question to the community in general: should we consider adding such an operation to RDDs especially as a developer API? On Sun, May 4, 2014 at 1:41 AM, DB Tsai <dbt...@stanford.edu> wrote: > You could easily achieve this by mapPartition. However, it seems that it > can not be done by using aggregate type of operation. I can see that it's a > general useful operation. For now, you could use mapPartition. > Sincerely, > DB Tsai > ------------------------------------------------------- > My Blog: https://www.dbtsai.com > LinkedIn: https://www.linkedin.com/in/dbtsai > On Sun, May 4, 2014 at 1:12 AM, Manish Amde <manish...@gmail.com> wrote: >> I am currently using the RDD aggregate operation to reduce (fold) per >> partition and then combine using the RDD aggregate operation. >> def aggregate[U: ClassTag](zeroValue: U)(seqOp: (U, T) => U, combOp: (U, U) >> => U): U >> >> I need to perform a transform operation after the seqOp and before the >> combOp. The signature would look like >> def foldTransformCombine[U: ClassTag](zeroReduceValue: V, zeroCombineValue: >> U)(seqOp: (V, T) => V, transformOp: (V) => U, combOp: (U, U) => U): U >> >> This is especially useful in the scenario where the transformOp is >> expensive and should be performed once per partition before combining. Is >> there a way to accomplish this with existing RDD operations? If yes, great >> but if not, should we consider adding such a general transformation to the >> list of RDD operations? >> >> -Manish >>