Github user ankurdave commented on the pull request: https://github.com/apache/spark/pull/5142#issuecomment-85805531 Hmm, this doesn't really fulfill the role of folds at the moment. The initial values are passed to a "fold" function that otherwise does the same thing as a reduce function, so it's equivalent to the user just generating the initial values in the fold function themselves. With SPARK-4086, I think @jegonzal and I were imagining a use case where you have an Iterator[(VertexId, A)] with duplicate keys and you want a VertexRDD[Array[A]]. Currently the only way is to map each element to an array and then concatenate arrays, which creates a lot of garbage. With folds you could use an empty ArrayBuffer as the initial value and do in-place inserts as elements appear. So it only seems to make sense to have a fold version of `aggregateUsingIndex` and the `VertexRDD.apply` constructor (maybe called `VertexRDD.createWithFold`), not the joins, because joining two VertexRDDs will never involve more than two values for a key. Also, in general the fold function has to return a value of the same type as the init value, e.g. `def fold[A](initVal: A, f: (VD, A) => A): VertexRDD[A]`.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org