[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...

ankurdave Tue, 24 Mar 2015 20:08:21 -0700

Github user ankurdave commented on the pull request:

    https://github.com/apache/spark/pull/5142#issuecomment-85805531
  
    Hmm, this doesn't really fulfill the role of folds at the moment. The 
initial values are passed to a "fold" function that otherwise does the same 
thing as a reduce function, so it's equivalent to the user just generating the 
initial values in the fold function themselves.
    
    With SPARK-4086, I think @jegonzal and I were imagining a use case where 
you have an Iterator[(VertexId, A)] with duplicate keys and you want a 
VertexRDD[Array[A]]. Currently the only way is to map each element to an array 
and then concatenate arrays, which creates a lot of garbage. With folds you 
could use an empty ArrayBuffer as the initial value and do in-place inserts as 
elements appear.
    
    So it only seems to make sense to have a fold version of 
`aggregateUsingIndex` and the `VertexRDD.apply` constructor (maybe called 
`VertexRDD.createWithFold`), not the joins, because joining two VertexRDDs will 
never involve more than two values for a key.
    
    Also, in general the fold function has to return a value of the same type 
as the init value, e.g. `def fold[A](initVal: A, f: (VD, A) => A): 
VertexRDD[A]`.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...

Reply via email to