On Thu, Jul 10, 2014 at 8:20 AM, Yifan LI <iamyifa...@gmail.com> wrote: > > - how to "build the latest version of Spark from the master branch, which > contains a fix"?
Instead of downloading a prebuilt Spark release from http://spark.apache.org/downloads.html, follow the instructions under "Development Version" on that page. In short: git clone git://github.com/apache/spark.git cd spark sbt/sbt assembly Then you can run bin/spark-shell and bin/spark-submit as usual, and Graph.partitionBy should work. - how to specify other partition strategy, eg. CanonicalRandomVertexCut, > EdgePartition1D, EdgePartition2D, RandomVertexCut > (listed in Scala API document, but seems only "EdgePartition2D" is > available? I am not sure for this! ) All of those partition strategies should be available -- for example, you can call graph.partitionBy(PartitionStrategy.RandomVertexCut). - Is it possible to add my own partition strategy(hash function, etc.) into > GraphX? Yes, you just need to create a subclass of PartitionStrategy as follows: import org.apache.spark.graphx._ object MyPartitionStrategy extends PartitionStrategy { override def getPartition(src: VertexId, dst: VertexId, numParts: PartitionID): PartitionID = { // put your hash function here } } Ankur <http://www.ankurdave.com/>