Re: "the default GraphX graph-partition strategy on multicore machine"?

2014-07-21 Thread Yifan LI
Thanks so much, Ankur, :)) Excuse me but I am wondering that: (for a chosen partition strategy for my application) 1.1) how to check the size of each partition? is there any api, or log file? 1.2) how to check the processing cost of each partition(time, memory, etc)? 2.1) and the global communi

Re: "the default GraphX graph-partition strategy on multicore machine"?

2014-07-18 Thread Ankur Dave
Sorry, I didn't read your vertex replication example carefully, so my previous answer is wrong. Here's the correct one: On Fri, Jul 18, 2014 at 9:13 AM, Yifan LI wrote: > I don't understand, for instance, we have 3 edge partition tables(EA: a -> > b, a -> c; EB: a -> d, a -> e; EC: d -> c ), 2 v

Re: "the default GraphX graph-partition strategy on multicore machine"?

2014-07-18 Thread Ankur Dave
On Fri, Jul 18, 2014 at 9:13 AM, Yifan LI wrote: > Yes, is possible to defining a custom partition strategy? Yes, you just need to create a subclass of PartitionStrategy as follows: import org.apache.spark.graphx._ object MyPartitionStrategy extends PartitionStrategy { override def getPartit

Re: "the default GraphX graph-partition strategy on multicore machine"?

2014-07-18 Thread Yifan LI
Hi Ankur, Thanks so much! :)) Yes, is possible to defining a custom partition strategy? And, some other questions: (2*4 cores machine, 24GB memory) - if I load one edges file(5 GB), without any cores/partitions setting, what is the default partition in graph construction? and how many cores wi

Re: "the default GraphX graph-partition strategy on multicore machine"?

2014-07-15 Thread Ankur Dave
On Jul 15, 2014, at 12:06 PM, Yifan LI wrote: > Btw, is there any possibility to customise the partition strategy as we > expect? I'm not sure I understand. Are you asking about defining a custom

Re: "the default GraphX graph-partition strategy on multicore machine"?

2014-07-15 Thread Yifan LI
Hi Ankur, I have another question, w.r.t edges/partitions scheduling: For instance, I have a 2*4 cores(L1 cache: 32K) machine, with 32GB memory, a 80GB size of local edges file on disk, when I load the file using sc.textFile (minPartitions = 16, PartitionStrategy.RandomVertexCut), Then, what ha

Re: "the default GraphX graph-partition strategy on multicore machine"?

2014-07-15 Thread Yifan LI
Dear Ankur, Thanks so much! Btw, is there any possibility to customise the partition strategy as we expect? Best, Yifan On Jul 11, 2014, at 10:20 PM, Ankur Dave wrote: > Hi Yifan, > > When you run Spark on a single machine, it uses a local mode where one task > per core can be executed at a