Hi, I have a question about running PageRan with live journal data as suggested by the example at
org.apache.spark.examples.graphx.LiveJournalPageRank I ran with the following options bin/run-example org.apache.spark.examples.graphx.LiveJournalPageRank data/graphx/soc-LiveJournal1.txt --numEPart=1 And it seems that from the SparkUI, the data that mapPartitions at GraphImpl.scala:235 shuffle read size is steadily increasing all the way to 2.1GB on a single node machine. I think the shuffle read size should be decreasing as the number of messages decrease? I tried with 4 partitions and it seems that the shuffle read for mapPartitions job is decreasing as the program progresses. But I am not sure why it is actually increasing for one partition? And it really destroys the performance for a single partition even though single partition uses much less time on reduce phase than the 4-partitions configuration on a single node. Thanks