Hi All, I started exploring Spark from past 2 months. I'm looking for some concrete features from both Spark and GraphX so that I'll take some decisions what to use, based upon who get highest performance.
According to documentation GraphX runs 10x faster than normal Spark. So I run Page Rank algorithm in both the applications: For Spark I used: https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/SparkPageRank.scala For GraphX I used : https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/graphx/LiveJournalPageRank.scala Input data : http://snap.stanford.edu/data/soc-LiveJournal1.html (1 Gb in size) No of Iterations : 2 *Time Taken : * Local Mode (Machine : 8 Core; 16 GB memory; 2.80 Ghz Intel i7; Executor Memory: 4Gb, No. of Partition: 50; No. of Iterations: 2); ==> *Spark Page Rank took -> 21.29 mins GraphX Page Rank took -> 42.01 mins * Cluster Mode (ubantu 12.4; spark 1.1/hadoop 2.4 cluster ; 3 workers , 1 driver , 8 cores, 30 gb memory) (Executor memory 4gb; No. of edge partitions : 50, random vertex cut ; no. of iteration : 2) => *Spark Page Rank took -> 10.54 mins GraphX Page Rank took -> 7.54 mins * Could you please help me to determine, when to use Spark and GraphX ? If GraphX took same amount of time than Spark then its better to use Spark because spark has variey of operators to deal with any type of RDD. Any suggestions or feedback or pointers will highly appreciate Thanks, ----- --Harihar -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-or-GraphX-runs-fast-a-performance-comparison-on-Page-Rank-tp19710.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org