Re: New Codes in GraphX
Could it be because my edge list file is in the form (1 2), where there is an edge between node 1 and node 2? On Tue, Nov 18, 2014 at 4:13 PM, Ankur Dave ankurd...@gmail.com wrote: At 2014-11-18 15:51:52 +0530, Deep Pradhan pradhandeep1...@gmail.com wrote: Yes the above command works, but there is this problem. Most of the times, the total rank is Nan (Not a Number). Why is it so? I've also seen this, but I'm not sure why it happens. If you could find out which vertices are getting the NaN rank, it might be helpful in tracking down the problem. Ankur
New Codes in GraphX
Hi, I am using Spark-1.0.0. There are two GraphX directories that I can see here 1. spark-1.0.0/examples/src/main/scala/org/apache/sprak/examples/graphx which contains LiveJournalPageRank,scala 2. spark-1.0.0/graphx/src/main/scala/org/apache/sprak/graphx/lib which contains Analytics.scala, ConnectedComponenets.scala etc etc Now, if I want to add my own code to GraphX i.e., if I want to write a small application on GraphX, in which directory should I add my code, in 1 or 2 ? And what is the difference? Can anyone tell me something on this? Thank You
Re: New Codes in GraphX
The codes that are present in 2 can be run with the command *$SPARK_HOME/bin/spark-submit --master local[*] --class org.apache.spark.graphx.lib.Analytics $SPARK_HOME/assembly/target/scala-2.10/spark-assembly-*.jar pagerank /edge-list-file.txt --numEPart=8 --numIter=10 --partStrategy=EdgePartition2D* Now, how do I run the LiveJournalPageRank.scala that is there in 1? On Tue, Nov 18, 2014 at 2:51 PM, Deep Pradhan pradhandeep1...@gmail.com wrote: Hi, I am using Spark-1.0.0. There are two GraphX directories that I can see here 1. spark-1.0.0/examples/src/main/scala/org/apache/sprak/examples/graphx which contains LiveJournalPageRank,scala 2. spark-1.0.0/graphx/src/main/scala/org/apache/sprak/graphx/lib which contains Analytics.scala, ConnectedComponenets.scala etc etc Now, if I want to add my own code to GraphX i.e., if I want to write a small application on GraphX, in which directory should I add my code, in 1 or 2 ? And what is the difference? Can anyone tell me something on this? Thank You
Re: New Codes in GraphX
At 2014-11-18 14:51:54 +0530, Deep Pradhan pradhandeep1...@gmail.com wrote: I am using Spark-1.0.0. There are two GraphX directories that I can see here 1. spark-1.0.0/examples/src/main/scala/org/apache/sprak/examples/graphx which contains LiveJournalPageRank,scala 2. spark-1.0.0/graphx/src/main/scala/org/apache/sprak/graphx/lib which contains Analytics.scala, ConnectedComponenets.scala etc etc Now, if I want to add my own code to GraphX i.e., if I want to write a small application on GraphX, in which directory should I add my code, in 1 or 2 ? And what is the difference? If you want to add an algorithm which you can call from the Spark shell and submit as a pull request, you should add it to org.apache.spark.graphx.lib (#2). To run it from the command line, you'll also have to modify Analytics.scala. If you want to write a separate application, the ideal way is to do it in a separate project that links in Spark as a dependency [1]. It will also work to put it in either #1 or #2, but this will be worse in the long term because each build cycle will require you to rebuild and restart all of Spark rather than just building your application and calling spark-submit on the new JAR. Ankur [1] http://spark.apache.org/docs/1.0.2/quick-start.html#standalone-applications - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: New Codes in GraphX
What command should I use to run the LiveJournalPageRank.scala? If you want to write a separate application, the ideal way is to do it in a separate project that links in Spark as a dependency [1]. But even for this, I have to do the build every time I change the code, right? Thank You On Tue, Nov 18, 2014 at 3:35 PM, Ankur Dave ankurd...@gmail.com wrote: At 2014-11-18 14:51:54 +0530, Deep Pradhan pradhandeep1...@gmail.com wrote: I am using Spark-1.0.0. There are two GraphX directories that I can see here 1. spark-1.0.0/examples/src/main/scala/org/apache/sprak/examples/graphx which contains LiveJournalPageRank,scala 2. spark-1.0.0/graphx/src/main/scala/org/apache/sprak/graphx/lib which contains Analytics.scala, ConnectedComponenets.scala etc etc Now, if I want to add my own code to GraphX i.e., if I want to write a small application on GraphX, in which directory should I add my code, in 1 or 2 ? And what is the difference? If you want to add an algorithm which you can call from the Spark shell and submit as a pull request, you should add it to org.apache.spark.graphx.lib (#2). To run it from the command line, you'll also have to modify Analytics.scala. If you want to write a separate application, the ideal way is to do it in a separate project that links in Spark as a dependency [1]. It will also work to put it in either #1 or #2, but this will be worse in the long term because each build cycle will require you to rebuild and restart all of Spark rather than just building your application and calling spark-submit on the new JAR. Ankur [1] http://spark.apache.org/docs/1.0.2/quick-start.html#standalone-applications
Re: New Codes in GraphX
At 2014-11-18 15:35:13 +0530, Deep Pradhan pradhandeep1...@gmail.com wrote: Now, how do I run the LiveJournalPageRank.scala that is there in 1? I think it should work to use MASTER=local[*] $SPARK_HOME/bin/run-example graphx.LiveJournalPageRank /edge-list-file.txt --numEPart=8 --numIter=10 --partStrategy=EdgePartition2D Ankur - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: New Codes in GraphX
Yes the above command works, but there is this problem. Most of the times, the total rank is Nan (Not a Number). Why is it so? Thank You On Tue, Nov 18, 2014 at 3:48 PM, Deep Pradhan pradhandeep1...@gmail.com wrote: What command should I use to run the LiveJournalPageRank.scala? If you want to write a separate application, the ideal way is to do it in a separate project that links in Spark as a dependency [1]. But even for this, I have to do the build every time I change the code, right? Thank You On Tue, Nov 18, 2014 at 3:35 PM, Ankur Dave ankurd...@gmail.com wrote: At 2014-11-18 14:51:54 +0530, Deep Pradhan pradhandeep1...@gmail.com wrote: I am using Spark-1.0.0. There are two GraphX directories that I can see here 1. spark-1.0.0/examples/src/main/scala/org/apache/sprak/examples/graphx which contains LiveJournalPageRank,scala 2. spark-1.0.0/graphx/src/main/scala/org/apache/sprak/graphx/lib which contains Analytics.scala, ConnectedComponenets.scala etc etc Now, if I want to add my own code to GraphX i.e., if I want to write a small application on GraphX, in which directory should I add my code, in 1 or 2 ? And what is the difference? If you want to add an algorithm which you can call from the Spark shell and submit as a pull request, you should add it to org.apache.spark.graphx.lib (#2). To run it from the command line, you'll also have to modify Analytics.scala. If you want to write a separate application, the ideal way is to do it in a separate project that links in Spark as a dependency [1]. It will also work to put it in either #1 or #2, but this will be worse in the long term because each build cycle will require you to rebuild and restart all of Spark rather than just building your application and calling spark-submit on the new JAR. Ankur [1] http://spark.apache.org/docs/1.0.2/quick-start.html#standalone-applications
Re: New Codes in GraphX
At 2014-11-18 15:51:52 +0530, Deep Pradhan pradhandeep1...@gmail.com wrote: Yes the above command works, but there is this problem. Most of the times, the total rank is Nan (Not a Number). Why is it so? I've also seen this, but I'm not sure why it happens. If you could find out which vertices are getting the NaN rank, it might be helpful in tracking down the problem. Ankur - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org