Re: New Codes in GraphX

2014-11-24 Thread Deep Pradhan
Could it be because my edge list file is in the form (1 2), where there
is an edge between node 1 and node 2?

On Tue, Nov 18, 2014 at 4:13 PM, Ankur Dave ankurd...@gmail.com wrote:

 At 2014-11-18 15:51:52 +0530, Deep Pradhan pradhandeep1...@gmail.com
 wrote:
  Yes the above command works, but there is this problem. Most of the
 times,
  the total rank is Nan (Not a Number). Why is it so?

 I've also seen this, but I'm not sure why it happens. If you could find
 out which vertices are getting the NaN rank, it might be helpful in
 tracking down the problem.

 Ankur



New Codes in GraphX

2014-11-18 Thread Deep Pradhan
Hi,
I am using Spark-1.0.0. There are two GraphX directories that I can see here

1. spark-1.0.0/examples/src/main/scala/org/apache/sprak/examples/graphx
which contains LiveJournalPageRank,scala

2. spark-1.0.0/graphx/src/main/scala/org/apache/sprak/graphx/lib which
contains   Analytics.scala, ConnectedComponenets.scala etc etc

Now, if I want to add my own code to GraphX i.e., if I want to write a
small application on GraphX, in which directory should I add my code, in 1
or 2 ? And what is the difference?

Can anyone tell me something on this?

Thank You


Re: New Codes in GraphX

2014-11-18 Thread Deep Pradhan
The codes that are present in 2 can be run with the command

*$SPARK_HOME/bin/spark-submit --master local[*] --class
org.apache.spark.graphx.lib.Analytics
$SPARK_HOME/assembly/target/scala-2.10/spark-assembly-*.jar pagerank
/edge-list-file.txt --numEPart=8 --numIter=10
--partStrategy=EdgePartition2D*

Now, how do I run the LiveJournalPageRank.scala that is there in 1?



On Tue, Nov 18, 2014 at 2:51 PM, Deep Pradhan pradhandeep1...@gmail.com
wrote:

 Hi,
 I am using Spark-1.0.0. There are two GraphX directories that I can see
 here

 1. spark-1.0.0/examples/src/main/scala/org/apache/sprak/examples/graphx
 which contains LiveJournalPageRank,scala

 2. spark-1.0.0/graphx/src/main/scala/org/apache/sprak/graphx/lib which
 contains   Analytics.scala, ConnectedComponenets.scala etc etc

 Now, if I want to add my own code to GraphX i.e., if I want to write a
 small application on GraphX, in which directory should I add my code, in 1
 or 2 ? And what is the difference?

 Can anyone tell me something on this?

 Thank You



Re: New Codes in GraphX

2014-11-18 Thread Ankur Dave
At 2014-11-18 14:51:54 +0530, Deep Pradhan pradhandeep1...@gmail.com wrote:
 I am using Spark-1.0.0. There are two GraphX directories that I can see here

 1. spark-1.0.0/examples/src/main/scala/org/apache/sprak/examples/graphx
 which contains LiveJournalPageRank,scala

 2. spark-1.0.0/graphx/src/main/scala/org/apache/sprak/graphx/lib which
 contains   Analytics.scala, ConnectedComponenets.scala etc etc

 Now, if I want to add my own code to GraphX i.e., if I want to write a
 small application on GraphX, in which directory should I add my code, in 1
 or 2 ? And what is the difference?

If you want to add an algorithm which you can call from the Spark shell and 
submit as a pull request, you should add it to org.apache.spark.graphx.lib 
(#2). To run it from the command line, you'll also have to modify 
Analytics.scala.

If you want to write a separate application, the ideal way is to do it in a 
separate project that links in Spark as a dependency [1]. It will also work to 
put it in either #1 or #2, but this will be worse in the long term because each 
build cycle will require you to rebuild and restart all of Spark rather than 
just building your application and calling spark-submit on the new JAR.

Ankur

[1] http://spark.apache.org/docs/1.0.2/quick-start.html#standalone-applications

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: New Codes in GraphX

2014-11-18 Thread Deep Pradhan
What command should I use to run the LiveJournalPageRank.scala?

 If you want to write a separate application, the ideal way is to do it in
a separate project that links in Spark as a dependency [1].
But even for this, I have to do the build every time I change the code,
right?

Thank You

On Tue, Nov 18, 2014 at 3:35 PM, Ankur Dave ankurd...@gmail.com wrote:

 At 2014-11-18 14:51:54 +0530, Deep Pradhan pradhandeep1...@gmail.com
 wrote:
  I am using Spark-1.0.0. There are two GraphX directories that I can see
 here
 
  1. spark-1.0.0/examples/src/main/scala/org/apache/sprak/examples/graphx
  which contains LiveJournalPageRank,scala
 
  2. spark-1.0.0/graphx/src/main/scala/org/apache/sprak/graphx/lib which
  contains   Analytics.scala, ConnectedComponenets.scala etc etc
 
  Now, if I want to add my own code to GraphX i.e., if I want to write a
  small application on GraphX, in which directory should I add my code, in
 1
  or 2 ? And what is the difference?

 If you want to add an algorithm which you can call from the Spark shell
 and submit as a pull request, you should add it to
 org.apache.spark.graphx.lib (#2). To run it from the command line, you'll
 also have to modify Analytics.scala.

 If you want to write a separate application, the ideal way is to do it in
 a separate project that links in Spark as a dependency [1]. It will also
 work to put it in either #1 or #2, but this will be worse in the long term
 because each build cycle will require you to rebuild and restart all of
 Spark rather than just building your application and calling spark-submit
 on the new JAR.

 Ankur

 [1]
 http://spark.apache.org/docs/1.0.2/quick-start.html#standalone-applications



Re: New Codes in GraphX

2014-11-18 Thread Ankur Dave
At 2014-11-18 15:35:13 +0530, Deep Pradhan pradhandeep1...@gmail.com wrote:
 Now, how do I run the LiveJournalPageRank.scala that is there in 1?

I think it should work to use

MASTER=local[*] $SPARK_HOME/bin/run-example graphx.LiveJournalPageRank 
/edge-list-file.txt --numEPart=8 --numIter=10
 --partStrategy=EdgePartition2D

Ankur

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: New Codes in GraphX

2014-11-18 Thread Deep Pradhan
Yes the above command works, but there is this problem. Most of the times,
the total rank is Nan (Not a Number). Why is it so?

Thank You

On Tue, Nov 18, 2014 at 3:48 PM, Deep Pradhan pradhandeep1...@gmail.com
wrote:

 What command should I use to run the LiveJournalPageRank.scala?

  If you want to write a separate application, the ideal way is to do it
 in a separate project that links in Spark as a dependency [1].
 But even for this, I have to do the build every time I change the code,
 right?

 Thank You

 On Tue, Nov 18, 2014 at 3:35 PM, Ankur Dave ankurd...@gmail.com wrote:

 At 2014-11-18 14:51:54 +0530, Deep Pradhan pradhandeep1...@gmail.com
 wrote:
  I am using Spark-1.0.0. There are two GraphX directories that I can see
 here
 
  1. spark-1.0.0/examples/src/main/scala/org/apache/sprak/examples/graphx
  which contains LiveJournalPageRank,scala
 
  2. spark-1.0.0/graphx/src/main/scala/org/apache/sprak/graphx/lib which
  contains   Analytics.scala, ConnectedComponenets.scala etc etc
 
  Now, if I want to add my own code to GraphX i.e., if I want to write a
  small application on GraphX, in which directory should I add my code,
 in 1
  or 2 ? And what is the difference?

 If you want to add an algorithm which you can call from the Spark shell
 and submit as a pull request, you should add it to
 org.apache.spark.graphx.lib (#2). To run it from the command line, you'll
 also have to modify Analytics.scala.

 If you want to write a separate application, the ideal way is to do it in
 a separate project that links in Spark as a dependency [1]. It will also
 work to put it in either #1 or #2, but this will be worse in the long term
 because each build cycle will require you to rebuild and restart all of
 Spark rather than just building your application and calling spark-submit
 on the new JAR.

 Ankur

 [1]
 http://spark.apache.org/docs/1.0.2/quick-start.html#standalone-applications





Re: New Codes in GraphX

2014-11-18 Thread Ankur Dave
At 2014-11-18 15:51:52 +0530, Deep Pradhan pradhandeep1...@gmail.com wrote:
 Yes the above command works, but there is this problem. Most of the times,
 the total rank is Nan (Not a Number). Why is it so?

I've also seen this, but I'm not sure why it happens. If you could find out 
which vertices are getting the NaN rank, it might be helpful in tracking down 
the problem.

Ankur

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org