[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...

2016-12-15 Thread ankurdave
Github user ankurdave commented on the issue:

https://github.com/apache/spark/pull/16271
  
Merged into master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...

2016-12-15 Thread ankurdave
Github user ankurdave commented on the issue:

https://github.com/apache/spark/pull/16271
  
Thanks @aray for the explanation. I agree with @srowen - this looks 
reasonable to me. I'm going to merge it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...

2016-12-15 Thread aray
Github user aray commented on the issue:

https://github.com/apache/spark/pull/16271
  
Yes the improvement is from the sum of magnitudes of initial values being 
closer to the (known) sum of the solution. Fiddling with resetProb controls a 
completely different thing. The current implementation has no advantage 
(excluding finding the incorrect solution to a star graph one iteration faster).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...

2016-12-14 Thread aray
Github user aray commented on the issue:

https://github.com/apache/spark/pull/16271
  
**References**
[Pagerank paper](http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf)
> We need to make an initial assignment of the ranks. This assignment can 
be made by one of several strategies. If it is going to iterate until 
convergence, in general the initial values will not affect final values, just 
the rate of convergence. But we can speed
up convergence by choosing a good initial assignment.

Since they are more focused on updating values for one evolving graph (the 
internet) they dont really talk about starting from scratch. But this does 
emphisize that there is no change to answers, just rate of convergence.

A more direct statement would be 
[Wikipedia](https://en.wikipedia.org/wiki/PageRank)
> PageRank is initialized to the same value for all pages. In the original 
form of PageRank, the sum of PageRank over all pages was the total number of 
pages on the web at that time, so each page in this example would have an 
initial value of 1.

Note that there are two variants of pagerank that differ by a constant 
multiple in outputs but are determined by the dampening factor, we use the 
version that sums to N (most other implementations use the other). More 
Wikipedia:
>The difference between them is that the PageRank values in the first 
formula sum to one, while in the second formula each PageRank is multiplied by 
N and the sum becomes N.

Essentialy starting with the correct sum is closer to the actual fixed 
point and thus gets you faster convergence.

The [NetworkX 
implementation](https://github.com/networkx/networkx/blob/master/networkx/algorithms/link_analysis/pagerank_alg.py#L122)
 uses the variant that sums to 1 hence their initialization values are all 1/N. 

igraph is unfortunately not comparable as they use a [more complex linear 
solver 
approach](https://github.com/igraph/igraph/blob/master/src/prpack/prpack_solver.cpp)

Additional credentials (if it matters): PhD Mathematics with dissertation 
in Graph Theory


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...

2016-12-14 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16271
  
I just emailed @ankurdave and he is going to look at this tonight.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...

2016-12-14 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16271
  
I am not sure who if anyone would review graphx at this point, and I am not 
so familiar with the implementation here. If it converges to the same answer 
faster that's good. it might be nice to understand why this init is better, 
like any paper or similar implementaiton.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...

2016-12-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16271
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70139/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...

2016-12-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16271
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...

2016-12-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16271
  
**[Test build #70139 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70139/consoleFull)**
 for PR 16271 at commit 
[`8be9a97`](https://github.com/apache/spark/commit/8be9a9765d10331ae1b5c15ff753bb5c2697acfc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...

2016-12-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16271
  
**[Test build #70139 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70139/consoleFull)**
 for PR 16271 at commit 
[`8be9a97`](https://github.com/apache/spark/commit/8be9a9765d10331ae1b5c15ff753bb5c2697acfc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...

2016-12-14 Thread aray
Github user aray commented on the issue:

https://github.com/apache/spark/pull/16271
  
ping @srowen @dbtsai @rxin @ankurdave @jegonzal


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...

2016-12-14 Thread aray
Github user aray commented on the issue:

https://github.com/apache/spark/pull/16271
  
Updated the above benchmark code with a log normal random graph on 10,000 
vertices the difference is much more drastic.
![](http://i.imgur.com/Zo56dEO.png)
(take the very bottom of the graph with a grain of salt as its in 
comparison to `g.pageRank(0.1)`, actual error continues to drop)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...

2016-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16271
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70101/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...

2016-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16271
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...

2016-12-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16271
  
**[Test build #70101 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70101/consoleFull)**
 for PR 16271 at commit 
[`33cd794`](https://github.com/apache/spark/commit/33cd79400d546b60a8fd87c8a7a0612f97ea8ebb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...

2016-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16271
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...

2016-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16271
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70100/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...

2016-12-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16271
  
**[Test build #70100 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70100/consoleFull)**
 for PR 16271 at commit 
[`7ea03a8`](https://github.com/apache/spark/commit/7ea03a88a3d9caa0ab7a7e6e681b8bf00b5cc128).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...

2016-12-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16271
  
**[Test build #70101 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70101/consoleFull)**
 for PR 16271 at commit 
[`33cd794`](https://github.com/apache/spark/commit/33cd79400d546b60a8fd87c8a7a0612f97ea8ebb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...

2016-12-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16271
  
**[Test build #70100 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70100/consoleFull)**
 for PR 16271 at commit 
[`7ea03a8`](https://github.com/apache/spark/commit/7ea03a88a3d9caa0ab7a7e6e681b8bf00b5cc128).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org