[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...
Github user ankurdave commented on the issue: https://github.com/apache/spark/pull/16271 Merged into master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...
Github user ankurdave commented on the issue: https://github.com/apache/spark/pull/16271 Thanks @aray for the explanation. I agree with @srowen - this looks reasonable to me. I'm going to merge it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...
Github user aray commented on the issue: https://github.com/apache/spark/pull/16271 Yes the improvement is from the sum of magnitudes of initial values being closer to the (known) sum of the solution. Fiddling with resetProb controls a completely different thing. The current implementation has no advantage (excluding finding the incorrect solution to a star graph one iteration faster). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...
Github user aray commented on the issue: https://github.com/apache/spark/pull/16271 **References** [Pagerank paper](http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf) > We need to make an initial assignment of the ranks. This assignment can be made by one of several strategies. If it is going to iterate until convergence, in general the initial values will not affect final values, just the rate of convergence. But we can speed up convergence by choosing a good initial assignment. Since they are more focused on updating values for one evolving graph (the internet) they dont really talk about starting from scratch. But this does emphisize that there is no change to answers, just rate of convergence. A more direct statement would be [Wikipedia](https://en.wikipedia.org/wiki/PageRank) > PageRank is initialized to the same value for all pages. In the original form of PageRank, the sum of PageRank over all pages was the total number of pages on the web at that time, so each page in this example would have an initial value of 1. Note that there are two variants of pagerank that differ by a constant multiple in outputs but are determined by the dampening factor, we use the version that sums to N (most other implementations use the other). More Wikipedia: >The difference between them is that the PageRank values in the first formula sum to one, while in the second formula each PageRank is multiplied by N and the sum becomes N. Essentialy starting with the correct sum is closer to the actual fixed point and thus gets you faster convergence. The [NetworkX implementation](https://github.com/networkx/networkx/blob/master/networkx/algorithms/link_analysis/pagerank_alg.py#L122) uses the variant that sums to 1 hence their initialization values are all 1/N. igraph is unfortunately not comparable as they use a [more complex linear solver approach](https://github.com/igraph/igraph/blob/master/src/prpack/prpack_solver.cpp) Additional credentials (if it matters): PhD Mathematics with dissertation in Graph Theory --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16271 I just emailed @ankurdave and he is going to look at this tonight. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16271 I am not sure who if anyone would review graphx at this point, and I am not so familiar with the implementation here. If it converges to the same answer faster that's good. it might be nice to understand why this init is better, like any paper or similar implementaiton. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16271 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70139/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16271 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16271 **[Test build #70139 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70139/consoleFull)** for PR 16271 at commit [`8be9a97`](https://github.com/apache/spark/commit/8be9a9765d10331ae1b5c15ff753bb5c2697acfc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16271 **[Test build #70139 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70139/consoleFull)** for PR 16271 at commit [`8be9a97`](https://github.com/apache/spark/commit/8be9a9765d10331ae1b5c15ff753bb5c2697acfc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...
Github user aray commented on the issue: https://github.com/apache/spark/pull/16271 ping @srowen @dbtsai @rxin @ankurdave @jegonzal --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...
Github user aray commented on the issue: https://github.com/apache/spark/pull/16271 Updated the above benchmark code with a log normal random graph on 10,000 vertices the difference is much more drastic. ![](http://i.imgur.com/Zo56dEO.png) (take the very bottom of the graph with a grain of salt as its in comparison to `g.pageRank(0.1)`, actual error continues to drop) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16271 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70101/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16271 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16271 **[Test build #70101 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70101/consoleFull)** for PR 16271 at commit [`33cd794`](https://github.com/apache/spark/commit/33cd79400d546b60a8fd87c8a7a0612f97ea8ebb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16271 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16271 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70100/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16271 **[Test build #70100 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70100/consoleFull)** for PR 16271 at commit [`7ea03a8`](https://github.com/apache/spark/commit/7ea03a88a3d9caa0ab7a7e6e681b8bf00b5cc128). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16271 **[Test build #70101 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70101/consoleFull)** for PR 16271 at commit [`33cd794`](https://github.com/apache/spark/commit/33cd79400d546b60a8fd87c8a7a0612f97ea8ebb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16271: [SPARK-18845][GraphX] PageRank has incorrect initializat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16271 **[Test build #70100 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70100/consoleFull)** for PR 16271 at commit [`7ea03a8`](https://github.com/apache/spark/commit/7ea03a88a3d9caa0ab7a7e6e681b8bf00b5cc128). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org