bonnal-enzo created SPARK-35357:
-----------------------------------

             Summary: Allow to turn off the normalization applied by static 
PageRank utilities
                 Key: SPARK-35357
                 URL: https://issues.apache.org/jira/browse/SPARK-35357
             Project: Spark
          Issue Type: Improvement
          Components: GraphX
    Affects Versions: 3.1.1
            Reporter: bonnal-enzo


Since SPARK-18847, static PageRank computations available in `PageRank.scala` 
are normalizing the sum of the ranks after the fixed number of iterations has 
completed, and *there is no way for a developer to access the raw non 
normalized ranks values*.

Since SPARK-29877 one can run a fixed number of PageRank iterations starting 
from previous `preRankGraph`'s ranks.
 This nice feature open the door for interesting *incremental algorithms*, for 
example:
 "Run some initial pagerank iterations using `PageRank.runWithOptions` and then 
update the graph's edges and update the ranks with a call to 
`PageRank.runWithOptionsWithPreviousPageRank`, and so on...".

This kind of algorithms would highly benefit (precision gain) from being 
allowed to manipulate directly the raw ranks (and not the normalized ones) in 
the case where the graph has a substantial proportion of sinks (vertices 
without outgoing edges).

It would be nice to add a method's signature having a boolean that allows to 
turn off the automatic normalization run at the end of 
`PageRank.runWithOptions` and `PageRank.runWithOptionsWithPreviousPageRank`, 
making the developers free to apply the normalization only when they really 
need it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to