On distributed APSP algorithms

Mohit Mohit Daga Fri, 25 Mar 2022 05:46:53 -0700

Hi all,

I wrote a recent email titled: "Idea for a Google Summer of Code Project". This 
email is a further attempt to create some excitement about the APSP results, 
and on reasons to try them on Giraph. This email is rather a brief tutorial of 
some results. Please feel to get back. Also, apologies for long email.


I will discuss some candidate algorithms and try to make the larger point. To 
fix the notations, let's say we have n vertices and m edges. Each vertex needs 
to know the shortest distances to all other vertices. The distinguishing 
feature between these algorithms is what can be the size of the message to be 
passed, to which vertex the messages can be passed (only connected vertices, or 
all other vertices), if the result works for weighted, un-weighted, etc.

1. *Large messages (upto O(n) size) are allowed.* (key idea: vanilla algorithm)
Each vertex maintains a table of distances for all n vertices. In each round it 
updates the table and exchange with its neighbours.

**Features of 1**

* Issue: large messages of size O(n) requires be passed.
* Round complexity. O(n)
* Works for weighted, unweighted etc.*
* communication only occurs on edges in the graph
* deterministic, exact result

2. *Only Small message (upto O(log n) size) are allowed* (key idea: a neat 
technique to parallelize token propagation, appeared in 2012)

Each vertex propagates a single source shortest path protocol. Something 
similar to chapter 4 of Practical Graph Analytics with Apache Giraph [1]. This 
takes O(n) rounds, (O(Diameter) rounds to be precise), and we do n such 
protocols for each vertex, overall taking O(n^2) rounds (or O(n.D) rounds). If 
the graph is unweighted, then we even have a very cute O(n) round algorithm [2, 
see Algorithm 1: 5 line algorithm]. The basic idea is just that we do not wait 
for the finish of the shortest path protocol from a single vertex, instead we 
run this parallelly, using edges, that are already decided to be in BFS of 
earlier the shortest path. 

**Features of 2**

* small messages, so no worries
* Round* complexity. O(n)
* Works for unweighted *
* communication only occurs on edges in the graph
* deterministic, exact result

3. *small messages, randomized, but exact* key idea: randomly selecting 
vertices with 1/2^i probability, ensures that every path of hop  (number of 
edges) distance c.2^i has at least one of the selected vertices)

This is cute, recent (from 2019), simple algorithm, that allows only small 
messages to be passed between connected vertices. One difference from 2, is 
that this also works for weighted graphs.

Higher level Algorithm
S_i = randomly selected vertex set with probability 1/2^i, i ranging from 0 to 
log n
Recursive step (i): 
  run the shortest path protocol in 2 from vertices in S_i upto hop distance 
2^i. // |S_i| X 2^i = O(n)
  broadcast all the 2^i x S_i distances // always O(n) broadcasts only.
  Recursive step (i+1)
  all vertices have enough information to calculate distances from S_i // not 
hard to verify

**Features of 3**

* small messages, so no worries
* Round* complexity. O(n)
* Works for weighted *
* communication only occurs on edges in the graph.
* randomized, exact result

4. The difference here is that messages are allowed to be sent to all nodes, 
irrespective of graph connection. such a model is called Congested Clique, the  
algorithms here have poly log round, or even constant round complexity.  There 
is a lot of recent work in this regard. [4,5].

**Features of 4**

* small messages, so no worries
* Round* complexity O(poly log), or constant round
* Works for weighted, unweighted etc. *
* communication is allowed to all nodes irrespective of graph connection
* randomized, approximation results

Overall, IMHO, it does not make sense to implement 1. **High message cost**. 
Algorithm 2, is pretty simple. Algorithm 3 needs shared randomness. But its 
simple too. They are O(n) round, exchanges small messages, so still a worth 
try, especially due to simplicity. Algorithms mentioned in 4 are complex, may 
take a bit of time to implement. But it is so cool, and the theoretically 
claimed results are poly logorithmic round, or even constant round complexity. 
***It baffles me***, but as much as I know about the results, they are correct 
and provable. The time complexities are pretty cool. It would be really 
interesting to see how does these complex algorithms serve on a system like 
Giraph. 

Best Regards,
Mohit Daga

P.S, My work is mostly on minimum-edge-cuts. But I follow results on shortest 
path. I work on shortest path by me is in progress.


[1] 
https://www.amazon.com/Practical-Graph-Analytics-Apache-Giraph/dp/1484212525 
[2] https://dl.acm.org/doi/pdf/10.1145/2332432.2332504 
[3] https://dl.acm.org/doi/pdf/10.1145/3313276.3316326 
[4] https://dl.acm.org/doi/10.1145/3293611.3331633
[5] https://dl.acm.org/doi/pdf/10.1145/3465084.3467928

On distributed APSP algorithms

Reply via email to