Repository: incubator-madlib Updated Branches: refs/heads/master 67b69eb8a -> 6025c4b0d
Graph Docs: Update BFS design doc Include write-up for BFS impplementation and design. Closes #161 Project: http://git-wip-us.apache.org/repos/asf/incubator-madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-madlib/commit/6025c4b0 Tree: http://git-wip-us.apache.org/repos/asf/incubator-madlib/tree/6025c4b0 Diff: http://git-wip-us.apache.org/repos/asf/incubator-madlib/diff/6025c4b0 Branch: refs/heads/master Commit: 6025c4b0d0d1a83a522a6c08553eb1d0632329be Parents: 67b69eb Author: Rashmi Raghu <rra...@pivotal.io> Authored: Thu Aug 3 16:15:32 2017 -0700 Committer: Nandish Jayaram <njaya...@apache.org> Committed: Fri Aug 11 11:25:05 2017 -0700 ---------------------------------------------------------------------- doc/design/modules/graph.tex | 51 ++++++++++++++++++++++++++++++++++++++- doc/literature.bib | 5 ++++ 2 files changed, 55 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-madlib/blob/6025c4b0/doc/design/modules/graph.tex ---------------------------------------------------------------------- diff --git a/doc/design/modules/graph.tex b/doc/design/modules/graph.tex index f9183b3..c631e16 100644 --- a/doc/design/modules/graph.tex +++ b/doc/design/modules/graph.tex @@ -22,7 +22,7 @@ \chapter[Graph]{Graph} \begin{moduleinfo} -\item[Authors] \href{mailto:okis...@pivotal.io}{Orhan Kislal}, \href{mailto:njaya...@pivotal.io}{Nandish Jayaram} +\item[Authors] \href{mailto:okis...@pivotal.io}{Orhan Kislal}, \href{mailto:njaya...@pivotal.io}{Nandish Jayaram}, \href{mailto:rra...@pivotal.io}{Rashmi Raghu} \item[History] \begin{modulehistory} \item[v0.1] Initial version, SSSP only. @@ -30,6 +30,7 @@ \item[v0.3] PageRank \item[v0.4] APSP \item[v0.5] Weakly Connected Components + \item[v0.6] Breadth First Search (BFS) \end{modulehistory} \end{moduleinfo} @@ -616,3 +617,51 @@ associated with each vertex in $G$. The component ID of all the vertices in a component is equal to the smallest vertex ID in the corresponding subgraph. +\section{Breadth-first Search} \label{sec:graph:bfs} + +Given a graph $G$ and a user-specified origin vertex $src$, this algorithm +searches and discovers connected nodes in a graph in breadth-first order +\cite{bfs_wikipedia}. The graph can be treated as either directed or +undirected based on a parameter specified when calling the function. +There is also a parameter to control the number of hops (edges) to traverse +from the source vertex. If not specified, all nodes accessible from the +source node will be discovered. + +\subsection{Implementation Details} +\begin{algorithm}[Breadth First Search$(V, E, src)$] \label{alg:bfs:high} +\begin{algorithmic}[1] + \State Set $dist \leftarrow 0$ + \State Create $message$ table with $src$ vertex, $NULL$ parent, and $dist$ + \State Copy $message$ table to output table $out$ + \Repeat + \State Create $toupdate$ table using $out$ and $message$ tables + \State $dist \leftarrow dist + 1$ + \State Update $message$ table with newly found candidate vertices, parent and $dist$ + \State Copy $message$ table to $out$ + \Until {There are no candidate vertices remaining in $message$ table} +\end{algorithmic} +\end{algorithm} + +The implementation details are similar to SSSP, albeit simpler. We only have to +track the number of hops and not the sum of weights, but other requirements and +restrictions such as finding the parent, distinct updates, etc. are common in +both cases. The output table is initialized only to the $src$ vertex to begin +with. A $message$ table is also maintained that contains the list of vertices +to traverse and explore in the next iteration, which is also initialized with +the $src$ vertex. BFS then runs iteratively until no more candidate vertices +remain in the $message$ table, as outlined in~\ref{alg:bfs:high}. + +At every iteration, $toupdate$ table is updated with vertices that are neighbors +of vertices in the $message$ table, that are not already visited in the past +(one scan of the $out$ table reveals all the vertices that have already been +visited in the past). All such newly found neighboring vertices in the current +iteration will have one or more parents, based on how many vertices in the +$message$ table have a direct edge to them. Each such vertex in the $message$ +table is marked as the parent of such newly found neighboring vertices in +the $toupdate$ table. + +The $message$ table is then cleared and updated with the contents of $toupdate$ +table, except that for each new neighboring vertex considered, only one of the +parents is recorded as its parent (the node with the smallest id among all +parent nodes). The content of this updated $message$ is then copied +to the $out$ table, and this process continues until $message$ table is empty. http://git-wip-us.apache.org/repos/asf/incubator-madlib/blob/6025c4b0/doc/literature.bib ---------------------------------------------------------------------- diff --git a/doc/literature.bib b/doc/literature.bib index aadd65b..225622d 100644 --- a/doc/literature.bib +++ b/doc/literature.bib @@ -949,3 +949,8 @@ Applied Survival Analysis}, Title = {{MLP(III): Back-Propagation}}, Author = {{Yu Hen Hu}} } + +@online{bfs_wikipedia, + title = {Breadth-first search}, + url={https://en.wikipedia.org/wiki/Breadth-first_search} +} \ No newline at end of file