I have a question about distribution of data to the segments for the various graph processing algos we are building.
Do we have guidance for users on how to distribute data? Does the strategy vary by algorithm? What impact will data distribution have on performance? Looking at Section 4.1 of the Pregel paper https://kowshik.github.io/JPregel/pregel_paper.pdf it has a default partitioning scheme of hash(ID) mod N, where N is the number of partitions. But then it says “Some applications work well with the default assignment, but some benefit from defining custom assignment functions to better exploit locality inherent in the graph. For example, a typical heuristic employed for the Web graph is to colocate vertices representing pages of the same site.” Frank