Hi Greg,

thank you for this proposal!
I think graph generators will be a very useful addition to Gelly.

I'm not quite familiar with the state-of-the-art algorithms for distributed
graph generation.
I suppose that we could easily provide an efficient random graph generator
and I've also seen some work on parallel/distributed algorithms for R-MAT
[1, 2].
Are you aware of similar work for Erdos-Reniy, Kronecker or other types of
graphs?
Another place we might want to look at is Giraph's Watts-Strogatz generator
[3].

Cheers,
Vasia.

[1]: https://github.com/farkhor/PaRMAT/
[2]: http://arxiv.org/pdf/1210.0187.pdf
[3]:
https://giraph.apache.org/apidocs/org/apache/giraph/io/formats/WattsStrogatzVertexInputFormat.html


On 23 September 2015 at 19:49, Greg Hogan <[email protected]> wrote:

> I would like to propose that Flink include a selection of graph generators
> in Gelly. Generated graphs will be useful for performing scalability,
> stress, and regression testing as well as benchmarking and comparing
> algorithms, both for Flink users and developers. Generated data is
> infinitely scalable yet described by a few simple parameters and can often
> substitute for user data or sharing large files when reporting issues.
>
> Spark's GraphX includes a modest GraphGenerators class [1].
>
> The initial implementation would focus on Erdos-Renyi, R-Mat [2], and
> Kronecker [3] generators.
>
> A key consideration is that the graphs should be seedable and generate the
> same Graph regardless of parallelism.
>
> Generated data is a complement to my proposed "Checksum method for DataSet
> and Graph" [4].
>
> [1]
>
> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.graphx.util.GraphGenerators$
> [2] R-MAT: A Recursive Model for Graph Mining;
> http://snap.stanford.edu/class/cs224w-readings/chakrabarti04rmat.pdf
> [3] Kronecker graphs: An Approach to Modeling Networks;
> http://arxiv.org/pdf/0812.4905v2.pdf
> [4] https://issues.apache.org/jira/browse/FLINK-2716
>
> Greg Hogan
>

Reply via email to