Re: Design document for FLINK-2254

Vasiliki Kalavri Thu, 22 Oct 2015 02:11:15 -0700

Hi Saumitra,

could you maybe add your ideas above in a google doc and share the link, so
we can easily comment and/or edit?


Thank you!
-Vasia.

On 22 October 2015 at 10:53, Andra Lungu <lungu.an...@gmail.com> wrote:

> Hi Saumitra,
>
> As you already noticed, the first version (with duplicates) is highly
> inefficient and consumes a lot of memory. So, I suggest we drop it for now.
> The version with the label makes a lot of modifications on the base Graph
> class, and this, in my opinion would make it more difficult to grasp.
> Bipartite Graphs are not as common as regular graphs. And then, when you
> would add an Isomorphic Graph class (stupid example), you'll need to strip
> stuff out again.
>
> I believe we should go with a BipartiteGraph class which extends Graph and
> which has a DataSet of topVertices and a DataSet of bottomVertices. Apart
> from getTopVertices() methods, we would still need functions such as
> getNumberOfTopVertices(). For the time being, just support the funstiones
> mentiones as well as the basics: fromCollection, fromDataSet, etc, get*,
> join*, map*, filter*.
>
> Wait for a few more opinions and dig in. I believe we should discuss this
> even further and propose Jiras for examples of algos for bipartite graphs.
>
> Once we have the design specs well figured out, and if I find some time
> I'll gladly chime in :)
>
> Cheers!
> Andra
>
> On Wed, Oct 21, 2015 at 8:45 PM, Saumitra Shahapure <
> saumitra.offic...@gmail.com> wrote:
>
> > In FLINK-2254 <https://issues.apache.org/jira/browse/FLINK-2254> , we
> > want to extend Graph API of Gelly by adding support for bipartite graphs
> > too. In the long term, the support for newer graph types can be added in
> > the same way. Also specialised algorithms for special types of graphs
> > should be implemented.
> >
> > For bipartite graph, a new class BipartiteGraph can be implemented which
> > extends Graph. Other graph algorithms which are written for Graph can be
> > used for BipartiteGraph too, because of inheritance.
> > Initialisation functions like  public static <K, VV, EV> Graph<K, VV, EV>
> > fromCollection(Collection<Vertex<K, VV>> vertices,Collection<Edge<K, EV>>
> > edges, ExecutionEnvironment context) need to be duplicated as public
> static
> > <K, VV, EV> BipartiteGraph<K, VV, EV> fromCollection(Collection<Vertex<K,
> > VV>> topVertices,Collection<Vertex<K, VV>> bottomVertices,
> > Collection<Edge<K, EV>> edges, ExecutionEnvironment context)
> > Graph validation does not need to happen implicitly. user can call
> > BipartiteGraph.validate() in case he wants to check sanity of data.
> > Vertex modes is primary requirement. BipartiteGraph should have
> functions,
> > getTopVertices() and getBottomVertices(). There are three ways to
> implement
> > it:
> > Simplest and least efficient way is to maintain vertices variable in
> Graph
> > in addition to two more Datasets topVertices, bottomVertices. Benefit of
> > this approach is that Graph class would not need any modification at all.
> > this.vertices variable access inside Graph class would work correctly.
> > Disadvantage is that multiple copies of vertex dataset are created and
> > maintained. So this is inefficient in terms of memory.
> > Another way is to keep topVertices and bottomVertices variables in
> > BipartiteGraph. vertices variable in Graph would stay empty.
> getVertices()
> > function of Graph would be overloaded by BipartiteGraph and reimplemented
> > as union of of topVertices and bottomVertices. Since, vertices is a
> private
> > variable, external view of vertices stays unaffected. All functions of
> > Graph class which use vertices local variable (e.g.
> getVerticesAsTuple2())
> > need to use getVertices() instead of this.vertices. Disadvantage of this
> > method is Graph.vertices variable would have invalid value throughout for
> > BipartiteGraph.
> > Another way is to ‘label’ vertices with an integer. So public class
> > BipartiteGraph<K,VV,EV> extends Graph<K, Tuple2<VV,Integer>, EV> would be
> > the signature. Initialisers would tag the vertices according to their
> mode.
> > getVertices() should be overridden to strip-off the tag values so that
> > users would get consistent view of vertex dataset for Graph and
> > BipartiteGraph. getTopVertices/getBottomVertices would be filter
> functions
> > on vertex tags.
> > I personally feel method 2 to be most elegant.
> > Functions like addVertices(vertexList) are not relevant without
> specifying
> > whether they are top or bottom. A possibility could be to add them to
> > topVertices by default. And have overloaded function
> > addVertices(vertexList, mode) to add them to specific partition.
> > Specialised algorithms for Bipartite graphs can implement new
> > BipartiteGraphAlgorithm interface. It would be similar to GraphAlgorithm.
> > In future, newer types of graphs can be similarly derived from Graph and
> > type hierarchy of Graph can be created. Class hierarchy would be better
> > here rather than interface hierarchy, because the functions which are
> > applicable to all derived classes can be implemented in the base class.
> > As a part of future refactoring, Graph transformation functions should
> > rather be a new class implementing GraphAlgorithm rather than function in
> > Graph class. This may make implementing complex graph types easier.
> > PS. Do I need to add it in wiki too? I can copy the contents to wiki if
> > you say so,
> > -- Saumitra Shahapure
> >
> >
>

Re: Design document for FLINK-2254

Reply via email to