Hello all, We are about to finalise following design. Please review it in the google doc below. And let us know your inputs.
https://docs.google.com/document/d/1SChonW4bXNiqU2Pz9PPjuKDuKrBGmESrPcJXamj4dlI/edit -- Saumitra S. Shahapure On Thu, Oct 22, 2015 at 1:00 PM, Saumitra Shahapure < saumitra.offic...@gmail.com> wrote: > Hi Vasia, > > Here > <https://docs.google.com/document/d/1SChonW4bXNiqU2Pz9PPjuKDuKrBGmESrPcJXamj4dlI/edit?usp=sharing> > is the Google doc. > > -- Saumitra S. Shahapure > > On Thu, Oct 22, 2015 at 11:10 AM, Vasiliki Kalavri < > vasilikikala...@gmail.com> wrote: > >> Hi Saumitra, >> >> could you maybe add your ideas above in a google doc and share the link, >> so >> we can easily comment and/or edit? >> >> Thank you! >> -Vasia. >> >> On 22 October 2015 at 10:53, Andra Lungu <lungu.an...@gmail.com> wrote: >> >> > Hi Saumitra, >> > >> > As you already noticed, the first version (with duplicates) is highly >> > inefficient and consumes a lot of memory. So, I suggest we drop it for >> now. >> > The version with the label makes a lot of modifications on the base >> Graph >> > class, and this, in my opinion would make it more difficult to grasp. >> > Bipartite Graphs are not as common as regular graphs. And then, when you >> > would add an Isomorphic Graph class (stupid example), you'll need to >> strip >> > stuff out again. >> > >> > I believe we should go with a BipartiteGraph class which extends Graph >> and >> > which has a DataSet of topVertices and a DataSet of bottomVertices. >> Apart >> > from getTopVertices() methods, we would still need functions such as >> > getNumberOfTopVertices(). For the time being, just support the >> funstiones >> > mentiones as well as the basics: fromCollection, fromDataSet, etc, get*, >> > join*, map*, filter*. >> > >> > Wait for a few more opinions and dig in. I believe we should discuss >> this >> > even further and propose Jiras for examples of algos for bipartite >> graphs. >> > >> > Once we have the design specs well figured out, and if I find some time >> > I'll gladly chime in :) >> > >> > Cheers! >> > Andra >> > >> > On Wed, Oct 21, 2015 at 8:45 PM, Saumitra Shahapure < >> > saumitra.offic...@gmail.com> wrote: >> > >> > > In FLINK-2254 <https://issues.apache.org/jira/browse/FLINK-2254> , we >> > > want to extend Graph API of Gelly by adding support for bipartite >> graphs >> > > too. In the long term, the support for newer graph types can be added >> in >> > > the same way. Also specialised algorithms for special types of graphs >> > > should be implemented. >> > > >> > > For bipartite graph, a new class BipartiteGraph can be implemented >> which >> > > extends Graph. Other graph algorithms which are written for Graph can >> be >> > > used for BipartiteGraph too, because of inheritance. >> > > Initialisation functions like public static <K, VV, EV> Graph<K, VV, >> EV> >> > > fromCollection(Collection<Vertex<K, VV>> vertices,Collection<Edge<K, >> EV>> >> > > edges, ExecutionEnvironment context) need to be duplicated as public >> > static >> > > <K, VV, EV> BipartiteGraph<K, VV, EV> >> fromCollection(Collection<Vertex<K, >> > > VV>> topVertices,Collection<Vertex<K, VV>> bottomVertices, >> > > Collection<Edge<K, EV>> edges, ExecutionEnvironment context) >> > > Graph validation does not need to happen implicitly. user can call >> > > BipartiteGraph.validate() in case he wants to check sanity of data. >> > > Vertex modes is primary requirement. BipartiteGraph should have >> > functions, >> > > getTopVertices() and getBottomVertices(). There are three ways to >> > implement >> > > it: >> > > Simplest and least efficient way is to maintain vertices variable in >> > Graph >> > > in addition to two more Datasets topVertices, bottomVertices. Benefit >> of >> > > this approach is that Graph class would not need any modification at >> all. >> > > this.vertices variable access inside Graph class would work correctly. >> > > Disadvantage is that multiple copies of vertex dataset are created and >> > > maintained. So this is inefficient in terms of memory. >> > > Another way is to keep topVertices and bottomVertices variables in >> > > BipartiteGraph. vertices variable in Graph would stay empty. >> > getVertices() >> > > function of Graph would be overloaded by BipartiteGraph and >> reimplemented >> > > as union of of topVertices and bottomVertices. Since, vertices is a >> > private >> > > variable, external view of vertices stays unaffected. All functions of >> > > Graph class which use vertices local variable (e.g. >> > getVerticesAsTuple2()) >> > > need to use getVertices() instead of this.vertices. Disadvantage of >> this >> > > method is Graph.vertices variable would have invalid value throughout >> for >> > > BipartiteGraph. >> > > Another way is to ‘label’ vertices with an integer. So public class >> > > BipartiteGraph<K,VV,EV> extends Graph<K, Tuple2<VV,Integer>, EV> >> would be >> > > the signature. Initialisers would tag the vertices according to their >> > mode. >> > > getVertices() should be overridden to strip-off the tag values so that >> > > users would get consistent view of vertex dataset for Graph and >> > > BipartiteGraph. getTopVertices/getBottomVertices would be filter >> > functions >> > > on vertex tags. >> > > I personally feel method 2 to be most elegant. >> > > Functions like addVertices(vertexList) are not relevant without >> > specifying >> > > whether they are top or bottom. A possibility could be to add them to >> > > topVertices by default. And have overloaded function >> > > addVertices(vertexList, mode) to add them to specific partition. >> > > Specialised algorithms for Bipartite graphs can implement new >> > > BipartiteGraphAlgorithm interface. It would be similar to >> GraphAlgorithm. >> > > In future, newer types of graphs can be similarly derived from Graph >> and >> > > type hierarchy of Graph can be created. Class hierarchy would be >> better >> > > here rather than interface hierarchy, because the functions which are >> > > applicable to all derived classes can be implemented in the base >> class. >> > > As a part of future refactoring, Graph transformation functions should >> > > rather be a new class implementing GraphAlgorithm rather than >> function in >> > > Graph class. This may make implementing complex graph types easier. >> > > PS. Do I need to add it in wiki too? I can copy the contents to wiki >> if >> > > you say so, >> > > -- Saumitra Shahapure >> > > >> > > >> > >> > >