Re: What makes 'graph traversals' and 'relational joins' the same?

Joshua Shinavier Sun, 21 Apr 2019 11:59:09 -0700

On the subject of "reified joins", maybe be a picture will be worth a few
words. As I said in the thread
<https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ> on
property graph standardization, if you think of vertex labels, edge labels,
and property keys as types, each with projections to two other types, there
is a nice analogy with relations of two columns, and this analogy can be
easily extended to hyper-edges. Here is what the schema of the TinkerPop
classic graph looks like if you make each type (e.g. Person, Project,
knows, name) into a relation:


[image: image.png]


I have made the vertex types salmon-colored, the edge types yellow, the
property types green, and the data types blue. The "o" and "I" columns
represent the out-type (e.g. out-vertex type of Person) and in-type (e.g.
property value type of String) of each relation. More than two arrows from
a column represent a coproduct, e.g. the out-type of "name" is Person OR
Project. Now you can think of out() and in() as joins of two tables on a
primary and foreign key.

We are not limited to "out" and "in", however. Here is the ternary
relationship (hyper-edge) from hyper-edge slide
<https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49>
of
my Graph Day preso, which has three columns/roles/projections:

[image: image.png]


I have drawn Says in light blue to indicate that it is a generalized
element; it has projections other than "out" and "in". Now the line between
relations and edges begins to blur. E.g. in the following, is PlaceEvent a
vertex or a property?

[image: image.png]


With the right type system, we can just speak of graph elements, and use
"vertex", "edge", "property" when it is convenient. In the relational
model, they are relations. If you materialize them in a relational
database, they are rows. In any case, you need two basic graph traversal
operations:

   - project() -- forward traversal of the arrows in the above diagrams.
   Takes you from an element to a component like in-vertex.
   - select() -- reverse traversal of the arrows. Allows you to answer
   questions like "in which Trips is John Doe the rider?"


Josh


On Fri, Apr 19, 2019 at 10:03 AM Marko Rodriguez <okramma...@gmail.com>
wrote:

> Hello,
>
> I agree with everything you say. Here is my question:
>
>         Relational database — join: Table x Table x equality function ->
> Table
>         Graph database — traverser: Vertex x edge label -> Vertex
>
> I want a single function that does both. The only think was to represent
> traverser() in terms of join():
>
>         Graph database — traverser: Vertices x Vertex x equality function
> -> Vertices
>
> For example,
>
> V().out(‘address’)
>
>         ==>
>
> g.join(V().hasLabel(‘person’).as(‘a’)
>        V().hasLabel(‘addresses’).as(‘b’)).
>          by(‘name’).select(?address vertex?)
>
> That is, join the vertices with themselves based on some predicate to go
> from vertices to vertices.
>
> However, I would like instead to transform the relational database join()
> concept into a traverser() concept. Kuppitz and I were talking the other
> day about a link() type operator that says: “try and link to this thing in
> some specified way.” .. ?? The problem we ran into is again, “link it to
> what?”
>
>         - in graph, the ‘to what’ is hardcoded so you don’t need to
> specify anything.
>         - in rdbms, the ’to what’ is some other specified table.
>
> So what does the link() operator look like?
>
> ——
>
> Some other random thoughts….
>
> Relational databases join on the table (the whole collection)
> Graph databases traverser on the vertex (an element of the whole
> collection)
>
> We can make a relational database join on single row (by providing a
> filter to a particular primary key). This is the same as a table with one
> row. Likewise, for graph in the join() context above:
>
> V(1).out(‘address’)
>
>         ==>
>
> g.join(V(1).as(‘a’)
>        V().hasLabel(‘addresses’).as(‘b’)).
>          by(‘name’).select(?address vertex?)
>
> More thoughts please….
>
> Marko.
>
> http://rredux.com <http://rredux.com/>
>
>
>
>
> > On Apr 19, 2019, at 4:20 AM, pieter martin <pieter.mar...@gmail.com>
> wrote:
> >
> > Hi,
> > The way I saw it is that the big difference is that graph's have
> > reified joins. This is both a blessing and a curse.
> > A blessing because its much easier (less text to type, less mistakes,
> > clearer semantics...) to traverse an edge than to construct a manual
> > join.A curse because there are almost always far more ways to traverse
> > a data set than just by the edges some architect might have considered
> > when creating the data set. Often the architect is not the domain
> > expert and the edges are a hardcoded layout of the dataset, which
> > almost certainly won't survive the real world's demands. In graphs, if
> > their are no edges then the data is not reachable, except via indexed
> > lookups. This is the standard engineering problem of database design,
> > but it is important and useful that data can be traversed, joined,
> > without having reified edges.
> > In Sqlg at least, but I suspect it generalizes, I want to create the
> > notion of a "virtual edge". Which in meta data describes the join and
> > then the standard to(direction, "virtualEdgeName") will work.
> > In a way this is precisely to keep the graphy nature of gremlin, i.e.
> > traversing edges, and avoid using the manual join syntax you described.
> > CheersPieter
> >
> > On Thu, 2019-04-18 at 14:15 -0600, Marko Rodriguez wrote:
> >> Hi,
> >> *** This is mainly for Kuppitz, but if others care.
> >> Was thinking last night about relational data and Gremlin. The T()
> >> step returns all the tables in the withStructure() RDBMS database.
> >> Tables are ‘complex values’ so they can't leave the VM (only a simple
> >> ‘toString’).
> >> Below is a fake Gremlin session. (and these are just ideas…) tables
> >> -> a ListLike of rows        rows -> a MapLike of primitives
> >> gremlin> g.T()==>t[people]==>t[addresses]gremlin>
> >> g.T(‘people’)==>t[people]gremlin>
> >> g.T(‘people’).values()==>r[people:1]==>r[people:2]==>r[people:3]greml
> >> in>
> >> g.T(‘people’).values().asMap()==>{name:marko,age:29}==>{name:kuppitz,
> >> age:10}==>{name:josh,age:35}gremlin>
> >> g.T(‘people’).values().has(‘age’,gt(20))==>r[people:1]==>r[people:3]g
> >> remlin>
> >> g.T(‘people’).values().has(‘age’,gt(20)).values(‘name’)==>marko==>jos
> >> h
> >> Makes sense. Nice that values() and has() generally apply to all
> >> ListLike and MapLike structures. Also, note how asMap() is the
> >> valueMap() of TP4, but generalizes to anything that is MapLike so it
> >> can be turned into a primitive form as a data-rich result from the
> >> VM.
> >> gremlin> g.T()==>t[people]==>t[addresses]gremlin>
> >> g.T(‘addresses’).values().asMap()==>{name:marko,city:santafe}==>{name
> >> :kuppitz,city:tucson}==>{name:josh,city:desertisland}gremlin>
> >> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).             by(se
> >> lect(‘a’).value(’name’).is(eq(select(‘b’).value(’name’))).
> >> values().asMap()==>{a.name:marko,a.age:29,b.name:marko,b.city:santafe
> >> }==>{a.name:kuppitz,a.age:10,b.name:kuppitz,b.city:tucson}==>{a.name:
> >> josh,a.age:35,b.name:josh,b.city:desertisland}gremlin>
> >> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).             by(’n
> >> ame’). // shorthand for equijoin on name
> >> column/key           values().asMap()==>{a.name:marko,a.age:29,b.name
> >> :marko,b.city:santafe}==>{a.name:kuppitz,a.age:10,b.name:kuppitz,b.ci
> >> ty:tucson}==>{a.name:josh,a.age:35,b.name:josh,b.city:desertisland}gr
> >> emlin>
> >> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).             by(’n
> >> ame’)==>t[people<-name->addresses]  // without asMap(), just the
> >> complex value ‘toString'gremlin>
> >> And of course, all of this is strategized into a SQL call so its
> >> joins aren’t necessarily computed using TP4-VM resources.
> >> Anywho — what I hope to realize is the relationship between “links”
> >> (graph) and “joins” (tables). How can we make (bytecode-wise at
> >> least) RDBMS join operations and graph traversal operations ‘the
> >> same.’?
> >>      Singleton: Integer, String, Float, Double, etc. Collection:
> >> List, Map (Vertex, Table, Document)  Linkable: Vertex, Table
> >> Vertices and Tables can be “linked.” Unlike Collections, they don’t
> >> maintain a “parent/child” relationship with the objects they
> >> reference. What does this mean……….?
> >> Take care,Marko.
> >> http://rredux.com <http://rredux.com/> <http://rredux.com/ <
> http://rredux.com/>>
>
>

Re: What makes 'graph traversals' and 'relational joins' the same?

Reply via email to