Re: What makes 'graph traversals' and 'relational joins' the same?

Marko Rodriguez Tue, 23 Apr 2019 05:14:19 -0700

Hey Josh,

This gets to the notion I presented in “The Fabled GMachine.”
        http://rredux.com/the-fabled-gmachine.html 
<http://rredux.com/the-fabled-gmachine.html> (first paragraph of “Structures, 
Processes, and Languages” section)


 All that exists are memory addresses that contain either:

        1. A primitive
        2. A set of labeled references to other references or primitives.

Using your work and the above, here is a super low-level ‘bytecode' for 
property graphs.

v.goto("id") => 1
v.goto("label") => person
v.goto("properties").goto("name") => "marko"
v.goto("properties").goto("name").goto(0) => "m"
v.goto("outE").goto("inV") => v[2], v[4]
g.goto("V").goto(1) => v[1]

The goto() instruction moves the “memory reference” (traverser) from the 
current “memory address” to the “memory address” referenced by the goto() 
argument.

The Gremlin expression:

        g.V().has(‘name’,’marko’).out(‘knows’).drop()

..would compile to:

        
g.goto(“V”).filter(goto(“properties”).goto(“name”).is(“marko”)).goto(“outE”).filter(goto(“label”).is(“knows”)).goto(“inV”).free()

…where free() is the opposite of malloc().

If we can get things that “low-level” and still efficient to compile, then we 
can model every data structure. All you are doing is pointer chasing through a 
withStructure() data structure. .

No one would ever want to write strategies for goto()-based Bytecode. Thus, 
perhaps there could be a PropertyGraphDecorationStrategy that does:

g = Gremlin.traversal(machine).withStructure(JanusGraph.class)  // this will 
register the strategy
g.V().has(‘name’,’marko’).out(‘knows’).drop() // this generates goto()-based 
bytecode underneath the covers
        ==submit==>
[goto,V][filter,[goto…]][goto][goto][free]] // Bytecode from the “fundamental 
instruction set” 
[V][has,name,marko][out,knows][drop] // PropertyGraphDecorationStrategy 
converts goto() instructions into a property graph-specific instruction set.
[V-idx,name,marko][out,knows][drop] // JanusGraphProviderStrategy converts 
V().has() into an index lookup instruction.

[I AM NOW GOING OFF THE RAILS]

Like fluent-style Gremlin, we could have an AssemblyLanguage that only has 
goto(), free(), malloc(), filter(), map(), reduce(), flatmap(), barrier(), 
branch(), repeat(), sideEffect() instructions. For instance, if you wanted to 
create an array list (not a linked list! :):

[“marko”,29,true]

you would do:

malloc(childrefs(0,1,2)).sideEffect(goto(0).malloc(“marko”)).sideEffect(goto(1).malloc(29)).sideEffect(goto(2).malloc(true))

This tells the underlying data structure (e.g. database) that you want to 
create a set of “children references" labeled 0, 1, and 2. And then you goto() 
each reference and add primitives. Now, if JanusGraph got this batch of 
instructions, it would do the following:

Vertex refs = graph.addVertex()
refs.addEdge(“childref", graph.addVertex(“value”,”marko”)).property(“ref”,0)
refs.addEdge(“childref", graph.addVertex(“value”,29)).property(“ref”,1)
refs.addEdge(“childref", graph.addVertex(“value”,true)).property(“ref”,2)

The reason for childref, is that if you delete the list, you should delete all 
the children referenced data! In other words, refs-vertex has cascading deletes.

list.drop()
==>
list.sideEffect(goto(0,1,2).free()).free()

JanusGraph would then do:

refs.out(“childref").drop()
refs.drop()

Or, more generally:

refs.emit().repeat(out(“childref”)).drop()

Trippy.

[I AM NOW BACK ON THE RAILS]

Its as if “properties”, “outE”, “label”, “inV”, etc. references mean something 
to property graph providers and they can do more intelligent stuff than what 
MongoDB would do with such information. However, someone, of course, can create 
a MongoDBPropertyGraphStrategy that would make documents look like vertices and 
edges and then use O(log(n)) lookups on ids to walk the graph. However, if that 
didn’t exist, it would still do something that works even if its horribly 
inefficient as every database can make primitives with references between them!

Anywho @Josh, I believe goto() is what you are doing with multi-references off 
an object. How do we make it all clean, easy, and universal?

Marko.

http://rredux.com <http://rredux.com/>




> On Apr 22, 2019, at 6:42 PM, Joshua Shinavier <j...@fortytwo.net> wrote:
> 
> Ah, glad you asked. It's all in the pictures. I have nowhere to put them 
> online at the moment... maybe this attachment will go through to the list?
> 
> Btw. David Spivak gave his talk today at Uber; it was great. Juan Sequeda 
> (relational <--> RDF mapping guy) was also here, and Ryan joined remotely. 
> Really interesting discussion about databases vs. graphs, and what category 
> theory brings to the table.
> 
> 
> On Mon, Apr 22, 2019 at 1:45 PM Marko Rodriguez <okramma...@gmail.com 
> <mailto:okramma...@gmail.com>> wrote:
> Hey Josh,
> 
> I’m digging what you are saying, but the pictures didn’t come through for me 
> ? … Can you provide them again (or if dev@ is filtering them, can you give me 
> URLs to them)?
> 
> Thanks,
> Marko.
> 
> 
> > On Apr 21, 2019, at 12:58 PM, Joshua Shinavier <j...@fortytwo.net 
> > <mailto:j...@fortytwo.net>> wrote:
> > 
> > On the subject of "reified joins", maybe be a picture will be worth a few 
> > words. As I said in the thread 
> > <https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ 
> > <https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ>> 
> > on property graph standardization, if you think of vertex labels, edge 
> > labels, and property keys as types, each with projections to two other 
> > types, there is a nice analogy with relations of two columns, and this 
> > analogy can be easily extended to hyper-edges. Here is what the schema of 
> > the TinkerPop classic graph looks like if you make each type (e.g. Person, 
> > Project, knows, name) into a relation:
> > 
> > 
> > 
> > I have made the vertex types salmon-colored, the edge types yellow, the 
> > property types green, and the data types blue. The "o" and "I" columns 
> > represent the out-type (e.g. out-vertex type of Person) and in-type (e.g. 
> > property value type of String) of each relation. More than two arrows from 
> > a column represent a coproduct, e.g. the out-type of "name" is Person OR 
> > Project. Now you can think of out() and in() as joins of two tables on a 
> > primary and foreign key.
> > 
> > We are not limited to "out" and "in", however. Here is the ternary 
> > relationship (hyper-edge) from hyper-edge slide 
> > <https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49
> >  
> > <https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49>>
> >  of my Graph Day preso, which has three columns/roles/projections:
> > 
> > 
> > 
> > I have drawn Says in light blue to indicate that it is a generalized 
> > element; it has projections other than "out" and "in". Now the line between 
> > relations and edges begins to blur. E.g. in the following, is PlaceEvent a 
> > vertex or a property?
> > 
> > 
> > 
> > With the right type system, we can just speak of graph elements, and use 
> > "vertex", "edge", "property" when it is convenient. In the relational 
> > model, they are relations. If you materialize them in a relational 
> > database, they are rows. In any case, you need two basic graph traversal 
> > operations:
> > project() -- forward traversal of the arrows in the above diagrams. Takes 
> > you from an element to a component like in-vertex.
> > select() -- reverse traversal of the arrows. Allows you to answer questions 
> > like "in which Trips is John Doe the rider?"
> > 
> > Josh
> > 
> > 
> > On Fri, Apr 19, 2019 at 10:03 AM Marko Rodriguez <okramma...@gmail.com 
> > <mailto:okramma...@gmail.com> <mailto:okramma...@gmail.com 
> > <mailto:okramma...@gmail.com>>> wrote:
> > Hello,
> > 
> > I agree with everything you say. Here is my question:
> > 
> >         Relational database — join: Table x Table x equality function -> 
> > Table
> >         Graph database — traverser: Vertex x edge label -> Vertex
> > 
> > I want a single function that does both. The only think was to represent 
> > traverser() in terms of join():
> > 
> >         Graph database — traverser: Vertices x Vertex x equality function 
> > -> Vertices
> > 
> > For example, 
> > 
> > V().out(‘address’)
> > 
> >         ==>
> > 
> > g.join(V().hasLabel(‘person’).as(‘a’)
> >        V().hasLabel(‘addresses’).as(‘b’)).
> >          by(‘name’).select(?address vertex?)
> > 
> > That is, join the vertices with themselves based on some predicate to go 
> > from vertices to vertices.
> > 
> > However, I would like instead to transform the relational database join() 
> > concept into a traverser() concept. Kuppitz and I were talking the other 
> > day about a link() type operator that says: “try and link to this thing in 
> > some specified way.” .. ?? The problem we ran into is again, “link it to 
> > what?”
> > 
> >         - in graph, the ‘to what’ is hardcoded so you don’t need to specify 
> > anything.
> >         - in rdbms, the ’to what’ is some other specified table.
> > 
> > So what does the link() operator look like?
> > 
> > ——
> > 
> > Some other random thoughts….
> > 
> > Relational databases join on the table (the whole collection)
> > Graph databases traverser on the vertex (an element of the whole collection)
> > 
> > We can make a relational database join on single row (by providing a filter 
> > to a particular primary key). This is the same as a table with one row. 
> > Likewise, for graph in the join() context above:
> > 
> > V(1).out(‘address’) 
> > 
> >         ==>
> > 
> > g.join(V(1).as(‘a’)
> >        V().hasLabel(‘addresses’).as(‘b’)).
> >          by(‘name’).select(?address vertex?)
> > 
> > More thoughts please….
> > 
> > Marko.
> > 
> > http://rredux.com <http://rredux.com/> <http://rredux.com/ 
> > <http://rredux.com/>> <http://rredux.com/ <http://rredux.com/> 
> > <http://rredux.com/ <http://rredux.com/>>>
> > 
> > 
> > 
> > 
> > > On Apr 19, 2019, at 4:20 AM, pieter martin <pieter.mar...@gmail.com 
> > > <mailto:pieter.mar...@gmail.com> <mailto:pieter.mar...@gmail.com 
> > > <mailto:pieter.mar...@gmail.com>>> wrote:
> > > 
> > > Hi,
> > > The way I saw it is that the big difference is that graph's have
> > > reified joins. This is both a blessing and a curse.
> > > A blessing because its much easier (less text to type, less mistakes,
> > > clearer semantics...) to traverse an edge than to construct a manual
> > > join.A curse because there are almost always far more ways to traverse
> > > a data set than just by the edges some architect might have considered
> > > when creating the data set. Often the architect is not the domain
> > > expert and the edges are a hardcoded layout of the dataset, which
> > > almost certainly won't survive the real world's demands. In graphs, if
> > > their are no edges then the data is not reachable, except via indexed
> > > lookups. This is the standard engineering problem of database design,
> > > but it is important and useful that data can be traversed, joined,
> > > without having reified edges.
> > > In Sqlg at least, but I suspect it generalizes, I want to create the
> > > notion of a "virtual edge". Which in meta data describes the join and
> > > then the standard to(direction, "virtualEdgeName") will work.
> > > In a way this is precisely to keep the graphy nature of gremlin, i.e.
> > > traversing edges, and avoid using the manual join syntax you described.
> > > CheersPieter
> > > 
> > > On Thu, 2019-04-18 at 14:15 -0600, Marko Rodriguez wrote:
> > >> Hi,
> > >> *** This is mainly for Kuppitz, but if others care. 
> > >> Was thinking last night about relational data and Gremlin. The T()
> > >> step returns all the tables in the withStructure() RDBMS database.
> > >> Tables are ‘complex values’ so they can't leave the VM (only a simple
> > >> ‘toString’).
> > >> Below is a fake Gremlin session. (and these are just ideas…) tables
> > >> -> a ListLike of rows        rows -> a MapLike of primitives
> > >> gremlin> g.T()==>t[people]==>t[addresses]gremlin>
> > >> g.T(‘people’)==>t[people]gremlin>
> > >> g.T(‘people’).values()==>r[people:1]==>r[people:2]==>r[people:3]greml
> > >> in>
> > >> g.T(‘people’).values().asMap()==>{name:marko,age:29}==>{name:kuppitz,
> > >> age:10}==>{name:josh,age:35}gremlin>
> > >> g.T(‘people’).values().has(‘age’,gt(20))==>r[people:1]==>r[people:3]g
> > >> remlin>
> > >> g.T(‘people’).values().has(‘age’,gt(20)).values(‘name’)==>marko==>jos
> > >> h
> > >> Makes sense. Nice that values() and has() generally apply to all
> > >> ListLike and MapLike structures. Also, note how asMap() is the
> > >> valueMap() of TP4, but generalizes to anything that is MapLike so it
> > >> can be turned into a primitive form as a data-rich result from the
> > >> VM.
> > >> gremlin> g.T()==>t[people]==>t[addresses]gremlin>
> > >> g.T(‘addresses’).values().asMap()==>{name:marko,city:santafe}==>{name
> > >> :kuppitz,city:tucson}==>{name:josh,city:desertisland}gremlin>
> > >> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).             by(se
> > >> lect(‘a’).value(’name’).is(eq(select(‘b’).value(’name’))).           
> > >> values().asMap()==>{a.name:marko,a.age:29,b.name:marko,b.city:santafe
> > >> }==>{a.name:kuppitz,a.age:10,b.name:kuppitz,b.city:tucson}==>{a.name 
> > >> <http://a.name/> <http://a.name/ <http://a.name/>>:
> > >> josh,a.age:35,b.name:josh,b.city:desertisland}gremlin>
> > >> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).             by(’n
> > >> ame’). // shorthand for equijoin on name
> > >> column/key           values().asMap()==>{a.name:marko,a.age:29,b.name 
> > >> <http://b.name/> <http://b.name/ <http://b.name/>>
> > >> :marko,b.city:santafe}==>{a.name:kuppitz,a.age:10,b.name:kuppitz,b.ci 
> > >> <http://b.ci/> <http://b.ci/ <http://b.ci/>>
> > >> ty:tucson}==>{a.name:josh,a.age:35,b.name:josh,b.city:desertisland}gr
> > >> emlin>
> > >> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).             by(’n
> > >> ame’)==>t[people<-name->addresses]  // without asMap(), just the
> > >> complex value ‘toString'gremlin>
> > >> And of course, all of this is strategized into a SQL call so its
> > >> joins aren’t necessarily computed using TP4-VM resources.
> > >> Anywho — what I hope to realize is the relationship between “links”
> > >> (graph) and “joins” (tables). How can we make (bytecode-wise at
> > >> least) RDBMS join operations and graph traversal operations ‘the
> > >> same.’?
> > >>      Singleton: Integer, String, Float, Double, etc. Collection:
> > >> List, Map (Vertex, Table, Document)  Linkable: Vertex, Table
> > >> Vertices and Tables can be “linked.” Unlike Collections, they don’t
> > >> maintain a “parent/child” relationship with the objects they
> > >> reference. What does this mean……….?
> > >> Take care,Marko.
> > >> http://rredux.com <http://rredux.com/> <http://rredux.com/ 
> > >> <http://rredux.com/>> <http://rredux.com/ <http://rredux.com/> 
> > >> <http://rredux.com/ <http://rredux.com/>>> <http://rredux.com/ 
> > >> <http://rredux.com/> <http://rredux.com/ <http://rredux.com/>> 
> > >> <http://rredux.com/ <http://rredux.com/> <http://rredux.com/ 
> > >> <http://rredux.com/>>>>
> 
> <diagrams.zip>

Re: What makes 'graph traversals' and 'relational joins' the same?

Reply via email to