Re: What makes 'graph traversals' and 'relational joins' the same?

Marko Rodriguez Wed, 24 Apr 2019 15:02:04 -0700

Hey,

Thinking through things more and re-reading your emails.


Its like this:

        From an object you want to be able to go to the relations in which that 
object is a particular entry.
        From that relation you want to go to another object referenced in 
another entry.

For instance assume this set of 3-tuple relations:

talk_table
speaker  listener  statement
marko    josh      “sup bro"
marko    kuppitz   “dude man"

Lets say I’m at josh and I want to know what marko said to him:

        josh.adjacents(‘talk’,’listener’, …) // and this is why you have 
from().restrict().to()

Using your from()/restrict()/to() notation:

        josh.from(‘talk’,’listener’).restrict(‘speaker’,marko).to(‘statement’) 
=> “sup bro”

I want to get some terminology down:

        Relation: a tuple with key/value entries. (basically a map)
        Key: A relation column name.
        Value: A relation column value.

So there are three operations:

        1. Get the relations in which the current object is a value for the 
specified key. [select] // like a back()
        2. Filter out those relations that don’t have a particular value for a 
particular key. [filter]
        3. Get those objects in the remaining relations associated with a 
particular key. [project] // like a forward()

What did Kuppitz hear from Marko?

        
kuppitz.select(‘talk’,’listener’).filter(‘speaker’,marko).project(‘statement’) 
=> “dude man”

So, how do we do this with just goto pointer chasing?

kuppitz.goto(‘listener’).filter(goto(‘speaker’).is(marko)).goto(‘statement’)

That is, I went from Kuppitz to all those relations in which he is a listener. 
I then filtered out those relations that don’t have marko as the speaker. I 
then went to the statements associated with those remaining relations. However, 
with this model, I’m assuming that “listener” is unique to the talk_table and 
this is not smart…

Anywho, is this more in line with what you are getting at?

Thanks for your patience,
Marko.

http://rredux.com <http://rredux.com/>




> On Apr 24, 2019, at 11:30 AM, Marko Rodriguez <okramma...@gmail.com> wrote:
> 
> Hi,
> 
> I think I understand you now. The concept of local and non-local data is what 
> made me go “ah!”
> 
> So let me reiterate what I think you are saying.
> 
> v[1] is guaranteed to have its id data local to it. All other information 
> could be derived via id-based "equi-joins.” Thus, we can’t assume that a 
> vertex will always have its properties and edges co-located with it. However, 
> we can assume that it knows where to get its property and edge data when 
> requested. Assume the following RDBMS-style data structure that is referenced 
> by com.example.MyGraph.
> 
> vertex_table
> id label
> 1  person
> 2  person
> …
> 
> properties_table
> id  name   age
> 1   marko  29
> 2   josh   35
> …
> 
> edge_table
> id outV  label  inV
> 0  1    knows   2
> …
> 
> If we want to say that the above data structure is a graph, what is required 
> of “ComplexType” such that we can satisfy both Neo4j-style and RDBMS-style 
> graph encodings? Assume ComplexType is defined as:
> 
> interface ComplexType
>   Iterator<T> adjacents(String label, Object... identifiers)
> 
> Take this basic Gremlin traversal:
> 
> g.V(1).out(‘knows’).values(‘name’)
> 
> I now believe this should compile to the following:
> 
> [goto,V,1] [goto,outE,knows] [goto,inV] [goto,properties,name]
> 
> Given MyGraph/MyVertex/MyEdge all implement ComplexType and there is no local 
> caching of data on these respective objects, then the bytecode isn’t 
> rewritten and the following cascade of events occurs:
> 
> mygraph
> [goto,V,1] => 
>   mygraph.adjacents(“V”,1) => 
>     SELECT * FROM vertex_table WHERE id=1
> myvertex1
> [goto,outE,knows] => 
>   myvertex1.adjacents(“outE”,”knows”) => 
>     SELECT id FROM edge_table WHERE outV=1 AND label=knows
> myedge0
> [goto,inV,knows] => 
>   myedge1.adjacents(“inV”) => 
>     SELECT vertex_table.id FROM vertex_table, edge_table WHERE 
> vertex_table.id=edge_table.inV AND edge_table.id=0
> myvertex2
> [goto,properties,name] => 
>   myvertex2.adjacents(“properties”,”name”) => 
>     SELECT name FROM properties_table WHERE id=2
> “josh"
> 
> Lets review the ComplexType adjacents()-method:
> 
> complexType.adjacents(label,identifiers...)
> 
> complexType must have sufficient information to represent the tail of the 
> relation.
> label specifies the relation type (we will always assume that a single String 
> is sufficient)
> identifiers... must contain sufficient information to identify the head of 
> the relation.
> 
> The return of the the method adjacents() is then the object(s) on the other 
> side of the relation(s).
> 
> Now, given the way I have my data structure organized, I could beef up the 
> MyXXX implementation such that MyStrategy rewrites the base bytecode to:
> 
> [goto,V,1] [goto,out,knows][goto,properties,name]
> 
> The following cascade of events occurs:
> 
> mygraph
> [goto,V,1] => 
>   mygraph.adjacents(“V”,1) => 
>     SELECT * FROM vertex_table WHERE id=1
> myvertex1
> [goto,out,knows] => 
>   myvertex1.adjacents(“outE”,”knows”) => 
>     SELECT vertex_table.id FROM vertex_table,edge_table WHERE outV=1 AND 
> label=knows AND inV=vertex_table.id
> myvertex2
> [goto,properties,name] => 
>   myvertex2.adjacents(“properties”,”name”) => 
>     SELECT name FROM properties_table WHERE id=2
> “josh"
> 
> Now, I could really beef up MyStrategy when I realize that no path 
> information is used in the traversal. Thus, the base bytecode compiles to:
> 
> [my:sql,SELECT name FROM properties_table,vertex_table,edge_table WHERE … 
> lots of join equalities]
> 
> This would then just emit “josh” given the mygraph object.
> 
> ——
> 
> To recap.
> 
>       1. There are primitives.
>       2. There are Maps and Lists.
>       3. There are ComplexTypes.
>       4. ComplexTypes are adjacent to other objects via relations.
>               - These adjacent objects may be cached locally with the 
> ComplexType instance.
>               - These adjacent objects may require some database lookup.
>               - Regardless, TP4 doesn’t care — its up to the provider’s 
> ComplexType instance to decide how to resolve the adjacency.
>       5. ComplexTypes don’t go over the wire — a ComplexTypeProxy with 
> appropriately provided toString() is all that leaves the TP4 VM.
> 
> Finally, to solve the asMap()/asList() problem, we simply have:
> 
> asMap(’name’,’age’) => complexType.adjacents(‘asMap’,’name’,’age')
> asList() => complexType.adjacents(‘asList’)
> 
> It is up to the complexType to manifest a Map or List accordingly.
> 
> I see this as basically a big flatmap system. ComplexTypes just map from self 
> to any number of logical neighbors as specified by the relation.
> 
> Am I getting it?,
> Marko.
> 
> http://rredux.com <http://rredux.com/>
> 
> 
> 
> 
>> On Apr 24, 2019, at 9:56 AM, Joshua Shinavier <j...@fortytwo.net 
>> <mailto:j...@fortytwo.net>> wrote:
>> 
>> On Tue, Apr 23, 2019 at 10:28 AM Marko Rodriguez <okramma...@gmail.com 
>> <mailto:okramma...@gmail.com>>
>> wrote:
>> 
>>> Hi,
>>> 
>>> I think we are very close to something useable for TP4 structure/. Solving
>>> this problem elegantly will open the flood gates on tp4/ development.
>>> 
>> 
>> Yes, and formality often brings elegance. I don't think we can do much
>> better than relational algebra and relational calculus in terms of
>> formality, so to the extent we can reduce the fundamental TP4 traversal
>> steps to basic relational operations, the floodgates will also be open to
>> applications of query validation and query optimization from the last 40+
>> years of research.
>> 
>> 
>> 
>>> I still don’t grock your comeFrom().goto() stuff. I don’t get the benefit
>>> of having two instructions for “pointer chasing” instead of one.
>>> 
>> 
>> There are just a handful of basic operations in relational algebra.
>> Projection, selection, union, complement, Cartesian product. Joins, as well
>> as all other operations, can be derived from these. A lot of graph
>> traversal can be accomplished using only projection and selection, which is
>> why we were able to get away with only to/goto and from/comeFrom in the
>> examples above. However, I believe you do need both operations. You can
>> kind of get away without from() if you assume that each vertex has local
>> inE and outE references to incoming and outgoing edges, but I see that as a
>> kind of pre-materialized from()/select(). If you think of edges strictly as
>> relations, and represent them in a straightforward way with tables, you
>> don't need the local inE and outE; whether you have them depends on the
>> graph back-end.
>> 
>> 
>> 
>>> Lets put that aside for now and lets turn to modeling a Vertex. Go back to
>>> my original representation:
>>> 
>>> vertex.goto(‘label’)
>>> vertex.goto(‘id’)
>>> 
>> 
>> Local (in my view). All good.
>> 
>> 
>> 
>>> vertex.goto(‘outE’)
>>> vertex.goto(‘inE’)
>>> vertex.goto(‘properties’)
>>> 
>> 
>> Non-local (in my view). You can use goto(), but if the goal is to bring the
>> relational model into the fold, at a lower level you do have a select()
>> operation. Unless you make projections local to vertices instead of edges,
>> but then you just have the same problem in reverse. Am I making sense?
>> 
>> 
>> Any object can be converted into a Map. In TinkerPop3 we convert vertices
>>> into maps via:
>>> 
>>>        g.V().has(‘name’,’marko’).valueMap() => {name:marko,age:29}
>>>        g.V().has(‘name’,’marko’).valueMap(true) =>
>>> {id:1,label:person,name:marko,age:29}
>>> 
>> 
>> Maps are A-OK. In the case of properties, I think where we differ is that
>> you see a property like "name" as a key/value pair in a map local to the
>> vertex. I see the property as an element of type "name", with the vertex as
>> a value in its local map, logically if not physically. This allows maximum
>> flexibility in terms of meta-properties -- exotic beasts which seem to be
>> in a kind of limbo state in TP3, but if we're trying to be as general as
>> possible, some data models we might want to pull in, like GRAKN.AI, do
>> allow this kind of flexibility.
>> 
>> 
>> 
>>> In the spirit of instruction reuse, we should have an asMap() instruction
>>> that works for ANY object. (As a side: this gets back to ONLY sending
>>> primitives over the wire, no
>>> Vertex/Edge/Document/Table/Row/XML/ColumnFamily/etc.). Thus, the above is:
>>> 
>>>        g.V().has(‘name’,’marko’).properties().asMap() =>
>>> {name:marko,age:29}
>>>        g.V().has(‘name’,’marko’).asMap() =>
>>> {id:1,label:person,properties:{name:marko,age:29}}
>>> 
>> 
>> Again, no argument here, although I would think of a map as an
>> optimization. IMO, the fundamental projections from v[1] are id:1 and
>> label:Person. You could make a map out of these, or just use an offset,
>> since the keys are always the same. However, you can also build a map
>> including any key you can turn into a function. properties() is such a key.
>> 
>> 
>> You might ask, why didn’t it go to outE and inE and map-ify that data?
>>> Because those are "sibling” references, not “children” references.
>>> 
>>>        goto(‘outE’) is a “sibling” reference. (a vertex does not contain
>>> an edge)
>>>        goto(‘id’) is a “child” reference. (a vertex contains the id)
>>> 
>> 
>> I agree with both of those statements. A vertex does not contain the edges
>> incident on it. Again, I am thinking of properties a bit more like edges
>> for maximum generality.
>> 
>> 
>> 
>>> Where do we find sibling references?
>>>        Graphs: vertices don’t contain each other.
>>>        OO heaps: many objects don’t contain each other.
>>>        RDBMS: rows are linked by joins, but don’t contain each other.
>>> 
>> 
>> Yep.
>> 
>> 
>> So, the way in which we structure our references (pointers) determines the
>>> shape of the data and ultimately how different instructions will behave. We
>>> can’t assume that asMap() knows anything about
>>> vertices/edges/documents/rows/tables/etc. It will simply walk all
>>> child-references and create a map.
>>> 
>> 
>> Just to play devil's advocate, you *could* include "inE" and "outE" as keys
>> in the local map of a vertex; it's just a matter of what you choose to do.
>> inE and outE are perfectly good functions from a vertex to a set of edges.
>> 
>> 
>> We don’t want TP to get involved in “complex data types.”
>> 
>> 
>> Well, how do you feel about algebraic data types? They are simple, and
>> allow you to capture arbitrary relations as elements.
>> 
>> 
>> 
>>> We don’t care. You can propagate MyDatabaseObject through the TP4 VM
>>> pipeline and load your object up with methods for optimizations with your
>>> DB and all that, but for TP4, your object is just needs to implement:
>>> 
>>>        ComplexType
>>>                - Iterator<T> children(String label)
>>>                - Iterator<T> siblings(String label)
>>>                - default Iterator<T> references(String label) {
>>> IteratorUtils.concat(children(label), siblings(label)) }
>>>                - String toString()
>>> 
>> 
>> I don't think you need siblings(). I think you need a more generic
>> select(), but since this is graph traversal, select() only needs the
>> identifier of a type (e.g. "knows") and the name of a field (e.g. "out").
>> 
>> 
>> 
>>> When a ComplexType goes over the wire to the user, it just represented as
>>> a ComplexTypeProxy with a toString() like v[1],
>>> tinkergraph[vertices:10,edges:34], etc. All references are disconnected.
>>> Yes, even children references. We do not want language drivers having to
>>> know about random object types and have to deal with implementing
>>> serializers and all that non-sense. The TP4 serialization protocol is
>>> primitives, maps, lists, bytecode, and traversers. Thats it!
>>> 
>> 
>> No disagreement here. I think the only disconnect is about what keys are
>> local to what elements. Some keys are hard-local, like id and type for all
>> elements, and "in" and "out" for edges and properties. These *should* be
>> carried over the wire. Properties, incident edges, etc. possibly but not
>> necessarily.
>> 
>> 
>> 
>>> *** Only Maps and Lists (that don’t contain complex data types) maintain
>>> their child references “over the wire.”
>>> 
>> 
>> Sure.
>> 
>> 
>> 
>>> I don’t get your hypergraph example, so let me try another example:
>>> 
>>>        tp ==member==> marko, josh
>>> 
>>> TP is a vertex and there is a directed hyperedge with label “member”
>>> connecting to marko and josh vertices.
>>> 
>> 
>> That's kind of an unlabeled hyperedge; I am not sure we need to support
>> those. Look at the GRAKN data model, or at HypergraphDB or earlier
>> hypergraph data models. A hyperedge is essentially a tuple in which each
>> components has a label ("role", in GRAKN). In other words, it is a relation
>> in which some of the columns may be foreign keys. In your example, rather
>> than "member" connecting "tp" to a set of vertices, you might have
>> something like Collaborated{person1:marko, person2:josh, project=tp}. Then
>> a query like "who did marko collaborate with on tp?" becomes:
>> 
>>    tp.from("Collaborated", "project").restrict("person1",
>> "marko").to("person2")
>> 
>> Of course, if you want this relationship to be symmetrical, you can
>> introduce a constraint.
>> 
>> 
>> 
>>> tp.goto(“outE”).filter(goto(“label”).is(“member”)).goto(“inV”)
>>> 
>>> Looks exactly like a property graph query? However, its not because
>>> goto(“inV”) returns 2 vertices, not 1.
>> 
>> 
>> I think your example works well for the type of hypergraph you are
>> referring to. It's just different than the type of hypergraph I am
>> referring to. I think by now you know that I would rather see a from()
>> instead of that goto("outE"). I also agree you can make a function out of
>> outE, and expose it using a map, if you really want to. Under the hood,
>> however, I see this as traversing a projection head to tail rather than
>> tail to head.
>> 
>> 
>> 
>>> EdgeVertexFlatmapFunction works for property graphs and hypergraphs. It
>>> doesn’t care — it just follows goto() pointers! That is, it follows the
>>> ComplexType.references(“inV”). Multi-properties are the same as well.
>>> Likewise for meta-properties. These data model variations are not “special”
>>> to the TP4 VM. It just walks references whether there are 0,1,2, or N of
>>> them.
>>> 
>> 
>> At a high level, I agree with what you are saying. We should have a common
>> data model that unifies traditional property graphs, hypergraphs,
>> relational databases, and any other data model that can be modeled using
>> algebraic data types with references. We define a small set of basic
>> operations on this data model which can be combined into more complex
>> operations that are amenable to static analysis and optimization. We can
>> send graph data over the wire as collections of elements using the bare
>> minimum of local fields, and reconstruct the graph on the other end. We can
>> operate on streams of such elements under suitable conditions (elements
>> sent in an appropriate order). The basic operations are not tied to the
>> JVM, and should be straightforward to implement in other frameworks.
>> 
>> 
>> 
>>> 
>>> Thus, what is crucial to all this is the “shape of the data.” Using your
>>> pointers wisely so instructions produce useful results.
>>> 
>> 
>> +1
>> 
>> 
>> 
>>> Does any of what I wrote update your comeFrom().goto() stuff?
>> 
>> 
>> Sadly, no, though I appreciate that you are coming from a slightly
>> different place w.r.t. properties, hypergraphs, and most importantly, the
>> role of a type system.
>> 
>> 
>> 
>>> If not, can you please explain to me why comeFrom() is cool — sorry for
>>> being dense (aka “being Kuppitz" — thats right, I said it. boom!).
>>> 
>> 
>> Let's keep iterating until we reach a fixed point. Maybe Daniel's already
>> there.
>> 
>> Josh
>> 
>> 
>> 
>>> 
>>> Thanks,
>>> Marko.
>>> 
>>> http://rredux.com <http://rredux.com/> <http://rredux.com/ 
>>> <http://rredux.com/>>
>>> 
>>> 
>>> 
>>> 
>>>> On Apr 23, 2019, at 10:25 AM, Joshua Shinavier <j...@fortytwo.net 
>>>> <mailto:j...@fortytwo.net>>
>>> wrote:
>>>> 
>>>> On Tue, Apr 23, 2019 at 5:14 AM Marko Rodriguez <okramma...@gmail.com 
>>>> <mailto:okramma...@gmail.com>>
>>>> wrote:
>>>> 
>>>>> Hey Josh,
>>>>> 
>>>>> This gets to the notion I presented in “The Fabled GMachine.”
>>>>>       http://rredux.com/the-fabled-gmachine.html 
>>>>> <http://rredux.com/the-fabled-gmachine.html> <
>>>>> http://rredux.com/the-fabled-gmachine.html 
>>>>> <http://rredux.com/the-fabled-gmachine.html>> (first paragraph of
>>>>> “Structures, Processes, and Languages” section)
>>>>> 
>>>>> All that exists are memory addresses that contain either:
>>>>> 
>>>>>       1. A primitive
>>>>>       2. A set of labeled references to other references or primitives.
>>>>> 
>>>>> Using your work and the above, here is a super low-level ‘bytecode' for
>>>>> property graphs.
>>>>> 
>>>>> v.goto("id") => 1
>>>>> 
>>>> 
>>>> LGTM. An id is special because it is uniquely identifying / is a primary
>>>> key for the element. However, it is also just a field of the element,
>>> like
>>>> "in"/"inV" and "out"/"outV" are fields of an edge. As an aside, an id
>>> would
>>>> only really need to be unique among other elements of the same type. To
>>> the
>>>> above, I would add:
>>>> 
>>>> v.type() => Person
>>>> 
>>>> ...a special operation which takes you from an element to its type. This
>>> is
>>>> important if unions are supported; e.g. "name" in my example can apply
>>>> either to a Person or a Project.
>>>> 
>>>> 
>>>> v.goto("label") => person
>>>>> 
>>>> 
>>>> Or that. Like "id", "type"/"label" is special. You can think of it as a
>>>> field; it's just a different sort of field which will have the same value
>>>> for all elements of any given type.
>>>> 
>>>> 
>>>> 
>>>>> v.goto("properties").goto("name") => "marko"
>>>>> 
>>>> 
>>>> OK, properties. Are properties built-in as a separate kind of thing from
>>>> edges, or can we treat them the same as vertices and edges here? I think
>>> we
>>>> can treat them the same. A property, in the algebraic model I described
>>>> above, is just an element with two fields, the second of which is a
>>>> primitive value. As I said, I think we need two distinct traversal
>>>> operations -- projection and selection -- and here is where we can use
>>> the
>>>> latter. Here, I will call it "comeFrom".
>>>> 
>>>> v.comeFrom("name", "out").goto("in") => {"marko"}
>>>> 
>>>> You can think of this comeFrom as a special case of a select() function
>>>> which takes a type -- "name" -- and a set of key/value pairs {("out",
>>> v)}.
>>>> It returns all matching elements of the given type. You then project to
>>> the
>>>> "in" value using your goto. I wrote {"marko"} as a set, because comeFrom
>>>> can give you multiple properties, depending on whether multi-properties
>>> are
>>>> supported.
>>>> 
>>>> Note how similar this is to an edge traversal:
>>>> 
>>>> v.comeFrom("knows", "out").goto("in") => {v[2], v[4]}
>>>> 
>>>> Of course, you could define "properties" in such a way that a
>>>> goto("properties") does exactly this under the hood, but in terms of low
>>>> level instructions, you need something like comeFrom.
>>>> 
>>>> 
>>>> v.goto("properties").goto("name").goto(0) => "m"
>>>>> 
>>>> 
>>>> This is where the notion of optionals becomes handy. You can make
>>>> array/list indices into fields like this, but IMO you should also make
>>> them
>>>> safe. E.g. borrowing Haskell syntax for a moment:
>>>> 
>>>> v.goto("properties").goto("name").goto(0) => Just 'm'
>>>> 
>>>> v.goto("properties").goto("name").goto(5) => Nothing
>>>> 
>>>> 
>>>> v.goto("outE").goto("inV") => v[2], v[4]
>>>>> 
>>>> 
>>>> I am not a big fan of untyped "outE", but you can think of this as a
>>> union
>>>> of all v.comeFrom(x, "out").goto("in"), where x is any edge type. Only
>>>> "knows" and "created" are edge types which are applicable to "Person", so
>>>> you will only get {v[2], v[4]}. If you want to get really crazy, you can
>>>> allow x to be any type. Then you get {v[2], v[4], 29, "marko"}.
>>>> 
>>>> 
>>>> 
>>>>> g.goto("V").goto(1) => v[1]
>>>>> 
>>>> 
>>>> That, or you give every element a virtual field called "graph". So:
>>>> 
>>>> v.goto("graph") => g
>>>> 
>>>> g.comeFrom("Person", "graph") => {v[1], v[2], v[4], v[6]}
>>>> 
>>>> g.comeFrom("Person", "graph").restrict("id", 1)
>>>> 
>>>> ...where restrict() is the relational "sigma" operation as above, not to
>>> be
>>>> confused with TinkerPop's select(), filter(), or has() steps. Again, I
>>>> prefer to specify a type in comeFrom (i.e. we're looking specifically
>>> for a
>>>> Person with id of 1), but you could also do a comprehension g.comeFrom(x,
>>>> "graph"), letting x range over all types.
>>>> 
>>>> 
>>>> 
>>>>> The goto() instruction moves the “memory reference” (traverser) from the
>>>>> current “memory address” to the “memory address” referenced by the
>>> goto()
>>>>> argument.
>>>>> 
>>>> 
>>>> Agreed, if we also think of primitive values as memory references.
>>>> 
>>>> 
>>>> 
>>>>> The Gremlin expression:
>>>>> 
>>>>>       g.V().has(‘name’,’marko’).out(‘knows’).drop()
>>>>> 
>>>>> ..would compile to:
>>>>> 
>>>>> 
>>>>> 
>>> g.goto(“V”).filter(goto(“properties”).goto(“name”).is(“marko”)).goto(“outE”).filter(goto(“label”).is(“knows”)).goto(“inV”).free()
>>>>> 
>>>> 
>>>> 
>>>> In the alternate universe:
>>>> 
>>>> g.comeFrom("Person", "graph").comeFrom("name", "out").restrict("in",
>>>> "marko").goto("out").comeFrom("knows", "out").goto("in").free()
>>>> 
>>>> I have wimped out on free() and just left it as you had it, but I think
>>> it
>>>> would be worthwhile to explore a monadic syntax for traversals with
>>>> side-effects. Different topic.
>>>> 
>>>> Now, all of this "out", "in" business is getting pretty repetitive,
>>> right?
>>>> Well, the field names become more diverse if we allow hyper-edges and
>>>> generalized ADTs. E.g. in my Trip example, say I want to know all
>>> drop-off
>>>> locations for a given rider:
>>>> 
>>>> u.comeFrom("Trip", "rider").goto("dropoff").goto("place")
>>>> 
>>>> Done.
>>>> 
>>>> 
>>>> 
>>>>> If we can get things that “low-level” and still efficient to compile,
>>> then
>>>>> we can model every data structure. All you are doing is pointer chasing
>>>>> through a withStructure() data structure. .
>>>>> 
>>>> 
>>>> Agreed.
>>>> 
>>>> 
>>>> No one would ever want to write strategies for goto()-based Bytecode.
>>>> 
>>>> 
>>>> Also agreed.
>>>> 
>>>> 
>>>> 
>>>>> Thus, perhaps there could be a PropertyGraphDecorationStrategy that
>>> does:
>>>>> 
>>>>> [...]
>>>> 
>>>> 
>>>> No argument here, though the alternate-universe "bytecode" would look
>>>> slightly different. And the high-level syntax should also be able to deal
>>>> with generalized relations / data types gracefully. As a thought
>>>> experiment, suppose we were to define the steps to() as your goto(), and
>>>> from() as my comeFrom(). Then traversals like:
>>>> 
>>>> u.from("Trip", "rider").to("dropoff").to("time")
>>>> 
>>>> ...look pretty good as-is, and are not too low-level. However, ordinary
>>>> edge traversals like:
>>>> 
>>>> v.from("knows", "out").to("in")
>>>> 
>>>> ...do look a little Assembly-like. So in/out/both etc. remain as they
>>> are,
>>>> but are shorthand for from() and to() steps using "out" or "in":
>>>> 
>>>> v.out("knows") === v.outE("knows").inV() === v.from("knows",
>>> "out").to("in")
>>>> 
>>>> 
>>>> [I AM NOW GOING OFF THE RAILS]
>>>>> [sniiiiip]
>>>>> 
>>>> 
>>>> Sure. Again, I like the idea of wrapping side-effects in monads. What
>>> would
>>>> that look like in a Gremlinesque fluent syntax? I don't quite know, but
>>> if
>>>> we think of the dot as a monadic bind operation like Haskell's >>=, then
>>>> perhaps the monadic expressions look pretty similar to what you have just
>>>> sketched out. Might have to be careful about what it means to nest
>>>> operations as in your addEdge examples.
>>>> 
>>>> 
>>>> 
>>>> [I AM NOW BACK ON THE RAILS]
>>>>> 
>>>>> Its as if “properties”, “outE”, “label”, “inV”, etc. references mean
>>>>> something to property graph providers and they can do more intelligent
>>>>> stuff than what MongoDB would do with such information. However,
>>> someone,
>>>>> of course, can create a MongoDBPropertyGraphStrategy that would make
>>>>> documents look like vertices and edges and then use O(log(n)) lookups on
>>>>> ids to walk the graph. However, if that didn’t exist, it would still do
>>>>> something that works even if its horribly inefficient as every database
>>> can
>>>>> make primitives with references between them!
>>>>> 
>>>> 
>>>> I'm on the same same pair of rails.
>>>> 
>>>> 
>>>> 
>>>>> Anywho @Josh, I believe goto() is what you are doing with
>>> multi-references
>>>>> off an object. How do we make it all clean, easy, and universal?
>>>>> 
>>>> 
>>>> Let me know what you think of the above.
>>>> 
>>>> Josh
>>>> 
>>>> 
>>>> 
>>>>> 
>>>>> Marko.
>>>>> 
>>>>> http://rredux.com <http://rredux.com/> <http://rredux.com/ 
>>>>> <http://rredux.com/>>
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Apr 22, 2019, at 6:42 PM, Joshua Shinavier <j...@fortytwo.net 
>>>>>> <mailto:j...@fortytwo.net>>
>>> wrote:
>>>>>> 
>>>>>> Ah, glad you asked. It's all in the pictures. I have nowhere to put
>>> them
>>>>> online at the moment... maybe this attachment will go through to the
>>> list?
>>>>>> 
>>>>>> Btw. David Spivak gave his talk today at Uber; it was great. Juan
>>>>> Sequeda (relational <--> RDF mapping guy) was also here, and Ryan joined
>>>>> remotely. Really interesting discussion about databases vs. graphs, and
>>>>> what category theory brings to the table.
>>>>>> 
>>>>>> 
>>>>>> On Mon, Apr 22, 2019 at 1:45 PM Marko Rodriguez <okramma...@gmail.com 
>>>>>> <mailto:okramma...@gmail.com>
>>>>> <mailto:okramma...@gmail.com <mailto:okramma...@gmail.com>>> wrote:
>>>>>> Hey Josh,
>>>>>> 
>>>>>> I’m digging what you are saying, but the pictures didn’t come through
>>>>> for me ? … Can you provide them again (or if dev@ is filtering them,
>>> can
>>>>> you give me URLs to them)?
>>>>>> 
>>>>>> Thanks,
>>>>>> Marko.
>>>>>> 
>>>>>> 
>>>>>>> On Apr 21, 2019, at 12:58 PM, Joshua Shinavier <j...@fortytwo.net 
>>>>>>> <mailto:j...@fortytwo.net>
>>>>> <mailto:j...@fortytwo.net <mailto:j...@fortytwo.net>>> wrote:
>>>>>>> 
>>>>>>> On the subject of "reified joins", maybe be a picture will be worth a
>>>>> few words. As I said in the thread <
>>>>> https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ 
>>>>> <https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ>
>>> <
>>>>> https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ 
>>>>> <https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ>
>>>>> 
>>>>> on property graph standardization, if you think of vertex labels, edge
>>>>> labels, and property keys as types, each with projections to two other
>>>>> types, there is a nice analogy with relations of two columns, and this
>>>>> analogy can be easily extended to hyper-edges. Here is what the schema
>>> of
>>>>> the TinkerPop classic graph looks like if you make each type (e.g.
>>> Person,
>>>>> Project, knows, name) into a relation:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> I have made the vertex types salmon-colored, the edge types yellow,
>>>>> the property types green, and the data types blue. The "o" and "I"
>>> columns
>>>>> represent the out-type (e.g. out-vertex type of Person) and in-type
>>> (e.g.
>>>>> property value type of String) of each relation. More than two arrows
>>> from
>>>>> a column represent a coproduct, e.g. the out-type of "name" is Person OR
>>>>> Project. Now you can think of out() and in() as joins of two tables on a
>>>>> primary and foreign key.
>>>>>>> 
>>>>>>> We are not limited to "out" and "in", however. Here is the ternary
>>>>> relationship (hyper-edge) from hyper-edge slide <
>>>>> 
>>> https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49
>>>  
>>> <https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49>
>>>>> <
>>>>> 
>>> https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49
>>>>> 
>>>>> of my Graph Day preso, which has three columns/roles/projections:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> I have drawn Says in light blue to indicate that it is a generalized
>>>>> element; it has projections other than "out" and "in". Now the line
>>> between
>>>>> relations and edges begins to blur. E.g. in the following, is
>>> PlaceEvent a
>>>>> vertex or a property?
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> With the right type system, we can just speak of graph elements, and
>>>>> use "vertex", "edge", "property" when it is convenient. In the
>>> relational
>>>>> model, they are relations. If you materialize them in a relational
>>>>> database, they are rows. In any case, you need two basic graph traversal
>>>>> operations:
>>>>>>> project() -- forward traversal of the arrows in the above diagrams.
>>>>> Takes you from an element to a component like in-vertex.
>>>>>>> select() -- reverse traversal of the arrows. Allows you to answer
>>>>> questions like "in which Trips is John Doe the rider?"
>>>>>>> 
>>>>>>> Josh
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, Apr 19, 2019 at 10:03 AM Marko Rodriguez <
>>> okramma...@gmail.com
>>>>> <mailto:okramma...@gmail.com> <mailto:okramma...@gmail.com <mailto:
>>>>> okramma...@gmail.com>>> wrote:
>>>>>>> Hello,
>>>>>>> 
>>>>>>> I agree with everything you say. Here is my question:
>>>>>>> 
>>>>>>>       Relational database — join: Table x Table x equality function
>>>>> -> Table
>>>>>>>       Graph database — traverser: Vertex x edge label -> Vertex
>>>>>>> 
>>>>>>> I want a single function that does both. The only think was to
>>>>> represent traverser() in terms of join():
>>>>>>> 
>>>>>>>       Graph database — traverser: Vertices x Vertex x equality
>>>>> function -> Vertices
>>>>>>> 
>>>>>>> For example,
>>>>>>> 
>>>>>>> V().out(‘address’)
>>>>>>> 
>>>>>>>       ==>
>>>>>>> 
>>>>>>> g.join(V().hasLabel(‘person’).as(‘a’)
>>>>>>>      V().hasLabel(‘addresses’).as(‘b’)).
>>>>>>>        by(‘name’).select(?address vertex?)
>>>>>>> 
>>>>>>> That is, join the vertices with themselves based on some predicate to
>>>>> go from vertices to vertices.
>>>>>>> 
>>>>>>> However, I would like instead to transform the relational database
>>>>> join() concept into a traverser() concept. Kuppitz and I were talking
>>> the
>>>>> other day about a link() type operator that says: “try and link to this
>>>>> thing in some specified way.” .. ?? The problem we ran into is again,
>>> “link
>>>>> it to what?”
>>>>>>> 
>>>>>>>       - in graph, the ‘to what’ is hardcoded so you don’t need to
>>>>> specify anything.
>>>>>>>       - in rdbms, the ’to what’ is some other specified table.
>>>>>>> 
>>>>>>> So what does the link() operator look like?
>>>>>>> 
>>>>>>> ——
>>>>>>> 
>>>>>>> Some other random thoughts….
>>>>>>> 
>>>>>>> Relational databases join on the table (the whole collection)
>>>>>>> Graph databases traverser on the vertex (an element of the whole
>>>>> collection)
>>>>>>> 
>>>>>>> We can make a relational database join on single row (by providing a
>>>>> filter to a particular primary key). This is the same as a table with
>>> one
>>>>> row. Likewise, for graph in the join() context above:
>>>>>>> 
>>>>>>> V(1).out(‘address’)
>>>>>>> 
>>>>>>>       ==>
>>>>>>> 
>>>>>>> g.join(V(1).as(‘a’)
>>>>>>>      V().hasLabel(‘addresses’).as(‘b’)).
>>>>>>>        by(‘name’).select(?address vertex?)
>>>>>>> 
>>>>>>> More thoughts please….
>>>>>>> 
>>>>>>> Marko.
>>>>>>> 
>>>>>>> http://rredux.com <http://rredux.com/> <http://rredux.com/ <
>>>>> http://rredux.com/>> <http://rredux.com/ <http://rredux.com/> <
>>>>> http://rredux.com/ <http://rredux.com/>>>
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Apr 19, 2019, at 4:20 AM, pieter martin <pieter.mar...@gmail.com
>>>>> <mailto:pieter.mar...@gmail.com> <mailto:pieter.mar...@gmail.com
>>> <mailto:
>>>>> pieter.mar...@gmail.com>>> wrote:
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> The way I saw it is that the big difference is that graph's have
>>>>>>>> reified joins. This is both a blessing and a curse.
>>>>>>>> A blessing because its much easier (less text to type, less mistakes,
>>>>>>>> clearer semantics...) to traverse an edge than to construct a manual
>>>>>>>> join.A curse because there are almost always far more ways to
>>>>> traverse
>>>>>>>> a data set than just by the edges some architect might have
>>>>> considered
>>>>>>>> when creating the data set. Often the architect is not the domain
>>>>>>>> expert and the edges are a hardcoded layout of the dataset, which
>>>>>>>> almost certainly won't survive the real world's demands. In graphs,
>>>>> if
>>>>>>>> their are no edges then the data is not reachable, except via indexed
>>>>>>>> lookups. This is the standard engineering problem of database design,
>>>>>>>> but it is important and useful that data can be traversed, joined,
>>>>>>>> without having reified edges.
>>>>>>>> In Sqlg at least, but I suspect it generalizes, I want to create the
>>>>>>>> notion of a "virtual edge". Which in meta data describes the join and
>>>>>>>> then the standard to(direction, "virtualEdgeName") will work.
>>>>>>>> In a way this is precisely to keep the graphy nature of gremlin, i.e.
>>>>>>>> traversing edges, and avoid using the manual join syntax you
>>>>> described.
>>>>>>>> CheersPieter
>>>>>>>> 
>>>>>>>> On Thu, 2019-04-18 at 14:15 -0600, Marko Rodriguez wrote:
>>>>>>>>> Hi,
>>>>>>>>> *** This is mainly for Kuppitz, but if others care.
>>>>>>>>> Was thinking last night about relational data and Gremlin. The T()
>>>>>>>>> step returns all the tables in the withStructure() RDBMS database.
>>>>>>>>> Tables are ‘complex values’ so they can't leave the VM (only a
>>>>> simple
>>>>>>>>> ‘toString’).
>>>>>>>>> Below is a fake Gremlin session. (and these are just ideas…) tables
>>>>>>>>> -> a ListLike of rows        rows -> a MapLike of primitives
>>>>>>>>> gremlin> g.T()==>t[people]==>t[addresses]gremlin>
>>>>>>>>> g.T(‘people’)==>t[people]gremlin>
>>>>>>>>> 
>>>>> g.T(‘people’).values()==>r[people:1]==>r[people:2]==>r[people:3]greml
>>>>>>>>> in>
>>>>>>>>> 
>>>>> g.T(‘people’).values().asMap()==>{name:marko,age:29}==>{name:kuppitz,
>>>>>>>>> age:10}==>{name:josh,age:35}gremlin>
>>>>>>>>> 
>>>>> g.T(‘people’).values().has(‘age’,gt(20))==>r[people:1]==>r[people:3]g
>>>>>>>>> remlin>
>>>>>>>>> 
>>>>> g.T(‘people’).values().has(‘age’,gt(20)).values(‘name’)==>marko==>jos
>>>>>>>>> h
>>>>>>>>> Makes sense. Nice that values() and has() generally apply to all
>>>>>>>>> ListLike and MapLike structures. Also, note how asMap() is the
>>>>>>>>> valueMap() of TP4, but generalizes to anything that is MapLike so it
>>>>>>>>> can be turned into a primitive form as a data-rich result from the
>>>>>>>>> VM.
>>>>>>>>> gremlin> g.T()==>t[people]==>t[addresses]gremlin>
>>>>>>>>> 
>>>>> g.T(‘addresses’).values().asMap()==>{name:marko,city:santafe}==>{name
>>>>>>>>> :kuppitz,city:tucson}==>{name:josh,city:desertisland}gremlin>
>>>>>>>>> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).
>>>>> by(se
>>>>>>>>> lect(‘a’).value(’name’).is(eq(select(‘b’).value(’name’))).
>>>>> 
>>>>>>>>> values().asMap()==>{a.name:marko,a.age:29,b.name:
>>>>> marko,b.city:santafe
>>>>>>>>> }==>{a.name:kuppitz,a.age:10,b.name:kuppitz,b.city:tucson}==>{
>>>>> a.name <http://a.name/> <http://a.name/ <http://a.name/>>:
>>>>>>>>> josh,a.age:35,b.name:josh,b.city:desertisland}gremlin>
>>>>>>>>> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).
>>>>> by(’n
>>>>>>>>> ame’). // shorthand for equijoin on name
>>>>>>>>> column/key           values().asMap()==>{a.name:marko,a.age:29,
>>>>> b.name <http://b.name/> <http://b.name/ <http://b.name/>>
>>>>>>>>> :marko,b.city:santafe}==>{a.name:kuppitz,a.age:10,b.name:kuppitz,
>>>>> b.ci <http://b.ci/> <http://b.ci/ <http://b.ci/>>
>>>>>>>>> ty:tucson}==>{a.name:josh,a.age:35,b.name:
>>>>> josh,b.city:desertisland}gr
>>>>>>>>> emlin>
>>>>>>>>> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).
>>>>> by(’n
>>>>>>>>> ame’)==>t[people<-name->addresses]  // without asMap(), just the
>>>>>>>>> complex value ‘toString'gremlin>
>>>>>>>>> And of course, all of this is strategized into a SQL call so its
>>>>>>>>> joins aren’t necessarily computed using TP4-VM resources.
>>>>>>>>> Anywho — what I hope to realize is the relationship between “links”
>>>>>>>>> (graph) and “joins” (tables). How can we make (bytecode-wise at
>>>>>>>>> least) RDBMS join operations and graph traversal operations ‘the
>>>>>>>>> same.’?
>>>>>>>>>    Singleton: Integer, String, Float, Double, etc. Collection:
>>>>>>>>> List, Map (Vertex, Table, Document)  Linkable: Vertex, Table
>>>>>>>>> Vertices and Tables can be “linked.” Unlike Collections, they don’t
>>>>>>>>> maintain a “parent/child” relationship with the objects they
>>>>>>>>> reference. What does this mean……….?
>>>>>>>>> Take care,Marko.
>>>>>>>>> http://rredux.com <http://rredux.com/> <http://rredux.com/ <
>>>>> http://rredux.com/>> <http://rredux.com/ <http://rredux.com/> <
>>>>> http://rredux.com/ <http://rredux.com/>>> <http://rredux.com/ <
>>>>> http://rredux.com/> <http://rredux.com/ <http://rredux.com/>> <
>>>>> http://rredux.com/ <http://rredux.com/> <http://rredux.com/ <
>>>>> http://rredux.com/>>>>
>>>>>> 
>>>>>> <diagrams.zip>
>>>>> 
>>>>> 
>>> 
>>> 
>

Re: What makes 'graph traversals' and 'relational joins' the same?

Reply via email to