Hey, Thinking through things more and re-reading your emails.
Its like this: From an object you want to be able to go to the relations in which that object is a particular entry. From that relation you want to go to another object referenced in another entry. For instance assume this set of 3-tuple relations: talk_table speaker listener statement marko josh “sup bro" marko kuppitz “dude man" Lets say I’m at josh and I want to know what marko said to him: josh.adjacents(‘talk’,’listener’, …) // and this is why you have from().restrict().to() Using your from()/restrict()/to() notation: josh.from(‘talk’,’listener’).restrict(‘speaker’,marko).to(‘statement’) => “sup bro” I want to get some terminology down: Relation: a tuple with key/value entries. (basically a map) Key: A relation column name. Value: A relation column value. So there are three operations: 1. Get the relations in which the current object is a value for the specified key. [select] // like a back() 2. Filter out those relations that don’t have a particular value for a particular key. [filter] 3. Get those objects in the remaining relations associated with a particular key. [project] // like a forward() What did Kuppitz hear from Marko? kuppitz.select(‘talk’,’listener’).filter(‘speaker’,marko).project(‘statement’) => “dude man” So, how do we do this with just goto pointer chasing? kuppitz.goto(‘listener’).filter(goto(‘speaker’).is(marko)).goto(‘statement’) That is, I went from Kuppitz to all those relations in which he is a listener. I then filtered out those relations that don’t have marko as the speaker. I then went to the statements associated with those remaining relations. However, with this model, I’m assuming that “listener” is unique to the talk_table and this is not smart… Anywho, is this more in line with what you are getting at? Thanks for your patience, Marko. http://rredux.com <http://rredux.com/> > On Apr 24, 2019, at 11:30 AM, Marko Rodriguez <okramma...@gmail.com> wrote: > > Hi, > > I think I understand you now. The concept of local and non-local data is what > made me go “ah!” > > So let me reiterate what I think you are saying. > > v[1] is guaranteed to have its id data local to it. All other information > could be derived via id-based "equi-joins.” Thus, we can’t assume that a > vertex will always have its properties and edges co-located with it. However, > we can assume that it knows where to get its property and edge data when > requested. Assume the following RDBMS-style data structure that is referenced > by com.example.MyGraph. > > vertex_table > id label > 1 person > 2 person > … > > properties_table > id name age > 1 marko 29 > 2 josh 35 > … > > edge_table > id outV label inV > 0 1 knows 2 > … > > If we want to say that the above data structure is a graph, what is required > of “ComplexType” such that we can satisfy both Neo4j-style and RDBMS-style > graph encodings? Assume ComplexType is defined as: > > interface ComplexType > Iterator<T> adjacents(String label, Object... identifiers) > > Take this basic Gremlin traversal: > > g.V(1).out(‘knows’).values(‘name’) > > I now believe this should compile to the following: > > [goto,V,1] [goto,outE,knows] [goto,inV] [goto,properties,name] > > Given MyGraph/MyVertex/MyEdge all implement ComplexType and there is no local > caching of data on these respective objects, then the bytecode isn’t > rewritten and the following cascade of events occurs: > > mygraph > [goto,V,1] => > mygraph.adjacents(“V”,1) => > SELECT * FROM vertex_table WHERE id=1 > myvertex1 > [goto,outE,knows] => > myvertex1.adjacents(“outE”,”knows”) => > SELECT id FROM edge_table WHERE outV=1 AND label=knows > myedge0 > [goto,inV,knows] => > myedge1.adjacents(“inV”) => > SELECT vertex_table.id FROM vertex_table, edge_table WHERE > vertex_table.id=edge_table.inV AND edge_table.id=0 > myvertex2 > [goto,properties,name] => > myvertex2.adjacents(“properties”,”name”) => > SELECT name FROM properties_table WHERE id=2 > “josh" > > Lets review the ComplexType adjacents()-method: > > complexType.adjacents(label,identifiers...) > > complexType must have sufficient information to represent the tail of the > relation. > label specifies the relation type (we will always assume that a single String > is sufficient) > identifiers... must contain sufficient information to identify the head of > the relation. > > The return of the the method adjacents() is then the object(s) on the other > side of the relation(s). > > Now, given the way I have my data structure organized, I could beef up the > MyXXX implementation such that MyStrategy rewrites the base bytecode to: > > [goto,V,1] [goto,out,knows][goto,properties,name] > > The following cascade of events occurs: > > mygraph > [goto,V,1] => > mygraph.adjacents(“V”,1) => > SELECT * FROM vertex_table WHERE id=1 > myvertex1 > [goto,out,knows] => > myvertex1.adjacents(“outE”,”knows”) => > SELECT vertex_table.id FROM vertex_table,edge_table WHERE outV=1 AND > label=knows AND inV=vertex_table.id > myvertex2 > [goto,properties,name] => > myvertex2.adjacents(“properties”,”name”) => > SELECT name FROM properties_table WHERE id=2 > “josh" > > Now, I could really beef up MyStrategy when I realize that no path > information is used in the traversal. Thus, the base bytecode compiles to: > > [my:sql,SELECT name FROM properties_table,vertex_table,edge_table WHERE … > lots of join equalities] > > This would then just emit “josh” given the mygraph object. > > —— > > To recap. > > 1. There are primitives. > 2. There are Maps and Lists. > 3. There are ComplexTypes. > 4. ComplexTypes are adjacent to other objects via relations. > - These adjacent objects may be cached locally with the > ComplexType instance. > - These adjacent objects may require some database lookup. > - Regardless, TP4 doesn’t care — its up to the provider’s > ComplexType instance to decide how to resolve the adjacency. > 5. ComplexTypes don’t go over the wire — a ComplexTypeProxy with > appropriately provided toString() is all that leaves the TP4 VM. > > Finally, to solve the asMap()/asList() problem, we simply have: > > asMap(’name’,’age’) => complexType.adjacents(‘asMap’,’name’,’age') > asList() => complexType.adjacents(‘asList’) > > It is up to the complexType to manifest a Map or List accordingly. > > I see this as basically a big flatmap system. ComplexTypes just map from self > to any number of logical neighbors as specified by the relation. > > Am I getting it?, > Marko. > > http://rredux.com <http://rredux.com/> > > > > >> On Apr 24, 2019, at 9:56 AM, Joshua Shinavier <j...@fortytwo.net >> <mailto:j...@fortytwo.net>> wrote: >> >> On Tue, Apr 23, 2019 at 10:28 AM Marko Rodriguez <okramma...@gmail.com >> <mailto:okramma...@gmail.com>> >> wrote: >> >>> Hi, >>> >>> I think we are very close to something useable for TP4 structure/. Solving >>> this problem elegantly will open the flood gates on tp4/ development. >>> >> >> Yes, and formality often brings elegance. I don't think we can do much >> better than relational algebra and relational calculus in terms of >> formality, so to the extent we can reduce the fundamental TP4 traversal >> steps to basic relational operations, the floodgates will also be open to >> applications of query validation and query optimization from the last 40+ >> years of research. >> >> >> >>> I still don’t grock your comeFrom().goto() stuff. I don’t get the benefit >>> of having two instructions for “pointer chasing” instead of one. >>> >> >> There are just a handful of basic operations in relational algebra. >> Projection, selection, union, complement, Cartesian product. Joins, as well >> as all other operations, can be derived from these. A lot of graph >> traversal can be accomplished using only projection and selection, which is >> why we were able to get away with only to/goto and from/comeFrom in the >> examples above. However, I believe you do need both operations. You can >> kind of get away without from() if you assume that each vertex has local >> inE and outE references to incoming and outgoing edges, but I see that as a >> kind of pre-materialized from()/select(). If you think of edges strictly as >> relations, and represent them in a straightforward way with tables, you >> don't need the local inE and outE; whether you have them depends on the >> graph back-end. >> >> >> >>> Lets put that aside for now and lets turn to modeling a Vertex. Go back to >>> my original representation: >>> >>> vertex.goto(‘label’) >>> vertex.goto(‘id’) >>> >> >> Local (in my view). All good. >> >> >> >>> vertex.goto(‘outE’) >>> vertex.goto(‘inE’) >>> vertex.goto(‘properties’) >>> >> >> Non-local (in my view). You can use goto(), but if the goal is to bring the >> relational model into the fold, at a lower level you do have a select() >> operation. Unless you make projections local to vertices instead of edges, >> but then you just have the same problem in reverse. Am I making sense? >> >> >> Any object can be converted into a Map. In TinkerPop3 we convert vertices >>> into maps via: >>> >>> g.V().has(‘name’,’marko’).valueMap() => {name:marko,age:29} >>> g.V().has(‘name’,’marko’).valueMap(true) => >>> {id:1,label:person,name:marko,age:29} >>> >> >> Maps are A-OK. In the case of properties, I think where we differ is that >> you see a property like "name" as a key/value pair in a map local to the >> vertex. I see the property as an element of type "name", with the vertex as >> a value in its local map, logically if not physically. This allows maximum >> flexibility in terms of meta-properties -- exotic beasts which seem to be >> in a kind of limbo state in TP3, but if we're trying to be as general as >> possible, some data models we might want to pull in, like GRAKN.AI, do >> allow this kind of flexibility. >> >> >> >>> In the spirit of instruction reuse, we should have an asMap() instruction >>> that works for ANY object. (As a side: this gets back to ONLY sending >>> primitives over the wire, no >>> Vertex/Edge/Document/Table/Row/XML/ColumnFamily/etc.). Thus, the above is: >>> >>> g.V().has(‘name’,’marko’).properties().asMap() => >>> {name:marko,age:29} >>> g.V().has(‘name’,’marko’).asMap() => >>> {id:1,label:person,properties:{name:marko,age:29}} >>> >> >> Again, no argument here, although I would think of a map as an >> optimization. IMO, the fundamental projections from v[1] are id:1 and >> label:Person. You could make a map out of these, or just use an offset, >> since the keys are always the same. However, you can also build a map >> including any key you can turn into a function. properties() is such a key. >> >> >> You might ask, why didn’t it go to outE and inE and map-ify that data? >>> Because those are "sibling” references, not “children” references. >>> >>> goto(‘outE’) is a “sibling” reference. (a vertex does not contain >>> an edge) >>> goto(‘id’) is a “child” reference. (a vertex contains the id) >>> >> >> I agree with both of those statements. A vertex does not contain the edges >> incident on it. Again, I am thinking of properties a bit more like edges >> for maximum generality. >> >> >> >>> Where do we find sibling references? >>> Graphs: vertices don’t contain each other. >>> OO heaps: many objects don’t contain each other. >>> RDBMS: rows are linked by joins, but don’t contain each other. >>> >> >> Yep. >> >> >> So, the way in which we structure our references (pointers) determines the >>> shape of the data and ultimately how different instructions will behave. We >>> can’t assume that asMap() knows anything about >>> vertices/edges/documents/rows/tables/etc. It will simply walk all >>> child-references and create a map. >>> >> >> Just to play devil's advocate, you *could* include "inE" and "outE" as keys >> in the local map of a vertex; it's just a matter of what you choose to do. >> inE and outE are perfectly good functions from a vertex to a set of edges. >> >> >> We don’t want TP to get involved in “complex data types.” >> >> >> Well, how do you feel about algebraic data types? They are simple, and >> allow you to capture arbitrary relations as elements. >> >> >> >>> We don’t care. You can propagate MyDatabaseObject through the TP4 VM >>> pipeline and load your object up with methods for optimizations with your >>> DB and all that, but for TP4, your object is just needs to implement: >>> >>> ComplexType >>> - Iterator<T> children(String label) >>> - Iterator<T> siblings(String label) >>> - default Iterator<T> references(String label) { >>> IteratorUtils.concat(children(label), siblings(label)) } >>> - String toString() >>> >> >> I don't think you need siblings(). I think you need a more generic >> select(), but since this is graph traversal, select() only needs the >> identifier of a type (e.g. "knows") and the name of a field (e.g. "out"). >> >> >> >>> When a ComplexType goes over the wire to the user, it just represented as >>> a ComplexTypeProxy with a toString() like v[1], >>> tinkergraph[vertices:10,edges:34], etc. All references are disconnected. >>> Yes, even children references. We do not want language drivers having to >>> know about random object types and have to deal with implementing >>> serializers and all that non-sense. The TP4 serialization protocol is >>> primitives, maps, lists, bytecode, and traversers. Thats it! >>> >> >> No disagreement here. I think the only disconnect is about what keys are >> local to what elements. Some keys are hard-local, like id and type for all >> elements, and "in" and "out" for edges and properties. These *should* be >> carried over the wire. Properties, incident edges, etc. possibly but not >> necessarily. >> >> >> >>> *** Only Maps and Lists (that don’t contain complex data types) maintain >>> their child references “over the wire.” >>> >> >> Sure. >> >> >> >>> I don’t get your hypergraph example, so let me try another example: >>> >>> tp ==member==> marko, josh >>> >>> TP is a vertex and there is a directed hyperedge with label “member” >>> connecting to marko and josh vertices. >>> >> >> That's kind of an unlabeled hyperedge; I am not sure we need to support >> those. Look at the GRAKN data model, or at HypergraphDB or earlier >> hypergraph data models. A hyperedge is essentially a tuple in which each >> components has a label ("role", in GRAKN). In other words, it is a relation >> in which some of the columns may be foreign keys. In your example, rather >> than "member" connecting "tp" to a set of vertices, you might have >> something like Collaborated{person1:marko, person2:josh, project=tp}. Then >> a query like "who did marko collaborate with on tp?" becomes: >> >> tp.from("Collaborated", "project").restrict("person1", >> "marko").to("person2") >> >> Of course, if you want this relationship to be symmetrical, you can >> introduce a constraint. >> >> >> >>> tp.goto(“outE”).filter(goto(“label”).is(“member”)).goto(“inV”) >>> >>> Looks exactly like a property graph query? However, its not because >>> goto(“inV”) returns 2 vertices, not 1. >> >> >> I think your example works well for the type of hypergraph you are >> referring to. It's just different than the type of hypergraph I am >> referring to. I think by now you know that I would rather see a from() >> instead of that goto("outE"). I also agree you can make a function out of >> outE, and expose it using a map, if you really want to. Under the hood, >> however, I see this as traversing a projection head to tail rather than >> tail to head. >> >> >> >>> EdgeVertexFlatmapFunction works for property graphs and hypergraphs. It >>> doesn’t care — it just follows goto() pointers! That is, it follows the >>> ComplexType.references(“inV”). Multi-properties are the same as well. >>> Likewise for meta-properties. These data model variations are not “special” >>> to the TP4 VM. It just walks references whether there are 0,1,2, or N of >>> them. >>> >> >> At a high level, I agree with what you are saying. We should have a common >> data model that unifies traditional property graphs, hypergraphs, >> relational databases, and any other data model that can be modeled using >> algebraic data types with references. We define a small set of basic >> operations on this data model which can be combined into more complex >> operations that are amenable to static analysis and optimization. We can >> send graph data over the wire as collections of elements using the bare >> minimum of local fields, and reconstruct the graph on the other end. We can >> operate on streams of such elements under suitable conditions (elements >> sent in an appropriate order). The basic operations are not tied to the >> JVM, and should be straightforward to implement in other frameworks. >> >> >> >>> >>> Thus, what is crucial to all this is the “shape of the data.” Using your >>> pointers wisely so instructions produce useful results. >>> >> >> +1 >> >> >> >>> Does any of what I wrote update your comeFrom().goto() stuff? >> >> >> Sadly, no, though I appreciate that you are coming from a slightly >> different place w.r.t. properties, hypergraphs, and most importantly, the >> role of a type system. >> >> >> >>> If not, can you please explain to me why comeFrom() is cool — sorry for >>> being dense (aka “being Kuppitz" — thats right, I said it. boom!). >>> >> >> Let's keep iterating until we reach a fixed point. Maybe Daniel's already >> there. >> >> Josh >> >> >> >>> >>> Thanks, >>> Marko. >>> >>> http://rredux.com <http://rredux.com/> <http://rredux.com/ >>> <http://rredux.com/>> >>> >>> >>> >>> >>>> On Apr 23, 2019, at 10:25 AM, Joshua Shinavier <j...@fortytwo.net >>>> <mailto:j...@fortytwo.net>> >>> wrote: >>>> >>>> On Tue, Apr 23, 2019 at 5:14 AM Marko Rodriguez <okramma...@gmail.com >>>> <mailto:okramma...@gmail.com>> >>>> wrote: >>>> >>>>> Hey Josh, >>>>> >>>>> This gets to the notion I presented in “The Fabled GMachine.” >>>>> http://rredux.com/the-fabled-gmachine.html >>>>> <http://rredux.com/the-fabled-gmachine.html> < >>>>> http://rredux.com/the-fabled-gmachine.html >>>>> <http://rredux.com/the-fabled-gmachine.html>> (first paragraph of >>>>> “Structures, Processes, and Languages” section) >>>>> >>>>> All that exists are memory addresses that contain either: >>>>> >>>>> 1. A primitive >>>>> 2. A set of labeled references to other references or primitives. >>>>> >>>>> Using your work and the above, here is a super low-level ‘bytecode' for >>>>> property graphs. >>>>> >>>>> v.goto("id") => 1 >>>>> >>>> >>>> LGTM. An id is special because it is uniquely identifying / is a primary >>>> key for the element. However, it is also just a field of the element, >>> like >>>> "in"/"inV" and "out"/"outV" are fields of an edge. As an aside, an id >>> would >>>> only really need to be unique among other elements of the same type. To >>> the >>>> above, I would add: >>>> >>>> v.type() => Person >>>> >>>> ...a special operation which takes you from an element to its type. This >>> is >>>> important if unions are supported; e.g. "name" in my example can apply >>>> either to a Person or a Project. >>>> >>>> >>>> v.goto("label") => person >>>>> >>>> >>>> Or that. Like "id", "type"/"label" is special. You can think of it as a >>>> field; it's just a different sort of field which will have the same value >>>> for all elements of any given type. >>>> >>>> >>>> >>>>> v.goto("properties").goto("name") => "marko" >>>>> >>>> >>>> OK, properties. Are properties built-in as a separate kind of thing from >>>> edges, or can we treat them the same as vertices and edges here? I think >>> we >>>> can treat them the same. A property, in the algebraic model I described >>>> above, is just an element with two fields, the second of which is a >>>> primitive value. As I said, I think we need two distinct traversal >>>> operations -- projection and selection -- and here is where we can use >>> the >>>> latter. Here, I will call it "comeFrom". >>>> >>>> v.comeFrom("name", "out").goto("in") => {"marko"} >>>> >>>> You can think of this comeFrom as a special case of a select() function >>>> which takes a type -- "name" -- and a set of key/value pairs {("out", >>> v)}. >>>> It returns all matching elements of the given type. You then project to >>> the >>>> "in" value using your goto. I wrote {"marko"} as a set, because comeFrom >>>> can give you multiple properties, depending on whether multi-properties >>> are >>>> supported. >>>> >>>> Note how similar this is to an edge traversal: >>>> >>>> v.comeFrom("knows", "out").goto("in") => {v[2], v[4]} >>>> >>>> Of course, you could define "properties" in such a way that a >>>> goto("properties") does exactly this under the hood, but in terms of low >>>> level instructions, you need something like comeFrom. >>>> >>>> >>>> v.goto("properties").goto("name").goto(0) => "m" >>>>> >>>> >>>> This is where the notion of optionals becomes handy. You can make >>>> array/list indices into fields like this, but IMO you should also make >>> them >>>> safe. E.g. borrowing Haskell syntax for a moment: >>>> >>>> v.goto("properties").goto("name").goto(0) => Just 'm' >>>> >>>> v.goto("properties").goto("name").goto(5) => Nothing >>>> >>>> >>>> v.goto("outE").goto("inV") => v[2], v[4] >>>>> >>>> >>>> I am not a big fan of untyped "outE", but you can think of this as a >>> union >>>> of all v.comeFrom(x, "out").goto("in"), where x is any edge type. Only >>>> "knows" and "created" are edge types which are applicable to "Person", so >>>> you will only get {v[2], v[4]}. If you want to get really crazy, you can >>>> allow x to be any type. Then you get {v[2], v[4], 29, "marko"}. >>>> >>>> >>>> >>>>> g.goto("V").goto(1) => v[1] >>>>> >>>> >>>> That, or you give every element a virtual field called "graph". So: >>>> >>>> v.goto("graph") => g >>>> >>>> g.comeFrom("Person", "graph") => {v[1], v[2], v[4], v[6]} >>>> >>>> g.comeFrom("Person", "graph").restrict("id", 1) >>>> >>>> ...where restrict() is the relational "sigma" operation as above, not to >>> be >>>> confused with TinkerPop's select(), filter(), or has() steps. Again, I >>>> prefer to specify a type in comeFrom (i.e. we're looking specifically >>> for a >>>> Person with id of 1), but you could also do a comprehension g.comeFrom(x, >>>> "graph"), letting x range over all types. >>>> >>>> >>>> >>>>> The goto() instruction moves the “memory reference” (traverser) from the >>>>> current “memory address” to the “memory address” referenced by the >>> goto() >>>>> argument. >>>>> >>>> >>>> Agreed, if we also think of primitive values as memory references. >>>> >>>> >>>> >>>>> The Gremlin expression: >>>>> >>>>> g.V().has(‘name’,’marko’).out(‘knows’).drop() >>>>> >>>>> ..would compile to: >>>>> >>>>> >>>>> >>> g.goto(“V”).filter(goto(“properties”).goto(“name”).is(“marko”)).goto(“outE”).filter(goto(“label”).is(“knows”)).goto(“inV”).free() >>>>> >>>> >>>> >>>> In the alternate universe: >>>> >>>> g.comeFrom("Person", "graph").comeFrom("name", "out").restrict("in", >>>> "marko").goto("out").comeFrom("knows", "out").goto("in").free() >>>> >>>> I have wimped out on free() and just left it as you had it, but I think >>> it >>>> would be worthwhile to explore a monadic syntax for traversals with >>>> side-effects. Different topic. >>>> >>>> Now, all of this "out", "in" business is getting pretty repetitive, >>> right? >>>> Well, the field names become more diverse if we allow hyper-edges and >>>> generalized ADTs. E.g. in my Trip example, say I want to know all >>> drop-off >>>> locations for a given rider: >>>> >>>> u.comeFrom("Trip", "rider").goto("dropoff").goto("place") >>>> >>>> Done. >>>> >>>> >>>> >>>>> If we can get things that “low-level” and still efficient to compile, >>> then >>>>> we can model every data structure. All you are doing is pointer chasing >>>>> through a withStructure() data structure. . >>>>> >>>> >>>> Agreed. >>>> >>>> >>>> No one would ever want to write strategies for goto()-based Bytecode. >>>> >>>> >>>> Also agreed. >>>> >>>> >>>> >>>>> Thus, perhaps there could be a PropertyGraphDecorationStrategy that >>> does: >>>>> >>>>> [...] >>>> >>>> >>>> No argument here, though the alternate-universe "bytecode" would look >>>> slightly different. And the high-level syntax should also be able to deal >>>> with generalized relations / data types gracefully. As a thought >>>> experiment, suppose we were to define the steps to() as your goto(), and >>>> from() as my comeFrom(). Then traversals like: >>>> >>>> u.from("Trip", "rider").to("dropoff").to("time") >>>> >>>> ...look pretty good as-is, and are not too low-level. However, ordinary >>>> edge traversals like: >>>> >>>> v.from("knows", "out").to("in") >>>> >>>> ...do look a little Assembly-like. So in/out/both etc. remain as they >>> are, >>>> but are shorthand for from() and to() steps using "out" or "in": >>>> >>>> v.out("knows") === v.outE("knows").inV() === v.from("knows", >>> "out").to("in") >>>> >>>> >>>> [I AM NOW GOING OFF THE RAILS] >>>>> [sniiiiip] >>>>> >>>> >>>> Sure. Again, I like the idea of wrapping side-effects in monads. What >>> would >>>> that look like in a Gremlinesque fluent syntax? I don't quite know, but >>> if >>>> we think of the dot as a monadic bind operation like Haskell's >>=, then >>>> perhaps the monadic expressions look pretty similar to what you have just >>>> sketched out. Might have to be careful about what it means to nest >>>> operations as in your addEdge examples. >>>> >>>> >>>> >>>> [I AM NOW BACK ON THE RAILS] >>>>> >>>>> Its as if “properties”, “outE”, “label”, “inV”, etc. references mean >>>>> something to property graph providers and they can do more intelligent >>>>> stuff than what MongoDB would do with such information. However, >>> someone, >>>>> of course, can create a MongoDBPropertyGraphStrategy that would make >>>>> documents look like vertices and edges and then use O(log(n)) lookups on >>>>> ids to walk the graph. However, if that didn’t exist, it would still do >>>>> something that works even if its horribly inefficient as every database >>> can >>>>> make primitives with references between them! >>>>> >>>> >>>> I'm on the same same pair of rails. >>>> >>>> >>>> >>>>> Anywho @Josh, I believe goto() is what you are doing with >>> multi-references >>>>> off an object. How do we make it all clean, easy, and universal? >>>>> >>>> >>>> Let me know what you think of the above. >>>> >>>> Josh >>>> >>>> >>>> >>>>> >>>>> Marko. >>>>> >>>>> http://rredux.com <http://rredux.com/> <http://rredux.com/ >>>>> <http://rredux.com/>> >>>>> >>>>> >>>>> >>>>> >>>>>> On Apr 22, 2019, at 6:42 PM, Joshua Shinavier <j...@fortytwo.net >>>>>> <mailto:j...@fortytwo.net>> >>> wrote: >>>>>> >>>>>> Ah, glad you asked. It's all in the pictures. I have nowhere to put >>> them >>>>> online at the moment... maybe this attachment will go through to the >>> list? >>>>>> >>>>>> Btw. David Spivak gave his talk today at Uber; it was great. Juan >>>>> Sequeda (relational <--> RDF mapping guy) was also here, and Ryan joined >>>>> remotely. Really interesting discussion about databases vs. graphs, and >>>>> what category theory brings to the table. >>>>>> >>>>>> >>>>>> On Mon, Apr 22, 2019 at 1:45 PM Marko Rodriguez <okramma...@gmail.com >>>>>> <mailto:okramma...@gmail.com> >>>>> <mailto:okramma...@gmail.com <mailto:okramma...@gmail.com>>> wrote: >>>>>> Hey Josh, >>>>>> >>>>>> I’m digging what you are saying, but the pictures didn’t come through >>>>> for me ? … Can you provide them again (or if dev@ is filtering them, >>> can >>>>> you give me URLs to them)? >>>>>> >>>>>> Thanks, >>>>>> Marko. >>>>>> >>>>>> >>>>>>> On Apr 21, 2019, at 12:58 PM, Joshua Shinavier <j...@fortytwo.net >>>>>>> <mailto:j...@fortytwo.net> >>>>> <mailto:j...@fortytwo.net <mailto:j...@fortytwo.net>>> wrote: >>>>>>> >>>>>>> On the subject of "reified joins", maybe be a picture will be worth a >>>>> few words. As I said in the thread < >>>>> https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ >>>>> <https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ> >>> < >>>>> https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ >>>>> <https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ> >>>>> >>>>> on property graph standardization, if you think of vertex labels, edge >>>>> labels, and property keys as types, each with projections to two other >>>>> types, there is a nice analogy with relations of two columns, and this >>>>> analogy can be easily extended to hyper-edges. Here is what the schema >>> of >>>>> the TinkerPop classic graph looks like if you make each type (e.g. >>> Person, >>>>> Project, knows, name) into a relation: >>>>>>> >>>>>>> >>>>>>> >>>>>>> I have made the vertex types salmon-colored, the edge types yellow, >>>>> the property types green, and the data types blue. The "o" and "I" >>> columns >>>>> represent the out-type (e.g. out-vertex type of Person) and in-type >>> (e.g. >>>>> property value type of String) of each relation. More than two arrows >>> from >>>>> a column represent a coproduct, e.g. the out-type of "name" is Person OR >>>>> Project. Now you can think of out() and in() as joins of two tables on a >>>>> primary and foreign key. >>>>>>> >>>>>>> We are not limited to "out" and "in", however. Here is the ternary >>>>> relationship (hyper-edge) from hyper-edge slide < >>>>> >>> https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49 >>> >>> <https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49> >>>>> < >>>>> >>> https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49 >>>>> >>>>> of my Graph Day preso, which has three columns/roles/projections: >>>>>>> >>>>>>> >>>>>>> >>>>>>> I have drawn Says in light blue to indicate that it is a generalized >>>>> element; it has projections other than "out" and "in". Now the line >>> between >>>>> relations and edges begins to blur. E.g. in the following, is >>> PlaceEvent a >>>>> vertex or a property? >>>>>>> >>>>>>> >>>>>>> >>>>>>> With the right type system, we can just speak of graph elements, and >>>>> use "vertex", "edge", "property" when it is convenient. In the >>> relational >>>>> model, they are relations. If you materialize them in a relational >>>>> database, they are rows. In any case, you need two basic graph traversal >>>>> operations: >>>>>>> project() -- forward traversal of the arrows in the above diagrams. >>>>> Takes you from an element to a component like in-vertex. >>>>>>> select() -- reverse traversal of the arrows. Allows you to answer >>>>> questions like "in which Trips is John Doe the rider?" >>>>>>> >>>>>>> Josh >>>>>>> >>>>>>> >>>>>>> On Fri, Apr 19, 2019 at 10:03 AM Marko Rodriguez < >>> okramma...@gmail.com >>>>> <mailto:okramma...@gmail.com> <mailto:okramma...@gmail.com <mailto: >>>>> okramma...@gmail.com>>> wrote: >>>>>>> Hello, >>>>>>> >>>>>>> I agree with everything you say. Here is my question: >>>>>>> >>>>>>> Relational database — join: Table x Table x equality function >>>>> -> Table >>>>>>> Graph database — traverser: Vertex x edge label -> Vertex >>>>>>> >>>>>>> I want a single function that does both. The only think was to >>>>> represent traverser() in terms of join(): >>>>>>> >>>>>>> Graph database — traverser: Vertices x Vertex x equality >>>>> function -> Vertices >>>>>>> >>>>>>> For example, >>>>>>> >>>>>>> V().out(‘address’) >>>>>>> >>>>>>> ==> >>>>>>> >>>>>>> g.join(V().hasLabel(‘person’).as(‘a’) >>>>>>> V().hasLabel(‘addresses’).as(‘b’)). >>>>>>> by(‘name’).select(?address vertex?) >>>>>>> >>>>>>> That is, join the vertices with themselves based on some predicate to >>>>> go from vertices to vertices. >>>>>>> >>>>>>> However, I would like instead to transform the relational database >>>>> join() concept into a traverser() concept. Kuppitz and I were talking >>> the >>>>> other day about a link() type operator that says: “try and link to this >>>>> thing in some specified way.” .. ?? The problem we ran into is again, >>> “link >>>>> it to what?” >>>>>>> >>>>>>> - in graph, the ‘to what’ is hardcoded so you don’t need to >>>>> specify anything. >>>>>>> - in rdbms, the ’to what’ is some other specified table. >>>>>>> >>>>>>> So what does the link() operator look like? >>>>>>> >>>>>>> —— >>>>>>> >>>>>>> Some other random thoughts…. >>>>>>> >>>>>>> Relational databases join on the table (the whole collection) >>>>>>> Graph databases traverser on the vertex (an element of the whole >>>>> collection) >>>>>>> >>>>>>> We can make a relational database join on single row (by providing a >>>>> filter to a particular primary key). This is the same as a table with >>> one >>>>> row. Likewise, for graph in the join() context above: >>>>>>> >>>>>>> V(1).out(‘address’) >>>>>>> >>>>>>> ==> >>>>>>> >>>>>>> g.join(V(1).as(‘a’) >>>>>>> V().hasLabel(‘addresses’).as(‘b’)). >>>>>>> by(‘name’).select(?address vertex?) >>>>>>> >>>>>>> More thoughts please…. >>>>>>> >>>>>>> Marko. >>>>>>> >>>>>>> http://rredux.com <http://rredux.com/> <http://rredux.com/ < >>>>> http://rredux.com/>> <http://rredux.com/ <http://rredux.com/> < >>>>> http://rredux.com/ <http://rredux.com/>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Apr 19, 2019, at 4:20 AM, pieter martin <pieter.mar...@gmail.com >>>>> <mailto:pieter.mar...@gmail.com> <mailto:pieter.mar...@gmail.com >>> <mailto: >>>>> pieter.mar...@gmail.com>>> wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> The way I saw it is that the big difference is that graph's have >>>>>>>> reified joins. This is both a blessing and a curse. >>>>>>>> A blessing because its much easier (less text to type, less mistakes, >>>>>>>> clearer semantics...) to traverse an edge than to construct a manual >>>>>>>> join.A curse because there are almost always far more ways to >>>>> traverse >>>>>>>> a data set than just by the edges some architect might have >>>>> considered >>>>>>>> when creating the data set. Often the architect is not the domain >>>>>>>> expert and the edges are a hardcoded layout of the dataset, which >>>>>>>> almost certainly won't survive the real world's demands. In graphs, >>>>> if >>>>>>>> their are no edges then the data is not reachable, except via indexed >>>>>>>> lookups. This is the standard engineering problem of database design, >>>>>>>> but it is important and useful that data can be traversed, joined, >>>>>>>> without having reified edges. >>>>>>>> In Sqlg at least, but I suspect it generalizes, I want to create the >>>>>>>> notion of a "virtual edge". Which in meta data describes the join and >>>>>>>> then the standard to(direction, "virtualEdgeName") will work. >>>>>>>> In a way this is precisely to keep the graphy nature of gremlin, i.e. >>>>>>>> traversing edges, and avoid using the manual join syntax you >>>>> described. >>>>>>>> CheersPieter >>>>>>>> >>>>>>>> On Thu, 2019-04-18 at 14:15 -0600, Marko Rodriguez wrote: >>>>>>>>> Hi, >>>>>>>>> *** This is mainly for Kuppitz, but if others care. >>>>>>>>> Was thinking last night about relational data and Gremlin. The T() >>>>>>>>> step returns all the tables in the withStructure() RDBMS database. >>>>>>>>> Tables are ‘complex values’ so they can't leave the VM (only a >>>>> simple >>>>>>>>> ‘toString’). >>>>>>>>> Below is a fake Gremlin session. (and these are just ideas…) tables >>>>>>>>> -> a ListLike of rows rows -> a MapLike of primitives >>>>>>>>> gremlin> g.T()==>t[people]==>t[addresses]gremlin> >>>>>>>>> g.T(‘people’)==>t[people]gremlin> >>>>>>>>> >>>>> g.T(‘people’).values()==>r[people:1]==>r[people:2]==>r[people:3]greml >>>>>>>>> in> >>>>>>>>> >>>>> g.T(‘people’).values().asMap()==>{name:marko,age:29}==>{name:kuppitz, >>>>>>>>> age:10}==>{name:josh,age:35}gremlin> >>>>>>>>> >>>>> g.T(‘people’).values().has(‘age’,gt(20))==>r[people:1]==>r[people:3]g >>>>>>>>> remlin> >>>>>>>>> >>>>> g.T(‘people’).values().has(‘age’,gt(20)).values(‘name’)==>marko==>jos >>>>>>>>> h >>>>>>>>> Makes sense. Nice that values() and has() generally apply to all >>>>>>>>> ListLike and MapLike structures. Also, note how asMap() is the >>>>>>>>> valueMap() of TP4, but generalizes to anything that is MapLike so it >>>>>>>>> can be turned into a primitive form as a data-rich result from the >>>>>>>>> VM. >>>>>>>>> gremlin> g.T()==>t[people]==>t[addresses]gremlin> >>>>>>>>> >>>>> g.T(‘addresses’).values().asMap()==>{name:marko,city:santafe}==>{name >>>>>>>>> :kuppitz,city:tucson}==>{name:josh,city:desertisland}gremlin> >>>>>>>>> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)). >>>>> by(se >>>>>>>>> lect(‘a’).value(’name’).is(eq(select(‘b’).value(’name’))). >>>>> >>>>>>>>> values().asMap()==>{a.name:marko,a.age:29,b.name: >>>>> marko,b.city:santafe >>>>>>>>> }==>{a.name:kuppitz,a.age:10,b.name:kuppitz,b.city:tucson}==>{ >>>>> a.name <http://a.name/> <http://a.name/ <http://a.name/>>: >>>>>>>>> josh,a.age:35,b.name:josh,b.city:desertisland}gremlin> >>>>>>>>> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)). >>>>> by(’n >>>>>>>>> ame’). // shorthand for equijoin on name >>>>>>>>> column/key values().asMap()==>{a.name:marko,a.age:29, >>>>> b.name <http://b.name/> <http://b.name/ <http://b.name/>> >>>>>>>>> :marko,b.city:santafe}==>{a.name:kuppitz,a.age:10,b.name:kuppitz, >>>>> b.ci <http://b.ci/> <http://b.ci/ <http://b.ci/>> >>>>>>>>> ty:tucson}==>{a.name:josh,a.age:35,b.name: >>>>> josh,b.city:desertisland}gr >>>>>>>>> emlin> >>>>>>>>> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)). >>>>> by(’n >>>>>>>>> ame’)==>t[people<-name->addresses] // without asMap(), just the >>>>>>>>> complex value ‘toString'gremlin> >>>>>>>>> And of course, all of this is strategized into a SQL call so its >>>>>>>>> joins aren’t necessarily computed using TP4-VM resources. >>>>>>>>> Anywho — what I hope to realize is the relationship between “links” >>>>>>>>> (graph) and “joins” (tables). How can we make (bytecode-wise at >>>>>>>>> least) RDBMS join operations and graph traversal operations ‘the >>>>>>>>> same.’? >>>>>>>>> Singleton: Integer, String, Float, Double, etc. Collection: >>>>>>>>> List, Map (Vertex, Table, Document) Linkable: Vertex, Table >>>>>>>>> Vertices and Tables can be “linked.” Unlike Collections, they don’t >>>>>>>>> maintain a “parent/child” relationship with the objects they >>>>>>>>> reference. What does this mean……….? >>>>>>>>> Take care,Marko. >>>>>>>>> http://rredux.com <http://rredux.com/> <http://rredux.com/ < >>>>> http://rredux.com/>> <http://rredux.com/ <http://rredux.com/> < >>>>> http://rredux.com/ <http://rredux.com/>>> <http://rredux.com/ < >>>>> http://rredux.com/> <http://rredux.com/ <http://rredux.com/>> < >>>>> http://rredux.com/ <http://rredux.com/> <http://rredux.com/ < >>>>> http://rredux.com/>>>> >>>>>> >>>>>> <diagrams.zip> >>>>> >>>>> >>> >>> >