Hey Josh,
> One more thing is needed: disjoint unions. I described these in my email on > algebraic property graphs. They are the "plus" operator to complement the > "times" operator in our type algebra. A disjoint union type is just like a > tuple type, but instead of having values for field a AND field b AND field > c, an instance of a union type has a value for field a XOR field b XOR > field c. Let me know if you are not completely sold on union types, and I > will provide additional motivation. Huh. That is an interesting concept. Can you please provide examples? >> The instructions: >> 1. relations can be “queried” for matching tuples. >> > > Yes. One thing I want to stress. The “universal bytecode” is just standard [op,arg*]* bytecode save that data access is via the “universal model's" db() instruction. Thus, AND/OR/pattern matching/etc. is all available. Likewise union(), repeat(), coalesce(), choose(), etc. are all available. db().and(as('a').values('knows').as('b'), or(as('a').has('name','marko'), as('a').values(‘created').count().is(gt(1))), as('b').values(’created').as('c')). path(‘c') As you can see, and()/or() pattern matching is possible and can be nested. *** SIDENOTE: In TP3, such nested and()/or() pattern matching is expressed using match() where the root grouping is assumed to be and()’d together. *** SIDENOTE: In TP4, I want to get rid of an explicit match() bytecode instruction and replace it with and()/or() instructions with prefix/suffix as()s. *** SIDENOTE: In TP4, in general, any nested bytecode that starts with as(x) is path(x) and any bytecode that ends with as(y) is where(eq(path(y)). > >> 2. tuple values can be projected out to yield primitives. >> > > Or other tuples, or tagged values. E.g. any edge projects to two vertices, > which are (trivial) tuples as opposed to primitive values. Good point. I started to do some modeling and I’ve been getting some good mileage from a new “pointer” primitive. Assume every N-Tuple has a unique ID (outside the data models id space). If so, the TinkerPop toy graph as N-Tuples is: [0][id:1,name:marko,age:29,created:*1,knows:*2] [1][0:*3] [2][0:*4,1:*5] [3][id:3,name:lop,lang:java] [4][id:2,name:vadas,age:27] [5][id:4,name:josh,age:32,created*:…] I know you are thinking that vertices don’t have “outE” projections so this isn’t inline with your thinking. However, check this out. If we assume that pointers are automatically dereferenced on reference then: db().has(‘name’,’marko’).values(‘knows’).values(‘name’) => vadas, josh Pointers are useful when a tuple has another tuple as a value. Instead of nesting, you “blank node.” DocumentDBs (with nested list/maps) would use this extensively. > Grumble... db() is just an alias for select()... grumble… select() and project() are existing instructions in TP3 (TP4?). SELECT db() will iterate all N-Tuples has() will filter out those N-Tuples with respective key/values. and()/or() are used for nested pattern matching. PROJECT values() will project out the n-tuple values. > Here, we are kind of mixing fields with property keys. Yes, > db().has('name', 'marko') can be used to search for elements of any type... > if that type agrees with the out-type of the "name" relation. In my > TinkerPop Classic example, the out type of "name" is (Person OR Project), > so your query will get you people or projects. Like indices, I don’t think we should introduce types. But this is up for further discussion... > Which is to say that we define the out-type of "name" to be the disjoint > union of all element types. The type becomes trivial. However, we can also > be more selective if we want to, restricting "name" only to a small subset > of types. Hm… I’m listening. I’m running into problems in my modeling when trying to generically fit things into relational tables. Maybe typing is necessary :(. > Good idea. TP4 can provide several "flavors" of interfaces, each of which > is idiomatic for each major class of database provider. Meeting the > providers halfway will make integration that much easier. Yes. With respects to graphdb providers, they want to think in terms of Vertex/Edges/etc. We want to put the bytecode in their language so: 1. It is easier for them to write custom strategies. 2. inV() can operate on their Vertex object without them having to implement inV(). *** Basically just like TP3 is now. GraphDB providers implement Graph/Vertex/Edge and everything works! However, they will then want to write custom instructions/strategies to do use their databases optimizations such as vertex-centric indices for outE(‘knows’).has(‘stars’,gt(3)).inV(). > I think we will see steps like V() and R() in Gremlin, but do not need them > in bytecode. Again, db() is just select(), V() is just select(), etc. The > model-specific interfaces adapt V() to select() etc. Hm. See my points above. Having providers reason at the “universal model”-level seems intense. ? > select(foafPerson) > > The second expression becomes: > > value("marko").select(foafName, "in").project("out”) > ...which you can rewrite with has(); I just think the above is clear w.r.t. > low-level operations. The value() is just providing a start of "marko", > which is a string value. No need for xsd:string if we have a deep mapping > between RDF and APG. Hm… I see your type “slots” model and fear the global typing in a (potentially) schemaless world. For me, everything should be standard has()/values() TP bytecode off of an “get all” db()… ? However, I’m open to seeing examples that demonstrate easier reasoning. Here are some examples I’ve been playing with using db()/has()/values() over DocumentDB data: https://gist.github.com/okram/764033e215906787217bc3176bb3bb15 <https://gist.github.com/okram/764033e215906787217bc3176bb3bb15> > Yes, nice. We can even take things a step further and decouple the query > language from the database. Have a property graph database, but want to > evaluate SPARQL? No problem. Have a relational database but want to do > Gremlin traversal? No worries. Yes. That is the whole point of this rabbit hole! * any query language -> universal model -> any data model. > Not sure about vendor-specific instructions; a > lot can be done in the mapping of relations to instructions which live > entirely within black box of the vendor code. Vendor instructions are crucial to allow the vendor to interact with their database’s custom optimizations. V().has(‘name’,’marko’) => jg:v-index(‘name’,’marko’) outE(‘knows’).has(‘stars’,gt(3)).inV() => jg:vcentric-index-out(‘knows’,’stars’,gt(3)) However, I would like to understand (via examples) what you are talking about as that sounds super interesting! > Back to indexes. IMO there should be a vendor-neutral API. Even extremely > vendor-specific indexes like geotemporal indexes could be exposed through a > common API, e.g. > > select("Dropoffs", {lat:37.7740, lon:122.4149, time:1556899302149}) > > which resolves to a vendor-specific index. I really don’t think so. There are too many variations to indexing. TP doesn’t need to go down that rats nest. That is what vendor-specific strategies/instructions are for — let them decide how to fold has().has().has() into a single index lookup. We see everything as linear scans. > I actually like your term "GMachine", and I don't think it's a bad idea to > keep "graph" front and center. Yes, TP4 shall have the flexibility to > interoperate with a variety of non-graph databases, but what it adds is a > unifying graph abstraction. I do like GMachine too. But, I think TP4 VM is best for now. I don’t think graph should be front-and-center. Graph is just another data model much like RDF, Document, Relational, etc. In fact, “graph” will have numerous flavors: Graph w/ multi-properties, meta-properties, vertex multi-labels, … - All captured in the pg/ interfaces. How exactly, not sure. Awesome stuff. Excited to receive your response. Marko. http://rredux.com <http://rredux.com/>