Re: The Fundamental Structure Instructions Already Exist! (w/ RDBMS Example)

Marko Rodriguez Mon, 06 May 2019 03:59:00 -0700

Hey Josh,


> One more thing is needed: disjoint unions. I described these in my email on
> algebraic property graphs. They are the "plus" operator to complement the
> "times" operator in our type algebra. A disjoint union type is just like a
> tuple type, but instead of having values for field a AND field b AND field
> c, an instance of a union type has a value for field a XOR field b XOR
> field c. Let me know if you are not completely sold on union types, and I
> will provide additional motivation.

Huh. That is an interesting concept. Can you please provide examples?

>> The instructions:
>>        1. relations can be “queried” for matching tuples.
>> 
> 
> Yes.

One thing I want to stress. The “universal bytecode” is just standard 
[op,arg*]* bytecode save that data access is via the “universal model's" db() 
instruction. Thus, AND/OR/pattern matching/etc. is all available. Likewise 
union(), repeat(), coalesce(), choose(), etc. are all available.

db().and(as('a').values('knows').as('b'),
         or(as('a').has('name','marko'),
            as('a').values(‘created').count().is(gt(1))),
         as('b').values(’created').as('c')).
     path(‘c')

As you can see, and()/or() pattern matching is possible and can be nested.
  *** SIDENOTE: In TP3, such nested and()/or() pattern matching is expressed 
using match() where the root grouping is assumed to be and()’d together.
  *** SIDENOTE: In TP4, I want to get rid of an explicit match() bytecode 
instruction and replace it with and()/or() instructions with prefix/suffix 
as()s.
  *** SIDENOTE: In TP4, in general, any nested bytecode that starts with as(x) 
is path(x) and any bytecode that ends with as(y) is where(eq(path(y)).

> 
>>        2. tuple values can be projected out to yield primitives.
>> 
> 
> Or other tuples, or tagged values. E.g. any edge projects to two vertices,
> which are (trivial) tuples as opposed to primitive values.

Good point. I started to do some modeling and I’ve been getting some good 
mileage from a new “pointer” primitive. Assume every N-Tuple has a unique ID 
(outside the data models id space). If so, the TinkerPop toy graph as N-Tuples 
is:

[0][id:1,name:marko,age:29,created:*1,knows:*2]
[1][0:*3]
[2][0:*4,1:*5]
[3][id:3,name:lop,lang:java]
[4][id:2,name:vadas,age:27]
[5][id:4,name:josh,age:32,created*:…]

I know you are thinking that vertices don’t have “outE” projections so this 
isn’t inline with your thinking. However, check this out. If we assume that 
pointers are automatically dereferenced on reference then:

db().has(‘name’,’marko’).values(‘knows’).values(‘name’) => vadas, josh

Pointers are useful when a tuple has another tuple as a value. Instead of 
nesting, you “blank node.” DocumentDBs (with nested list/maps) would use this 
extensively.

> Grumble... db() is just an alias for select()... grumble…

select() and project() are existing instructions in TP3 (TP4?).

        SELECT
        db() will iterate all N-Tuples
        has() will filter out those N-Tuples with respective key/values.
        and()/or() are used for nested pattern matching.
        
        PROJECT
        values() will project out the n-tuple values.

> Here, we are kind of mixing fields with property keys. Yes,
> db().has('name', 'marko') can be used to search for elements of any type...
> if that type agrees with the out-type of the "name" relation. In my
> TinkerPop Classic example, the out type of "name" is (Person OR Project),
> so your query will get you people or projects.

Like indices, I don’t think we should introduce types. But this is up for 
further discussion...

> Which is to say that we define the out-type of "name" to be the disjoint
> union of all element types. The type becomes trivial. However, we can also
> be more selective if we want to, restricting "name" only to a small subset
> of types.

Hm… I’m listening. I’m running into problems in my modeling when trying to 
generically fit things into relational tables. Maybe typing is necessary :(.


> Good idea. TP4 can provide several "flavors" of interfaces, each of which
> is idiomatic for each major class of database provider. Meeting the
> providers halfway will make integration that much easier.

Yes. With respects to graphdb providers, they want to think in terms of 
Vertex/Edges/etc. We want to put the bytecode in their language so:

        1. It is easier for them to write custom strategies.
        2. inV() can operate on their Vertex object without them having to 
implement inV().
                *** Basically just like TP3 is now. GraphDB providers implement 
Graph/Vertex/Edge and everything works! However, they will then want to write 
custom instructions/strategies to do use their databases optimizations such as 
vertex-centric indices for outE(‘knows’).has(‘stars’,gt(3)).inV().


> I think we will see steps like V() and R() in Gremlin, but do not need them
> in bytecode. Again, db() is just select(), V() is just select(), etc. The
> model-specific interfaces adapt V() to select() etc.

Hm. See my points above. Having providers reason at the “universal model”-level 
seems intense. ?


>    select(foafPerson)
> 
> The second expression becomes:
> 
>    value("marko").select(foafName, "in").project("out”)
> ...which you can rewrite with has(); I just think the above is clear w.r.t.
> low-level operations. The value() is just providing a start of "marko",
> which is a string value. No need for xsd:string if we have a deep mapping
> between RDF and APG.


Hm… I see your type “slots” model and fear the global typing in a (potentially) 
schemaless world. For me, everything should be standard has()/values() TP 
bytecode off of an “get all” db()…  ? However, I’m open to seeing examples that 
demonstrate easier reasoning.

Here are some examples I’ve been playing with using db()/has()/values() over 
DocumentDB data:
        https://gist.github.com/okram/764033e215906787217bc3176bb3bb15 
<https://gist.github.com/okram/764033e215906787217bc3176bb3bb15>
> Yes, nice. We can even take things a step further and decouple the query
> language from the database. Have a property graph database, but want to
> evaluate SPARQL? No problem. Have a relational database but want to do
> Gremlin traversal? No worries.

Yes. That is the whole point of this rabbit hole!
        * any query language -> universal model -> any data model.

> Not sure about vendor-specific instructions; a
> lot can be done in the mapping of relations to instructions which live
> entirely within black box of the vendor code.

Vendor instructions are crucial to allow the vendor to interact with their 
database’s custom optimizations.

V().has(‘name’,’marko’) 
  => jg:v-index(‘name’,’marko’)
outE(‘knows’).has(‘stars’,gt(3)).inV()
  => jg:vcentric-index-out(‘knows’,’stars’,gt(3))

However, I would like to understand (via examples) what you are talking about 
as that sounds super interesting!


> Back to indexes. IMO there should be a vendor-neutral API. Even extremely
> vendor-specific indexes like geotemporal indexes could be exposed through a
> common API, e.g.
> 
>    select("Dropoffs", {lat:37.7740, lon:122.4149, time:1556899302149})
> 
> which resolves to a vendor-specific index.

I really don’t think so. There are too many variations to indexing. TP doesn’t 
need to go down that rats nest. That is what vendor-specific 
strategies/instructions are for — let them decide how to fold has().has().has() 
into a single index lookup. We see everything as linear scans.

> I actually like your term "GMachine", and I don't think it's a bad idea to
> keep "graph" front and center. Yes, TP4 shall have the flexibility to
> interoperate with a variety of non-graph databases, but what it adds is a
> unifying graph abstraction.

I do like GMachine too. But, I think TP4 VM is best for now.

I don’t think graph should be front-and-center. Graph is just another data 
model much like RDF, Document, Relational, etc. In fact, “graph” will have 
numerous flavors:

        Graph w/ multi-properties, meta-properties, vertex multi-labels, …
                - All captured in the pg/ interfaces. How exactly, not sure.


Awesome stuff. Excited to receive your response.

Marko.

http://rredux.com <http://rredux.com/>

Re: The Fundamental Structure Instructions Already Exist! (w/ RDBMS Example)

Reply via email to