Re: The Fundamental Structure Instructions Already Exist! (w/ RDBMS Example)

Joshua Shinavier Sat, 11 May 2019 06:59:45 -0700

OK, beginning at the beginning.

On Mon, May 6, 2019 at 3:58 AM Marko Rodriguez <[email protected]> wrote:

> Hey Josh,
>
>
> > One more thing is needed: disjoint unions. I described these in my email
> on
> > algebraic property graphs. They are the "plus" operator to complement the
> > "times" operator in our type algebra. A disjoint union type is just like
> a
> > tuple type, but instead of having values for field a AND field b AND
> field
> > c, an instance of a union type has a value for field a XOR field b XOR
> > field c. Let me know if you are not completely sold on union types, and I
> > will provide additional motivation.
>
> Huh. That is an interesting concept. Can you please provide examples?
>

Yes. If you think back to your elementary school algebra, you will recall
four basic associative operations: addition, multiplication, subtraction,
and division. Simple stuff, but let's make things even simpler by throwing
out inverses. So we have: addition and multiplication. You also need unit
elements 0 and 1 which have the usual properties. This structure is called
a semiring <https://en.wikipedia.org/wiki/Semiring>, and with it, you can
build up a rich type system, and allows you to reason on equations of
types. Multiplication represents the concatenation of tuples -- a × b × c
is a type that has a AND b and c -- whereas addition represents a choice -- a +
b + c is a type that has a XOR b XOR c.

Examples of multiplication are edges (e.g. a knows edge type is the product
of Person and Person; the out-vertex is a person, and the in-vertex is a
person) and properties (e.g. age is a product of Person and the primitive
integer type). For example, you could express the knows type as Person
× Person or as prod{out=Person, in=Person} if you want to give names to the
components of tuples (fields).

Examples of addition are in- or out-types which are a disjunction of other
types. For example, in the TInkerPop classic graph, the name property can
attach to either a Person or a Project, so the type is (Person +
Project) × string, or prod{out=sum{person=Person, project=Project},
in=string} if you want field names.

Just as the teacher made you do at the blackboard, you can distribute
multiplication over a sum, so

(Person + Project) × string = (Person × string) + (Project × string)

 In other words, a name property which can attach either to a person or
project is equivalent to two distinct properties, maybe call them personName
and projectName, which each attach to only one type of vertex.

Other fun things you can build with unions include lists, trees, and other
recursive data structures. How do you formalize a "list of people" as a
type? Well, you can think of it in this way:

ListOfPeople = () + (Person) + (Person × Person) + (Person × Person ×
Person) + ...

In other words, a list of people can be either the additive unit (0-tuple),
a single person, a pair of people, a triplet of people... an n-tuple of
people for any n >= 0. You could also write:

ListOfPeople = () + (Person × ListOfPeople)

Products let you concatenate types and tuples to build larger types and
tuples; sums enable choices and pattern matching.

> One thing I want to stress. The “universal bytecode” is just standard
> [op,arg*]* bytecode save that data access is via the “universal model's"
> db() instruction. Thus, AND/OR/pattern matching/etc. is all available.
> Likewise union(), repeat(), coalesce(), choose(), etc. are all available.
>
> db().and(as('a').values('knows').as('b'),
>          or(as('a').has('name','marko'),
>             as('a').values(‘created').count().is(gt(1))),
>          as('b').values(’created').as('c')).
>      path(‘c')
>

No disagreement. This is essentially functional pattern matching as
motivated above, though it includes a condition we wouldn't include in the
type system itself: the "created" count.

> As you can see, and()/or() pattern matching is possible and can be nested.
>   *** SIDENOTE: In TP3, such nested and()/or() pattern matching is
> expressed using match() where the root grouping is assumed to be and()’d
> together.
>

Yep.

>   *** SIDENOTE: In TP4, I want to get rid of an explicit match() bytecode
> instruction and replace it with and()/or() instructions with prefix/suffix
> as()s.
>

Hmm. I think the match() syntax is useful, even if you can build match()
expressions out of and() and or(). Or maybe we just point users to
OpenCypher if they want conjunctive query patterns. Jeremy Hanna and I
chatted about this at the conference earlier this week... it is really just
a matter of providing the best syntactic sugar. You CAN do everything that
match() or OpenCypher can do in Gremlin, but this is not to say you always
SHOULD.

>    [...]

> Or other tuples, or tagged values. E.g. any edge projects to two vertices,
> > which are (trivial) tuples as opposed to primitive values.
>
> Good point. I started to do some modeling and I’ve been getting some good
> mileage from a new “pointer” primitive. Assume every N-Tuple has a unique ID

Minor point: it's also OK to have tuples with no id; not everything needs
to be an entity.

> (outside the data models id space). If so, the TinkerPop toy graph as
> N-Tuples is:
>
> [0][id:1,name:marko,age:29,created:*1,knows:*2]
> [1][0:*3]
> [2][0:*4,1:*5]
> [3][id:3,name:lop,lang:java]
> [4][id:2,name:vadas,age:27]
> [5][id:4,name:josh,age:32,created*:…]
>

I don't think we are quite aligned on what belongs in a tuple (which you
note below). I would rewrite these as:

{id=1, label=person}
{id=4, label=person}
{id=8, label=knows, out=1, in=4}
{id=13, label=name, out=1, in="marko"}

etc., where just for readability, we are putting id and label in the same
namespace as the fields of the tuple.

> I know you are thinking that vertices don’t have “outE” projections so
> this isn’t inline with your thinking.

True...

> However, check this out. If we assume that pointers are automatically
> dereferenced on reference then:
>
> db().has(‘name’,’marko’).values(‘knows’).values(‘name’) => vadas, josh
>

This is more compact than something like:

  value("marko").select("name", "in").project("out").select("knows",
"out").project("in").select("name", "out").project("in")

 but that is what I see as the most low-level bytecode for your expression.
I see your has() as sugar.

Pointers are useful when a tuple has another tuple as a value. Instead of
> nesting, you “blank node.” DocumentDBs (with nested list/maps) would use
> this extensively.
>

I agree, although in the case of lists/maps, again you don't always need a
"pointer" because the value doesn't always need to be an entity. You are
actually making explicit here that all tuples are to be treated as
entities, which is OK -- RDF does it -- but IMO not necessary. If you just
want a list of 100 people, you don't care whether the list notes have a
unique identity in the graph; in fact, it may be more efficient not to give
them any.

> > Grumble... db() is just an alias for select()... grumble…
>
> select() and project() are existing instructions in TP3 (TP4?).
>

Kinda.

> Like indices, I don’t think we should introduce types. But this is up for
> further discussion...
>

Let's keep discussing. With types comes a lot of opportunity for static
analysis and optimization, not to mention functional pattern matching.

> > Which is to say that we define the out-type of "name" to be the disjoint
> > union of all element types. The type becomes trivial. However, we can
> also
> > be more selective if we want to, restricting "name" only to a small
> subset
> > of types.
>
> Hm… I’m listening. I’m running into problems in my modeling when trying to
> generically fit things into relational tables. Maybe typing is necessary :(.
>

Types are your friend, and union types let you keep your friend at arm's
distance if you want to.

> [...]
> > I think we will see steps like V() and R() in Gremlin, but do not need
> them
> > in bytecode. Again, db() is just select(), V() is just select(), etc. The
> > model-specific interfaces adapt V() to select() etc.
>
> Hm. See my points above. Having providers reason at the “universal
> model”-level seems intense. ?
>

I have warmed up to the idea of provider (model-) specific instructions,
which compile down to universal bytecode when providers are done with their
optimizations.

>
>
> >    select(foafPerson)
> >
> > The second expression becomes:
> >
> >    value("marko").select(foafName, "in").project("out”)
> > ...which you can rewrite with has(); I just think the above is clear
> w.r.t.
> > low-level operations. The value() is just providing a start of "marko",
> > which is a string value. No need for xsd:string if we have a deep mapping
> > between RDF and APG.
>
>
> Hm… I see your type “slots” model and fear the global typing in a
> (potentially) schemaless world. For me, everything should be standard
> has()/values() TP bytecode off of an “get all” db()…  ? However, I’m open
> to seeing examples that demonstrate easier reasoning.
>

I agree that we want loosely-typed steps as well, but this is where "sugar"
like has() comes in. E.g. my

value("marko").select("name", "in").project("out")

or your

db().has("name", "marko")

both hide the fact that "name" is a property type which allows either a
person or a project as its out-vertex. However, if we do have that type
defined, then we can do some inference. E.g. we know that the result of
this traversal is a collection of either people or projects, and now we can
do smarter things if the traversal is composed together with other
traversals. The type system should be opt-in.
\

> [...]
> Yes. That is the whole point of this rabbit hole!
>         * any query language -> universal model -> any data model.
>

+1

> Vendor instructions are crucial to allow the vendor to interact with their
> database’s custom optimizations.
>

I'm sold.

> [...]
> I really don’t think so. There are too many variations to indexing.

Ah, but I would claim that all of them can be boiled down to algebraic data
types. Give me an example of an index, and I will give you the type. For
example, consider a geospatial index that resolves a (lat, lon) point to a
collection of nearby Restaurant vertices: (double × double × Restaurant),
where we resolve keys to values in the index from left to right. Give me a
double, and I will give you a (double × Restaurant) index. Give me another
double, and I will give you restaurants. The ordering of elements returned
by the index is not part of the type system; it is up to the vendor.

How about a vertex-centric index of Trips on drop-off time? That's even
simpler: (long × Trip). Again, give me a long, and I will give you a
collection of trips.

Let me know if/when I have convinced you that this not a "rat's nest" and
provides further opportunities for optimization.

> [...]
> I do like GMachine too. But, I think TP4 VM is best for now.
>

Sure. I thought you were suggesting a new name "Database Virtual Machine",
which would put us in congested air space.

I don’t think graph should be front-and-center. Graph is just another data
> model much like RDF, Document, Relational, etc.

Yes, but instead of saying "we are not only graphs", you can say "these
other things can also be treated as graphs", which is more exciting to me,
because I like graphs.

> In fact, “graph” will have numerous flavors:
>
>         Graph w/ multi-properties, meta-properties, vertex multi-labels, …
>                 - All captured in the pg/ interfaces. How exactly, not
> sure.
>

Agreed.

> Awesome stuff. Excited to receive your response.
>

Might not get through all of your emails today, but I will get to them. If
I were to starting implementing the type system, would you be open to Scala
as an implementation language?

Josh

>
> Marko.
>
> http://rredux.com <http://rredux.com/>
>
>

Re: The Fundamental Structure Instructions Already Exist! (w/ RDBMS Example)

Reply via email to