Re: N-Tuples, Pointers, Data Model Interfaces, and Bytecode Instructions

Marko Rodriguez Tue, 07 May 2019 09:21:19 -0700

I’m in the pocket so I’m just going to riff…

The concept of a pointer as I have defined it is a "0-argument function.” That 
is, we currently have Pointer as:


Pointer<E> implements Supplier<Iterator<E extends Tuple>>

However, pointers can be n-arg functions!

Pointer<S,E> implements Function<List<S>, Iterator<E extends Tuple>>

If so, providers can support dynamic tuple fields.

{#id:1, #label:person, name:marko, age:29, #outE_by_label:*{#outV=*{#id:1}, 
#label=@[0]}}

I’m using @[0] to denote the first argument passed into the pointer.

What this tuple is saying is that the provider is able to do a direct lookup of 
all incident edges based on edge label. Thus, as TP4, we can optimize:

values(‘#outE’).has(‘#label’,‘knows’)
        == to ==>
values(‘#outE_by_label’).by(‘knows’)

Lets say a user is doing outE(‘knows’,’likes’). With #outE_by_label we can only 
reference one edge sequence as we only use one argument. Thus, the compiler 
would have to write out the following bytecode:

union(values(‘#outE_by_label’).by(‘knows’), 
values(‘#outE_by_label’).by(‘likes’))

However, lets say another provider has this field on their vertex-tuples:

{#id:1, #label:person, name:marko, #outE_by_labels:*{#outV=*{#id:1}, #label 
within @}}

This tuple is saying that it can get a single sequence from a pointer that 
references an arbitrary number of incident edges by label, where “within” is a 
predicate like =. Thus, TP4 can compile outE(‘knows’,’likes’) to the following 
bytecode:

values(‘#outE_by_labels’).by(‘knows’).by(‘likes’)

*** I’m starting to see #fields as bytecode! …………….. foggy vision right now.

——————

I’m starting to think that there are no graph interfaces, relational 
interfaces, document interfaces, etc. No — instead, everything is always in 
terms of tuples!

        1. Database providers communicate with TP4 by giving it tuples.
                - these tuples “abstractly” represent objects like vertices, 
edges, rows, documents, etc.
        2. TP4 communicates with database providers by either 
                (1) joining tuple sequences to create a new tuple sequence (for 
relational-style data) or 
                (2) jumping to a tuple sequence via #fields (for pointer-style 
data).

Thats it. Thats all there is to the game.

// Gremlin
g.V().out(‘knows’).values(’name’)

// TP4 Bytecode
        == minimum tuple-field requirement to call yourself a graphdb ==>
db().values(‘#V’).values(‘#outE’).has(‘#label’,’knows’).values(‘#inV’).values(‘name’)
        == edge-label vertex-centric index optimization ==>
db().values(‘#V’).values(‘#out_label’).by(‘knows’).values(‘name’)
        == denormalized adjacent vertex property data for read-heavy systems ==>
db().values('#V’).values(‘#out_label_key’).by(‘knows’).by(‘name’)

So what makes a graphdb? The tuple types and their respective fields.
        - you have to have #type=vertex and #type=edge (you can have other 
types in there too!, but the graphdb instruction set only cares about vertices 
and edges).
        - you have to have #id and #label
        - you have to have #outE and #inE
        - you have to have #outV and #inV (that reference a single #type=vertex)
If your n-tuple representation has all that, then you are a “graphdb” and 
Gremlin will be able to process you.

Now check this mind-numbing entailment. Your n-tuple representation could have 
other tuple #types. For instance, #type=row.

{#type:row, #table:yearly_expenses, person:*{#type:vertex,#id:1}, january:$10, 
february:$12, march:$54, april:…. }

I have a graphdb mixed with a relationaldb all within the same db! You could 
effectively encode this in Cassandra (same keyspace) or an RDBMS (same 
database). Watch and learn:

        // what are the names of the friends of the person that spent the least 
last year?
        sql("SELECT person FROM yearly_expenses WHERE 
MIN(january+february+…)”).out(‘knows’).values(‘name’)

We aren’t talking “multi-model” …. no my friend, we are talking “hybrid-model.”

Its all just tuples and pointers.

Outz,
Marko.

http://rredux.com <http://rredux.com/>




> On May 7, 2019, at 7:26 AM, Marko Rodriguez <[email protected]> wrote:
> 
> Whoa.
> 
> Check out this trippy trick.
> 
> First, here is how you define a pointer to a map-tuple.
> 
>       *{k1?v1, k2?v2, …, kn?vn}
>               * says “this is a pointer to a map" { }
>               ? is some comparator like =, >, <, !=, contains(), etc.
> 
> Assume the vertex map tuple v[1]:
> 
> {#id:1, #label:person, name:marko, age:29} 
> 
> Now, we can add the following fields:
> 
> 1. #outE:*{#outV=*{#id=1}}  // references all tuples that have an outV field 
> that is a pointer to the the v[1] vertex tuple.
> 2. #outE.knows:*{#outV=*{#id=1},#label=knows} // references all outgoing 
> knows-edges.
> 3. #outE.knows.weight_gt_85:*{#outV=*{#id=1},#label=knows,weight>0.85} // 
> references all strong outgoing knows-edges
> 
> By using different types of pointers, a graph database provider can make 
> explicit their internal structure. Assume all three fields above are in the 
> v[1] vertex tuple. This means that:
> 
>       1. all of v[1]’s outgoing edges are group together. <— linear scan
>       2. all of v[1]’s outgoing knows-edges are group together. <— indexed by 
> label
>       3. all of v[1]’s strong outgoing knows-edges are group together <— 
> indexed by label and weight
> 
> Thus, a graph database provider can describe the way in which it internally 
> organizes adjacent edges — i.e. vertex-centric indices! This means then that 
> TP4 can do vertex-centric index optimizations automatically for providers!
> 
>       1. values(“#outE”).hasLabel(‘knows’).has(‘weight’,gt(0.85)) // grab all 
> edges, then filter on label, then filter on weight.
>       2. values(“#outE.knows”).has(‘weight’,gt(0.85)) // grab all 
> knows-edges, then filter on weight.
>       3. values(“#outE.knows.weight_gt_85”) // grab all strong knows-edges.
> 
> *** Realize that Gremlin outE() will just compile to bytecode values(“#outE”).
> 
> Freakin’ crazy! … Josh was interested in using the n-tuple structure to 
> describe indices. I was against it. I believe I still am. However, this is 
> pretty neat. As Josh was saying though, without a rich enough n-tuple 
> description of the underlying database, there should be no reason for 
> providers to have to write custom strategies and instructions ?!?!?!?!? 
> crazy!?
> 
> Marko.
> 
> http://rredux.com <http://rredux.com/>
> 
> 
> 
> 
>> On May 7, 2019, at 4:44 AM, Marko Rodriguez <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Hey Josh,
>> 
>>> I think of your Pointer<T> as a reference to an entity. It does not contain
>>> the entity it refers to, but it contains the primary key of that entity.
>> 
>> Exactly! I was just thinking that last night. Tuples don’t need a separate 
>> ID system. No -- pointers reference the primary key of a tuple! Better yet 
>> perhaps, they can reference one-to-many. For instance:
>> 
>> { id:1, label:person, name:marko, age:29, outE:*(outV=id) }
>> 
>> Thus, a pointer is defined by a pattern match. Haven’t thought through the 
>> consequences, but … :)
>> 
>>> Here, I have invented an Entity class to indicate that the pointer resolves
>>> to a vertex (an entity without a tuple, or rather with a 0-tuple -- the
>>> unit element).
>> 
>> Ah — the 0-tuple. Neat thought.
>> 
>> I look forward to your slides from the Knowledge Graph Conference. If I 
>> wasn’t such a reclusive hermit, I would have loved to have joined you there.
>> 
>> Take care,
>> Marko.
>> 
>> http://rredux.com <http://rredux.com/>
>> 
>> 
>>> On Mon, May 6, 2019 at 9:38 PM Marko Rodriguez <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>>> Hey Josh,
>>>> 
>>>>> I am feeling the tuples... as long as they can be typed, e.g.
>>>>> 
>>>>>    <V> myTuple.get(Integer) -- int-indexed tuples
>>>>>    <V> myTuple.get(String) -- string-indexed tuples
>>>>> In most programming languages, "tuples" are not lists, though they are
>>>> typed by a list of element types. E.g. in Haskell you might have a tuple
>>>> with the type
>>>>>    (Double, Double, Bool)
>>>> 
>>>> 
>>>> Yes, we have Pair<A,B>, Triple<A,B,C>, Quadruple<A,B,C,D>, etc. However
>>>> for base Tuple<A> of unknown length, the best I can do in Java is <A>. :|
>>>> You can see my stubs in the gist:
>>>>        https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8 
>>>> <https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8> <
>>>> https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8 
>>>> <https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8>> (LINES
>>>> #21-42)
>>>> 
>>>>> If this is in line with your proposal, then we agree that tuples should
>>>> be the atomic unit of data in TP4.
>>>> 
>>>> Yep. Vertices, Edges, Rows, Documents, etc. are all just tuples. However,
>>>> I suspect that we will disagree on some of my tweaks. Thus, I’d really like
>>>> to get your feedback on:
>>>> 
>>>>        1. pointers (tuple entries referencing tuples).
>>>>        2. sequences (multi-value tuple entries).
>>>>        3. # hidden map keys :|
>>>>                - sorta ghetto.
>>>> 
>>>> Also, I’m still not happy with db().has().has().as(‘x’).db().where()… its
>>>> an intense syntax and its hard to strategize.
>>>> 
>>>> I really want to nail down this “universal model” (tuple structure and
>>>> tuple-oriented instructions) as then I can get back on the codebase and
>>>> start to flush this stuff out with confidence.
>>>> 
>>>> See ya,
>>>> Marko.
>>>> 
>>>> http://rredux.com <http://rredux.com/> <http://rredux.com/ 
>>>> <http://rredux.com/>>
>>>> 
>>>> 
>>>>> 
>>>>> Josh
>>>>> 
>>>>> 
>>>>> On Mon, May 6, 2019 at 5:34 PM Marko Rodriguez <[email protected] 
>>>>> <mailto:[email protected]>
>>>> <mailto:[email protected] <mailto:[email protected]>>> wrote:
>>>>> Hi,
>>>>> 
>>>>> I spent this afternoon playing with n-tuples, pointers, data model
>>>> interfaces, and bytecode instructions.
>>>>> 
>>>>>        https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8 
>>>>> <https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8> <
>>>> https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8 
>>>> <https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8>> <
>>>> https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8 
>>>> <https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8> <
>>>> https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8 
>>>> <https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8>>>
>>>>> 
>>>>> *** Kuppitz: They are tuples :). A Map<K,V> extends Tuple<Pair<K,V>>.
>>>> Tada!
>>>>> 
>>>>> What I like about this is that it combines the best of both worlds
>>>> (Josh+Marko).
>>>>>        * just flat tuples of arbitrary length.
>>>>>                * pattern matching for arbitrary joins. (k1=k2 AND k3=k4
>>>> …)
>>>>>                * pointers chasing for direct links. (edges, foreign
>>>> keys, document _id references, URI resolutions, …)
>>>>>        * sequences are a special type of tuple used for multi-valued
>>>> entries.
>>>>>        * has()/values()/etc. work on all tuple types! (maps, lists,
>>>> tuples, vertices, edges, rows, statements, documents, etc.)
>>>>> 
>>>>> Thoughts?,
>>>>> Marko.
>>>>> 
>>>>> http://rredux.com <http://rredux.com/> <http://rredux.com/ 
>>>>> <http://rredux.com/>> <http://rredux.com/ <http://rredux.com/> <
>>>> http://rredux.com/ <http://rredux.com/>>>
>> 
>

Re: N-Tuples, Pointers, Data Model Interfaces, and Bytecode Instructions

Reply via email to