Re: N-Tuples, Pointers, Data Model Interfaces, and Bytecode Instructions

Marko Rodriguez Tue, 07 May 2019 06:26:59 -0700

Whoa.

Check out this trippy trick.


First, here is how you define a pointer to a map-tuple.

        *{k1?v1, k2?v2, …, kn?vn}
                * says “this is a pointer to a map" { }
                ? is some comparator like =, >, <, !=, contains(), etc.

Assume the vertex map tuple v[1]:

{#id:1, #label:person, name:marko, age:29} 

Now, we can add the following fields:

1. #outE:*{#outV=*{#id=1}}  // references all tuples that have an outV field 
that is a pointer to the the v[1] vertex tuple.
2. #outE.knows:*{#outV=*{#id=1},#label=knows} // references all outgoing 
knows-edges.
3. #outE.knows.weight_gt_85:*{#outV=*{#id=1},#label=knows,weight>0.85} // 
references all strong outgoing knows-edges

By using different types of pointers, a graph database provider can make 
explicit their internal structure. Assume all three fields above are in the 
v[1] vertex tuple. This means that:

        1. all of v[1]’s outgoing edges are group together. <— linear scan
        2. all of v[1]’s outgoing knows-edges are group together. <— indexed by 
label
        3. all of v[1]’s strong outgoing knows-edges are group together <— 
indexed by label and weight

Thus, a graph database provider can describe the way in which it internally 
organizes adjacent edges — i.e. vertex-centric indices! This means then that 
TP4 can do vertex-centric index optimizations automatically for providers!

        1. values(“#outE”).hasLabel(‘knows’).has(‘weight’,gt(0.85)) // grab all 
edges, then filter on label, then filter on weight.
        2. values(“#outE.knows”).has(‘weight’,gt(0.85)) // grab all 
knows-edges, then filter on weight.
        3. values(“#outE.knows.weight_gt_85”) // grab all strong knows-edges.

*** Realize that Gremlin outE() will just compile to bytecode values(“#outE”).

Freakin’ crazy! … Josh was interested in using the n-tuple structure to 
describe indices. I was against it. I believe I still am. However, this is 
pretty neat. As Josh was saying though, without a rich enough n-tuple 
description of the underlying database, there should be no reason for providers 
to have to write custom strategies and instructions ?!?!?!?!? crazy!?

Marko.

http://rredux.com <http://rredux.com/>




> On May 7, 2019, at 4:44 AM, Marko Rodriguez <[email protected]> wrote:
> 
> Hey Josh,
> 
>> I think of your Pointer<T> as a reference to an entity. It does not contain
>> the entity it refers to, but it contains the primary key of that entity.
> 
> Exactly! I was just thinking that last night. Tuples don’t need a separate ID 
> system. No -- pointers reference the primary key of a tuple! Better yet 
> perhaps, they can reference one-to-many. For instance:
> 
> { id:1, label:person, name:marko, age:29, outE:*(outV=id) }
> 
> Thus, a pointer is defined by a pattern match. Haven’t thought through the 
> consequences, but … :)
> 
>> Here, I have invented an Entity class to indicate that the pointer resolves
>> to a vertex (an entity without a tuple, or rather with a 0-tuple -- the
>> unit element).
> 
> Ah — the 0-tuple. Neat thought.
> 
> I look forward to your slides from the Knowledge Graph Conference. If I 
> wasn’t such a reclusive hermit, I would have loved to have joined you there.
> 
> Take care,
> Marko.
> 
> http://rredux.com <http://rredux.com/>
> 
> 
>> On Mon, May 6, 2019 at 9:38 PM Marko Rodriguez <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>>> Hey Josh,
>>> 
>>>> I am feeling the tuples... as long as they can be typed, e.g.
>>>> 
>>>>    <V> myTuple.get(Integer) -- int-indexed tuples
>>>>    <V> myTuple.get(String) -- string-indexed tuples
>>>> In most programming languages, "tuples" are not lists, though they are
>>> typed by a list of element types. E.g. in Haskell you might have a tuple
>>> with the type
>>>>    (Double, Double, Bool)
>>> 
>>> 
>>> Yes, we have Pair<A,B>, Triple<A,B,C>, Quadruple<A,B,C,D>, etc. However
>>> for base Tuple<A> of unknown length, the best I can do in Java is <A>. :|
>>> You can see my stubs in the gist:
>>>        https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8 
>>> <https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8> <
>>> https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8 
>>> <https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8>> (LINES
>>> #21-42)
>>> 
>>>> If this is in line with your proposal, then we agree that tuples should
>>> be the atomic unit of data in TP4.
>>> 
>>> Yep. Vertices, Edges, Rows, Documents, etc. are all just tuples. However,
>>> I suspect that we will disagree on some of my tweaks. Thus, I’d really like
>>> to get your feedback on:
>>> 
>>>        1. pointers (tuple entries referencing tuples).
>>>        2. sequences (multi-value tuple entries).
>>>        3. # hidden map keys :|
>>>                - sorta ghetto.
>>> 
>>> Also, I’m still not happy with db().has().has().as(‘x’).db().where()… its
>>> an intense syntax and its hard to strategize.
>>> 
>>> I really want to nail down this “universal model” (tuple structure and
>>> tuple-oriented instructions) as then I can get back on the codebase and
>>> start to flush this stuff out with confidence.
>>> 
>>> See ya,
>>> Marko.
>>> 
>>> http://rredux.com <http://rredux.com/> <http://rredux.com/ 
>>> <http://rredux.com/>>
>>> 
>>> 
>>>> 
>>>> Josh
>>>> 
>>>> 
>>>> On Mon, May 6, 2019 at 5:34 PM Marko Rodriguez <[email protected] 
>>>> <mailto:[email protected]>
>>> <mailto:[email protected] <mailto:[email protected]>>> wrote:
>>>> Hi,
>>>> 
>>>> I spent this afternoon playing with n-tuples, pointers, data model
>>> interfaces, and bytecode instructions.
>>>> 
>>>>        https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8 
>>>> <https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8> <
>>> https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8 
>>> <https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8>> <
>>> https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8 
>>> <https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8> <
>>> https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8 
>>> <https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8>>>
>>>> 
>>>> *** Kuppitz: They are tuples :). A Map<K,V> extends Tuple<Pair<K,V>>.
>>> Tada!
>>>> 
>>>> What I like about this is that it combines the best of both worlds
>>> (Josh+Marko).
>>>>        * just flat tuples of arbitrary length.
>>>>                * pattern matching for arbitrary joins. (k1=k2 AND k3=k4
>>> …)
>>>>                * pointers chasing for direct links. (edges, foreign
>>> keys, document _id references, URI resolutions, …)
>>>>        * sequences are a special type of tuple used for multi-valued
>>> entries.
>>>>        * has()/values()/etc. work on all tuple types! (maps, lists,
>>> tuples, vertices, edges, rows, statements, documents, etc.)
>>>> 
>>>> Thoughts?,
>>>> Marko.
>>>> 
>>>> http://rredux.com <http://rredux.com/> <http://rredux.com/ 
>>>> <http://rredux.com/>> <http://rredux.com/ <http://rredux.com/> <
>>> http://rredux.com/ <http://rredux.com/>>>
>

Re: N-Tuples, Pointers, Data Model Interfaces, and Bytecode Instructions

Reply via email to