Whoa.
Check out this trippy trick.
First, here is how you define a pointer to a map-tuple.
*{k1?v1, k2?v2, …, kn?vn}
* says “this is a pointer to a map" { }
? is some comparator like =, >, <, !=, contains(), etc.
Assume the vertex map tuple v[1]:
{#id:1, #label:person, name:marko, age:29}
Now, we can add the following fields:
1. #outE:*{#outV=*{#id=1}} // references all tuples that have an outV field
that is a pointer to the the v[1] vertex tuple.
2. #outE.knows:*{#outV=*{#id=1},#label=knows} // references all outgoing
knows-edges.
3. #outE.knows.weight_gt_85:*{#outV=*{#id=1},#label=knows,weight>0.85} //
references all strong outgoing knows-edges
By using different types of pointers, a graph database provider can make
explicit their internal structure. Assume all three fields above are in the
v[1] vertex tuple. This means that:
1. all of v[1]’s outgoing edges are group together. <— linear scan
2. all of v[1]’s outgoing knows-edges are group together. <— indexed by
label
3. all of v[1]’s strong outgoing knows-edges are group together <—
indexed by label and weight
Thus, a graph database provider can describe the way in which it internally
organizes adjacent edges — i.e. vertex-centric indices! This means then that
TP4 can do vertex-centric index optimizations automatically for providers!
1. values(“#outE”).hasLabel(‘knows’).has(‘weight’,gt(0.85)) // grab all
edges, then filter on label, then filter on weight.
2. values(“#outE.knows”).has(‘weight’,gt(0.85)) // grab all
knows-edges, then filter on weight.
3. values(“#outE.knows.weight_gt_85”) // grab all strong knows-edges.
*** Realize that Gremlin outE() will just compile to bytecode values(“#outE”).
Freakin’ crazy! … Josh was interested in using the n-tuple structure to
describe indices. I was against it. I believe I still am. However, this is
pretty neat. As Josh was saying though, without a rich enough n-tuple
description of the underlying database, there should be no reason for providers
to have to write custom strategies and instructions ?!?!?!?!? crazy!?
Marko.
http://rredux.com <http://rredux.com/>
> On May 7, 2019, at 4:44 AM, Marko Rodriguez <[email protected]> wrote:
>
> Hey Josh,
>
>> I think of your Pointer<T> as a reference to an entity. It does not contain
>> the entity it refers to, but it contains the primary key of that entity.
>
> Exactly! I was just thinking that last night. Tuples don’t need a separate ID
> system. No -- pointers reference the primary key of a tuple! Better yet
> perhaps, they can reference one-to-many. For instance:
>
> { id:1, label:person, name:marko, age:29, outE:*(outV=id) }
>
> Thus, a pointer is defined by a pattern match. Haven’t thought through the
> consequences, but … :)
>
>> Here, I have invented an Entity class to indicate that the pointer resolves
>> to a vertex (an entity without a tuple, or rather with a 0-tuple -- the
>> unit element).
>
> Ah — the 0-tuple. Neat thought.
>
> I look forward to your slides from the Knowledge Graph Conference. If I
> wasn’t such a reclusive hermit, I would have loved to have joined you there.
>
> Take care,
> Marko.
>
> http://rredux.com <http://rredux.com/>
>
>
>> On Mon, May 6, 2019 at 9:38 PM Marko Rodriguez <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>>> Hey Josh,
>>>
>>>> I am feeling the tuples... as long as they can be typed, e.g.
>>>>
>>>> <V> myTuple.get(Integer) -- int-indexed tuples
>>>> <V> myTuple.get(String) -- string-indexed tuples
>>>> In most programming languages, "tuples" are not lists, though they are
>>> typed by a list of element types. E.g. in Haskell you might have a tuple
>>> with the type
>>>> (Double, Double, Bool)
>>>
>>>
>>> Yes, we have Pair<A,B>, Triple<A,B,C>, Quadruple<A,B,C,D>, etc. However
>>> for base Tuple<A> of unknown length, the best I can do in Java is <A>. :|
>>> You can see my stubs in the gist:
>>> https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8
>>> <https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8> <
>>> https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8
>>> <https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8>> (LINES
>>> #21-42)
>>>
>>>> If this is in line with your proposal, then we agree that tuples should
>>> be the atomic unit of data in TP4.
>>>
>>> Yep. Vertices, Edges, Rows, Documents, etc. are all just tuples. However,
>>> I suspect that we will disagree on some of my tweaks. Thus, I’d really like
>>> to get your feedback on:
>>>
>>> 1. pointers (tuple entries referencing tuples).
>>> 2. sequences (multi-value tuple entries).
>>> 3. # hidden map keys :|
>>> - sorta ghetto.
>>>
>>> Also, I’m still not happy with db().has().has().as(‘x’).db().where()… its
>>> an intense syntax and its hard to strategize.
>>>
>>> I really want to nail down this “universal model” (tuple structure and
>>> tuple-oriented instructions) as then I can get back on the codebase and
>>> start to flush this stuff out with confidence.
>>>
>>> See ya,
>>> Marko.
>>>
>>> http://rredux.com <http://rredux.com/> <http://rredux.com/
>>> <http://rredux.com/>>
>>>
>>>
>>>>
>>>> Josh
>>>>
>>>>
>>>> On Mon, May 6, 2019 at 5:34 PM Marko Rodriguez <[email protected]
>>>> <mailto:[email protected]>
>>> <mailto:[email protected] <mailto:[email protected]>>> wrote:
>>>> Hi,
>>>>
>>>> I spent this afternoon playing with n-tuples, pointers, data model
>>> interfaces, and bytecode instructions.
>>>>
>>>> https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8
>>>> <https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8> <
>>> https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8
>>> <https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8>> <
>>> https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8
>>> <https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8> <
>>> https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8
>>> <https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8>>>
>>>>
>>>> *** Kuppitz: They are tuples :). A Map<K,V> extends Tuple<Pair<K,V>>.
>>> Tada!
>>>>
>>>> What I like about this is that it combines the best of both worlds
>>> (Josh+Marko).
>>>> * just flat tuples of arbitrary length.
>>>> * pattern matching for arbitrary joins. (k1=k2 AND k3=k4
>>> …)
>>>> * pointers chasing for direct links. (edges, foreign
>>> keys, document _id references, URI resolutions, …)
>>>> * sequences are a special type of tuple used for multi-valued
>>> entries.
>>>> * has()/values()/etc. work on all tuple types! (maps, lists,
>>> tuples, vertices, edges, rows, statements, documents, etc.)
>>>>
>>>> Thoughts?,
>>>> Marko.
>>>>
>>>> http://rredux.com <http://rredux.com/> <http://rredux.com/
>>>> <http://rredux.com/>> <http://rredux.com/ <http://rredux.com/> <
>>> http://rredux.com/ <http://rredux.com/>>>
>