I’m in the pocket so I’m just going to riff…
The concept of a pointer as I have defined it is a "0-argument function.” That
is, we currently have Pointer as:
Pointer<E> implements Supplier<Iterator<E extends Tuple>>
However, pointers can be n-arg functions!
Pointer<S,E> implements Function<List<S>, Iterator<E extends Tuple>>
If so, providers can support dynamic tuple fields.
{#id:1, #label:person, name:marko, age:29, #outE_by_label:*{#outV=*{#id:1},
#label=@[0]}}
I’m using @[0] to denote the first argument passed into the pointer.
What this tuple is saying is that the provider is able to do a direct lookup of
all incident edges based on edge label. Thus, as TP4, we can optimize:
values(‘#outE’).has(‘#label’,‘knows’)
== to ==>
values(‘#outE_by_label’).by(‘knows’)
Lets say a user is doing outE(‘knows’,’likes’). With #outE_by_label we can only
reference one edge sequence as we only use one argument. Thus, the compiler
would have to write out the following bytecode:
union(values(‘#outE_by_label’).by(‘knows’),
values(‘#outE_by_label’).by(‘likes’))
However, lets say another provider has this field on their vertex-tuples:
{#id:1, #label:person, name:marko, #outE_by_labels:*{#outV=*{#id:1}, #label
within @}}
This tuple is saying that it can get a single sequence from a pointer that
references an arbitrary number of incident edges by label, where “within” is a
predicate like =. Thus, TP4 can compile outE(‘knows’,’likes’) to the following
bytecode:
values(‘#outE_by_labels’).by(‘knows’).by(‘likes’)
*** I’m starting to see #fields as bytecode! …………….. foggy vision right now.
——————
I’m starting to think that there are no graph interfaces, relational
interfaces, document interfaces, etc. No — instead, everything is always in
terms of tuples!
1. Database providers communicate with TP4 by giving it tuples.
- these tuples “abstractly” represent objects like vertices,
edges, rows, documents, etc.
2. TP4 communicates with database providers by either
(1) joining tuple sequences to create a new tuple sequence (for
relational-style data) or
(2) jumping to a tuple sequence via #fields (for pointer-style
data).
Thats it. Thats all there is to the game.
// Gremlin
g.V().out(‘knows’).values(’name’)
// TP4 Bytecode
== minimum tuple-field requirement to call yourself a graphdb ==>
db().values(‘#V’).values(‘#outE’).has(‘#label’,’knows’).values(‘#inV’).values(‘name’)
== edge-label vertex-centric index optimization ==>
db().values(‘#V’).values(‘#out_label’).by(‘knows’).values(‘name’)
== denormalized adjacent vertex property data for read-heavy systems ==>
db().values('#V’).values(‘#out_label_key’).by(‘knows’).by(‘name’)
So what makes a graphdb? The tuple types and their respective fields.
- you have to have #type=vertex and #type=edge (you can have other
types in there too!, but the graphdb instruction set only cares about vertices
and edges).
- you have to have #id and #label
- you have to have #outE and #inE
- you have to have #outV and #inV (that reference a single #type=vertex)
If your n-tuple representation has all that, then you are a “graphdb” and
Gremlin will be able to process you.
Now check this mind-numbing entailment. Your n-tuple representation could have
other tuple #types. For instance, #type=row.
{#type:row, #table:yearly_expenses, person:*{#type:vertex,#id:1}, january:$10,
february:$12, march:$54, april:…. }
I have a graphdb mixed with a relationaldb all within the same db! You could
effectively encode this in Cassandra (same keyspace) or an RDBMS (same
database). Watch and learn:
// what are the names of the friends of the person that spent the least
last year?
sql("SELECT person FROM yearly_expenses WHERE
MIN(january+february+…)”).out(‘knows’).values(‘name’)
We aren’t talking “multi-model” …. no my friend, we are talking “hybrid-model.”
Its all just tuples and pointers.
Outz,
Marko.
http://rredux.com <http://rredux.com/>
> On May 7, 2019, at 7:26 AM, Marko Rodriguez <[email protected]> wrote:
>
> Whoa.
>
> Check out this trippy trick.
>
> First, here is how you define a pointer to a map-tuple.
>
> *{k1?v1, k2?v2, …, kn?vn}
> * says “this is a pointer to a map" { }
> ? is some comparator like =, >, <, !=, contains(), etc.
>
> Assume the vertex map tuple v[1]:
>
> {#id:1, #label:person, name:marko, age:29}
>
> Now, we can add the following fields:
>
> 1. #outE:*{#outV=*{#id=1}} // references all tuples that have an outV field
> that is a pointer to the the v[1] vertex tuple.
> 2. #outE.knows:*{#outV=*{#id=1},#label=knows} // references all outgoing
> knows-edges.
> 3. #outE.knows.weight_gt_85:*{#outV=*{#id=1},#label=knows,weight>0.85} //
> references all strong outgoing knows-edges
>
> By using different types of pointers, a graph database provider can make
> explicit their internal structure. Assume all three fields above are in the
> v[1] vertex tuple. This means that:
>
> 1. all of v[1]’s outgoing edges are group together. <— linear scan
> 2. all of v[1]’s outgoing knows-edges are group together. <— indexed by
> label
> 3. all of v[1]’s strong outgoing knows-edges are group together <—
> indexed by label and weight
>
> Thus, a graph database provider can describe the way in which it internally
> organizes adjacent edges — i.e. vertex-centric indices! This means then that
> TP4 can do vertex-centric index optimizations automatically for providers!
>
> 1. values(“#outE”).hasLabel(‘knows’).has(‘weight’,gt(0.85)) // grab all
> edges, then filter on label, then filter on weight.
> 2. values(“#outE.knows”).has(‘weight’,gt(0.85)) // grab all
> knows-edges, then filter on weight.
> 3. values(“#outE.knows.weight_gt_85”) // grab all strong knows-edges.
>
> *** Realize that Gremlin outE() will just compile to bytecode values(“#outE”).
>
> Freakin’ crazy! … Josh was interested in using the n-tuple structure to
> describe indices. I was against it. I believe I still am. However, this is
> pretty neat. As Josh was saying though, without a rich enough n-tuple
> description of the underlying database, there should be no reason for
> providers to have to write custom strategies and instructions ?!?!?!?!?
> crazy!?
>
> Marko.
>
> http://rredux.com <http://rredux.com/>
>
>
>
>
>> On May 7, 2019, at 4:44 AM, Marko Rodriguez <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>> Hey Josh,
>>
>>> I think of your Pointer<T> as a reference to an entity. It does not contain
>>> the entity it refers to, but it contains the primary key of that entity.
>>
>> Exactly! I was just thinking that last night. Tuples don’t need a separate
>> ID system. No -- pointers reference the primary key of a tuple! Better yet
>> perhaps, they can reference one-to-many. For instance:
>>
>> { id:1, label:person, name:marko, age:29, outE:*(outV=id) }
>>
>> Thus, a pointer is defined by a pattern match. Haven’t thought through the
>> consequences, but … :)
>>
>>> Here, I have invented an Entity class to indicate that the pointer resolves
>>> to a vertex (an entity without a tuple, or rather with a 0-tuple -- the
>>> unit element).
>>
>> Ah — the 0-tuple. Neat thought.
>>
>> I look forward to your slides from the Knowledge Graph Conference. If I
>> wasn’t such a reclusive hermit, I would have loved to have joined you there.
>>
>> Take care,
>> Marko.
>>
>> http://rredux.com <http://rredux.com/>
>>
>>
>>> On Mon, May 6, 2019 at 9:38 PM Marko Rodriguez <[email protected]
>>> <mailto:[email protected]>> wrote:
>>>
>>>> Hey Josh,
>>>>
>>>>> I am feeling the tuples... as long as they can be typed, e.g.
>>>>>
>>>>> <V> myTuple.get(Integer) -- int-indexed tuples
>>>>> <V> myTuple.get(String) -- string-indexed tuples
>>>>> In most programming languages, "tuples" are not lists, though they are
>>>> typed by a list of element types. E.g. in Haskell you might have a tuple
>>>> with the type
>>>>> (Double, Double, Bool)
>>>>
>>>>
>>>> Yes, we have Pair<A,B>, Triple<A,B,C>, Quadruple<A,B,C,D>, etc. However
>>>> for base Tuple<A> of unknown length, the best I can do in Java is <A>. :|
>>>> You can see my stubs in the gist:
>>>> https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8
>>>> <https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8> <
>>>> https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8
>>>> <https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8>> (LINES
>>>> #21-42)
>>>>
>>>>> If this is in line with your proposal, then we agree that tuples should
>>>> be the atomic unit of data in TP4.
>>>>
>>>> Yep. Vertices, Edges, Rows, Documents, etc. are all just tuples. However,
>>>> I suspect that we will disagree on some of my tweaks. Thus, I’d really like
>>>> to get your feedback on:
>>>>
>>>> 1. pointers (tuple entries referencing tuples).
>>>> 2. sequences (multi-value tuple entries).
>>>> 3. # hidden map keys :|
>>>> - sorta ghetto.
>>>>
>>>> Also, I’m still not happy with db().has().has().as(‘x’).db().where()… its
>>>> an intense syntax and its hard to strategize.
>>>>
>>>> I really want to nail down this “universal model” (tuple structure and
>>>> tuple-oriented instructions) as then I can get back on the codebase and
>>>> start to flush this stuff out with confidence.
>>>>
>>>> See ya,
>>>> Marko.
>>>>
>>>> http://rredux.com <http://rredux.com/> <http://rredux.com/
>>>> <http://rredux.com/>>
>>>>
>>>>
>>>>>
>>>>> Josh
>>>>>
>>>>>
>>>>> On Mon, May 6, 2019 at 5:34 PM Marko Rodriguez <[email protected]
>>>>> <mailto:[email protected]>
>>>> <mailto:[email protected] <mailto:[email protected]>>> wrote:
>>>>> Hi,
>>>>>
>>>>> I spent this afternoon playing with n-tuples, pointers, data model
>>>> interfaces, and bytecode instructions.
>>>>>
>>>>> https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8
>>>>> <https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8> <
>>>> https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8
>>>> <https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8>> <
>>>> https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8
>>>> <https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8> <
>>>> https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8
>>>> <https://gist.github.com/okram/25d50724da89452853a3f4fa894bcbe8>>>
>>>>>
>>>>> *** Kuppitz: They are tuples :). A Map<K,V> extends Tuple<Pair<K,V>>.
>>>> Tada!
>>>>>
>>>>> What I like about this is that it combines the best of both worlds
>>>> (Josh+Marko).
>>>>> * just flat tuples of arbitrary length.
>>>>> * pattern matching for arbitrary joins. (k1=k2 AND k3=k4
>>>> …)
>>>>> * pointers chasing for direct links. (edges, foreign
>>>> keys, document _id references, URI resolutions, …)
>>>>> * sequences are a special type of tuple used for multi-valued
>>>> entries.
>>>>> * has()/values()/etc. work on all tuple types! (maps, lists,
>>>> tuples, vertices, edges, rows, statements, documents, etc.)
>>>>>
>>>>> Thoughts?,
>>>>> Marko.
>>>>>
>>>>> http://rredux.com <http://rredux.com/> <http://rredux.com/
>>>>> <http://rredux.com/>> <http://rredux.com/ <http://rredux.com/> <
>>>> http://rredux.com/ <http://rredux.com/>>>
>>
>