Hello,

*** I believe the following email provides the most elegant TP4 structure/ 
proposal to date. ***

Every database (withStructure()) is understood by TP4 as an unordered sequence 
of #type'd tuples.

{#type:?, ...}
{#type:?, ...}
{#type:?, ...}
...

This is the TP4 “universal model.”

Every graph database is understood by TP4 as an unordered sequence of tuples 
with the following #types.

{#type:index, ...}
{#type:vertex, ...}
{#type:edge, ...}
...

This is the TP4 “property graph model."

It is up to the underlying database to organize tuples in memory and on disk as 
it sees fit. For example, an RDBMS-based graph database may want an 
indices-table, a vertices-table, and an edges-table. Or, for schema-oriented 
graph data, it may want a person-table, a knows-table, a project-table, a 
created-table, etc. The TP4 VM doesn't care how the tuples are organized by the 
database. The TP4 VM interprets the entire database (data + indices) as an 
unordered heterogenous sequence of #type’d tuples. 

————————

All structure providers (i.e. database vendors) must implement a 
db(string...)-instruction which will emit all tuples of a particular #type.

db()                 -> a sequence of all tuples
db('vertex')         -> a sequence of all #type=vertex tuples
db('index','vertex') -> a sequence of all #type=index and #type=vertex tuples.

The difference between a graph database, a relational database, a document 
database, etc. is wholly dependent on the tuple #types and their respective 
keys.

###################################################
### A TP4 Property Graph Database Specification ###
###################################################

#type=index OR vertex OR edge
...
#type=index.#vertex-ids=(pointer(object...) to sequence<vertex>)          
[OPTIONAL]
#type=index.#vertex-property=(pointer(string,object) to sequence<vertex>) 
[OPTIONAL]
...
#type=vertex.#id=(object)
#type=vertex.#label=(string)
#type=vertex.#outE=(pointer to sequence<edge>)
#type=vertex.#inE=(pointer to sequence<edge>)
#type=vertex.#outE-labels=(pointer(string...) to sequence<edge>)  [OPTIONAL]
#type=vertex.#inE-labels=(pointer(string...) to sequence<edge>)   [OPTIONAL]
#type=vertex.#out-labels=(pointer(string...) to sequence<vertex>) [OPTIONAL]
#type=vertex.#in-labels=(pointer(string...) to sequence<vertex>)  [OPTIONAL]
...
#type=edge.#id=(object)
#type=edge.#label=(string)
#type=edge.#outV=(pointer to vertex)
#type=edge.#inV=(pointer to vertex)

The above specifies how the TP4 “universal model” is constrained to encode a 
TP4 "property graph model."

=== EXAMPLE #1 ===

Suppose a graph database provider supports #vertex-ids and #out-labels. The 
graph database's Structure implementation would express this fact via its 
getStrategies() implementation returning: 
GraphCentricIndexStrategy.create(‘vertex-ids')
VertexCentricIndexStrategy.create(‘out-labels')

These TP4 provided strategies would then perform the following bytecode 
rewrites:

g.V(1).out('knows').values('name’)
  == Gremlin compiles to TP4 universal bytecode ==>
db('vertex').has('#id',1).values('#outE').has('#label','knows').values('#inV').values('name')
  == GraphCentricIndexStrategy rewrites the bytecode to ==>
db('index').values('#vertex.ids').apply(1).values('#outE').has('#label','knows').values('#inV').values('name')
  == VertexCentricIndexStrategy rewrites the bytecode to ==>
db('index').values('#vertex.ids').apply(1).values('#out.labels').apply('knows').values('name')

If the original TP4 universal bytecode were to execute, it would do a linear 
scan of all vertex-tuples and filter out those that don’t have an #id=1. It 
would then do a linear scan of all outgoing incident edge-tuples to v[1] and 
filter out those edges that don’t have #label=knows. The result would be 
semantically correct, but the evaluation would be painfully slow for large 
graph datasets. Indices can be used to speed up data access. In the TP4 
universal model, an index is simply a pointer to a sub-sequence of tuples in 
db().

GraphCentricIndexStrategy re-writes the original compiled TP4 universal 
bytecode to use a graph-centric index (pointer) to directly access the vertex 
with #id=1.
VertexCentricIndexStrategy re-writes the previous strategized TP4 bytecode to 
use a vertex-centric index (pointer) to directly access all the incident 
knows-edges of v[1]. 

NOTES:
  values(‘#vertex-ids') references an n-arg pointer. apply(object...) resolves 
the pointer to its tuple sequence using the provided arguments.
  values('#outE') references a 0-arg point. apply() is not required as the 
pointer is immediately resolved to its tuple sequence.

[TO BE CONTINUED…]

Tomorrow I will write this same email but from the perspective of a relational 
data structure encoded in the TP4 universal model.

Feedback is much appreciated.

Take care,
Marko.

http://rredux.com <http://rredux.com/>




Reply via email to