The Fundamental Structure Instructions Already Exist! (w/ RDBMS Example)

Marko Rodriguez Mon, 29 Apr 2019 07:35:11 -0700

Hi,

*** This email is primarily for Josh (and Kuppitz). However, if others are 
interested… ***


So I did a lot of thinking this weekend about structure/ and this morning, I 
prototyped both graph/ and rdbms/.

This is the way I’m currently thinking of things:

        1. There are 4 base types in structure/.
                - Primitive: string, long, float, int, … (will constrain these 
at some point).
                - TTuple<K,V>: key/value map.
                - TSequence<V>: an iterable of v objects.
                - TSymbol: like Ruby, I think we need “enum-like” symbols 
(e.g., #id, #label).
        
        2. Every structure has a “root.”
                - for graph its TGraph implements TSequence<TVertex>
                - for rdbms its a TDatabase implements TTuple<String,TTable>

        3. Roots implement Structure and thus, are what is generated by 
StructureFactory.mint().
                - defined using withStructure().
                - For graph, its accessible via V().
                - For rdbms, its accessible via db().

        4. There is a list of core instructions for dealing with these base 
objects.
                - value(K key): gets the TTuple value for the provided key.
                - values(K key): gets an iterator of the value for the provided 
key.
                - entries(): gets an iterator of T2Tuple objects for the 
incoming TTuple.
                - hasXXX(A,B): various has()-based filters for looking into a 
TTuple and a TSequence
                - db()/V()/etc.: jump to the “root” of the withStructure() 
structure.
                - drop()/add(): behave as one would expect and thus.

————

For RDBMS, we have three interfaces in rdbms/. 
(machine/machine-core/structure/rdbms)

        1. TDatabase implements TTuple<String,TTable> // the root structure 
that indexes the tables.
        2. TTable implements TSequence<TRow<?>> // a table is a sequence of rows
        3. TRow<V> implements TTuple<String,V>> // a row has string column names
        
I then created a new project at machine/structure/jdbc). The classes in here 
implement the above rdbms/ interfaces/

Here is an RDBMS session:

final Machine machine = LocalMachine.open();
final TraversalSource jdbc =
        Gremlin.traversal(machine).
                        withProcessor(PipesProcessor.class).
                        withStructure(JDBCStructure.class, 
Map.of(JDBCStructure.JDBC_CONNECTION, "jdbc:h2:/tmp/test"));
        
System.out.println(jdbc.db().toList());
System.out.println(jdbc.db().entries().toList());
System.out.println(jdbc.db().value("people").toList());
System.out.println(jdbc.db().values("people").toList());
System.out.println(jdbc.db().values("people").value("name").toList());
System.out.println(jdbc.db().values("people").entries().toList());

This yields:

[<database#conn1: url=jdbc:h2:/tmp/test user=>]
[PEOPLE:<table#PEOPLE>]
[<table#people>]
[<row#PEOPLE:1>, <row#PEOPLE:2>]
[marko, josh]
[NAME:marko, AGE:29, NAME:josh, AGE:32]

The bytecode of the last query is:

[db(<database#conn1: url=jdbc:h2:/tmp/test user=>), values(people), entries]

JDBCDatabase implements TDatabase, Structure. 
        *** JDBCDatabase is the root structure and is referenced by db() *** 
(CRUCIAL POINT)
        
Assume another table called ADDRESSES with two columns: name and city.

jdbc.db().values(“people”).as(“x”).db().values(“addresses”).has(“name”,eq(path(“x”).by(“name”))).value(“city”)
        
The above is equivalent to:

SELECT city FROM people,addresses WHERE people.name=addresses.name

If you want to do an inner join (a product), you do this:

        
jdbc.db().values(“people”).as(“x”).db().values(“addresses”).has(“name”,eq(path(“x”).by(“name”))).as(“y”).path(“x”,”y")

The above is equivalent to:

SELECT * FROM addresses INNER JOIN people ON people.name=addresses.name

NOTES:
        1. Instead of select(), we simply jump to the root via db() (or V() for 
graph).
        2. Instead of project(), we simply use value() or values().
        3. Instead of select() being overloaded with by() join syntax, we use 
has() and path().
                - like TP3 we will be smart about dropping path() data once its 
no longer referenced.
        4. We can also do LEFT and RIGHT JOINs (haven’t thought through FULL 
OUTER JOIN yet).
                - however, we don’t support ‘null' in TP so I don’t know if we 
want to support these null-producing joins. ?
        
LEFT JOIN:
        * If an address doesn’t exist for the person, emit a “null”-filled path.

jdbc.db().values(“people”).as(“x”).
  db().values(“addresses”).as(“y”).
    choose(has(“name”,eq(path(“x”).by(“name”))),
      identity(),
      path(“y”).by(null).as(“y”)).
  path(“x”,”y")

SELECT * FROM addresses LEFT JOIN people ON people.name=addresses.name

RIGHT JOIN:

jdbc.db().values(“people”).as(“x”).
  db().values(“addresses”).as(“y”).
    choose(has(“name”,eq(path(“x”).by(“name”))),
      identity(),
      path(“x”).by(null).as(“x”)).
  path(“x”,”y")


SUMMARY:

There are no “low level” instructions. Everything is based on the standard 
instructions that we know and love. Finally, if not apparent, the above 
bytecode chunks would ultimately get strategized into a single SQL query 
(breadth-first) instead of one-off queries (depth-first) to improve performance.

Neat?,
Marko.

http://rredux.com <http://rredux.com/>

The Fundamental Structure Instructions Already Exist! (w/ RDBMS Example)

Reply via email to