Hey,
Check this out:
############################
Machine machine = LocalMachine.open();
TraversalSource jdbc =
Gremlin.traversal(machine).
withProcessor(PipesProcessor.class).
withStructure(JDBCStructure.class,
Map.of(JDBCStructure.JDBC_CONNECTION, "jdbc:h2:/tmp/test"));
System.out.println(jdbc.db().values("people").as("x”).
db().values("addresses").as("y").has("name",
__.path("x").by("name")).
path("x", "y").toList());
System.out.println(“\n\n”)
System.out.println(jdbc.db().values("people").as("x”).
db().values("addresses").as("y").has("name",
__.path("x").by("name")).
path("x", "y").explain().toList());
############################
[[{NAME=marko, AGE=29}, {CITY=santa fe, NAME=marko}], [{NAME=josh, AGE=32},
{CITY=san jose, NAME=josh}]]
[Original [db, values(people)@x, db,
values(addresses)@y, hasKeyValue(name,[path(x,[value(name)])]), path(x,y,|)]
JDBCStrategy [db(<database#conn9:
url=jdbc:h2:/tmp/test user=>), values(people)@x, db(<database#conn10:
url=jdbc:h2:/tmp/test user=>), values(addresses)@y,
hasKeyValue(name,[path(x,[value(name)])]), path(x,y,|)]
JDBCQueryStrategy [jdbc:sql(conn9: url=jdbc:h2:/tmp/test
user=,x,y,SELECT x.*, y.* FROM people AS x, addresses AS y WHERE x.name=y.name)]
PipesStrategy [jdbc:sql(conn9: url=jdbc:h2:/tmp/test
user=,x,y,SELECT x.*, y.* FROM people AS x, addresses AS y WHERE x.name=y.name)]
CoefficientStrategy [jdbc:sql(conn9: url=jdbc:h2:/tmp/test
user=,x,y,SELECT x.*, y.* FROM people AS x, addresses AS y WHERE x.name=y.name)]
CoefficientVerificationStrategy [jdbc:sql(conn9: url=jdbc:h2:/tmp/test
user=,x,y,SELECT x.*, y.* FROM people AS x, addresses AS y WHERE x.name=y.name)]
-------------------------------
Compilation [FlatMapInitial]
Execution Plan [PipesProcessor] [InitialStep[FlatMapInitial]]]
I basically look for a db.values.db.values.has-pattern in the bytecode and if I
find it, I try and roll it into a single provider-specific instruction that
does a SELECT query.
Here is JDBCQueryStrategy (its ghetto and error prone, but I just wanted to get
the basic concept working):
https://github.com/apache/tinkerpop/blob/7142dc16d8fc81ad8bd4090096b42e5b9b1744f4/java/machine/structure/jdbc/src/main/java/org/apache/tinkerpop/machine/structure/jdbc/strategy/JDBCQueryStrategy.java
<https://github.com/apache/tinkerpop/blob/7142dc16d8fc81ad8bd4090096b42e5b9b1744f4/java/machine/structure/jdbc/src/main/java/org/apache/tinkerpop/machine/structure/jdbc/strategy/JDBCQueryStrategy.java>
Here is SqlFlatMapStep (hyper-ghetto… but whateva’):
https://github.com/apache/tinkerpop/blob/7142dc16d8fc81ad8bd4090096b42e5b9b1744f4/java/machine/structure/jdbc/src/main/java/org/apache/tinkerpop/machine/structure/jdbc/function/flatmap/SqlFlatMap.java
<https://github.com/apache/tinkerpop/blob/7142dc16d8fc81ad8bd4090096b42e5b9b1744f4/java/machine/structure/jdbc/src/main/java/org/apache/tinkerpop/machine/structure/jdbc/function/flatmap/SqlFlatMap.java>
Na na!,
Marko.
http://rredux.com <http://rredux.com/>
> On Apr 29, 2019, at 11:50 AM, Marko Rodriguez <[email protected]> wrote:
>
> Hello Kuppitz,
>
>> I don't think it's a good idea to keep this mindset for TP4; NULLs are too
>> important in RDBMS. I don't know, maybe you can convince SQL people that
>> dropping a value is the same as setting its value to NULL. It would work
>> for you and me and everybody else who's familiar with Gremlin, but SQL
>> people really love their NULLs….
>
> Hmm……. I don’t like nulls. Perhaps with time a clever solution will emerge.
> ????
>
>> I'd prefer to just have special accessors for these. E.g. g.V().meta("id").
>> At least valueMaps would then only have String-keys.
>> I see the issue with that (naming collisions), but it's still better than
>> the enums in my opinion (which became a pain when started to implement
>> GLVs).
>
> So, TSymbols are not Java enums. They are simply a “primitive”-type that will
> have a serialization like:
>
> symbol[id]
>
> Meaning, that people can make up Symbols all day long without having to
> update serializers. How I see them working is that they are Strings prefixed
> with #.
>
> g.V().outE() <=> g.V().values(“#outE”)
> g.V().id() <=> g.V().value(“#id”)
> g.V().hasLabel(“person") <=> g.V().has(“#label”,”person”)
>
> Now that I type this out, perhaps we don’t even have a TSymbol-class.
> Instead, any String that starts with # is considered a symbol. Now watch this:
>
> g.V().label() <=> g.V().value(“#label”)
> g.V().labels() <=> g.V().values(“#label”)
>
> In this way, we can support Neo4j multi-labels as a Neo4jVertex’s #label-Key
> references a TSequence<String>.
>
> g.V(1).label() => TSequence<String>
> g.V(1).labels() => String, String, String, …
> g.V(1).label().add(“programmer”)
> g.V(1).label().drop(“person”)
>
> So we could do “meta()”, but then you need respective “hasXXX”-meta()
> methods. I think #symbol is easiest .. ?
>
>> Also, what I'm wondering about now: Have you thought about Stored
>> Procedures and Views in RDBMS? Views can be treated as tables, easy, but
>> what about stored procedures? SPs can be found in many more DBMS, would be
>> bad to not support them (or hack something ugly together later in the
>> development process).
>
> I’m not super versed in RDBMS technology. Can you please explain to me how to
> create a StoreProcedure and the range of outputs a StoredProcedure produces?
> From there, I can try and “Bytecode-ize” it.
>
> Thanks Kuppitz,
> Marko.
>
> http://rredux.com <http://rredux.com/>
>
>
>
>
>> On Mon, Apr 29, 2019 at 7:34 AM Marko Rodriguez <[email protected]
>> <mailto:[email protected]>>
>> wrote:
>>
>>> Hi,
>>>
>>> *** This email is primarily for Josh (and Kuppitz). However, if others are
>>> interested… ***
>>>
>>> So I did a lot of thinking this weekend about structure/ and this morning,
>>> I prototyped both graph/ and rdbms/.
>>>
>>> This is the way I’m currently thinking of things:
>>>
>>> 1. There are 4 base types in structure/.
>>> - Primitive: string, long, float, int, … (will constrain
>>> these at some point).
>>> - TTuple<K,V>: key/value map.
>>> - TSequence<V>: an iterable of v objects.
>>> - TSymbol: like Ruby, I think we need “enum-like” symbols
>>> (e.g., #id, #label).
>>>
>>> 2. Every structure has a “root.”
>>> - for graph its TGraph implements TSequence<TVertex>
>>> - for rdbms its a TDatabase implements
>>> TTuple<String,TTable>
>>>
>>> 3. Roots implement Structure and thus, are what is generated by
>>> StructureFactory.mint().
>>> - defined using withStructure().
>>> - For graph, its accessible via V().
>>> - For rdbms, its accessible via db().
>>>
>>> 4. There is a list of core instructions for dealing with these
>>> base objects.
>>> - value(K key): gets the TTuple value for the provided key.
>>> - values(K key): gets an iterator of the value for the
>>> provided key.
>>> - entries(): gets an iterator of T2Tuple objects for the
>>> incoming TTuple.
>>> - hasXXX(A,B): various has()-based filters for looking
>>> into a TTuple and a TSequence
>>> - db()/V()/etc.: jump to the “root” of the withStructure()
>>> structure.
>>> - drop()/add(): behave as one would expect and thus.
>>>
>>> ————
>>>
>>> For RDBMS, we have three interfaces in rdbms/.
>>> (machine/machine-core/structure/rdbms)
>>>
>>> 1. TDatabase implements TTuple<String,TTable> // the root
>>> structure that indexes the tables.
>>> 2. TTable implements TSequence<TRow<?>> // a table is a sequence
>>> of rows
>>> 3. TRow<V> implements TTuple<String,V>> // a row has string column
>>> names
>>>
>>> I then created a new project at machine/structure/jdbc). The classes in
>>> here implement the above rdbms/ interfaces/
>>>
>>> Here is an RDBMS session:
>>>
>>> final Machine machine = LocalMachine.open();
>>> final TraversalSource jdbc =
>>> Gremlin.traversal(machine).
>>> withProcessor(PipesProcessor.class).
>>> withStructure(JDBCStructure.class,
>>> Map.of(JDBCStructure.JDBC_CONNECTION, "jdbc:h2:/tmp/test"));
>>>
>>> System.out.println(jdbc.db().toList());
>>> System.out.println(jdbc.db().entries().toList());
>>> System.out.println(jdbc.db().value("people").toList());
>>> System.out.println(jdbc.db().values("people").toList());
>>> System.out.println(jdbc.db().values("people").value("name").toList());
>>> System.out.println(jdbc.db().values("people").entries().toList());
>>>
>>> This yields:
>>>
>>> [<database#conn1: url=jdbc:h2:/tmp/test user=>]
>>> [PEOPLE:<table#PEOPLE>]
>>> [<table#people>]
>>> [<row#PEOPLE:1>, <row#PEOPLE:2>]
>>> [marko, josh]
>>> [NAME:marko, AGE:29, NAME:josh, AGE:32]
>>>
>>> The bytecode of the last query is:
>>>
>>> [db(<database#conn1: url=jdbc:h2:/tmp/test user=>), values(people),
>>> entries]
>>>
>>> JDBCDatabase implements TDatabase, Structure.
>>> *** JDBCDatabase is the root structure and is referenced by db()
>>> *** (CRUCIAL POINT)
>>>
>>> Assume another table called ADDRESSES with two columns: name and city.
>>>
>>>
>>> jdbc.db().values(“people”).as(“x”).db().values(“addresses”).has(“name”,eq(path(“x”).by(“name”))).value(“city”)
>>>
>>> The above is equivalent to:
>>>
>>> SELECT city FROM people,addresses WHERE people.name=addresses.name
>>>
>>> If you want to do an inner join (a product), you do this:
>>>
>>>
>>> jdbc.db().values(“people”).as(“x”).db().values(“addresses”).has(“name”,eq(path(“x”).by(“name”))).as(“y”).path(“x”,”y")
>>>
>>> The above is equivalent to:
>>>
>>> SELECT * FROM addresses INNER JOIN people ON people.name=addresses.name
>>>
>>> NOTES:
>>> 1. Instead of select(), we simply jump to the root via db() (or
>>> V() for graph).
>>> 2. Instead of project(), we simply use value() or values().
>>> 3. Instead of select() being overloaded with by() join syntax, we
>>> use has() and path().
>>> - like TP3 we will be smart about dropping path() data
>>> once its no longer referenced.
>>> 4. We can also do LEFT and RIGHT JOINs (haven’t thought through
>>> FULL OUTER JOIN yet).
>>> - however, we don’t support ‘null' in TP so I don’t know
>>> if we want to support these null-producing joins. ?
>>>
>>> LEFT JOIN:
>>> * If an address doesn’t exist for the person, emit a “null”-filled
>>> path.
>>>
>>> jdbc.db().values(“people”).as(“x”).
>>> db().values(“addresses”).as(“y”).
>>> choose(has(“name”,eq(path(“x”).by(“name”))),
>>> identity(),
>>> path(“y”).by(null).as(“y”)).
>>> path(“x”,”y")
>>>
>>> SELECT * FROM addresses LEFT JOIN people ON people.name=addresses.name
>>>
>>> RIGHT JOIN:
>>>
>>> jdbc.db().values(“people”).as(“x”).
>>> db().values(“addresses”).as(“y”).
>>> choose(has(“name”,eq(path(“x”).by(“name”))),
>>> identity(),
>>> path(“x”).by(null).as(“x”)).
>>> path(“x”,”y")
>>>
>>>
>>> SUMMARY:
>>>
>>> There are no “low level” instructions. Everything is based on the standard
>>> instructions that we know and love. Finally, if not apparent, the above
>>> bytecode chunks would ultimately get strategized into a single SQL query
>>> (breadth-first) instead of one-off queries (depth-first) to improve
>>> performance.
>>>
>>> Neat?,
>>> Marko.
>>>
>>> http://rredux.com <http://rredux.com/> <http://rredux.com/
>>> <http://rredux.com/>>
>