Hello Dmitry,
> In TP3 compilation to Bytecode can happen on Gremlin Client side or Gremlin
> Server side:
>
> 1. If compilation is simple, it is possible to implement it for all Gremlin
> Clients: Java, Python, JavaScript, .NET...
> 2. If compilation is complex, it is possible to create a plugin for Gremlin
> Server. Clients send query string, and server does the compilation.
Yes, but not for the reasons you state. Every TP3-compliant language must be
able to compile to TP3 bytecode. That bytecode is then submitted, evaluated by
the TP3 VM, and a traverser iterator is returned.
However, TP3’s GremlinServer also supports JSR223 ScriptEngine which can
compile query language Strings server side and then return a traverser
iterator. This exists so people can submit complex Groovy/Python/JS scripts to
GremlinServer. The problem with this access point is that arbitrary code can be
submitted and thus while(true) { } can hang the system! dar.
> For example, in Cypher for Gremlin it is possible to use compilation to
> Bytecode in JVM client, or on the server when using [other language
> clients][1].
I’m not to familiar with GremlinServer plugin stuff, so I don’t know. I would
say that all TP3-compliant query languages must be able to compile to TP3
bytecode.
> My current understanding is that TP4 Server would serve only for I/O purposes.
This is still up in the air, but I believe that we should:
1. Only support one data access point.
TP4 bytecode in and traversers out.
2. The TP4 server should have two components.
(1) One (or many) bytecode input locations (IP/port) that pass
the bytecode to the TP4 VM.
(2) Multiple traverser output locations where distributed
processors can directly send halted traversers back to the client.
For me, thats it. However, I’m not a network server-guy so I don’t have a clear
understanding of what is absolutely necessary.
> Where do you see "Query language -> Universal Bytecode" part in TP4
> architecture? Will it be in the VM? Or in middleware? How will clients look
> like in TP4?
TP4 will publish a binary serialization specification.
It will be dead simple compared to TP3’s binary specification.
The only types of objects are: Bytecode, Instruction, Traverser, Tuple, and
Primitive.
Every query language designer that wants to have their query language execute
on the TP4 VM (and thus, against all supporting processing engines and data
storage systems) will need to have a compiler from their language to TP4
bytecode.
We will provide 2 tools in all the popular programming languages (Java, Python,
JS, …).
1. A TP4 serializer and deserializer.
2. A lightweight network client to submit serialized bytecode and
deserialize Iterator<Traverser> into objects in that language.
Thus, if the Cypher-TP4 compiler is written in Scala, you would:
1. build up a org.apache.tinkerpop.machine.bytecode.Bytecode object
during your compilation process.
2. use our org.apache.tinkerpop.machine.io
<http://org.apache.tinkerpop.machine.io/>.RemoteMachine object to send the
Bytecode and get back Iterator<Traverser> objects.
- RemoteMachine does the serialization and deserialization for
you.
I originally wrote out how it currently looks in the tp4/ branch, but realized
that it asks you to write one too many classes. Thus, I think we will probably
go with something like this:
Machine machine = RemoteMachine.
withStructure(NeptuneStructure.class, config1).
withProcessor(AkkaProcessor.class, config2).
open(config0);
Iterator<Traverser> results = machine.submit(CypherCompiler.compile("MATCH
(x)-[knows]->(y)”));
Thus, you would only have to provide a single CypherCompiler class.
If you have any better ideas, please say so. I don’t like that you would have
to create a CypherCompiler class (even if its just a wrapper) for all popular
programming languages. :(
Perhaps TP4 has a Compiler interface and compilation happens server side….? But
then that requires language designers to write their compiler in Java … hmm…..
Hope I’m clear,
Marko.
http://rredux.com <http://rredux.com/>