Hello,
Gremlin bytecode provides a language agnostic way of sending Gremlin traversals
between machines — whether physical or virtual. For instance, it is possible to
send bytecode from one JVM to another or from CPython to the JVM across the
network. Once bytecode is received, it needs to be translated into a
representation that the processing VM can then evaluate.
GremlinServer is smart in that when bytecode is received it will analyze it for
lambdas. If there are lambdas, written in language X, then it will use
XTranslator and XScriptEngine to evaluate the bytecode and create a Traversal
for evaluation. However, if there are no lambdas, then it will use
JavaTranslator to create a Traversal for evaluation.
So, the question for me is:
Is JavaTranslator (which uses Java reflection to convert bytecode to
Traversal) faster than GroovyTranslator/GroovyScriptEngine (which creates a
String script for and evaluates it in the ScriptEngine)?
Lets see. Here is our script in total.
import org.apache.tinkerpop.gremlin.jsr223.JavaTranslator
import org.apache.tinkerpop.gremlin.groovy.jsr223.GroovyTranslator
//// EXECUTED LOCALLY (e.g. CLIENT APPLICATION) ////
g = EmptyGraph.instance().traversal()
t = g.V().has('name','marko').
repeat(out()).times(2).
groupCount().by('name'); []
bytecode = t.bytecode
// send the bytecode over the wire
//// EXECUTED REMOTELY (e.g. GREMLIN SERVER) ////
groovy = new GremlinGroovyScriptEngine()
bindings = groovy.createBindings()
bindings.put('g',g)
compiled = groovy.compile(GroovyTranslator.of('g').translate(bytecode))
x = JavaTranslator.of(g).translate(bytecode); []
y = compiled.eval(bindings); []
z = groovy.eval(GroovyTranslator.of('g').translate(bytecode), bindings); []
x == y
y == z
z == x
x.toString()
clock(1000){ JavaTranslator.of(g).translate(bytecode) }
clock(1000){ compiled.eval(bindings) } // caching
clock(1000){ groovy.reset();
groovy.eval(GroovyTranslator.of('g').translate(bytecode), bindings) } // no
caching
First, lets make sure they all return the same traversal:
gremlin> x = JavaTranslator.of(g).translate(bytecode); []
gremlin> y = compiled.eval(bindings); []
gremlin> z = groovy.eval(GroovyTranslator.of('g').translate(bytecode),
bindings); []
gremlin> x == y
==>true
gremlin> y == z
==>true
gremlin> z == x
==>true
gremlin> x.toString()
==>[GraphStep(vertex,[]), HasStep([name.eq(marko)]),
RepeatStep([VertexStep(OUT,vertex),
RepeatEndStep],until(loops(2)),emit(false)), GroupCountStep(value(name))]
gremlin>
Great. They do. Now lets see how fast they are.
gremlin> clock(1000){ JavaTranslator.of(g).translate(bytecode) }
==>0.004768085
gremlin> clock(1000){ compiled.eval(bindings) } // caching
==>0.015168259
gremlin> clock(1000){ groovy.reset();
groovy.eval(GroovyTranslator.of('g').translate(bytecode), bindings) } // no
caching
==>40.790075693
gremlin>
Cool. JavaTranslator is about 1000x faster than a evaluating a String script
and about 3x faster than evaluating a compiled script. JavaTranslator takes
about 40 micro-seconds to translate the bytecode, while an uncached String
script takes 40 milliseconds.
So, what did we learn?
1. Bytecode is slick in that we don’t have to use Gremlin-Groovy to
evaluate it (if there are no lambdas) and thus, can do everything in Java and
fast!
2. It very important to always use parameterized queries with
GremlinServer/etc. as you can see how costly it is to evaluate a String script
repeatedly.
What is crazy is that my JavaTranslator code is gheeeeeetto.
https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/jsr223/JavaTranslator.java
<https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/jsr223/JavaTranslator.java>
If anyone wants to submit a PR to make JavaTranslator more efficient, please
do. However, we are still doing well with what we have regardless.
Take care,
Marko.
http://markorodriguez.com