Hi,
I have been optimizing Sqlg of late and eventually arrived at TinkerPop
code.
The gremlin in particular that I am interested is path queries.
Here is the test that I am running in jmh.
//@Setup
Vertex a = graph.addVertex(T.label, "A", "name", "a1");
for (int i = 1; i < 1_000_001; i++) {
Vertex b = graph.addVertex(T.label, "B", "name", "name_" + i);
a.addEdge("outB", b);
for (int j = 0; j < 1; j++) {
Vertex c = graph.addVertex(T.label, "C", "name", "name_"
+ i + " " + j);
b.addEdge("outC", c);
}
}
And the query being benchmarked is
GraphTraversal<Vertex, Path> traversal =
g.V(a).as("a").out().as("b").out().as("c").path();
while (traversal.hasNext()) {
Path path = traversal.next();
}
Before the optimization, (as things are now)
Benchmark Mode Cnt Score Error Units
GremlinPathBenchmark.g_path avgt 100 1.086 ± 0.020 s/op
The optimization I did is in AbstractStep.prepareTraversalForNextStep,
to not call addLabels() for path gremlins as the labels are known by the
step and do not change again so there is not need to keep adding them.
private final Traverser.Admin<E> prepareTraversalForNextStep(final
Traverser.Admin<E> traverser) {
if (!this.traverserStepIdAndLabelsSetByChild) {
traverser.setStepId(this.nextStep.getId());
if (traverser instanceof B_LP_O_P_S_SE_SL_Traverser) {
} else {
traverser.addLabels(this.labels);
}
}
return traverser;
}
After optimization,
Benchmark Mode Cnt Score Error Units
GremlinPathBenchmark.g_path avgt 100 0.680 ± 0.004 s/op
1.086 vs 0.689 seconds for the traversal.
I ran the Structured and Process test suites. 2 tests are failing with
this optimization.
InjectTest.g_VX1X_out_name_injectXdanielX_asXaX_mapXlengthX_path fails with
"java.lang.IllegalArgumentException: The step with label a does not exist"
and
SerializationTest.shouldSerializePathAsDetached fails with
"Caused by: java.lang.IllegalArgumentException: Class is not registered:
java.util.Collections$UnmodifiableSet"
Before investigating the failures is this optimization worth pursuing?
Thanks
Pieter