Hi Pieter, I’m still really interested in your work in this area. Are you still doing this?
Marko. http://markorodriguez.com > On Aug 7, 2016, at 9:12 AM, Pieter Martin <[email protected]> wrote: > > To avoid the collection logic alltogether. For most steps there is no need to > check the labels as it is known that they are added, immutable and correct. > > Also with the current strategy the `ImmutablePath.currentLabels` is exactly > the > same collection as that of the step. It is not a copy. > > `Traverser.addLabels()` is only called for the steps previously mentioned. > They > too can be optimized as there is no need to create a new `ImmutablePath` just > to > set the labels. There could be a `Traverer.split()` method that creates the > initial `ImmutablePath` with the correct labels to start with. As things stand > now the `ImmutablePath` is created just to be replaced 1 or 2 millis later. I > have however not benchmarked any of those queries so am not touching that at > the moment. > > In some ways its a matter of style. The label logic in `AbstractStep` is not > relevant to all of its subclasses and should be higher in the inheritance > hierarchy. > > Cheers > Pieter > > > Excerpts from Marko Rodriguez's message of August 7, 2016 3:54 : >> Why not just check to see if the labels to be added already exist, if they >> do, don’t addLabels() and thus, don’t create a new collection. >> Marko. >> http://markorodriguez.com >>> On Aug 7, 2016, at 6:07 AM, Pieter Martin <[email protected]> wrote: >>> Here is what I have come up with so far. >>> The idea is that `Traverser.split(r, step)` already copies the labels to the >>> traverser so there is no need to call `Traverser.addLabels(labels)` again. >>> I removed the `Traverser.addLabels(labels)` call from `AbstractStep`. >>> For the traversers that do not call `Traverer.split(r, step)` I manually >>> added >>> the `traverser.addLabels(labels)` call in `processNextStart()`. This was >>> done >>> by fixing test failures rather than searching and investigating all calls to >>> `Traverser.split()`. >>> The following steps needed the labels to be added manually. >>> `AggregateStep` >>> `CollectingBarrierStep` >>> `FilterStep` >>> `SideEffectStep` >>> `StartStep` >>> `NoOpBarrierStep` >>> Further seeing as `Step.getLabels()` already returns a unmodifiable >>> collection and >>> `ImmutablePath` is well immutable there is no need for it to have its own >>> copy >>> of the labels. I set the labels directly on the path as oppose to making a >>> copy. >>> `TinkerGraphProcessComputerTest` passes. >>> `TinkerGraphProcessStandardTest` passes. >>> `TinkerGraphStructureStandardTest` has one failure. >>> `SerializationTest$GryoTest.shouldSerializePathAsDetached` fails with >>> `java.lang.IllegalArgumentException: Class is not registered: >>> java.util.Collections$UnmodifiableSet` >>> Let me know what you think. >>> Thanks >>> Pieter >>> Excerpts from Ted Wilmes's message of August 7, 2016 12:16 : >>>> Neat find Pieter. With regards to the update to ImmutablePath, I think >>>> that defensive copying of inbound collections is generally a good idea but >>>> if we can target specific areas where we can reap big gains from not >>>> creating new collections it may be worth it to relax that constraint, >>>> especially if we're only talking about internal APIs. If we do relax that >>>> constraint though, we'll need to careful during code review and unit >>>> testing because those bugs can be difficult to track down. The other nice >>>> thing that the defensive copy gets us, particularly in the case of the >>>> ImmutablePath, is that it isolates ImmutablePath from whatever the subclass >>>> of set was that the caller passed in. I think that's what is causing the >>>> serialization test failure in this case since the caller passed in an >>>> unmodifiable set. >>>> --Ted >>>> On Fri, Aug 5, 2016, 2:31 PM Marko Rodriguez <[email protected]> wrote: >>>>> Hello, >>>>> This is cool. Check out also ImmutablePath.extend(labels) as that is >>>>> ultimately what Traverser.addLabels() calls. We have a lot of set copying >>>>> and I don’t know if its needed (as you seem to be demonstrating). What I >>>>> don’t like about your solution is the explicit reference to the >>>>> B_L_P…Traverser in AbstractStep. See if you can work your solution without >>>>> it. >>>>> Good luck, >>>>> Marko. >>>>> http://markorodriguez.com >>>>> > On Aug 5, 2016, at 12:44 PM, pieter-gmail <[email protected]> >>>>> wrote: >>>>> > >>>>> > Sorry forgot to add a rather important part. >>>>> > >>>>> > I changed ImmutablePath's constructor to >>>>> > >>>>> > private ImmutablePath(final ImmutablePathImpl previousPath, final >>>>> > Object currentObject, final Set<String> currentLabels) { >>>>> > this.previousPath = previousPath; >>>>> > this.currentObject = currentObject; >>>>> > this.currentLabels = currentLabels; >>>>> > // this.currentLabels.addAll(currentLabels); >>>>> > } >>>>> > >>>>> > Setting the collection directly as oppose to `addAll` >>>>> > >>>>> > Thanks >>>>> > Pieter >>>>> > >>>>> > >>>>> > On 05/08/2016 20:40, pieter-gmail wrote: >>>>> >> Hi, >>>>> >> >>>>> >> I have been optimizing Sqlg of late and eventually arrived at TinkerPop >>>>> >> code. >>>>> >> >>>>> >> The gremlin in particular that I am interested is path queries. >>>>> >> >>>>> >> Here is the test that I am running in jmh. >>>>> >> >>>>> >> //@Setup >>>>> >> Vertex a = graph.addVertex(T.label, "A", "name", "a1"); >>>>> >> for (int i = 1; i < 1_000_001; i++) { >>>>> >> Vertex b = graph.addVertex(T.label, "B", "name", "name_" + >>>>> i); >>>>> >> a.addEdge("outB", b); >>>>> >> for (int j = 0; j < 1; j++) { >>>>> >> Vertex c = graph.addVertex(T.label, "C", "name", "name_" >>>>> >> + i + " " + j); >>>>> >> b.addEdge("outC", c); >>>>> >> } >>>>> >> } >>>>> >> >>>>> >> And the query being benchmarked is >>>>> >> >>>>> >> GraphTraversal<Vertex, Path> traversal = >>>>> >> g.V(a).as("a").out().as("b").out().as("c").path(); >>>>> >> while (traversal.hasNext()) { >>>>> >> Path path = traversal.next(); >>>>> >> } >>>>> >> >>>>> >> Before the optimization, (as things are now) >>>>> >> >>>>> >> Benchmark Mode Cnt Score Error >>>>> Units >>>>> >> GremlinPathBenchmark.g_path avgt 100 1.086 ± 0.020 s/op >>>>> >> >>>>> >> The optimization I did is in AbstractStep.prepareTraversalForNextStep, >>>>> >> to not call addLabels() for path gremlins as the labels are known by >>>>> >> the >>>>> >> step and do not change again so there is not need to keep adding them. >>>>> >> >>>>> >> private final Traverser.Admin<E> prepareTraversalForNextStep(final >>>>> >> Traverser.Admin<E> traverser) { >>>>> >> if (!this.traverserStepIdAndLabelsSetByChild) { >>>>> >> traverser.setStepId(this.nextStep.getId()); >>>>> >> if (traverser instanceof B_LP_O_P_S_SE_SL_Traverser) { >>>>> >> } else { >>>>> >> traverser.addLabels(this.labels); >>>>> >> } >>>>> >> } >>>>> >> return traverser; >>>>> >> } >>>>> >> >>>>> >> After optimization, >>>>> >> >>>>> >> Benchmark Mode Cnt Score Error >>>>> Units >>>>> >> GremlinPathBenchmark.g_path avgt 100 0.680 ± 0.004 s/op >>>>> >> >>>>> >> 1.086 vs 0.689 seconds for the traversal. >>>>> >> >>>>> >> I ran the Structured and Process test suites. 2 tests are failing with >>>>> >> this optimization. >>>>> >> >>>>> >> InjectTest.g_VX1X_out_name_injectXdanielX_asXaX_mapXlengthX_path fails >>>>> with >>>>> >> >>>>> >> "java.lang.IllegalArgumentException: The step with label a does not >>>>> exist" >>>>> >> >>>>> >> and >>>>> >> >>>>> >> SerializationTest.shouldSerializePathAsDetached fails with >>>>> >> >>>>> >> "Caused by: java.lang.IllegalArgumentException: Class is not >>>>> registered: >>>>> >> java.util.Collections$UnmodifiableSet" >>>>> >> >>>>> >> Before investigating the failures is this optimization worth pursuing? >>>>> >> >>>>> >> Thanks >>>>> >> Pieter >>>>> >> >>>>> >
