The branch is TINKERPOP-1404 https://github.com/apache/tinkerpop/commit/c1556fe82c58527dc4425d23d1d69ce324e62cfa
Cheers Pieter On 01/11/2016 15:23, Marko Rodriguez wrote: > What branch are you in? Perhaps give me URLs (to GitHub) to the files > touched? (if its not too many) > > Marko. > > http://markorodriguez.com > > > >> On Nov 1, 2016, at 7:19 AM, pieter-gmail <pieter.mar...@gmail.com> wrote: >> >> Hi, >> >> Yes I am but afraid I do not have time at present to concentrate on it. >> I just noticed your ImmutablePath ticket which will overlap with some of >> what I have done. >> >> I'd suggest to pull my branch and look at what I did there. It was very >> little, but dangerous code, which is why I was reluctant to submit a PR >> at first. If you don't continue with it, I should in 2 or 3 weeks be >> able to look at it again. >> >> Thanks >> Pieter >> >> >> On 01/11/2016 15:11, Marko Rodriguez wrote: >>> Hi Pieter, >>> >>> I’m still really interested in your work in this area. Are you still doing >>> this? >>> >>> Marko. >>> >>> http://markorodriguez.com >>> >>> >>> >>>> On Aug 7, 2016, at 9:12 AM, Pieter Martin <pieter.mar...@gmail.com> wrote: >>>> >>>> To avoid the collection logic alltogether. For most steps there is no need >>>> to >>>> check the labels as it is known that they are added, immutable and correct. >>>> >>>> Also with the current strategy the `ImmutablePath.currentLabels` is >>>> exactly the >>>> same collection as that of the step. It is not a copy. >>>> >>>> `Traverser.addLabels()` is only called for the steps previously mentioned. >>>> They >>>> too can be optimized as there is no need to create a new `ImmutablePath` >>>> just to >>>> set the labels. There could be a `Traverer.split()` method that creates the >>>> initial `ImmutablePath` with the correct labels to start with. As things >>>> stand >>>> now the `ImmutablePath` is created just to be replaced 1 or 2 millis >>>> later. I >>>> have however not benchmarked any of those queries so am not touching that >>>> at >>>> the moment. >>>> >>>> In some ways its a matter of style. The label logic in `AbstractStep` is >>>> not >>>> relevant to all of its subclasses and should be higher in the inheritance >>>> hierarchy. >>>> >>>> Cheers >>>> Pieter >>>> >>>> >>>> Excerpts from Marko Rodriguez's message of August 7, 2016 3:54 : >>>>> Why not just check to see if the labels to be added already exist, if >>>>> they do, don’t addLabels() and thus, don’t create a new collection. >>>>> Marko. >>>>> http://markorodriguez.com >>>>>> On Aug 7, 2016, at 6:07 AM, Pieter Martin <pieter.mar...@gmail.com> >>>>>> wrote: >>>>>> Here is what I have come up with so far. >>>>>> The idea is that `Traverser.split(r, step)` already copies the labels to >>>>>> the >>>>>> traverser so there is no need to call `Traverser.addLabels(labels)` >>>>>> again. >>>>>> I removed the `Traverser.addLabels(labels)` call from `AbstractStep`. >>>>>> For the traversers that do not call `Traverer.split(r, step)` I manually >>>>>> added >>>>>> the `traverser.addLabels(labels)` call in `processNextStart()`. This was >>>>>> done >>>>>> by fixing test failures rather than searching and investigating all >>>>>> calls to >>>>>> `Traverser.split()`. >>>>>> The following steps needed the labels to be added manually. >>>>>> `AggregateStep` >>>>>> `CollectingBarrierStep` >>>>>> `FilterStep` >>>>>> `SideEffectStep` >>>>>> `StartStep` >>>>>> `NoOpBarrierStep` >>>>>> Further seeing as `Step.getLabels()` already returns a unmodifiable >>>>>> collection and >>>>>> `ImmutablePath` is well immutable there is no need for it to have its >>>>>> own copy >>>>>> of the labels. I set the labels directly on the path as oppose to making >>>>>> a copy. >>>>>> `TinkerGraphProcessComputerTest` passes. >>>>>> `TinkerGraphProcessStandardTest` passes. >>>>>> `TinkerGraphStructureStandardTest` has one failure. >>>>>> `SerializationTest$GryoTest.shouldSerializePathAsDetached` fails with >>>>>> `java.lang.IllegalArgumentException: Class is not registered: >>>>>> java.util.Collections$UnmodifiableSet` >>>>>> Let me know what you think. >>>>>> Thanks >>>>>> Pieter >>>>>> Excerpts from Ted Wilmes's message of August 7, 2016 12:16 : >>>>>>> Neat find Pieter. With regards to the update to ImmutablePath, I think >>>>>>> that defensive copying of inbound collections is generally a good idea >>>>>>> but >>>>>>> if we can target specific areas where we can reap big gains from not >>>>>>> creating new collections it may be worth it to relax that constraint, >>>>>>> especially if we're only talking about internal APIs. If we do relax >>>>>>> that >>>>>>> constraint though, we'll need to careful during code review and unit >>>>>>> testing because those bugs can be difficult to track down. The other >>>>>>> nice >>>>>>> thing that the defensive copy gets us, particularly in the case of the >>>>>>> ImmutablePath, is that it isolates ImmutablePath from whatever the >>>>>>> subclass >>>>>>> of set was that the caller passed in. I think that's what is causing >>>>>>> the >>>>>>> serialization test failure in this case since the caller passed in an >>>>>>> unmodifiable set. >>>>>>> --Ted >>>>>>> On Fri, Aug 5, 2016, 2:31 PM Marko Rodriguez <okramma...@gmail.com> >>>>>>> wrote: >>>>>>>> Hello, >>>>>>>> This is cool. Check out also ImmutablePath.extend(labels) as that is >>>>>>>> ultimately what Traverser.addLabels() calls. We have a lot of set >>>>>>>> copying >>>>>>>> and I don’t know if its needed (as you seem to be demonstrating). What >>>>>>>> I >>>>>>>> don’t like about your solution is the explicit reference to the >>>>>>>> B_L_P…Traverser in AbstractStep. See if you can work your solution >>>>>>>> without >>>>>>>> it. >>>>>>>> Good luck, >>>>>>>> Marko. >>>>>>>> http://markorodriguez.com >>>>>>>>> On Aug 5, 2016, at 12:44 PM, pieter-gmail <pieter.mar...@gmail.com> >>>>>>>> wrote: >>>>>>>>> Sorry forgot to add a rather important part. >>>>>>>>> >>>>>>>>> I changed ImmutablePath's constructor to >>>>>>>>> >>>>>>>>> private ImmutablePath(final ImmutablePathImpl previousPath, final >>>>>>>>> Object currentObject, final Set<String> currentLabels) { >>>>>>>>> this.previousPath = previousPath; >>>>>>>>> this.currentObject = currentObject; >>>>>>>>> this.currentLabels = currentLabels; >>>>>>>>> // this.currentLabels.addAll(currentLabels); >>>>>>>>> } >>>>>>>>> >>>>>>>>> Setting the collection directly as oppose to `addAll` >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> Pieter >>>>>>>>> >>>>>>>>> >>>>>>>>> On 05/08/2016 20:40, pieter-gmail wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I have been optimizing Sqlg of late and eventually arrived at >>>>>>>>>> TinkerPop >>>>>>>>>> code. >>>>>>>>>> >>>>>>>>>> The gremlin in particular that I am interested is path queries. >>>>>>>>>> >>>>>>>>>> Here is the test that I am running in jmh. >>>>>>>>>> >>>>>>>>>> //@Setup >>>>>>>>>> Vertex a = graph.addVertex(T.label, "A", "name", "a1"); >>>>>>>>>> for (int i = 1; i < 1_000_001; i++) { >>>>>>>>>> Vertex b = graph.addVertex(T.label, "B", "name", "name_" + >>>>>>>> i); >>>>>>>>>> a.addEdge("outB", b); >>>>>>>>>> for (int j = 0; j < 1; j++) { >>>>>>>>>> Vertex c = graph.addVertex(T.label, "C", "name", >>>>>>>>>> "name_" >>>>>>>>>> + i + " " + j); >>>>>>>>>> b.addEdge("outC", c); >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> And the query being benchmarked is >>>>>>>>>> >>>>>>>>>> GraphTraversal<Vertex, Path> traversal = >>>>>>>>>> g.V(a).as("a").out().as("b").out().as("c").path(); >>>>>>>>>> while (traversal.hasNext()) { >>>>>>>>>> Path path = traversal.next(); >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> Before the optimization, (as things are now) >>>>>>>>>> >>>>>>>>>> Benchmark Mode Cnt Score Error >>>>>>>> Units >>>>>>>>>> GremlinPathBenchmark.g_path avgt 100 1.086 ± 0.020 s/op >>>>>>>>>> >>>>>>>>>> The optimization I did is in >>>>>>>>>> AbstractStep.prepareTraversalForNextStep, >>>>>>>>>> to not call addLabels() for path gremlins as the labels are known by >>>>>>>>>> the >>>>>>>>>> step and do not change again so there is not need to keep adding >>>>>>>>>> them. >>>>>>>>>> >>>>>>>>>> private final Traverser.Admin<E> prepareTraversalForNextStep(final >>>>>>>>>> Traverser.Admin<E> traverser) { >>>>>>>>>> if (!this.traverserStepIdAndLabelsSetByChild) { >>>>>>>>>> traverser.setStepId(this.nextStep.getId()); >>>>>>>>>> if (traverser instanceof B_LP_O_P_S_SE_SL_Traverser) { >>>>>>>>>> } else { >>>>>>>>>> traverser.addLabels(this.labels); >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> return traverser; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> After optimization, >>>>>>>>>> >>>>>>>>>> Benchmark Mode Cnt Score Error >>>>>>>> Units >>>>>>>>>> GremlinPathBenchmark.g_path avgt 100 0.680 ± 0.004 s/op >>>>>>>>>> >>>>>>>>>> 1.086 vs 0.689 seconds for the traversal. >>>>>>>>>> >>>>>>>>>> I ran the Structured and Process test suites. 2 tests are failing >>>>>>>>>> with >>>>>>>>>> this optimization. >>>>>>>>>> >>>>>>>>>> InjectTest.g_VX1X_out_name_injectXdanielX_asXaX_mapXlengthX_path >>>>>>>>>> fails >>>>>>>> with >>>>>>>>>> "java.lang.IllegalArgumentException: The step with label a does not >>>>>>>> exist" >>>>>>>>>> and >>>>>>>>>> >>>>>>>>>> SerializationTest.shouldSerializePathAsDetached fails with >>>>>>>>>> >>>>>>>>>> "Caused by: java.lang.IllegalArgumentException: Class is not >>>>>>>> registered: >>>>>>>>>> java.util.Collections$UnmodifiableSet" >>>>>>>>>> >>>>>>>>>> Before investigating the failures is this optimization worth >>>>>>>>>> pursuing? >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> Pieter >>>>>>>>>>