Re: path query optimization
Hi, So the one thing I just copied into TINKERPOP-1372 was your ImmutablePath.currentLabels = currentLabels (vs. copying the set). You are right, step labels are immutable and thus, there is no need to copy the set over — a direct reference is sufficient. Regarding the AbstractStep.traverserStepIdAndLabelsSetByChild changes — this isn’t so important as this is only used in OLAP and thus, should be negligible relative to what else is going on in OLAP. However, to be good, I added the following: @Override public Path extend(final Set labels) { if (labels.isEmpty()) return this; else { final Set newLabels = new LinkedHashSet<>(); newLabels.addAll(this.currentLabels); newLabels.addAll(labels); return new ImmutablePath(this.previousPath, this.currentObject, newLabels); } } Thus, if the labels are empty, don’t clone. This work along with the other recursion work in TINKERPOP-1372 provided a solid speed up. I’ll present benchmark results in the subsequent PR. Thanks Pieter, Marko. http://markorodriguez.com > On Nov 1, 2016, at 7:43 AM, pieter-gmail wrote: > > The branch is TINKERPOP-1404 > https://github.com/apache/tinkerpop/commit/c1556fe82c58527dc4425d23d1d69ce324e62cfa > > Cheers > Pieter > > On 01/11/2016 15:23, Marko Rodriguez wrote: >> What branch are you in? Perhaps give me URLs (to GitHub) to the files >> touched? (if its not too many) >> >> Marko. >> >> http://markorodriguez.com >> >> >> >>> On Nov 1, 2016, at 7:19 AM, pieter-gmail wrote: >>> >>> Hi, >>> >>> Yes I am but afraid I do not have time at present to concentrate on it. >>> I just noticed your ImmutablePath ticket which will overlap with some of >>> what I have done. >>> >>> I'd suggest to pull my branch and look at what I did there. It was very >>> little, but dangerous code, which is why I was reluctant to submit a PR >>> at first. If you don't continue with it, I should in 2 or 3 weeks be >>> able to look at it again. >>> >>> Thanks >>> Pieter >>> >>> >>> On 01/11/2016 15:11, Marko Rodriguez wrote: Hi Pieter, I’m still really interested in your work in this area. Are you still doing this? Marko. http://markorodriguez.com > On Aug 7, 2016, at 9:12 AM, Pieter Martin wrote: > > To avoid the collection logic alltogether. For most steps there is no > need to > check the labels as it is known that they are added, immutable and > correct. > > Also with the current strategy the `ImmutablePath.currentLabels` is > exactly the > same collection as that of the step. It is not a copy. > > `Traverser.addLabels()` is only called for the steps previously > mentioned. They > too can be optimized as there is no need to create a new `ImmutablePath` > just to > set the labels. There could be a `Traverer.split()` method that creates > the > initial `ImmutablePath` with the correct labels to start with. As things > stand > now the `ImmutablePath` is created just to be replaced 1 or 2 millis > later. I > have however not benchmarked any of those queries so am not touching that > at > the moment. > > In some ways its a matter of style. The label logic in `AbstractStep` is > not > relevant to all of its subclasses and should be higher in the inheritance > hierarchy. > > Cheers > Pieter > > > Excerpts from Marko Rodriguez's message of August 7, 2016 3:54 : >> Why not just check to see if the labels to be added already exist, if >> they do, don’t addLabels() and thus, don’t create a new collection. >> Marko. >> http://markorodriguez.com >>> On Aug 7, 2016, at 6:07 AM, Pieter Martin >>> wrote: >>> Here is what I have come up with so far. >>> The idea is that `Traverser.split(r, step)` already copies the labels >>> to the >>> traverser so there is no need to call `Traverser.addLabels(labels)` >>> again. >>> I removed the `Traverser.addLabels(labels)` call from `AbstractStep`. >>> For the traversers that do not call `Traverer.split(r, step)` I >>> manually added >>> the `traverser.addLabels(labels)` call in `processNextStart()`. This >>> was done >>> by fixing test failures rather than searching and investigating all >>> calls to >>> `Traverser.split()`. >>> The following steps needed the labels to be added manually. >>> `AggregateStep` >>> `CollectingBarrierStep` >>> `FilterStep` >>> `SideEffectStep` >>> `StartStep` >>> `NoOpBarrierStep` >>> Further seeing as `Step.getLabels()` already returns a unmodifiable >>> collection and >>> `ImmutablePath` is well immutable there is no need for it to have its >>> own copy >>> of the labels. I set the labels directly on the path as oppose to >>> making a copy. >>> `TinkerGraphProcessComputerT
Re: path query optimization
The branch is TINKERPOP-1404 https://github.com/apache/tinkerpop/commit/c1556fe82c58527dc4425d23d1d69ce324e62cfa Cheers Pieter On 01/11/2016 15:23, Marko Rodriguez wrote: > What branch are you in? Perhaps give me URLs (to GitHub) to the files > touched? (if its not too many) > > Marko. > > http://markorodriguez.com > > > >> On Nov 1, 2016, at 7:19 AM, pieter-gmail wrote: >> >> Hi, >> >> Yes I am but afraid I do not have time at present to concentrate on it. >> I just noticed your ImmutablePath ticket which will overlap with some of >> what I have done. >> >> I'd suggest to pull my branch and look at what I did there. It was very >> little, but dangerous code, which is why I was reluctant to submit a PR >> at first. If you don't continue with it, I should in 2 or 3 weeks be >> able to look at it again. >> >> Thanks >> Pieter >> >> >> On 01/11/2016 15:11, Marko Rodriguez wrote: >>> Hi Pieter, >>> >>> I’m still really interested in your work in this area. Are you still doing >>> this? >>> >>> Marko. >>> >>> http://markorodriguez.com >>> >>> >>> On Aug 7, 2016, at 9:12 AM, Pieter Martin wrote: To avoid the collection logic alltogether. For most steps there is no need to check the labels as it is known that they are added, immutable and correct. Also with the current strategy the `ImmutablePath.currentLabels` is exactly the same collection as that of the step. It is not a copy. `Traverser.addLabels()` is only called for the steps previously mentioned. They too can be optimized as there is no need to create a new `ImmutablePath` just to set the labels. There could be a `Traverer.split()` method that creates the initial `ImmutablePath` with the correct labels to start with. As things stand now the `ImmutablePath` is created just to be replaced 1 or 2 millis later. I have however not benchmarked any of those queries so am not touching that at the moment. In some ways its a matter of style. The label logic in `AbstractStep` is not relevant to all of its subclasses and should be higher in the inheritance hierarchy. Cheers Pieter Excerpts from Marko Rodriguez's message of August 7, 2016 3:54 : > Why not just check to see if the labels to be added already exist, if > they do, don’t addLabels() and thus, don’t create a new collection. > Marko. > http://markorodriguez.com >> On Aug 7, 2016, at 6:07 AM, Pieter Martin >> wrote: >> Here is what I have come up with so far. >> The idea is that `Traverser.split(r, step)` already copies the labels to >> the >> traverser so there is no need to call `Traverser.addLabels(labels)` >> again. >> I removed the `Traverser.addLabels(labels)` call from `AbstractStep`. >> For the traversers that do not call `Traverer.split(r, step)` I manually >> added >> the `traverser.addLabels(labels)` call in `processNextStart()`. This was >> done >> by fixing test failures rather than searching and investigating all >> calls to >> `Traverser.split()`. >> The following steps needed the labels to be added manually. >> `AggregateStep` >> `CollectingBarrierStep` >> `FilterStep` >> `SideEffectStep` >> `StartStep` >> `NoOpBarrierStep` >> Further seeing as `Step.getLabels()` already returns a unmodifiable >> collection and >> `ImmutablePath` is well immutable there is no need for it to have its >> own copy >> of the labels. I set the labels directly on the path as oppose to making >> a copy. >> `TinkerGraphProcessComputerTest` passes. >> `TinkerGraphProcessStandardTest` passes. >> `TinkerGraphStructureStandardTest` has one failure. >> `SerializationTest$GryoTest.shouldSerializePathAsDetached` fails with >> `java.lang.IllegalArgumentException: Class is not registered: >> java.util.Collections$UnmodifiableSet` >> Let me know what you think. >> Thanks >> Pieter >> Excerpts from Ted Wilmes's message of August 7, 2016 12:16 : >>> Neat find Pieter. With regards to the update to ImmutablePath, I think >>> that defensive copying of inbound collections is generally a good idea >>> but >>> if we can target specific areas where we can reap big gains from not >>> creating new collections it may be worth it to relax that constraint, >>> especially if we're only talking about internal APIs. If we do relax >>> that >>> constraint though, we'll need to careful during code review and unit >>> testing because those bugs can be difficult to track down. The other >>> nice >>> thing that the defensive copy gets us, particularly in the case of the >>> ImmutablePath, is that it isolates ImmutablePath from whatever the >>> subclass >>> of set was that the caller passed in. I think that's what is causing >>> the
Re: path query optimization
What branch are you in? Perhaps give me URLs (to GitHub) to the files touched? (if its not too many) Marko. http://markorodriguez.com > On Nov 1, 2016, at 7:19 AM, pieter-gmail wrote: > > Hi, > > Yes I am but afraid I do not have time at present to concentrate on it. > I just noticed your ImmutablePath ticket which will overlap with some of > what I have done. > > I'd suggest to pull my branch and look at what I did there. It was very > little, but dangerous code, which is why I was reluctant to submit a PR > at first. If you don't continue with it, I should in 2 or 3 weeks be > able to look at it again. > > Thanks > Pieter > > > On 01/11/2016 15:11, Marko Rodriguez wrote: >> Hi Pieter, >> >> I’m still really interested in your work in this area. Are you still doing >> this? >> >> Marko. >> >> http://markorodriguez.com >> >> >> >>> On Aug 7, 2016, at 9:12 AM, Pieter Martin wrote: >>> >>> To avoid the collection logic alltogether. For most steps there is no need >>> to >>> check the labels as it is known that they are added, immutable and correct. >>> >>> Also with the current strategy the `ImmutablePath.currentLabels` is exactly >>> the >>> same collection as that of the step. It is not a copy. >>> >>> `Traverser.addLabels()` is only called for the steps previously mentioned. >>> They >>> too can be optimized as there is no need to create a new `ImmutablePath` >>> just to >>> set the labels. There could be a `Traverer.split()` method that creates the >>> initial `ImmutablePath` with the correct labels to start with. As things >>> stand >>> now the `ImmutablePath` is created just to be replaced 1 or 2 millis later. >>> I >>> have however not benchmarked any of those queries so am not touching that at >>> the moment. >>> >>> In some ways its a matter of style. The label logic in `AbstractStep` is not >>> relevant to all of its subclasses and should be higher in the inheritance >>> hierarchy. >>> >>> Cheers >>> Pieter >>> >>> >>> Excerpts from Marko Rodriguez's message of August 7, 2016 3:54 : Why not just check to see if the labels to be added already exist, if they do, don’t addLabels() and thus, don’t create a new collection. Marko. http://markorodriguez.com > On Aug 7, 2016, at 6:07 AM, Pieter Martin wrote: > Here is what I have come up with so far. > The idea is that `Traverser.split(r, step)` already copies the labels to > the > traverser so there is no need to call `Traverser.addLabels(labels)` again. > I removed the `Traverser.addLabels(labels)` call from `AbstractStep`. > For the traversers that do not call `Traverer.split(r, step)` I manually > added > the `traverser.addLabels(labels)` call in `processNextStart()`. This was > done > by fixing test failures rather than searching and investigating all calls > to > `Traverser.split()`. > The following steps needed the labels to be added manually. > `AggregateStep` > `CollectingBarrierStep` > `FilterStep` > `SideEffectStep` > `StartStep` > `NoOpBarrierStep` > Further seeing as `Step.getLabels()` already returns a unmodifiable > collection and > `ImmutablePath` is well immutable there is no need for it to have its own > copy > of the labels. I set the labels directly on the path as oppose to making > a copy. > `TinkerGraphProcessComputerTest` passes. > `TinkerGraphProcessStandardTest` passes. > `TinkerGraphStructureStandardTest` has one failure. > `SerializationTest$GryoTest.shouldSerializePathAsDetached` fails with > `java.lang.IllegalArgumentException: Class is not registered: > java.util.Collections$UnmodifiableSet` > Let me know what you think. > Thanks > Pieter > Excerpts from Ted Wilmes's message of August 7, 2016 12:16 : >> Neat find Pieter. With regards to the update to ImmutablePath, I think >> that defensive copying of inbound collections is generally a good idea >> but >> if we can target specific areas where we can reap big gains from not >> creating new collections it may be worth it to relax that constraint, >> especially if we're only talking about internal APIs. If we do relax >> that >> constraint though, we'll need to careful during code review and unit >> testing because those bugs can be difficult to track down. The other >> nice >> thing that the defensive copy gets us, particularly in the case of the >> ImmutablePath, is that it isolates ImmutablePath from whatever the >> subclass >> of set was that the caller passed in. I think that's what is causing the >> serialization test failure in this case since the caller passed in an >> unmodifiable set. >> --Ted >> On Fri, Aug 5, 2016, 2:31 PM Marko Rodriguez >> wrote: >>> Hello, >>> This is cool. Check out also ImmutablePath.extend(labels) as that is >>> ultimately what Traverser.addLabel
Re: path query optimization
Hi, Yes I am but afraid I do not have time at present to concentrate on it. I just noticed your ImmutablePath ticket which will overlap with some of what I have done. I'd suggest to pull my branch and look at what I did there. It was very little, but dangerous code, which is why I was reluctant to submit a PR at first. If you don't continue with it, I should in 2 or 3 weeks be able to look at it again. Thanks Pieter On 01/11/2016 15:11, Marko Rodriguez wrote: > Hi Pieter, > > I’m still really interested in your work in this area. Are you still doing > this? > > Marko. > > http://markorodriguez.com > > > >> On Aug 7, 2016, at 9:12 AM, Pieter Martin wrote: >> >> To avoid the collection logic alltogether. For most steps there is no need to >> check the labels as it is known that they are added, immutable and correct. >> >> Also with the current strategy the `ImmutablePath.currentLabels` is exactly >> the >> same collection as that of the step. It is not a copy. >> >> `Traverser.addLabels()` is only called for the steps previously mentioned. >> They >> too can be optimized as there is no need to create a new `ImmutablePath` >> just to >> set the labels. There could be a `Traverer.split()` method that creates the >> initial `ImmutablePath` with the correct labels to start with. As things >> stand >> now the `ImmutablePath` is created just to be replaced 1 or 2 millis later. I >> have however not benchmarked any of those queries so am not touching that at >> the moment. >> >> In some ways its a matter of style. The label logic in `AbstractStep` is not >> relevant to all of its subclasses and should be higher in the inheritance >> hierarchy. >> >> Cheers >> Pieter >> >> >> Excerpts from Marko Rodriguez's message of August 7, 2016 3:54 : >>> Why not just check to see if the labels to be added already exist, if they >>> do, don’t addLabels() and thus, don’t create a new collection. >>> Marko. >>> http://markorodriguez.com On Aug 7, 2016, at 6:07 AM, Pieter Martin wrote: Here is what I have come up with so far. The idea is that `Traverser.split(r, step)` already copies the labels to the traverser so there is no need to call `Traverser.addLabels(labels)` again. I removed the `Traverser.addLabels(labels)` call from `AbstractStep`. For the traversers that do not call `Traverer.split(r, step)` I manually added the `traverser.addLabels(labels)` call in `processNextStart()`. This was done by fixing test failures rather than searching and investigating all calls to `Traverser.split()`. The following steps needed the labels to be added manually. `AggregateStep` `CollectingBarrierStep` `FilterStep` `SideEffectStep` `StartStep` `NoOpBarrierStep` Further seeing as `Step.getLabels()` already returns a unmodifiable collection and `ImmutablePath` is well immutable there is no need for it to have its own copy of the labels. I set the labels directly on the path as oppose to making a copy. `TinkerGraphProcessComputerTest` passes. `TinkerGraphProcessStandardTest` passes. `TinkerGraphStructureStandardTest` has one failure. `SerializationTest$GryoTest.shouldSerializePathAsDetached` fails with `java.lang.IllegalArgumentException: Class is not registered: java.util.Collections$UnmodifiableSet` Let me know what you think. Thanks Pieter Excerpts from Ted Wilmes's message of August 7, 2016 12:16 : > Neat find Pieter. With regards to the update to ImmutablePath, I think > that defensive copying of inbound collections is generally a good idea but > if we can target specific areas where we can reap big gains from not > creating new collections it may be worth it to relax that constraint, > especially if we're only talking about internal APIs. If we do relax that > constraint though, we'll need to careful during code review and unit > testing because those bugs can be difficult to track down. The other nice > thing that the defensive copy gets us, particularly in the case of the > ImmutablePath, is that it isolates ImmutablePath from whatever the > subclass > of set was that the caller passed in. I think that's what is causing the > serialization test failure in this case since the caller passed in an > unmodifiable set. > --Ted > On Fri, Aug 5, 2016, 2:31 PM Marko Rodriguez wrote: >> Hello, >> This is cool. Check out also ImmutablePath.extend(labels) as that is >> ultimately what Traverser.addLabels() calls. We have a lot of set copying >> and I don’t know if its needed (as you seem to be demonstrating). What I >> don’t like about your solution is the explicit reference to the >> B_L_P…Traverser in AbstractStep. See if you can work your solution >> without >> it. >> Good luck, >> Marko. >> http://markorodriguez.com >>> On Aug 5, 2016
Re: path query optimization
Hi Pieter, I’m still really interested in your work in this area. Are you still doing this? Marko. http://markorodriguez.com > On Aug 7, 2016, at 9:12 AM, Pieter Martin wrote: > > To avoid the collection logic alltogether. For most steps there is no need to > check the labels as it is known that they are added, immutable and correct. > > Also with the current strategy the `ImmutablePath.currentLabels` is exactly > the > same collection as that of the step. It is not a copy. > > `Traverser.addLabels()` is only called for the steps previously mentioned. > They > too can be optimized as there is no need to create a new `ImmutablePath` just > to > set the labels. There could be a `Traverer.split()` method that creates the > initial `ImmutablePath` with the correct labels to start with. As things stand > now the `ImmutablePath` is created just to be replaced 1 or 2 millis later. I > have however not benchmarked any of those queries so am not touching that at > the moment. > > In some ways its a matter of style. The label logic in `AbstractStep` is not > relevant to all of its subclasses and should be higher in the inheritance > hierarchy. > > Cheers > Pieter > > > Excerpts from Marko Rodriguez's message of August 7, 2016 3:54 : >> Why not just check to see if the labels to be added already exist, if they >> do, don’t addLabels() and thus, don’t create a new collection. >> Marko. >> http://markorodriguez.com >>> On Aug 7, 2016, at 6:07 AM, Pieter Martin wrote: >>> Here is what I have come up with so far. >>> The idea is that `Traverser.split(r, step)` already copies the labels to the >>> traverser so there is no need to call `Traverser.addLabels(labels)` again. >>> I removed the `Traverser.addLabels(labels)` call from `AbstractStep`. >>> For the traversers that do not call `Traverer.split(r, step)` I manually >>> added >>> the `traverser.addLabels(labels)` call in `processNextStart()`. This was >>> done >>> by fixing test failures rather than searching and investigating all calls to >>> `Traverser.split()`. >>> The following steps needed the labels to be added manually. >>> `AggregateStep` >>> `CollectingBarrierStep` >>> `FilterStep` >>> `SideEffectStep` >>> `StartStep` >>> `NoOpBarrierStep` >>> Further seeing as `Step.getLabels()` already returns a unmodifiable >>> collection and >>> `ImmutablePath` is well immutable there is no need for it to have its own >>> copy >>> of the labels. I set the labels directly on the path as oppose to making a >>> copy. >>> `TinkerGraphProcessComputerTest` passes. >>> `TinkerGraphProcessStandardTest` passes. >>> `TinkerGraphStructureStandardTest` has one failure. >>> `SerializationTest$GryoTest.shouldSerializePathAsDetached` fails with >>> `java.lang.IllegalArgumentException: Class is not registered: >>> java.util.Collections$UnmodifiableSet` >>> Let me know what you think. >>> Thanks >>> Pieter >>> Excerpts from Ted Wilmes's message of August 7, 2016 12:16 : Neat find Pieter. With regards to the update to ImmutablePath, I think that defensive copying of inbound collections is generally a good idea but if we can target specific areas where we can reap big gains from not creating new collections it may be worth it to relax that constraint, especially if we're only talking about internal APIs. If we do relax that constraint though, we'll need to careful during code review and unit testing because those bugs can be difficult to track down. The other nice thing that the defensive copy gets us, particularly in the case of the ImmutablePath, is that it isolates ImmutablePath from whatever the subclass of set was that the caller passed in. I think that's what is causing the serialization test failure in this case since the caller passed in an unmodifiable set. --Ted On Fri, Aug 5, 2016, 2:31 PM Marko Rodriguez wrote: > Hello, > This is cool. Check out also ImmutablePath.extend(labels) as that is > ultimately what Traverser.addLabels() calls. We have a lot of set copying > and I don’t know if its needed (as you seem to be demonstrating). What I > don’t like about your solution is the explicit reference to the > B_L_P…Traverser in AbstractStep. See if you can work your solution without > it. > Good luck, > Marko. > http://markorodriguez.com > > On Aug 5, 2016, at 12:44 PM, pieter-gmail > wrote: > > > > Sorry forgot to add a rather important part. > > > > I changed ImmutablePath's constructor to > > > >private ImmutablePath(final ImmutablePathImpl previousPath, final > > Object currentObject, final Set currentLabels) { > >this.previousPath = previousPath; > >this.currentObject = currentObject; > >this.currentLabels = currentLabels; > > //this.currentLabels.addAll(currentLabels); > >} > > > > Setting the collection directly as oppose to `addAll`
Re: path query optimization
To avoid the collection logic alltogether. For most steps there is no need to check the labels as it is known that they are added, immutable and correct. Also with the current strategy the `ImmutablePath.currentLabels` is exactly the same collection as that of the step. It is not a copy. `Traverser.addLabels()` is only called for the steps previously mentioned. They too can be optimized as there is no need to create a new `ImmutablePath` just to set the labels. There could be a `Traverer.split()` method that creates the initial `ImmutablePath` with the correct labels to start with. As things stand now the `ImmutablePath` is created just to be replaced 1 or 2 millis later. I have however not benchmarked any of those queries so am not touching that at the moment. In some ways its a matter of style. The label logic in `AbstractStep` is not relevant to all of its subclasses and should be higher in the inheritance hierarchy. Cheers Pieter Excerpts from Marko Rodriguez's message of August 7, 2016 3:54 : Why not just check to see if the labels to be added already exist, if they do, don’t addLabels() and thus, don’t create a new collection. Marko. http://markorodriguez.com On Aug 7, 2016, at 6:07 AM, Pieter Martin wrote: Here is what I have come up with so far. The idea is that `Traverser.split(r, step)` already copies the labels to the traverser so there is no need to call `Traverser.addLabels(labels)` again. I removed the `Traverser.addLabels(labels)` call from `AbstractStep`. For the traversers that do not call `Traverer.split(r, step)` I manually added the `traverser.addLabels(labels)` call in `processNextStart()`. This was done by fixing test failures rather than searching and investigating all calls to `Traverser.split()`. The following steps needed the labels to be added manually. `AggregateStep` `CollectingBarrierStep` `FilterStep` `SideEffectStep` `StartStep` `NoOpBarrierStep` Further seeing as `Step.getLabels()` already returns a unmodifiable collection and `ImmutablePath` is well immutable there is no need for it to have its own copy of the labels. I set the labels directly on the path as oppose to making a copy. `TinkerGraphProcessComputerTest` passes. `TinkerGraphProcessStandardTest` passes. `TinkerGraphStructureStandardTest` has one failure. `SerializationTest$GryoTest.shouldSerializePathAsDetached` fails with `java.lang.IllegalArgumentException: Class is not registered: java.util.Collections$UnmodifiableSet` Let me know what you think. Thanks Pieter Excerpts from Ted Wilmes's message of August 7, 2016 12:16 : Neat find Pieter. With regards to the update to ImmutablePath, I think that defensive copying of inbound collections is generally a good idea but if we can target specific areas where we can reap big gains from not creating new collections it may be worth it to relax that constraint, especially if we're only talking about internal APIs. If we do relax that constraint though, we'll need to careful during code review and unit testing because those bugs can be difficult to track down. The other nice thing that the defensive copy gets us, particularly in the case of the ImmutablePath, is that it isolates ImmutablePath from whatever the subclass of set was that the caller passed in. I think that's what is causing the serialization test failure in this case since the caller passed in an unmodifiable set. --Ted On Fri, Aug 5, 2016, 2:31 PM Marko Rodriguez wrote: Hello, This is cool. Check out also ImmutablePath.extend(labels) as that is ultimately what Traverser.addLabels() calls. We have a lot of set copying and I don’t know if its needed (as you seem to be demonstrating). What I don’t like about your solution is the explicit reference to the B_L_P…Traverser in AbstractStep. See if you can work your solution without it. Good luck, Marko. http://markorodriguez.com > On Aug 5, 2016, at 12:44 PM, pieter-gmail wrote: > > Sorry forgot to add a rather important part. > > I changed ImmutablePath's constructor to > >private ImmutablePath(final ImmutablePathImpl previousPath, final > Object currentObject, final Set currentLabels) { >this.previousPath = previousPath; >this.currentObject = currentObject; >this.currentLabels = currentLabels; > //this.currentLabels.addAll(currentLabels); >} > > Setting the collection directly as oppose to `addAll` > > Thanks > Pieter > > > On 05/08/2016 20:40, pieter-gmail wrote: >> Hi, >> >> I have been optimizing Sqlg of late and eventually arrived at TinkerPop >> code. >> >> The gremlin in particular that I am interested is path queries. >> >> Here is the test that I am running in jmh. >> >>//@Setup >>Vertex a = graph.addVertex(T.label, "A", "name", "a1"); >>for (int i = 1; i < 1_000_001; i++) { >>Vertex b = graph.addVertex(T.label, "B", "name", "name_" + i); >>a.addEdge("outB", b); >>for (int j = 0; j < 1; j++) { >>Verte
Re: path query optimization
Why not just check to see if the labels to be added already exist, if they do, don’t addLabels() and thus, don’t create a new collection. Marko. http://markorodriguez.com > On Aug 7, 2016, at 6:07 AM, Pieter Martin wrote: > > Here is what I have come up with so far. > > The idea is that `Traverser.split(r, step)` already copies the labels to the > traverser so there is no need to call `Traverser.addLabels(labels)` again. > > I removed the `Traverser.addLabels(labels)` call from `AbstractStep`. > For the traversers that do not call `Traverer.split(r, step)` I manually added > the `traverser.addLabels(labels)` call in `processNextStart()`. This was done > by fixing test failures rather than searching and investigating all calls to > `Traverser.split()`. > > The following steps needed the labels to be added manually. > > `AggregateStep` > `CollectingBarrierStep` > `FilterStep` > `SideEffectStep` > `StartStep` > `NoOpBarrierStep` > > Further seeing as `Step.getLabels()` already returns a unmodifiable > collection and > `ImmutablePath` is well immutable there is no need for it to have its own copy > of the labels. I set the labels directly on the path as oppose to making a > copy. > > `TinkerGraphProcessComputerTest` passes. > `TinkerGraphProcessStandardTest` passes. > `TinkerGraphStructureStandardTest` has one failure. > `SerializationTest$GryoTest.shouldSerializePathAsDetached` fails with > `java.lang.IllegalArgumentException: Class is not registered: > java.util.Collections$UnmodifiableSet` > > Let me know what you think. > > Thanks > Pieter > > > Excerpts from Ted Wilmes's message of August 7, 2016 12:16 : >> Neat find Pieter. With regards to the update to ImmutablePath, I think >> that defensive copying of inbound collections is generally a good idea but >> if we can target specific areas where we can reap big gains from not >> creating new collections it may be worth it to relax that constraint, >> especially if we're only talking about internal APIs. If we do relax that >> constraint though, we'll need to careful during code review and unit >> testing because those bugs can be difficult to track down. The other nice >> thing that the defensive copy gets us, particularly in the case of the >> ImmutablePath, is that it isolates ImmutablePath from whatever the subclass >> of set was that the caller passed in. I think that's what is causing the >> serialization test failure in this case since the caller passed in an >> unmodifiable set. >> --Ted >> On Fri, Aug 5, 2016, 2:31 PM Marko Rodriguez wrote: >>> Hello, >>> >>> This is cool. Check out also ImmutablePath.extend(labels) as that is >>> ultimately what Traverser.addLabels() calls. We have a lot of set copying >>> and I don’t know if its needed (as you seem to be demonstrating). What I >>> don’t like about your solution is the explicit reference to the >>> B_L_P…Traverser in AbstractStep. See if you can work your solution without >>> it. >>> >>> Good luck, >>> Marko. >>> >>> http://markorodriguez.com >>> >>> >>> >>> > On Aug 5, 2016, at 12:44 PM, pieter-gmail >>> wrote: >>> > >>> > Sorry forgot to add a rather important part. >>> > >>> > I changed ImmutablePath's constructor to >>> > >>> >private ImmutablePath(final ImmutablePathImpl previousPath, final >>> > Object currentObject, final Set currentLabels) { >>> >this.previousPath = previousPath; >>> >this.currentObject = currentObject; >>> >this.currentLabels = currentLabels; >>> > //this.currentLabels.addAll(currentLabels); >>> >} >>> > >>> > Setting the collection directly as oppose to `addAll` >>> > >>> > Thanks >>> > Pieter >>> > >>> > >>> > On 05/08/2016 20:40, pieter-gmail wrote: >>> >> Hi, >>> >> >>> >> I have been optimizing Sqlg of late and eventually arrived at TinkerPop >>> >> code. >>> >> >>> >> The gremlin in particular that I am interested is path queries. >>> >> >>> >> Here is the test that I am running in jmh. >>> >> >>> >>//@Setup >>> >>Vertex a = graph.addVertex(T.label, "A", "name", "a1"); >>> >>for (int i = 1; i < 1_000_001; i++) { >>> >>Vertex b = graph.addVertex(T.label, "B", "name", "name_" + >>> i); >>> >>a.addEdge("outB", b); >>> >>for (int j = 0; j < 1; j++) { >>> >>Vertex c = graph.addVertex(T.label, "C", "name", "name_" >>> >> + i + " " + j); >>> >>b.addEdge("outC", c); >>> >>} >>> >>} >>> >> >>> >> And the query being benchmarked is >>> >> >>> >>GraphTraversal traversal = >>> >> g.V(a).as("a").out().as("b").out().as("c").path(); >>> >>while (traversal.hasNext()) { >>> >>Path path = traversal.next(); >>> >>} >>> >> >>> >> Before the optimization, (as things are now) >>> >> >>> >> Benchmark Mode Cnt Score Error >>> Units >>> >> GremlinPathBenchmark.g_path avgt 100 1.086 ± 0.020 s/op >>> >> >>> >> The optimization I did is in A
Re: path query optimization
Here is what I have come up with so far. The idea is that `Traverser.split(r, step)` already copies the labels to the traverser so there is no need to call `Traverser.addLabels(labels)` again. I removed the `Traverser.addLabels(labels)` call from `AbstractStep`. For the traversers that do not call `Traverer.split(r, step)` I manually added the `traverser.addLabels(labels)` call in `processNextStart()`. This was done by fixing test failures rather than searching and investigating all calls to `Traverser.split()`. The following steps needed the labels to be added manually. `AggregateStep` `CollectingBarrierStep` `FilterStep` `SideEffectStep` `StartStep` `NoOpBarrierStep` Further seeing as `Step.getLabels()` already returns a unmodifiable collection and `ImmutablePath` is well immutable there is no need for it to have its own copy of the labels. I set the labels directly on the path as oppose to making a copy. `TinkerGraphProcessComputerTest` passes. `TinkerGraphProcessStandardTest` passes. `TinkerGraphStructureStandardTest` has one failure. `SerializationTest$GryoTest.shouldSerializePathAsDetached` fails with `java.lang.IllegalArgumentException: Class is not registered: java.util.Collections$UnmodifiableSet` Let me know what you think. Thanks Pieter Excerpts from Ted Wilmes's message of August 7, 2016 12:16 : Neat find Pieter. With regards to the update to ImmutablePath, I think that defensive copying of inbound collections is generally a good idea but if we can target specific areas where we can reap big gains from not creating new collections it may be worth it to relax that constraint, especially if we're only talking about internal APIs. If we do relax that constraint though, we'll need to careful during code review and unit testing because those bugs can be difficult to track down. The other nice thing that the defensive copy gets us, particularly in the case of the ImmutablePath, is that it isolates ImmutablePath from whatever the subclass of set was that the caller passed in. I think that's what is causing the serialization test failure in this case since the caller passed in an unmodifiable set. --Ted On Fri, Aug 5, 2016, 2:31 PM Marko Rodriguez wrote: Hello, This is cool. Check out also ImmutablePath.extend(labels) as that is ultimately what Traverser.addLabels() calls. We have a lot of set copying and I don’t know if its needed (as you seem to be demonstrating). What I don’t like about your solution is the explicit reference to the B_L_P…Traverser in AbstractStep. See if you can work your solution without it. Good luck, Marko. http://markorodriguez.com > On Aug 5, 2016, at 12:44 PM, pieter-gmail wrote: > > Sorry forgot to add a rather important part. > > I changed ImmutablePath's constructor to > >private ImmutablePath(final ImmutablePathImpl previousPath, final > Object currentObject, final Set currentLabels) { >this.previousPath = previousPath; >this.currentObject = currentObject; >this.currentLabels = currentLabels; > //this.currentLabels.addAll(currentLabels); >} > > Setting the collection directly as oppose to `addAll` > > Thanks > Pieter > > > On 05/08/2016 20:40, pieter-gmail wrote: >> Hi, >> >> I have been optimizing Sqlg of late and eventually arrived at TinkerPop >> code. >> >> The gremlin in particular that I am interested is path queries. >> >> Here is the test that I am running in jmh. >> >>//@Setup >>Vertex a = graph.addVertex(T.label, "A", "name", "a1"); >>for (int i = 1; i < 1_000_001; i++) { >>Vertex b = graph.addVertex(T.label, "B", "name", "name_" + i); >>a.addEdge("outB", b); >>for (int j = 0; j < 1; j++) { >>Vertex c = graph.addVertex(T.label, "C", "name", "name_" >> + i + " " + j); >>b.addEdge("outC", c); >>} >>} >> >> And the query being benchmarked is >> >>GraphTraversal traversal = >> g.V(a).as("a").out().as("b").out().as("c").path(); >>while (traversal.hasNext()) { >>Path path = traversal.next(); >>} >> >> Before the optimization, (as things are now) >> >> Benchmark Mode Cnt Score Error Units >> GremlinPathBenchmark.g_path avgt 100 1.086 ± 0.020 s/op >> >> The optimization I did is in AbstractStep.prepareTraversalForNextStep, >> to not call addLabels() for path gremlins as the labels are known by the >> step and do not change again so there is not need to keep adding them. >> >>private final Traverser.Admin prepareTraversalForNextStep(final >> Traverser.Admin traverser) { >>if (!this.traverserStepIdAndLabelsSetByChild) { >>traverser.setStepId(this.nextStep.getId()); >>if (traverser instanceof B_LP_O_P_S_SE_SL_Traverser) { >>} else { >>traverser.addLabels(this.labels); >>} >>} >>return traverser; >>} >> >> After optimization, >>
Re: path query optimization
Neat find Pieter. With regards to the update to ImmutablePath, I think that defensive copying of inbound collections is generally a good idea but if we can target specific areas where we can reap big gains from not creating new collections it may be worth it to relax that constraint, especially if we're only talking about internal APIs. If we do relax that constraint though, we'll need to careful during code review and unit testing because those bugs can be difficult to track down. The other nice thing that the defensive copy gets us, particularly in the case of the ImmutablePath, is that it isolates ImmutablePath from whatever the subclass of set was that the caller passed in. I think that's what is causing the serialization test failure in this case since the caller passed in an unmodifiable set. --Ted On Fri, Aug 5, 2016, 2:31 PM Marko Rodriguez wrote: > Hello, > > This is cool. Check out also ImmutablePath.extend(labels) as that is > ultimately what Traverser.addLabels() calls. We have a lot of set copying > and I don’t know if its needed (as you seem to be demonstrating). What I > don’t like about your solution is the explicit reference to the > B_L_P…Traverser in AbstractStep. See if you can work your solution without > it. > > Good luck, > Marko. > > http://markorodriguez.com > > > > > On Aug 5, 2016, at 12:44 PM, pieter-gmail > wrote: > > > > Sorry forgot to add a rather important part. > > > > I changed ImmutablePath's constructor to > > > >private ImmutablePath(final ImmutablePathImpl previousPath, final > > Object currentObject, final Set currentLabels) { > >this.previousPath = previousPath; > >this.currentObject = currentObject; > >this.currentLabels = currentLabels; > > //this.currentLabels.addAll(currentLabels); > >} > > > > Setting the collection directly as oppose to `addAll` > > > > Thanks > > Pieter > > > > > > On 05/08/2016 20:40, pieter-gmail wrote: > >> Hi, > >> > >> I have been optimizing Sqlg of late and eventually arrived at TinkerPop > >> code. > >> > >> The gremlin in particular that I am interested is path queries. > >> > >> Here is the test that I am running in jmh. > >> > >>//@Setup > >>Vertex a = graph.addVertex(T.label, "A", "name", "a1"); > >>for (int i = 1; i < 1_000_001; i++) { > >>Vertex b = graph.addVertex(T.label, "B", "name", "name_" + > i); > >>a.addEdge("outB", b); > >>for (int j = 0; j < 1; j++) { > >>Vertex c = graph.addVertex(T.label, "C", "name", "name_" > >> + i + " " + j); > >>b.addEdge("outC", c); > >>} > >>} > >> > >> And the query being benchmarked is > >> > >>GraphTraversal traversal = > >> g.V(a).as("a").out().as("b").out().as("c").path(); > >>while (traversal.hasNext()) { > >>Path path = traversal.next(); > >>} > >> > >> Before the optimization, (as things are now) > >> > >> Benchmark Mode Cnt Score Error > Units > >> GremlinPathBenchmark.g_path avgt 100 1.086 ± 0.020 s/op > >> > >> The optimization I did is in AbstractStep.prepareTraversalForNextStep, > >> to not call addLabels() for path gremlins as the labels are known by the > >> step and do not change again so there is not need to keep adding them. > >> > >>private final Traverser.Admin prepareTraversalForNextStep(final > >> Traverser.Admin traverser) { > >>if (!this.traverserStepIdAndLabelsSetByChild) { > >>traverser.setStepId(this.nextStep.getId()); > >>if (traverser instanceof B_LP_O_P_S_SE_SL_Traverser) { > >>} else { > >>traverser.addLabels(this.labels); > >>} > >>} > >>return traverser; > >>} > >> > >> After optimization, > >> > >> Benchmark Mode Cnt Score Error > Units > >> GremlinPathBenchmark.g_path avgt 100 0.680 ± 0.004 s/op > >> > >> 1.086 vs 0.689 seconds for the traversal. > >> > >> I ran the Structured and Process test suites. 2 tests are failing with > >> this optimization. > >> > >> InjectTest.g_VX1X_out_name_injectXdanielX_asXaX_mapXlengthX_path fails > with > >> > >> "java.lang.IllegalArgumentException: The step with label a does not > exist" > >> > >> and > >> > >> SerializationTest.shouldSerializePathAsDetached fails with > >> > >> "Caused by: java.lang.IllegalArgumentException: Class is not > registered: > >> java.util.Collections$UnmodifiableSet" > >> > >> Before investigating the failures is this optimization worth pursuing? > >> > >> Thanks > >> Pieter > >> > > > >
Re: path query optimization
Hello, This is cool. Check out also ImmutablePath.extend(labels) as that is ultimately what Traverser.addLabels() calls. We have a lot of set copying and I don’t know if its needed (as you seem to be demonstrating). What I don’t like about your solution is the explicit reference to the B_L_P…Traverser in AbstractStep. See if you can work your solution without it. Good luck, Marko. http://markorodriguez.com > On Aug 5, 2016, at 12:44 PM, pieter-gmail wrote: > > Sorry forgot to add a rather important part. > > I changed ImmutablePath's constructor to > >private ImmutablePath(final ImmutablePathImpl previousPath, final > Object currentObject, final Set currentLabels) { >this.previousPath = previousPath; >this.currentObject = currentObject; >this.currentLabels = currentLabels; > //this.currentLabels.addAll(currentLabels); >} > > Setting the collection directly as oppose to `addAll` > > Thanks > Pieter > > > On 05/08/2016 20:40, pieter-gmail wrote: >> Hi, >> >> I have been optimizing Sqlg of late and eventually arrived at TinkerPop >> code. >> >> The gremlin in particular that I am interested is path queries. >> >> Here is the test that I am running in jmh. >> >>//@Setup >>Vertex a = graph.addVertex(T.label, "A", "name", "a1"); >>for (int i = 1; i < 1_000_001; i++) { >>Vertex b = graph.addVertex(T.label, "B", "name", "name_" + i); >>a.addEdge("outB", b); >>for (int j = 0; j < 1; j++) { >>Vertex c = graph.addVertex(T.label, "C", "name", "name_" >> + i + " " + j); >>b.addEdge("outC", c); >>} >>} >> >> And the query being benchmarked is >> >>GraphTraversal traversal = >> g.V(a).as("a").out().as("b").out().as("c").path(); >>while (traversal.hasNext()) { >>Path path = traversal.next(); >>} >> >> Before the optimization, (as things are now) >> >> Benchmark Mode Cnt Score Error Units >> GremlinPathBenchmark.g_path avgt 100 1.086 ± 0.020 s/op >> >> The optimization I did is in AbstractStep.prepareTraversalForNextStep, >> to not call addLabels() for path gremlins as the labels are known by the >> step and do not change again so there is not need to keep adding them. >> >>private final Traverser.Admin prepareTraversalForNextStep(final >> Traverser.Admin traverser) { >>if (!this.traverserStepIdAndLabelsSetByChild) { >>traverser.setStepId(this.nextStep.getId()); >>if (traverser instanceof B_LP_O_P_S_SE_SL_Traverser) { >>} else { >>traverser.addLabels(this.labels); >>} >>} >>return traverser; >>} >> >> After optimization, >> >> Benchmark Mode Cnt Score Error Units >> GremlinPathBenchmark.g_path avgt 100 0.680 ± 0.004 s/op >> >> 1.086 vs 0.689 seconds for the traversal. >> >> I ran the Structured and Process test suites. 2 tests are failing with >> this optimization. >> >> InjectTest.g_VX1X_out_name_injectXdanielX_asXaX_mapXlengthX_path fails with >> >> "java.lang.IllegalArgumentException: The step with label a does not exist" >> >> and >> >> SerializationTest.shouldSerializePathAsDetached fails with >> >> "Caused by: java.lang.IllegalArgumentException: Class is not registered: >> java.util.Collections$UnmodifiableSet" >> >> Before investigating the failures is this optimization worth pursuing? >> >> Thanks >> Pieter >> >
Re: path query optimization
Sorry forgot to add a rather important part. I changed ImmutablePath's constructor to private ImmutablePath(final ImmutablePathImpl previousPath, final Object currentObject, final Set currentLabels) { this.previousPath = previousPath; this.currentObject = currentObject; this.currentLabels = currentLabels; //this.currentLabels.addAll(currentLabels); } Setting the collection directly as oppose to `addAll` Thanks Pieter On 05/08/2016 20:40, pieter-gmail wrote: > Hi, > > I have been optimizing Sqlg of late and eventually arrived at TinkerPop > code. > > The gremlin in particular that I am interested is path queries. > > Here is the test that I am running in jmh. > > //@Setup > Vertex a = graph.addVertex(T.label, "A", "name", "a1"); > for (int i = 1; i < 1_000_001; i++) { > Vertex b = graph.addVertex(T.label, "B", "name", "name_" + i); > a.addEdge("outB", b); > for (int j = 0; j < 1; j++) { > Vertex c = graph.addVertex(T.label, "C", "name", "name_" > + i + " " + j); > b.addEdge("outC", c); > } > } > > And the query being benchmarked is > > GraphTraversal traversal = > g.V(a).as("a").out().as("b").out().as("c").path(); > while (traversal.hasNext()) { > Path path = traversal.next(); > } > > Before the optimization, (as things are now) > > Benchmark Mode Cnt Score Error Units > GremlinPathBenchmark.g_path avgt 100 1.086 ± 0.020 s/op > > The optimization I did is in AbstractStep.prepareTraversalForNextStep, > to not call addLabels() for path gremlins as the labels are known by the > step and do not change again so there is not need to keep adding them. > > private final Traverser.Admin prepareTraversalForNextStep(final > Traverser.Admin traverser) { > if (!this.traverserStepIdAndLabelsSetByChild) { > traverser.setStepId(this.nextStep.getId()); > if (traverser instanceof B_LP_O_P_S_SE_SL_Traverser) { > } else { > traverser.addLabels(this.labels); > } > } > return traverser; > } > > After optimization, > > Benchmark Mode Cnt Score Error Units > GremlinPathBenchmark.g_path avgt 100 0.680 ± 0.004 s/op > > 1.086 vs 0.689 seconds for the traversal. > > I ran the Structured and Process test suites. 2 tests are failing with > this optimization. > > InjectTest.g_VX1X_out_name_injectXdanielX_asXaX_mapXlengthX_path fails with > > "java.lang.IllegalArgumentException: The step with label a does not exist" > > and > > SerializationTest.shouldSerializePathAsDetached fails with > > "Caused by: java.lang.IllegalArgumentException: Class is not registered: > java.util.Collections$UnmodifiableSet" > > Before investigating the failures is this optimization worth pursuing? > > Thanks > Pieter >
RE: path query optimization
Hi, I have been optimizing Sqlg of late and eventually arrived at TinkerPop code. The gremlin in particular that I am interested is path queries. Here is the test that I am running in jmh. //@Setup Vertex a = graph.addVertex(T.label, "A", "name", "a1"); for (int i = 1; i < 1_000_001; i++) { Vertex b = graph.addVertex(T.label, "B", "name", "name_" + i); a.addEdge("outB", b); for (int j = 0; j < 1; j++) { Vertex c = graph.addVertex(T.label, "C", "name", "name_" + i + " " + j); b.addEdge("outC", c); } } And the query being benchmarked is GraphTraversal traversal = g.V(a).as("a").out().as("b").out().as("c").path(); while (traversal.hasNext()) { Path path = traversal.next(); } Before the optimization, (as things are now) Benchmark Mode Cnt Score Error Units GremlinPathBenchmark.g_path avgt 100 1.086 ± 0.020 s/op The optimization I did is in AbstractStep.prepareTraversalForNextStep, to not call addLabels() for path gremlins as the labels are known by the step and do not change again so there is not need to keep adding them. private final Traverser.Admin prepareTraversalForNextStep(final Traverser.Admin traverser) { if (!this.traverserStepIdAndLabelsSetByChild) { traverser.setStepId(this.nextStep.getId()); if (traverser instanceof B_LP_O_P_S_SE_SL_Traverser) { } else { traverser.addLabels(this.labels); } } return traverser; } After optimization, Benchmark Mode Cnt Score Error Units GremlinPathBenchmark.g_path avgt 100 0.680 ± 0.004 s/op 1.086 vs 0.689 seconds for the traversal. I ran the Structured and Process test suites. 2 tests are failing with this optimization. InjectTest.g_VX1X_out_name_injectXdanielX_asXaX_mapXlengthX_path fails with "java.lang.IllegalArgumentException: The step with label a does not exist" and SerializationTest.shouldSerializePathAsDetached fails with "Caused by: java.lang.IllegalArgumentException: Class is not registered: java.util.Collections$UnmodifiableSet" Before investigating the failures is this optimization worth pursuing? Thanks Pieter