Hi,
So the one thing I just copied into TINKERPOP-1372 was your
ImmutablePath.currentLabels = currentLabels (vs. copying the set). You are
right, step labels are immutable and thus, there is no need to copy the set
over — a direct reference is sufficient. Regarding the
AbstractStep.traverserStepIdAndLabelsSetByChild changes — this isn’t so
important as this is only used in OLAP and thus, should be negligible relative
to what else is going on in OLAP. However, to be good, I added the following:
@Override
public Path extend(final Set<String> labels) {
if (labels.isEmpty())
return this;
else {
final Set<String> newLabels = new LinkedHashSet<>();
newLabels.addAll(this.currentLabels);
newLabels.addAll(labels);
return new ImmutablePath(this.previousPath, this.currentObject,
newLabels);
}
}
Thus, if the labels are empty, don’t clone.
This work along with the other recursion work in TINKERPOP-1372 provided a
solid speed up. I’ll present benchmark results in the subsequent PR.
Thanks Pieter,
Marko.
http://markorodriguez.com
> On Nov 1, 2016, at 7:43 AM, pieter-gmail <[email protected]> wrote:
>
> The branch is TINKERPOP-1404
> https://github.com/apache/tinkerpop/commit/c1556fe82c58527dc4425d23d1d69ce324e62cfa
>
> Cheers
> Pieter
>
> On 01/11/2016 15:23, Marko Rodriguez wrote:
>> What branch are you in? Perhaps give me URLs (to GitHub) to the files
>> touched? (if its not too many)
>>
>> Marko.
>>
>> http://markorodriguez.com
>>
>>
>>
>>> On Nov 1, 2016, at 7:19 AM, pieter-gmail <[email protected]> wrote:
>>>
>>> Hi,
>>>
>>> Yes I am but afraid I do not have time at present to concentrate on it.
>>> I just noticed your ImmutablePath ticket which will overlap with some of
>>> what I have done.
>>>
>>> I'd suggest to pull my branch and look at what I did there. It was very
>>> little, but dangerous code, which is why I was reluctant to submit a PR
>>> at first. If you don't continue with it, I should in 2 or 3 weeks be
>>> able to look at it again.
>>>
>>> Thanks
>>> Pieter
>>>
>>>
>>> On 01/11/2016 15:11, Marko Rodriguez wrote:
>>>> Hi Pieter,
>>>>
>>>> I’m still really interested in your work in this area. Are you still doing
>>>> this?
>>>>
>>>> Marko.
>>>>
>>>> http://markorodriguez.com
>>>>
>>>>
>>>>
>>>>> On Aug 7, 2016, at 9:12 AM, Pieter Martin <[email protected]> wrote:
>>>>>
>>>>> To avoid the collection logic alltogether. For most steps there is no
>>>>> need to
>>>>> check the labels as it is known that they are added, immutable and
>>>>> correct.
>>>>>
>>>>> Also with the current strategy the `ImmutablePath.currentLabels` is
>>>>> exactly the
>>>>> same collection as that of the step. It is not a copy.
>>>>>
>>>>> `Traverser.addLabels()` is only called for the steps previously
>>>>> mentioned. They
>>>>> too can be optimized as there is no need to create a new `ImmutablePath`
>>>>> just to
>>>>> set the labels. There could be a `Traverer.split()` method that creates
>>>>> the
>>>>> initial `ImmutablePath` with the correct labels to start with. As things
>>>>> stand
>>>>> now the `ImmutablePath` is created just to be replaced 1 or 2 millis
>>>>> later. I
>>>>> have however not benchmarked any of those queries so am not touching that
>>>>> at
>>>>> the moment.
>>>>>
>>>>> In some ways its a matter of style. The label logic in `AbstractStep` is
>>>>> not
>>>>> relevant to all of its subclasses and should be higher in the inheritance
>>>>> hierarchy.
>>>>>
>>>>> Cheers
>>>>> Pieter
>>>>>
>>>>>
>>>>> Excerpts from Marko Rodriguez's message of August 7, 2016 3:54 :
>>>>>> Why not just check to see if the labels to be added already exist, if
>>>>>> they do, don’t addLabels() and thus, don’t create a new collection.
>>>>>> Marko.
>>>>>> http://markorodriguez.com
>>>>>>> On Aug 7, 2016, at 6:07 AM, Pieter Martin <[email protected]>
>>>>>>> wrote:
>>>>>>> Here is what I have come up with so far.
>>>>>>> The idea is that `Traverser.split(r, step)` already copies the labels
>>>>>>> to the
>>>>>>> traverser so there is no need to call `Traverser.addLabels(labels)`
>>>>>>> again.
>>>>>>> I removed the `Traverser.addLabels(labels)` call from `AbstractStep`.
>>>>>>> For the traversers that do not call `Traverer.split(r, step)` I
>>>>>>> manually added
>>>>>>> the `traverser.addLabels(labels)` call in `processNextStart()`. This
>>>>>>> was done
>>>>>>> by fixing test failures rather than searching and investigating all
>>>>>>> calls to
>>>>>>> `Traverser.split()`.
>>>>>>> The following steps needed the labels to be added manually.
>>>>>>> `AggregateStep`
>>>>>>> `CollectingBarrierStep`
>>>>>>> `FilterStep`
>>>>>>> `SideEffectStep`
>>>>>>> `StartStep`
>>>>>>> `NoOpBarrierStep`
>>>>>>> Further seeing as `Step.getLabels()` already returns a unmodifiable
>>>>>>> collection and
>>>>>>> `ImmutablePath` is well immutable there is no need for it to have its
>>>>>>> own copy
>>>>>>> of the labels. I set the labels directly on the path as oppose to
>>>>>>> making a copy.
>>>>>>> `TinkerGraphProcessComputerTest` passes.
>>>>>>> `TinkerGraphProcessStandardTest` passes.
>>>>>>> `TinkerGraphStructureStandardTest` has one failure.
>>>>>>> `SerializationTest$GryoTest.shouldSerializePathAsDetached` fails with
>>>>>>> `java.lang.IllegalArgumentException: Class is not registered:
>>>>>>> java.util.Collections$UnmodifiableSet`
>>>>>>> Let me know what you think.
>>>>>>> Thanks
>>>>>>> Pieter
>>>>>>> Excerpts from Ted Wilmes's message of August 7, 2016 12:16 :
>>>>>>>> Neat find Pieter. With regards to the update to ImmutablePath, I think
>>>>>>>> that defensive copying of inbound collections is generally a good idea
>>>>>>>> but
>>>>>>>> if we can target specific areas where we can reap big gains from not
>>>>>>>> creating new collections it may be worth it to relax that constraint,
>>>>>>>> especially if we're only talking about internal APIs. If we do relax
>>>>>>>> that
>>>>>>>> constraint though, we'll need to careful during code review and unit
>>>>>>>> testing because those bugs can be difficult to track down. The other
>>>>>>>> nice
>>>>>>>> thing that the defensive copy gets us, particularly in the case of the
>>>>>>>> ImmutablePath, is that it isolates ImmutablePath from whatever the
>>>>>>>> subclass
>>>>>>>> of set was that the caller passed in. I think that's what is causing
>>>>>>>> the
>>>>>>>> serialization test failure in this case since the caller passed in an
>>>>>>>> unmodifiable set.
>>>>>>>> --Ted
>>>>>>>> On Fri, Aug 5, 2016, 2:31 PM Marko Rodriguez <[email protected]>
>>>>>>>> wrote:
>>>>>>>>> Hello,
>>>>>>>>> This is cool. Check out also ImmutablePath.extend(labels) as that is
>>>>>>>>> ultimately what Traverser.addLabels() calls. We have a lot of set
>>>>>>>>> copying
>>>>>>>>> and I don’t know if its needed (as you seem to be demonstrating).
>>>>>>>>> What I
>>>>>>>>> don’t like about your solution is the explicit reference to the
>>>>>>>>> B_L_P…Traverser in AbstractStep. See if you can work your solution
>>>>>>>>> without
>>>>>>>>> it.
>>>>>>>>> Good luck,
>>>>>>>>> Marko.
>>>>>>>>> http://markorodriguez.com
>>>>>>>>>> On Aug 5, 2016, at 12:44 PM, pieter-gmail <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>> Sorry forgot to add a rather important part.
>>>>>>>>>>
>>>>>>>>>> I changed ImmutablePath's constructor to
>>>>>>>>>>
>>>>>>>>>> private ImmutablePath(final ImmutablePathImpl previousPath, final
>>>>>>>>>> Object currentObject, final Set<String> currentLabels) {
>>>>>>>>>> this.previousPath = previousPath;
>>>>>>>>>> this.currentObject = currentObject;
>>>>>>>>>> this.currentLabels = currentLabels;
>>>>>>>>>> // this.currentLabels.addAll(currentLabels);
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> Setting the collection directly as oppose to `addAll`
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Pieter
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 05/08/2016 20:40, pieter-gmail wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I have been optimizing Sqlg of late and eventually arrived at
>>>>>>>>>>> TinkerPop
>>>>>>>>>>> code.
>>>>>>>>>>>
>>>>>>>>>>> The gremlin in particular that I am interested is path queries.
>>>>>>>>>>>
>>>>>>>>>>> Here is the test that I am running in jmh.
>>>>>>>>>>>
>>>>>>>>>>> //@Setup
>>>>>>>>>>> Vertex a = graph.addVertex(T.label, "A", "name", "a1");
>>>>>>>>>>> for (int i = 1; i < 1_000_001; i++) {
>>>>>>>>>>> Vertex b = graph.addVertex(T.label, "B", "name", "name_" +
>>>>>>>>> i);
>>>>>>>>>>> a.addEdge("outB", b);
>>>>>>>>>>> for (int j = 0; j < 1; j++) {
>>>>>>>>>>> Vertex c = graph.addVertex(T.label, "C", "name",
>>>>>>>>>>> "name_"
>>>>>>>>>>> + i + " " + j);
>>>>>>>>>>> b.addEdge("outC", c);
>>>>>>>>>>> }
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> And the query being benchmarked is
>>>>>>>>>>>
>>>>>>>>>>> GraphTraversal<Vertex, Path> traversal =
>>>>>>>>>>> g.V(a).as("a").out().as("b").out().as("c").path();
>>>>>>>>>>> while (traversal.hasNext()) {
>>>>>>>>>>> Path path = traversal.next();
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> Before the optimization, (as things are now)
>>>>>>>>>>>
>>>>>>>>>>> Benchmark Mode Cnt Score Error
>>>>>>>>> Units
>>>>>>>>>>> GremlinPathBenchmark.g_path avgt 100 1.086 ± 0.020 s/op
>>>>>>>>>>>
>>>>>>>>>>> The optimization I did is in
>>>>>>>>>>> AbstractStep.prepareTraversalForNextStep,
>>>>>>>>>>> to not call addLabels() for path gremlins as the labels are known
>>>>>>>>>>> by the
>>>>>>>>>>> step and do not change again so there is not need to keep adding
>>>>>>>>>>> them.
>>>>>>>>>>>
>>>>>>>>>>> private final Traverser.Admin<E> prepareTraversalForNextStep(final
>>>>>>>>>>> Traverser.Admin<E> traverser) {
>>>>>>>>>>> if (!this.traverserStepIdAndLabelsSetByChild) {
>>>>>>>>>>> traverser.setStepId(this.nextStep.getId());
>>>>>>>>>>> if (traverser instanceof B_LP_O_P_S_SE_SL_Traverser) {
>>>>>>>>>>> } else {
>>>>>>>>>>> traverser.addLabels(this.labels);
>>>>>>>>>>> }
>>>>>>>>>>> }
>>>>>>>>>>> return traverser;
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> After optimization,
>>>>>>>>>>>
>>>>>>>>>>> Benchmark Mode Cnt Score Error
>>>>>>>>> Units
>>>>>>>>>>> GremlinPathBenchmark.g_path avgt 100 0.680 ± 0.004 s/op
>>>>>>>>>>>
>>>>>>>>>>> 1.086 vs 0.689 seconds for the traversal.
>>>>>>>>>>>
>>>>>>>>>>> I ran the Structured and Process test suites. 2 tests are failing
>>>>>>>>>>> with
>>>>>>>>>>> this optimization.
>>>>>>>>>>>
>>>>>>>>>>> InjectTest.g_VX1X_out_name_injectXdanielX_asXaX_mapXlengthX_path
>>>>>>>>>>> fails
>>>>>>>>> with
>>>>>>>>>>> "java.lang.IllegalArgumentException: The step with label a does not
>>>>>>>>> exist"
>>>>>>>>>>> and
>>>>>>>>>>>
>>>>>>>>>>> SerializationTest.shouldSerializePathAsDetached fails with
>>>>>>>>>>>
>>>>>>>>>>> "Caused by: java.lang.IllegalArgumentException: Class is not
>>>>>>>>> registered:
>>>>>>>>>>> java.util.Collections$UnmodifiableSet"
>>>>>>>>>>>
>>>>>>>>>>> Before investigating the failures is this optimization worth
>>>>>>>>>>> pursuing?
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> Pieter
>>>>>>>>>>>
>