Re: path query optimization

Marko Rodriguez Tue, 01 Nov 2016 07:19:54 -0700

Hi,

So the one thing I just copied into TINKERPOP-1372 was your 
ImmutablePath.currentLabels = currentLabels (vs. copying the set). You are 
right, step labels are immutable and thus, there is no need to copy the set 
over — a direct reference is sufficient. Regarding the 
AbstractStep.traverserStepIdAndLabelsSetByChild changes — this isn’t so 
important as this is only used in OLAP  and thus, should be negligible relative 
to what else is going on in OLAP. However, to be good, I added the following:
@Override
public Path extend(final Set<String> labels) {
    if (labels.isEmpty())
        return this;
    else {
        final Set<String> newLabels = new LinkedHashSet<>();
        newLabels.addAll(this.currentLabels);
        newLabels.addAll(labels);
        return new ImmutablePath(this.previousPath, this.currentObject, 
newLabels);
    }
}
Thus, if the labels are empty, don’t clone.


This work along with the other recursion work in TINKERPOP-1372 provided a 
solid speed up. I’ll present benchmark results in the subsequent PR.

Thanks Pieter,
Marko.

http://markorodriguez.com



> On Nov 1, 2016, at 7:43 AM, pieter-gmail <[email protected]> wrote:
> 
> The branch is TINKERPOP-1404
> https://github.com/apache/tinkerpop/commit/c1556fe82c58527dc4425d23d1d69ce324e62cfa
> 
> Cheers
> Pieter
> 
> On 01/11/2016 15:23, Marko Rodriguez wrote:
>> What branch are you in? Perhaps give me URLs (to GitHub) to the files 
>> touched? (if its not too many)
>> 
>> Marko.
>> 
>> http://markorodriguez.com
>> 
>> 
>> 
>>> On Nov 1, 2016, at 7:19 AM, pieter-gmail <[email protected]> wrote:
>>> 
>>> Hi,
>>> 
>>> Yes I am but afraid I do not have time at present to concentrate on it.
>>> I just noticed your ImmutablePath ticket which will overlap with some of
>>> what I have done.
>>> 
>>> I'd suggest to pull my branch and look at what I did there. It was very
>>> little, but dangerous code, which is why I was reluctant to submit a PR
>>> at first. If you don't continue with it, I should in 2 or 3 weeks be
>>> able to look at it again.
>>> 
>>> Thanks
>>> Pieter
>>> 
>>> 
>>> On 01/11/2016 15:11, Marko Rodriguez wrote:
>>>> Hi Pieter,
>>>> 
>>>> I’m still really interested in your work in this area. Are you still doing 
>>>> this?
>>>> 
>>>> Marko.
>>>> 
>>>> http://markorodriguez.com
>>>> 
>>>> 
>>>> 
>>>>> On Aug 7, 2016, at 9:12 AM, Pieter Martin <[email protected]> wrote:
>>>>> 
>>>>> To avoid the collection logic alltogether. For most steps there is no 
>>>>> need to
>>>>> check the labels as it is known that they are added, immutable and 
>>>>> correct.
>>>>> 
>>>>> Also with the current strategy the `ImmutablePath.currentLabels` is 
>>>>> exactly the
>>>>> same collection as that of the step. It is not a copy.
>>>>> 
>>>>> `Traverser.addLabels()` is only called for the steps previously 
>>>>> mentioned. They
>>>>> too can be optimized as there is no need to create a new `ImmutablePath` 
>>>>> just to
>>>>> set the labels. There could be a `Traverer.split()` method that creates 
>>>>> the
>>>>> initial `ImmutablePath` with the correct labels to start with. As things 
>>>>> stand
>>>>> now the `ImmutablePath` is created just to be replaced 1 or 2 millis 
>>>>> later. I
>>>>> have however not benchmarked any of those queries so am not touching that 
>>>>> at
>>>>> the moment.
>>>>> 
>>>>> In some ways its a matter of style. The label logic in `AbstractStep` is 
>>>>> not
>>>>> relevant to all of its subclasses and should be higher in the inheritance
>>>>> hierarchy.
>>>>> 
>>>>> Cheers
>>>>> Pieter
>>>>> 
>>>>> 
>>>>> Excerpts from Marko Rodriguez's message of August 7, 2016 3:54 :
>>>>>> Why not just check to see if the labels to be added already exist, if 
>>>>>> they do, don’t addLabels() and thus, don’t create a new collection.
>>>>>> Marko.
>>>>>> http://markorodriguez.com
>>>>>>> On Aug 7, 2016, at 6:07 AM, Pieter Martin <[email protected]> 
>>>>>>> wrote:
>>>>>>> Here is what I have come up with so far.
>>>>>>> The idea is that `Traverser.split(r, step)` already copies the labels 
>>>>>>> to the
>>>>>>> traverser so there is no need to call `Traverser.addLabels(labels)` 
>>>>>>> again.
>>>>>>> I removed the `Traverser.addLabels(labels)` call from `AbstractStep`.
>>>>>>> For the traversers that do not call `Traverer.split(r, step)` I 
>>>>>>> manually added
>>>>>>> the `traverser.addLabels(labels)` call in `processNextStart()`. This 
>>>>>>> was done
>>>>>>> by fixing test failures rather than searching and investigating all 
>>>>>>> calls to
>>>>>>> `Traverser.split()`.
>>>>>>> The following steps needed the labels to be added manually.
>>>>>>> `AggregateStep`
>>>>>>> `CollectingBarrierStep`
>>>>>>> `FilterStep`
>>>>>>> `SideEffectStep`
>>>>>>> `StartStep`
>>>>>>> `NoOpBarrierStep`
>>>>>>> Further seeing as `Step.getLabels()` already returns a unmodifiable 
>>>>>>> collection and
>>>>>>> `ImmutablePath` is well immutable there is no need for it to have its 
>>>>>>> own copy
>>>>>>> of the labels. I set the labels directly on the path as oppose to 
>>>>>>> making a copy.
>>>>>>> `TinkerGraphProcessComputerTest` passes.
>>>>>>> `TinkerGraphProcessStandardTest` passes.
>>>>>>> `TinkerGraphStructureStandardTest` has one failure.
>>>>>>> `SerializationTest$GryoTest.shouldSerializePathAsDetached` fails with
>>>>>>> `java.lang.IllegalArgumentException: Class is not registered:
>>>>>>> java.util.Collections$UnmodifiableSet`
>>>>>>> Let me know what you think.
>>>>>>> Thanks
>>>>>>> Pieter
>>>>>>> Excerpts from Ted Wilmes's message of August 7, 2016 12:16 :
>>>>>>>> Neat find Pieter.  With regards to the update to ImmutablePath, I think
>>>>>>>> that defensive copying of inbound collections is generally a good idea 
>>>>>>>> but
>>>>>>>> if we can target specific areas where we can reap big gains from not
>>>>>>>> creating new collections it may be worth it to relax that constraint,
>>>>>>>> especially if we're only talking about internal APIs.  If we do relax 
>>>>>>>> that
>>>>>>>> constraint though, we'll need to careful during code review and unit
>>>>>>>> testing because those bugs can be difficult to track down.  The other 
>>>>>>>> nice
>>>>>>>> thing that the defensive copy gets us, particularly in the case of the
>>>>>>>> ImmutablePath, is that it isolates ImmutablePath from whatever the 
>>>>>>>> subclass
>>>>>>>> of set was that the caller passed in.  I think that's what is causing 
>>>>>>>> the
>>>>>>>> serialization test failure in this case since the caller passed in an
>>>>>>>> unmodifiable set.
>>>>>>>> --Ted
>>>>>>>> On Fri, Aug 5, 2016, 2:31 PM Marko Rodriguez <[email protected]> 
>>>>>>>> wrote:
>>>>>>>>> Hello,
>>>>>>>>> This is cool. Check out also ImmutablePath.extend(labels) as that is
>>>>>>>>> ultimately what Traverser.addLabels() calls. We have a lot of set 
>>>>>>>>> copying
>>>>>>>>> and I don’t know if its needed (as you seem to be demonstrating). 
>>>>>>>>> What I
>>>>>>>>> don’t like about your solution is the explicit reference to the
>>>>>>>>> B_L_P…Traverser in AbstractStep. See if you can work your solution 
>>>>>>>>> without
>>>>>>>>> it.
>>>>>>>>> Good luck,
>>>>>>>>> Marko.
>>>>>>>>> http://markorodriguez.com
>>>>>>>>>> On Aug 5, 2016, at 12:44 PM, pieter-gmail <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>> Sorry forgot to add a rather important part.
>>>>>>>>>> 
>>>>>>>>>> I changed ImmutablePath's constructor to
>>>>>>>>>> 
>>>>>>>>>>  private ImmutablePath(final ImmutablePathImpl previousPath, final
>>>>>>>>>> Object currentObject, final Set<String> currentLabels) {
>>>>>>>>>>      this.previousPath = previousPath;
>>>>>>>>>>      this.currentObject = currentObject;
>>>>>>>>>>      this.currentLabels = currentLabels;
>>>>>>>>>> //        this.currentLabels.addAll(currentLabels);
>>>>>>>>>>  }
>>>>>>>>>> 
>>>>>>>>>> Setting the collection directly as oppose to `addAll`
>>>>>>>>>> 
>>>>>>>>>> Thanks
>>>>>>>>>> Pieter
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On 05/08/2016 20:40, pieter-gmail wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>> 
>>>>>>>>>>> I have been optimizing Sqlg of late and eventually arrived at 
>>>>>>>>>>> TinkerPop
>>>>>>>>>>> code.
>>>>>>>>>>> 
>>>>>>>>>>> The gremlin in particular that I am interested is path queries.
>>>>>>>>>>> 
>>>>>>>>>>> Here is the test that I am running in jmh.
>>>>>>>>>>> 
>>>>>>>>>>>      //@Setup
>>>>>>>>>>>      Vertex a = graph.addVertex(T.label, "A", "name", "a1");
>>>>>>>>>>>      for (int i = 1; i < 1_000_001; i++) {
>>>>>>>>>>>          Vertex b = graph.addVertex(T.label, "B", "name", "name_" +
>>>>>>>>> i);
>>>>>>>>>>>          a.addEdge("outB", b);
>>>>>>>>>>>          for (int j = 0; j < 1; j++) {
>>>>>>>>>>>              Vertex c = graph.addVertex(T.label, "C", "name", 
>>>>>>>>>>> "name_"
>>>>>>>>>>> + i + " " + j);
>>>>>>>>>>>              b.addEdge("outC", c);
>>>>>>>>>>>          }
>>>>>>>>>>>      }
>>>>>>>>>>> 
>>>>>>>>>>> And the query being benchmarked is
>>>>>>>>>>> 
>>>>>>>>>>>      GraphTraversal<Vertex, Path> traversal =
>>>>>>>>>>> g.V(a).as("a").out().as("b").out().as("c").path();
>>>>>>>>>>>      while (traversal.hasNext()) {
>>>>>>>>>>>          Path path = traversal.next();
>>>>>>>>>>>      }
>>>>>>>>>>> 
>>>>>>>>>>> Before the optimization, (as things are now)
>>>>>>>>>>> 
>>>>>>>>>>> Benchmark                                 Mode Cnt  Score Error
>>>>>>>>> Units
>>>>>>>>>>> GremlinPathBenchmark.g_path  avgt  100  1.086 ± 0.020   s/op
>>>>>>>>>>> 
>>>>>>>>>>> The optimization I did is in 
>>>>>>>>>>> AbstractStep.prepareTraversalForNextStep,
>>>>>>>>>>> to not call addLabels() for path gremlins as the labels are known 
>>>>>>>>>>> by the
>>>>>>>>>>> step and do not change again so there is not need to keep adding 
>>>>>>>>>>> them.
>>>>>>>>>>> 
>>>>>>>>>>>  private final Traverser.Admin<E> prepareTraversalForNextStep(final
>>>>>>>>>>> Traverser.Admin<E> traverser) {
>>>>>>>>>>>      if (!this.traverserStepIdAndLabelsSetByChild) {
>>>>>>>>>>>          traverser.setStepId(this.nextStep.getId());
>>>>>>>>>>>          if (traverser instanceof B_LP_O_P_S_SE_SL_Traverser) {
>>>>>>>>>>>          } else {
>>>>>>>>>>>              traverser.addLabels(this.labels);
>>>>>>>>>>>          }
>>>>>>>>>>>      }
>>>>>>>>>>>      return traverser;
>>>>>>>>>>>  }
>>>>>>>>>>> 
>>>>>>>>>>> After optimization,
>>>>>>>>>>> 
>>>>>>>>>>> Benchmark                                 Mode Cnt  Score Error
>>>>>>>>> Units
>>>>>>>>>>> GremlinPathBenchmark.g_path  avgt  100  0.680 ± 0.004   s/op
>>>>>>>>>>> 
>>>>>>>>>>> 1.086 vs 0.689 seconds for the traversal.
>>>>>>>>>>> 
>>>>>>>>>>> I ran the Structured and Process test suites. 2 tests are failing 
>>>>>>>>>>> with
>>>>>>>>>>> this optimization.
>>>>>>>>>>> 
>>>>>>>>>>> InjectTest.g_VX1X_out_name_injectXdanielX_asXaX_mapXlengthX_path 
>>>>>>>>>>> fails
>>>>>>>>> with
>>>>>>>>>>> "java.lang.IllegalArgumentException: The step with label a does not
>>>>>>>>> exist"
>>>>>>>>>>> and
>>>>>>>>>>> 
>>>>>>>>>>> SerializationTest.shouldSerializePathAsDetached fails with
>>>>>>>>>>> 
>>>>>>>>>>> "Caused by: java.lang.IllegalArgumentException: Class is not
>>>>>>>>> registered:
>>>>>>>>>>> java.util.Collections$UnmodifiableSet"
>>>>>>>>>>> 
>>>>>>>>>>> Before investigating the failures is this optimization worth 
>>>>>>>>>>> pursuing?
>>>>>>>>>>> 
>>>>>>>>>>> Thanks
>>>>>>>>>>> Pieter
>>>>>>>>>>> 
>

Re: path query optimization

Reply via email to