Re: path query optimization

Marko Rodriguez Tue, 01 Nov 2016 06:12:18 -0700

Hi Pieter,

I’m still really interested in your work in this area. Are you still doing this?


Marko.

http://markorodriguez.com



> On Aug 7, 2016, at 9:12 AM, Pieter Martin <[email protected]> wrote:
> 
> To avoid the collection logic alltogether. For most steps there is no need to
> check the labels as it is known that they are added, immutable and correct.
> 
> Also with the current strategy the `ImmutablePath.currentLabels` is exactly 
> the
> same collection as that of the step. It is not a copy.
> 
> `Traverser.addLabels()` is only called for the steps previously mentioned. 
> They
> too can be optimized as there is no need to create a new `ImmutablePath` just 
> to
> set the labels. There could be a `Traverer.split()` method that creates the
> initial `ImmutablePath` with the correct labels to start with. As things stand
> now the `ImmutablePath` is created just to be replaced 1 or 2 millis later. I
> have however not benchmarked any of those queries so am not touching that at
> the moment.
> 
> In some ways its a matter of style. The label logic in `AbstractStep` is not
> relevant to all of its subclasses and should be higher in the inheritance
> hierarchy.
> 
> Cheers
> Pieter
> 
> 
> Excerpts from Marko Rodriguez's message of August 7, 2016 3:54 :
>> Why not just check to see if the labels to be added already exist, if they 
>> do, don’t addLabels() and thus, don’t create a new collection.
>> Marko.
>> http://markorodriguez.com
>>> On Aug 7, 2016, at 6:07 AM, Pieter Martin <[email protected]> wrote:
>>> Here is what I have come up with so far.
>>> The idea is that `Traverser.split(r, step)` already copies the labels to the
>>> traverser so there is no need to call `Traverser.addLabels(labels)` again.
>>> I removed the `Traverser.addLabels(labels)` call from `AbstractStep`.
>>> For the traversers that do not call `Traverer.split(r, step)` I manually 
>>> added
>>> the `traverser.addLabels(labels)` call in `processNextStart()`. This was 
>>> done
>>> by fixing test failures rather than searching and investigating all calls to
>>> `Traverser.split()`.
>>> The following steps needed the labels to be added manually.
>>> `AggregateStep`
>>> `CollectingBarrierStep`
>>> `FilterStep`
>>> `SideEffectStep`
>>> `StartStep`
>>> `NoOpBarrierStep`
>>> Further seeing as `Step.getLabels()` already returns a unmodifiable 
>>> collection and
>>> `ImmutablePath` is well immutable there is no need for it to have its own 
>>> copy
>>> of the labels. I set the labels directly on the path as oppose to making a 
>>> copy.
>>> `TinkerGraphProcessComputerTest` passes.
>>> `TinkerGraphProcessStandardTest` passes.
>>> `TinkerGraphStructureStandardTest` has one failure.
>>> `SerializationTest$GryoTest.shouldSerializePathAsDetached` fails with
>>> `java.lang.IllegalArgumentException: Class is not registered:
>>> java.util.Collections$UnmodifiableSet`
>>> Let me know what you think.
>>> Thanks
>>> Pieter
>>> Excerpts from Ted Wilmes's message of August 7, 2016 12:16 :
>>>> Neat find Pieter.  With regards to the update to ImmutablePath, I think
>>>> that defensive copying of inbound collections is generally a good idea but
>>>> if we can target specific areas where we can reap big gains from not
>>>> creating new collections it may be worth it to relax that constraint,
>>>> especially if we're only talking about internal APIs.  If we do relax that
>>>> constraint though, we'll need to careful during code review and unit
>>>> testing because those bugs can be difficult to track down.  The other nice
>>>> thing that the defensive copy gets us, particularly in the case of the
>>>> ImmutablePath, is that it isolates ImmutablePath from whatever the subclass
>>>> of set was that the caller passed in.  I think that's what is causing the
>>>> serialization test failure in this case since the caller passed in an
>>>> unmodifiable set.
>>>> --Ted
>>>> On Fri, Aug 5, 2016, 2:31 PM Marko Rodriguez <[email protected]> wrote:
>>>>> Hello,
>>>>> This is cool. Check out also ImmutablePath.extend(labels) as that is
>>>>> ultimately what Traverser.addLabels() calls. We have a lot of set copying
>>>>> and I don’t know if its needed (as you seem to be demonstrating). What I
>>>>> don’t like about your solution is the explicit reference to the
>>>>> B_L_P…Traverser in AbstractStep. See if you can work your solution without
>>>>> it.
>>>>> Good luck,
>>>>> Marko.
>>>>> http://markorodriguez.com
>>>>> > On Aug 5, 2016, at 12:44 PM, pieter-gmail <[email protected]>
>>>>> wrote:
>>>>> >
>>>>> > Sorry forgot to add a rather important part.
>>>>> >
>>>>> > I changed ImmutablePath's constructor to
>>>>> >
>>>>> >    private ImmutablePath(final ImmutablePathImpl previousPath, final
>>>>> > Object currentObject, final Set<String> currentLabels) {
>>>>> >        this.previousPath = previousPath;
>>>>> >        this.currentObject = currentObject;
>>>>> >        this.currentLabels = currentLabels;
>>>>> > //        this.currentLabels.addAll(currentLabels);
>>>>> >    }
>>>>> >
>>>>> > Setting the collection directly as oppose to `addAll`
>>>>> >
>>>>> > Thanks
>>>>> > Pieter
>>>>> >
>>>>> >
>>>>> > On 05/08/2016 20:40, pieter-gmail wrote:
>>>>> >> Hi,
>>>>> >>
>>>>> >> I have been optimizing Sqlg of late and eventually arrived at TinkerPop
>>>>> >> code.
>>>>> >>
>>>>> >> The gremlin in particular that I am interested is path queries.
>>>>> >>
>>>>> >> Here is the test that I am running in jmh.
>>>>> >>
>>>>> >>        //@Setup
>>>>> >>        Vertex a = graph.addVertex(T.label, "A", "name", "a1");
>>>>> >>        for (int i = 1; i < 1_000_001; i++) {
>>>>> >>            Vertex b = graph.addVertex(T.label, "B", "name", "name_" +
>>>>> i);
>>>>> >>            a.addEdge("outB", b);
>>>>> >>            for (int j = 0; j < 1; j++) {
>>>>> >>                Vertex c = graph.addVertex(T.label, "C", "name", "name_"
>>>>> >> + i + " " + j);
>>>>> >>                b.addEdge("outC", c);
>>>>> >>            }
>>>>> >>        }
>>>>> >>
>>>>> >> And the query being benchmarked is
>>>>> >>
>>>>> >>        GraphTraversal<Vertex, Path> traversal =
>>>>> >> g.V(a).as("a").out().as("b").out().as("c").path();
>>>>> >>        while (traversal.hasNext()) {
>>>>> >>            Path path = traversal.next();
>>>>> >>        }
>>>>> >>
>>>>> >> Before the optimization, (as things are now)
>>>>> >>
>>>>> >> Benchmark                                 Mode Cnt  Score Error
>>>>> Units
>>>>> >> GremlinPathBenchmark.g_path  avgt  100  1.086 ± 0.020   s/op
>>>>> >>
>>>>> >> The optimization I did is in AbstractStep.prepareTraversalForNextStep,
>>>>> >> to not call addLabels() for path gremlins as the labels are known by 
>>>>> >> the
>>>>> >> step and do not change again so there is not need to keep adding them.
>>>>> >>
>>>>> >>    private final Traverser.Admin<E> prepareTraversalForNextStep(final
>>>>> >> Traverser.Admin<E> traverser) {
>>>>> >>        if (!this.traverserStepIdAndLabelsSetByChild) {
>>>>> >>            traverser.setStepId(this.nextStep.getId());
>>>>> >>            if (traverser instanceof B_LP_O_P_S_SE_SL_Traverser) {
>>>>> >>            } else {
>>>>> >>                traverser.addLabels(this.labels);
>>>>> >>            }
>>>>> >>        }
>>>>> >>        return traverser;
>>>>> >>    }
>>>>> >>
>>>>> >> After optimization,
>>>>> >>
>>>>> >> Benchmark                                 Mode Cnt  Score Error
>>>>> Units
>>>>> >> GremlinPathBenchmark.g_path  avgt  100  0.680 ± 0.004   s/op
>>>>> >>
>>>>> >> 1.086 vs 0.689 seconds for the traversal.
>>>>> >>
>>>>> >> I ran the Structured and Process test suites. 2 tests are failing with
>>>>> >> this optimization.
>>>>> >>
>>>>> >> InjectTest.g_VX1X_out_name_injectXdanielX_asXaX_mapXlengthX_path fails
>>>>> with
>>>>> >>
>>>>> >> "java.lang.IllegalArgumentException: The step with label a does not
>>>>> exist"
>>>>> >>
>>>>> >> and
>>>>> >>
>>>>> >> SerializationTest.shouldSerializePathAsDetached fails with
>>>>> >>
>>>>> >> "Caused by: java.lang.IllegalArgumentException: Class is not
>>>>> registered:
>>>>> >> java.util.Collections$UnmodifiableSet"
>>>>> >>
>>>>> >> Before investigating the failures is this optimization worth pursuing?
>>>>> >>
>>>>> >> Thanks
>>>>> >> Pieter
>>>>> >>
>>>>> >

Re: path query optimization

Reply via email to