[
https://issues.apache.org/jira/browse/CALCITE-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049849#comment-17049849
]
Julian Hyde edited comment on CALCITE-3827 at 4/8/20, 7:00 PM:
---------------------------------------------------------------
[~julianhyde] Thanks a lot for your effort for investigating the related code.
To evaluate the effects to all operations of the graph, we have added more
benchmarks to evaluate the performance of adding/removing/getting
edges/vertices in the graph. The results are as follows:
before
{noformat}
Benchmark Mode Cnt Score
Error Units
DefaultDirectedGraphBenchmark.addEdgeBenchmark avgt 5 0.110 ±
0.002 us/op
DefaultDirectedGraphBenchmark.addVertexBenchmark avgt 5 0.071 ±
0.002 us/op
DefaultDirectedGraphBenchmark.getEdgeBenchmark avgt 5 0.097 ±
0.001 us/op
DefaultDirectedGraphBenchmark.getInwardEdgesBenchmark avgt 5 26.604 ±
0.047 us/op
DefaultDirectedGraphBenchmark.getOutwardEdgesBenchmark avgt 5 0.204 ±
0.002 us/op
DefaultDirectedGraphBenchmark.removeAllVerticesBenchmark avgt 5 12.608 ±
0.062 us/op
DefaultDirectedGraphBenchmark.removeEdgeBenchmark avgt 5 0.149 ±
0.006 us/op
{noformat}
after
{noformat}
Benchmark Mode Cnt Score
Error Units
DefaultDirectedGraphBenchmark.addEdgeBenchmark avgt 5 0.113 ±
0.002 us/op
DefaultDirectedGraphBenchmark.addVertexBenchmark avgt 5 0.067 ±
0.012 us/op
DefaultDirectedGraphBenchmark.getEdgeBenchmark avgt 5 0.109 ±
0.002 us/op
DefaultDirectedGraphBenchmark.getInwardEdgesBenchmark avgt 5 0.203 ±
0.014 us/op
DefaultDirectedGraphBenchmark.getOutwardEdgesBenchmark avgt 5 0.212 ±
0.026 us/op
DefaultDirectedGraphBenchmark.removeAllVerticesBenchmark avgt 5 0.778 ±
0.017 us/op
DefaultDirectedGraphBenchmark.removeEdgeBenchmark avgt 5 0.160 ±
0.003 us/op
{noformat}
It can be observed that the performance of getting inward edges improves by
orders of magnitude, while the overheads for other operations are marginal.
One more thing to note is that removeAllVertices is another operation that
limits the scalability of the graph. This is because, our current
implementation involves traversing the whole graph, even if only a small number
of vertices needs to be removed.
We improve the removeAllVertices operation by traversing only vertices related
to vertices that will be removed. This leads to orders of magnitude performance
improvement. Please note that this improvement is only possible when both the
in-ward and out-ward edge tables are available.
was (Author: fan_li_ya):
[~julianhyde] Thanks a lot for your effort for investigating the related code.
To evaluate the effects to all operations of the graph, we have added more
benchmarks to evaluate the performance of adding/removing/getting
edges/vertices in the graph. The results are as follows:
before
Benchmark Mode Cnt Score
Error Units
DefaultDirectedGraphBenchmark.addEdgeBenchmark avgt 5 0.110 ±
0.002 us/op
DefaultDirectedGraphBenchmark.addVertexBenchmark avgt 5 0.071 ±
0.002 us/op
DefaultDirectedGraphBenchmark.getEdgeBenchmark avgt 5 0.097 ±
0.001 us/op
DefaultDirectedGraphBenchmark.getInwardEdgesBenchmark avgt 5 26.604 ±
0.047 us/op
DefaultDirectedGraphBenchmark.getOutwardEdgesBenchmark avgt 5 0.204 ±
0.002 us/op
DefaultDirectedGraphBenchmark.removeAllVerticesBenchmark avgt 5 12.608 ±
0.062 us/op
DefaultDirectedGraphBenchmark.removeEdgeBenchmark avgt 5 0.149 ±
0.006 us/op
after
Benchmark Mode Cnt Score
Error Units
DefaultDirectedGraphBenchmark.addEdgeBenchmark avgt 5 0.113 ±
0.002 us/op
DefaultDirectedGraphBenchmark.addVertexBenchmark avgt 5 0.067 ±
0.012 us/op
DefaultDirectedGraphBenchmark.getEdgeBenchmark avgt 5 0.109 ±
0.002 us/op
DefaultDirectedGraphBenchmark.getInwardEdgesBenchmark avgt 5 0.203 ±
0.014 us/op
DefaultDirectedGraphBenchmark.getOutwardEdgesBenchmark avgt 5 0.212 ±
0.026 us/op
DefaultDirectedGraphBenchmark.removeAllVerticesBenchmark avgt 5 0.778 ±
0.017 us/op
DefaultDirectedGraphBenchmark.removeEdgeBenchmark avgt 5 0.160 ±
0.003 us/op
It can be observed that the performance of getting inward edges improves by
orders of magnitude, while the overheads for other operations are marginal.
One more thing to note is that removeAllVertices is another operation that
limits the scalability of the graph. This is because, our current
implementation involves traversing the whole graph, even if only a small number
of vertices needs to be removed.
We improve the removeAllVertices operation by traversing only vertices related
to vertices that will be removed. This leads to orders of magnitude performance
improvement. Please note that this improvement is only possible when both the
in-ward and out-ward edge tables are available.
> Reduce the time complexity of finding in-edges of a vertex in the graph
> -----------------------------------------------------------------------
>
> Key: CALCITE-3827
> URL: https://issues.apache.org/jira/browse/CALCITE-3827
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Reporter: Liya Fan
> Assignee: Julian Hyde
> Priority: Major
> Labels: pull-request-available
> Time Spent: 2h 40m
> Remaining Estimate: 0h
>
> In the current graph implementation, it takes O(1) time to find the out-edges
> of a vertex, but O(e) time (where e is the total number of edges in the
> graph) to find the in-edges of a vertex.
> In some scenarios (e.g. clearing cache in hep planner), this may have severe
> performance penalty. To solve this problem, we add a inward edge table, in
> addition to the existing outward edge table, to the graph. This helps to
> reduce the time complexity for finding in-edges to O(1).
> Please note that, there is extra performance overhead for maintaining the in
> edge table, when adding/deleting vertices to the graph. However, the added
> overhead only takes O(1) time.
> Finally, it should be noted that, some existing operations can benefit from
> this improvement (e.g. topological sort).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)