[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances
[ https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17148358#comment-17148358 ] Ruben Q L commented on CALCITE-4049: Fixed via https://github.com/apache/calcite/commit/bd121aa3a6d3b0314785a4c9141028b2ef252ebf Thanks [~fan_li_ya] the patch! > Reduce the time complexity of getting shortest distances > > > Key: CALCITE-4049 > URL: https://issues.apache.org/jira/browse/CALCITE-4049 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Fix For: 1.24.0 > > Time Spent: 5h 10m > Remaining Estimate: 0h > > Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths > between all pairs of nodes. For many scenarios, however, we do not need the > exact paths between nodes. Instead, we are only interested in the lengths of > the shortest paths. > To get the path length, we need to get the shortest path first, which is > returned as a {{List}}, then we call the {{List#size()}} method. According to > the current implementation, the returned list is of type {{ConsList}}. The > time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on > the path), which is inefficient. > In this issue, we revise the implementation of {{ConsList#size}} so that it > takes O(1) time. In addition, we also give a utiltiy to get the shortest > distances between nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances
[ https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136359#comment-17136359 ] Liya Fan commented on CALCITE-4049: --- [~julianhyde] Sounds reasonable. Thank you for the good point. > Reduce the time complexity of getting shortest distances > > > Key: CALCITE-4049 > URL: https://issues.apache.org/jira/browse/CALCITE-4049 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Fix For: 1.24.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths > between all pairs of nodes. For many scenarios, however, we do not need the > exact paths between nodes. Instead, we are only interested in the lengths of > the shortest paths. > To get the path length, we need to get the shortest path first, which is > returned as a {{List}}, then we call the {{List#size()}} method. According to > the current implementation, the returned list is of type {{ConsList}}. The > time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on > the path), which is inefficient. > In this issue, we revise the implementation of {{ConsList#size}} so that it > takes O(1) time. In addition, we also give a utiltiy to get the shortest > distances between nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances
[ https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136344#comment-17136344 ] Liya Fan commented on CALCITE-4049: --- I have prepare another PR (https://github.com/apache/calcite/pull/2027/files) according to the above discussion. Maybe we can use that for further discussion. BTW, I have tried the Floyd-Warshall algorithm, but it did not perform well, as it lacks an early stopping methanism. > Reduce the time complexity of getting shortest distances > > > Key: CALCITE-4049 > URL: https://issues.apache.org/jira/browse/CALCITE-4049 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Fix For: 1.24.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths > between all pairs of nodes. For many scenarios, however, we do not need the > exact paths between nodes. Instead, we are only interested in the lengths of > the shortest paths. > To get the path length, we need to get the shortest path first, which is > returned as a {{List}}, then we call the {{List#size()}} method. According to > the current implementation, the returned list is of type {{ConsList}}. The > time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on > the path), which is inefficient. > In this issue, we revise the implementation of {{ConsList#size}} so that it > takes O(1) time. In addition, we also give a utiltiy to get the shortest > distances between nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances
[ https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136342#comment-17136342 ] Julian Hyde commented on CALCITE-4049: -- If there are particular algorithms (e.g. graph diameter) that you want to have a particular running time, consider adding them to the API. Then everyone can benchmark them. It seems to me that graph diameter could be easily and efficiently computed during freeze. > Reduce the time complexity of getting shortest distances > > > Key: CALCITE-4049 > URL: https://issues.apache.org/jira/browse/CALCITE-4049 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Fix For: 1.24.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths > between all pairs of nodes. For many scenarios, however, we do not need the > exact paths between nodes. Instead, we are only interested in the lengths of > the shortest paths. > To get the path length, we need to get the shortest path first, which is > returned as a {{List}}, then we call the {{List#size()}} method. According to > the current implementation, the returned list is of type {{ConsList}}. The > time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on > the path), which is inefficient. > In this issue, we revise the implementation of {{ConsList#size}} so that it > takes O(1) time. In addition, we also give a utiltiy to get the shortest > distances between nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances
[ https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136338#comment-17136338 ] Liya Fan commented on CALCITE-4049: --- [~xndai] Thanks for your clarification. In general, I agree with your point, as the concrete paths are not that important. Even if we need the shortest paths, we may not need all pairs of shortest paths. However, I am not sure if we need to keep the {{getShortestPath}} API, for the sake of backward compatibility. [~julianhyde] I agree with you that the shortest path lengths tend to be much smaller than the number of nodes in the graph. For cases when we need to get the shortest path lengths many times (e.g. calculating the graph diameter), the O(p) time complexity can be a bottleneck. Graphs#makeImmutable is another good example where the shortest path lengths are used repeatedly, as you indicated. > Reduce the time complexity of getting shortest distances > > > Key: CALCITE-4049 > URL: https://issues.apache.org/jira/browse/CALCITE-4049 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Fix For: 1.24.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths > between all pairs of nodes. For many scenarios, however, we do not need the > exact paths between nodes. Instead, we are only interested in the lengths of > the shortest paths. > To get the path length, we need to get the shortest path first, which is > returned as a {{List}}, then we call the {{List#size()}} method. According to > the current implementation, the returned list is of type {{ConsList}}. The > time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on > the path), which is inefficient. > In this issue, we revise the implementation of {{ConsList#size}} so that it > takes O(1) time. In addition, we also give a utiltiy to get the shortest > distances between nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances
[ https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136119#comment-17136119 ] Julian Hyde commented on CALCITE-4049: -- [~fan_li_ya], Can you describe the use case where you were getting performance problems? If would be surprised if one call to find the length of a path was problematic. In a many typical graphs with N nodes I'd expect the length of the shortest path is much smaller than N - say O(sqrt(N)). So in a graph of 10,000 nodes, finding the length of a path might require 100 or so operations. For such graphs, even fairly large, asking for the length of one path would not be expensive. Now if you are getting lots of paths and asking each for its length, then you now have a more complex algorithm. There are several ways that algorithm could be optimized - say by computing the path lengths once and storing them; or having a way of asking for all paths of length k - but it depends on the algorithm. Or are you finding that the graph freezing process is slow? (I would not be surprised, given that it calls {{List.size()}} many times, but again, there are other ways of optimizing the freezing algorithm.) > Reduce the time complexity of getting shortest distances > > > Key: CALCITE-4049 > URL: https://issues.apache.org/jira/browse/CALCITE-4049 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Fix For: 1.24.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths > between all pairs of nodes. For many scenarios, however, we do not need the > exact paths between nodes. Instead, we are only interested in the lengths of > the shortest paths. > To get the path length, we need to get the shortest path first, which is > returned as a {{List}}, then we call the {{List#size()}} method. According to > the current implementation, the returned list is of type {{ConsList}}. The > time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on > the path), which is inefficient. > In this issue, we revise the implementation of {{ConsList#size}} so that it > takes O(1) time. In addition, we also give a utiltiy to get the shortest > distances between nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances
[ https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17135989#comment-17135989 ] Xiening Dai commented on CALCITE-4049: -- My point is we don't even need shortest path. For converting conventions, we can get all path and then quickly do a sort to get shortest first (by the way, today's implementation doesn't even guarantee that, contradicting to the comment). This is not a big deal, since it only gets called once. > Reduce the time complexity of getting shortest distances > > > Key: CALCITE-4049 > URL: https://issues.apache.org/jira/browse/CALCITE-4049 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Fix For: 1.24.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths > between all pairs of nodes. For many scenarios, however, we do not need the > exact paths between nodes. Instead, we are only interested in the lengths of > the shortest paths. > To get the path length, we need to get the shortest path first, which is > returned as a {{List}}, then we call the {{List#size()}} method. According to > the current implementation, the returned list is of type {{ConsList}}. The > time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on > the path), which is inefficient. > In this issue, we revise the implementation of {{ConsList#size}} so that it > takes O(1) time. In addition, we also give a utiltiy to get the shortest > distances between nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances
[ https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17135665#comment-17135665 ] Liya Fan commented on CALCITE-4049: --- Hi all, thank you for the fruitful discussion. I think we all agree that we should not modify the general ConsList data structure. In addition, it seems that the path itself is not that important, and is not frequently used. For the problem of high cost of getting path length, I think we have some other ways to solve it: 1. We can have a list implementation that wraps the ConsList and have a size field. 2. We can store the shortest path length along with the shortest path in {{Map, List> shortestPaths = new HashMap<>();}} 3. We can employ Floyd's algorithm to get the shortest path lengths, and then restore the path in O(p) time, whenever necessary. Method 3 was suggested by Julian in the above discussion. I am also in favor of it, because storing all the shortest paths can be expensive. I will prepare a PR ASAP, and we can have further discussions based on it. Thank you. > Reduce the time complexity of getting shortest distances > > > Key: CALCITE-4049 > URL: https://issues.apache.org/jira/browse/CALCITE-4049 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Fix For: 1.24.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths > between all pairs of nodes. For many scenarios, however, we do not need the > exact paths between nodes. Instead, we are only interested in the lengths of > the shortest paths. > To get the path length, we need to get the shortest path first, which is > returned as a {{List}}, then we call the {{List#size()}} method. According to > the current implementation, the returned list is of type {{ConsList}}. The > time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on > the path), which is inefficient. > In this issue, we revise the implementation of {{ConsList#size}} so that it > takes O(1) time. In addition, we also give a utiltiy to get the shortest > distances between nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances
[ https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17135590#comment-17135590 ] Ruben Q L commented on CALCITE-4049: Thanks for your input. I'd suggest then: - Revert the current patch - Open separate ticket(s) to work on algorithmic solutions to achieve better performance in frozen graph computation What do you think [~fan_li_ya], [~julianhyde], [~xndai] ? > Reduce the time complexity of getting shortest distances > > > Key: CALCITE-4049 > URL: https://issues.apache.org/jira/browse/CALCITE-4049 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Fix For: 1.24.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths > between all pairs of nodes. For many scenarios, however, we do not need the > exact paths between nodes. Instead, we are only interested in the lengths of > the shortest paths. > To get the path length, we need to get the shortest path first, which is > returned as a {{List}}, then we call the {{List#size()}} method. According to > the current implementation, the returned list is of type {{ConsList}}. The > time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on > the path), which is inefficient. > In this issue, we revise the implementation of {{ConsList#size}} so that it > takes O(1) time. In addition, we also give a utiltiy to get the shortest > distances between nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances
[ https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134931#comment-17134931 ] Xiening Dai commented on CALCITE-4049: -- Based on the usage pattern, I don't think the shortest path is even needed. What we need is - 1. A fast way to detect the accessibility between two vertexes. 2. Method to get all path between two vertexes. For #1, we can build and cache an accessibility matrix when frozen the graph. And #2 can be calculated on-demand using DFS, since it's not at a hot path. > Reduce the time complexity of getting shortest distances > > > Key: CALCITE-4049 > URL: https://issues.apache.org/jira/browse/CALCITE-4049 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Fix For: 1.24.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths > between all pairs of nodes. For many scenarios, however, we do not need the > exact paths between nodes. Instead, we are only interested in the lengths of > the shortest paths. > To get the path length, we need to get the shortest path first, which is > returned as a {{List}}, then we call the {{List#size()}} method. According to > the current implementation, the returned list is of type {{ConsList}}. The > time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on > the path), which is inefficient. > In this issue, we revise the implementation of {{ConsList#size}} so that it > takes O(1) time. In addition, we also give a utiltiy to get the shortest > distances between nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances
[ https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134690#comment-17134690 ] Ruben Q L commented on CALCITE-4049: Looking back into it... I agree with [~julianhyde] and [~xndai]. Apart from all the other arguments: - Adding a {{size}} field to {{ConsList}} "violates" the contract of being a [cons|https://en.wikipedia.org/wiki/Cons] data structure. - If someone really requires an {{ImmutableList}} implementation with O(1) for size computation, they can still use e.g. guava's instead of {{ConsList}}. - As [~julianhyde], if the problem is the creation of a {{FrozenGraph}}, maybe that procedure can be re-worked to have a more efficient implementation. > Reduce the time complexity of getting shortest distances > > > Key: CALCITE-4049 > URL: https://issues.apache.org/jira/browse/CALCITE-4049 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Fix For: 1.24.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths > between all pairs of nodes. For many scenarios, however, we do not need the > exact paths between nodes. Instead, we are only interested in the lengths of > the shortest paths. > To get the path length, we need to get the shortest path first, which is > returned as a {{List}}, then we call the {{List#size()}} method. According to > the current implementation, the returned list is of type {{ConsList}}. The > time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on > the path), which is inefficient. > In this issue, we revise the implementation of {{ConsList#size}} so that it > takes O(1) time. In addition, we also give a utiltiy to get the shortest > distances between nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances
[ https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134593#comment-17134593 ] Xiening Dai commented on CALCITE-4049: -- I should also mention that getShortestPath() is also used by materialized view to determine if a materialization is used. It simply checks getShortestPath(...) != null, which should have been getPath(...) != null. https://github.com/apache/calcite/blob/24dd26640db01114c6931d6615b90a63969ffc42/core/src/main/java/org/apache/calcite/plan/RelOptMaterializations.java#L257 > Reduce the time complexity of getting shortest distances > > > Key: CALCITE-4049 > URL: https://issues.apache.org/jira/browse/CALCITE-4049 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Fix For: 1.24.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths > between all pairs of nodes. For many scenarios, however, we do not need the > exact paths between nodes. Instead, we are only interested in the lengths of > the shortest paths. > To get the path length, we need to get the shortest path first, which is > returned as a {{List}}, then we call the {{List#size()}} method. According to > the current implementation, the returned list is of type {{ConsList}}. The > time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on > the path), which is inefficient. > In this issue, we revise the implementation of {{ConsList#size}} so that it > takes O(1) time. In addition, we also give a utiltiy to get the shortest > distances between nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances
[ https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134589#comment-17134589 ] Xiening Dai commented on CALCITE-4049: -- After looking it deeper, I feel the shorted path might not needed at all. Right now, the getShortestPath() is used only by ConventionTraitDef.canConvert(). [1] But actually in ConventionTraitDef.convert(), it doesn't use the shortest path for conversion, instead it gets all possible paths and enumerate them one by one. [2] So it looks to me that the shortest path is not used at all. And even though the comment of Graph#getPaths() says it returns shortest path first, but actually it uses a DFS to find the path, and doesn't grantee shortest first today. So I believe the getShortedPath() can be removed. [1] https://github.com/apache/calcite/blob/52a57078ba081b24b9d086ed363c715485d1a519/core/src/main/java/org/apache/calcite/plan/ConventionTraitDef.java#L200 [2] https://github.com/apache/calcite/blob/52a57078ba081b24b9d086ed363c715485d1a519/core/src/main/java/org/apache/calcite/plan/ConventionTraitDef.java#L139 > Reduce the time complexity of getting shortest distances > > > Key: CALCITE-4049 > URL: https://issues.apache.org/jira/browse/CALCITE-4049 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Fix For: 1.24.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths > between all pairs of nodes. For many scenarios, however, we do not need the > exact paths between nodes. Instead, we are only interested in the lengths of > the shortest paths. > To get the path length, we need to get the shortest path first, which is > returned as a {{List}}, then we call the {{List#size()}} method. According to > the current implementation, the returned list is of type {{ConsList}}. The > time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on > the path), which is inefficient. > In this issue, we revise the implementation of {{ConsList#size}} so that it > takes O(1) time. In addition, we also give a utiltiy to get the shortest > distances between nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances
[ https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134573#comment-17134573 ] Julian Hyde commented on CALCITE-4049: -- The advantage of ConsList is that lists can share tails. For example, the paths [a, d, e] and [a, b, c, d, e] share the tail [d, e]. Sharing tails saves some memory (but I've not done any experiments to see whether there's a significant advantage). A single-linked list could in principle do the same, but java's LinkedList class (or in fact any double-linked list) does not share links. The original complaint in this bug is that when you get a path from a frozen graph, calling List.size() is expensive. But when I look at the code, I see that the "freeze" process ({{Graph.makeImmutable}}) heavily relies on List.size() and therefore will perform badly. A better implementation of "freeze" would perhaps use Floyd-Warshall algorithm to compute and store the lengths of the shortest paths between any two nodes. It would only store the length, not the path. The path can be reconstructed fairly cheaply. > Reduce the time complexity of getting shortest distances > > > Key: CALCITE-4049 > URL: https://issues.apache.org/jira/browse/CALCITE-4049 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Fix For: 1.24.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths > between all pairs of nodes. For many scenarios, however, we do not need the > exact paths between nodes. Instead, we are only interested in the lengths of > the shortest paths. > To get the path length, we need to get the shortest path first, which is > returned as a {{List}}, then we call the {{List#size()}} method. According to > the current implementation, the returned list is of type {{ConsList}}. The > time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on > the path), which is inefficient. > In this issue, we revise the implementation of {{ConsList#size}} so that it > takes O(1) time. In addition, we also give a utiltiy to get the shortest > distances between nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances
[ https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134529#comment-17134529 ] Xiening Dai commented on CALCITE-4049: -- I agree with [~julianhyde]. You cannot say there's no tradeoffs just because we have more memory. There are OOM issues we have seen, such as CALCITE-3784. In terms of the change itself, I wonder why we would need ConList in the first place. If we just use a LinkedList, we can contact the head node and also have O(1) complexity for size(). > Reduce the time complexity of getting shortest distances > > > Key: CALCITE-4049 > URL: https://issues.apache.org/jira/browse/CALCITE-4049 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Fix For: 1.24.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths > between all pairs of nodes. For many scenarios, however, we do not need the > exact paths between nodes. Instead, we are only interested in the lengths of > the shortest paths. > To get the path length, we need to get the shortest path first, which is > returned as a {{List}}, then we call the {{List#size()}} method. According to > the current implementation, the returned list is of type {{ConsList}}. The > time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on > the path), which is inefficient. > In this issue, we revise the implementation of {{ConsList#size}} so that it > takes O(1) time. In addition, we also give a utiltiy to get the shortest > distances between nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances
[ https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134400#comment-17134400 ] Julian Hyde commented on CALCITE-4049: -- You don't seem to realize that it is a trade-off. A change that improves some things makes other things worse. It is glib of you to say 'computers have more memory now, so there's no trade-off'. Cons is a general-purpose data structure. You made it better for your purposes but you have not considered other purposes. The best practice with data structures is to be explicit about the cost for various operations. In Cons, the running time of size is O(n). People understand and accept that. > Reduce the time complexity of getting shortest distances > > > Key: CALCITE-4049 > URL: https://issues.apache.org/jira/browse/CALCITE-4049 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Fix For: 1.24.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths > between all pairs of nodes. For many scenarios, however, we do not need the > exact paths between nodes. Instead, we are only interested in the lengths of > the shortest paths. > To get the path length, we need to get the shortest path first, which is > returned as a {{List}}, then we call the {{List#size()}} method. According to > the current implementation, the returned list is of type {{ConsList}}. The > time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on > the path), which is inefficient. > In this issue, we revise the implementation of {{ConsList#size}} so that it > takes O(1) time. In addition, we also give a utiltiy to get the shortest > distances between nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances
[ https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17132852#comment-17132852 ] Liya Fan commented on CALCITE-4049: --- [~julianhyde] Thanks a lot for your comments. The down-side of adding the size field is that it makes the object size 4 bytes larger (I am not sure if it is 50% larger, as an object has other meta-data to store). The up-side is reducing the time complexity of getting size from O(n) to O(1). IMO, a general trend for performance optimization is to reduce time complexity by taking more spaces, as memory/disk spaces are becoming less expensive today and modern computers tend to have larger memory/disk spaces. For example, we build indices for relation databases to reduce the query time, at the expense of extra memory/disk spaces. We can also find examples from Calcite code base. For example, in class [RexCall|https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rex/RexCall.java], we have the {{nodeCount}} field, which introduces an extra 4 byte. However, since it reduces the time complexity of getting node count, it should be considered a good optimization. > Reduce the time complexity of getting shortest distances > > > Key: CALCITE-4049 > URL: https://issues.apache.org/jira/browse/CALCITE-4049 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Fix For: 1.24.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths > between all pairs of nodes. For many scenarios, however, we do not need the > exact paths between nodes. Instead, we are only interested in the lengths of > the shortest paths. > To get the path length, we need to get the shortest path first, which is > returned as a {{List}}, then we call the {{List#size()}} method. According to > the current implementation, the returned list is of type {{ConsList}}. The > time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on > the path), which is inefficient. > In this issue, we revise the implementation of {{ConsList#size}} so that it > takes O(1) time. In addition, we also give a utiltiy to get the shortest > distances between nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances
[ https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17132562#comment-17132562 ] Julian Hyde commented on CALCITE-4049: -- This needed more discussion. You made a general-purpose data structure 50% larger. That may have a down-side. Also, I do not think the changes you made to the ASCII-art graphs are an improvement. The originals are concise and easy to read. Because they are DAGs arrows are implicitly left-to-right. > Reduce the time complexity of getting shortest distances > > > Key: CALCITE-4049 > URL: https://issues.apache.org/jira/browse/CALCITE-4049 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Fix For: 1.24.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths > between all pairs of nodes. For many scenarios, however, we do not need the > exact paths between nodes. Instead, we are only interested in the lengths of > the shortest paths. > To get the path length, we need to get the shortest path first, which is > returned as a {{List}}, then we call the {{List#size()}} method. According to > the current implementation, the returned list is of type {{ConsList}}. The > time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on > the path), which is inefficient. > In this issue, we revise the implementation of {{ConsList#size}} so that it > takes O(1) time. In addition, we also give a utiltiy to get the shortest > distances between nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances
[ https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128997#comment-17128997 ] Liya Fan commented on CALCITE-4049: --- [~rubenql] Thanks a lot for your effort. > Reduce the time complexity of getting shortest distances > > > Key: CALCITE-4049 > URL: https://issues.apache.org/jira/browse/CALCITE-4049 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Fix For: 1.24.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths > between all pairs of nodes. For many scenarios, however, we do not need the > exact paths between nodes. Instead, we are only interested in the lengths of > the shortest paths. > To get the path length, we need to get the shortest path first, which is > returned as a {{List}}, then we call the {{List#size()}} method. According to > the current implementation, the returned list is of type {{ConsList}}. The > time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on > the path), which is inefficient. > In this issue, we revise the implementation of {{ConsList#size}} so that it > takes O(1) time. In addition, we also give a utiltiy to get the shortest > distances between nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances
[ https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128971#comment-17128971 ] Ruben Q L commented on CALCITE-4049: Resolved via https://github.com/apache/calcite/commit/f9e8413b46f4c648736b529a55e25b57ab5dee74 Thanks [~fan_li_ya] for the PR! > Reduce the time complexity of getting shortest distances > > > Key: CALCITE-4049 > URL: https://issues.apache.org/jira/browse/CALCITE-4049 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths > between all pairs of nodes. For many scenarios, however, we do not need the > exact paths between nodes. Instead, we are only interested in the lengths of > the shortest paths. > To get the path length, we need to get the shortest path first, which is > returned as a {{List}}, then we call the {{List#size()}} method. According to > the current implementation, the returned list is of type {{ConsList}}. The > time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on > the path), which is inefficient. > In this issue, we revise the implementation of {{ConsList#size}} so that it > takes O(1) time. In addition, we also give a utiltiy to get the shortest > distances between nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005)