[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances

2020-06-30 Thread Ruben Q L (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17148358#comment-17148358
 ] 

Ruben Q L commented on CALCITE-4049:


Fixed via 
https://github.com/apache/calcite/commit/bd121aa3a6d3b0314785a4c9141028b2ef252ebf
Thanks [~fan_li_ya] the patch!

> Reduce the time complexity of getting shortest distances
> 
>
> Key: CALCITE-4049
> URL: https://issues.apache.org/jira/browse/CALCITE-4049
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
> Fix For: 1.24.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths 
> between all pairs of nodes. For many scenarios, however, we do not need the 
> exact paths between nodes. Instead, we are only interested in the lengths of 
> the shortest paths. 
> To get the path length, we need to get the shortest path first, which is 
> returned as a {{List}}, then we call the {{List#size()}} method. According to 
> the current implementation, the returned list is of type {{ConsList}}. The 
> time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on 
> the path), which is inefficient. 
> In this issue, we revise the implementation of {{ConsList#size}} so that it 
> takes O(1) time. In addition, we also give a utiltiy to get the shortest 
> distances between nodes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances

2020-06-16 Thread Liya Fan (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136359#comment-17136359
 ] 

Liya Fan commented on CALCITE-4049:
---

[~julianhyde] Sounds reasonable. Thank you for the good point. 

> Reduce the time complexity of getting shortest distances
> 
>
> Key: CALCITE-4049
> URL: https://issues.apache.org/jira/browse/CALCITE-4049
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
> Fix For: 1.24.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths 
> between all pairs of nodes. For many scenarios, however, we do not need the 
> exact paths between nodes. Instead, we are only interested in the lengths of 
> the shortest paths. 
> To get the path length, we need to get the shortest path first, which is 
> returned as a {{List}}, then we call the {{List#size()}} method. According to 
> the current implementation, the returned list is of type {{ConsList}}. The 
> time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on 
> the path), which is inefficient. 
> In this issue, we revise the implementation of {{ConsList#size}} so that it 
> takes O(1) time. In addition, we also give a utiltiy to get the shortest 
> distances between nodes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances

2020-06-16 Thread Liya Fan (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136344#comment-17136344
 ] 

Liya Fan commented on CALCITE-4049:
---

I have prepare another PR (https://github.com/apache/calcite/pull/2027/files) 
according to the above discussion. Maybe we can use that for further discussion.

BTW, I have tried the Floyd-Warshall algorithm, but it did not perform well, as 
it lacks an early stopping methanism. 

> Reduce the time complexity of getting shortest distances
> 
>
> Key: CALCITE-4049
> URL: https://issues.apache.org/jira/browse/CALCITE-4049
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
> Fix For: 1.24.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths 
> between all pairs of nodes. For many scenarios, however, we do not need the 
> exact paths between nodes. Instead, we are only interested in the lengths of 
> the shortest paths. 
> To get the path length, we need to get the shortest path first, which is 
> returned as a {{List}}, then we call the {{List#size()}} method. According to 
> the current implementation, the returned list is of type {{ConsList}}. The 
> time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on 
> the path), which is inefficient. 
> In this issue, we revise the implementation of {{ConsList#size}} so that it 
> takes O(1) time. In addition, we also give a utiltiy to get the shortest 
> distances between nodes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances

2020-06-16 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136342#comment-17136342
 ] 

Julian Hyde commented on CALCITE-4049:
--

If there are particular algorithms (e.g. graph diameter) that you want to have 
a particular running time, consider adding them to the API. Then everyone can 
benchmark them. It seems to me that graph diameter could be easily and 
efficiently computed during freeze.

> Reduce the time complexity of getting shortest distances
> 
>
> Key: CALCITE-4049
> URL: https://issues.apache.org/jira/browse/CALCITE-4049
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
> Fix For: 1.24.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths 
> between all pairs of nodes. For many scenarios, however, we do not need the 
> exact paths between nodes. Instead, we are only interested in the lengths of 
> the shortest paths. 
> To get the path length, we need to get the shortest path first, which is 
> returned as a {{List}}, then we call the {{List#size()}} method. According to 
> the current implementation, the returned list is of type {{ConsList}}. The 
> time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on 
> the path), which is inefficient. 
> In this issue, we revise the implementation of {{ConsList#size}} so that it 
> takes O(1) time. In addition, we also give a utiltiy to get the shortest 
> distances between nodes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances

2020-06-16 Thread Liya Fan (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136338#comment-17136338
 ] 

Liya Fan commented on CALCITE-4049:
---

[~xndai] Thanks for your clarification. In general, I agree with your point, as 
the concrete paths are not that important. Even if we need the shortest paths, 
we may not need all pairs of shortest paths. However, I am not sure if we need 
to keep the {{getShortestPath}} API, for the sake of backward compatibility. 

[~julianhyde] I agree with you that the shortest path lengths tend to be much 
smaller than the number of nodes in the graph. For cases when we need to get 
the shortest path lengths many times (e.g. calculating the graph diameter), the 
O(p) time complexity can be a bottleneck. Graphs#makeImmutable is another good 
example where the shortest path lengths are used repeatedly, as you indicated. 

> Reduce the time complexity of getting shortest distances
> 
>
> Key: CALCITE-4049
> URL: https://issues.apache.org/jira/browse/CALCITE-4049
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
> Fix For: 1.24.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths 
> between all pairs of nodes. For many scenarios, however, we do not need the 
> exact paths between nodes. Instead, we are only interested in the lengths of 
> the shortest paths. 
> To get the path length, we need to get the shortest path first, which is 
> returned as a {{List}}, then we call the {{List#size()}} method. According to 
> the current implementation, the returned list is of type {{ConsList}}. The 
> time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on 
> the path), which is inefficient. 
> In this issue, we revise the implementation of {{ConsList#size}} so that it 
> takes O(1) time. In addition, we also give a utiltiy to get the shortest 
> distances between nodes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances

2020-06-15 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136119#comment-17136119
 ] 

Julian Hyde commented on CALCITE-4049:
--

[~fan_li_ya], Can you describe the use case where you were getting performance 
problems?

If would be surprised if one call to find the length of a path was problematic. 
In a many typical graphs with N nodes I'd expect the length of the shortest 
path is much smaller than N - say O(sqrt(N)). So in a graph of 10,000 nodes, 
finding the length of a path might require 100 or so operations. For such 
graphs, even fairly large, asking for the length of one path would not be 
expensive.

Now if you are getting lots of paths and asking each for its length, then you 
now have a more complex algorithm. There are several ways that algorithm could 
be optimized - say by computing the path lengths once and storing them; or 
having a way of asking for all paths of length k - but it depends on the 
algorithm.

Or are you finding that the graph freezing process is slow? (I would not be 
surprised, given that it calls {{List.size()}} many times, but again, there are 
other ways of optimizing the freezing algorithm.)

> Reduce the time complexity of getting shortest distances
> 
>
> Key: CALCITE-4049
> URL: https://issues.apache.org/jira/browse/CALCITE-4049
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
> Fix For: 1.24.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths 
> between all pairs of nodes. For many scenarios, however, we do not need the 
> exact paths between nodes. Instead, we are only interested in the lengths of 
> the shortest paths. 
> To get the path length, we need to get the shortest path first, which is 
> returned as a {{List}}, then we call the {{List#size()}} method. According to 
> the current implementation, the returned list is of type {{ConsList}}. The 
> time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on 
> the path), which is inefficient. 
> In this issue, we revise the implementation of {{ConsList#size}} so that it 
> takes O(1) time. In addition, we also give a utiltiy to get the shortest 
> distances between nodes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances

2020-06-15 Thread Xiening Dai (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17135989#comment-17135989
 ] 

Xiening Dai commented on CALCITE-4049:
--

My point is we don't even need shortest path. For converting conventions, we 
can get all path and then quickly do a sort to get shortest first (by the way, 
today's implementation doesn't even guarantee that, contradicting to the 
comment). This is not a big deal, since it only gets called once.

> Reduce the time complexity of getting shortest distances
> 
>
> Key: CALCITE-4049
> URL: https://issues.apache.org/jira/browse/CALCITE-4049
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
> Fix For: 1.24.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths 
> between all pairs of nodes. For many scenarios, however, we do not need the 
> exact paths between nodes. Instead, we are only interested in the lengths of 
> the shortest paths. 
> To get the path length, we need to get the shortest path first, which is 
> returned as a {{List}}, then we call the {{List#size()}} method. According to 
> the current implementation, the returned list is of type {{ConsList}}. The 
> time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on 
> the path), which is inefficient. 
> In this issue, we revise the implementation of {{ConsList#size}} so that it 
> takes O(1) time. In addition, we also give a utiltiy to get the shortest 
> distances between nodes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances

2020-06-15 Thread Liya Fan (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17135665#comment-17135665
 ] 

Liya Fan commented on CALCITE-4049:
---

Hi all, thank you for the fruitful discussion.

I think we all agree that we should not modify the general ConsList data 
structure. In addition, it seems that the path itself is not that important, 
and is not frequently used. 

For the problem of high cost of getting path length, I think we have some other 
ways to solve it:

1. We can have a list implementation that wraps the ConsList and have a size 
field.
2. We can store the shortest path length along with the shortest path in 

{{Map, List> shortestPaths = new HashMap<>();}}

3. We can employ Floyd's algorithm to get the shortest path lengths, and then 
restore the path in O(p) time, whenever necessary. 

Method 3 was suggested by Julian in the above discussion. I am also in favor of 
it, because storing all the shortest paths can be expensive.

I will prepare a PR ASAP, and we can have further discussions based on it. 
Thank you.



> Reduce the time complexity of getting shortest distances
> 
>
> Key: CALCITE-4049
> URL: https://issues.apache.org/jira/browse/CALCITE-4049
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
> Fix For: 1.24.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths 
> between all pairs of nodes. For many scenarios, however, we do not need the 
> exact paths between nodes. Instead, we are only interested in the lengths of 
> the shortest paths. 
> To get the path length, we need to get the shortest path first, which is 
> returned as a {{List}}, then we call the {{List#size()}} method. According to 
> the current implementation, the returned list is of type {{ConsList}}. The 
> time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on 
> the path), which is inefficient. 
> In this issue, we revise the implementation of {{ConsList#size}} so that it 
> takes O(1) time. In addition, we also give a utiltiy to get the shortest 
> distances between nodes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances

2020-06-15 Thread Ruben Q L (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17135590#comment-17135590
 ] 

Ruben Q L commented on CALCITE-4049:


Thanks for your input.
I'd suggest then:
- Revert the current patch
- Open separate ticket(s) to work on algorithmic solutions to achieve better 
performance in frozen graph computation

What do you think [~fan_li_ya], [~julianhyde], [~xndai] ?

> Reduce the time complexity of getting shortest distances
> 
>
> Key: CALCITE-4049
> URL: https://issues.apache.org/jira/browse/CALCITE-4049
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
> Fix For: 1.24.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths 
> between all pairs of nodes. For many scenarios, however, we do not need the 
> exact paths between nodes. Instead, we are only interested in the lengths of 
> the shortest paths. 
> To get the path length, we need to get the shortest path first, which is 
> returned as a {{List}}, then we call the {{List#size()}} method. According to 
> the current implementation, the returned list is of type {{ConsList}}. The 
> time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on 
> the path), which is inefficient. 
> In this issue, we revise the implementation of {{ConsList#size}} so that it 
> takes O(1) time. In addition, we also give a utiltiy to get the shortest 
> distances between nodes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances

2020-06-13 Thread Xiening Dai (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134931#comment-17134931
 ] 

Xiening Dai commented on CALCITE-4049:
--

Based on the usage pattern, I don't think the shortest path is even needed. 
What we need is -

1. A fast way to detect the accessibility between two vertexes.
2. Method to get all path between two vertexes.

For #1, we can build and cache an accessibility matrix when frozen the graph. 
And #2 can be calculated on-demand using DFS, since it's not at a hot path.

> Reduce the time complexity of getting shortest distances
> 
>
> Key: CALCITE-4049
> URL: https://issues.apache.org/jira/browse/CALCITE-4049
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
> Fix For: 1.24.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths 
> between all pairs of nodes. For many scenarios, however, we do not need the 
> exact paths between nodes. Instead, we are only interested in the lengths of 
> the shortest paths. 
> To get the path length, we need to get the shortest path first, which is 
> returned as a {{List}}, then we call the {{List#size()}} method. According to 
> the current implementation, the returned list is of type {{ConsList}}. The 
> time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on 
> the path), which is inefficient. 
> In this issue, we revise the implementation of {{ConsList#size}} so that it 
> takes O(1) time. In addition, we also give a utiltiy to get the shortest 
> distances between nodes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances

2020-06-13 Thread Ruben Q L (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134690#comment-17134690
 ] 

Ruben Q L commented on CALCITE-4049:


Looking back into it... I agree with [~julianhyde] and [~xndai]. Apart from all 
the other arguments:
- Adding a {{size}} field to {{ConsList}} "violates" the contract of being a 
[cons|https://en.wikipedia.org/wiki/Cons] data structure.
- If someone really requires an {{ImmutableList}} implementation with O(1) for 
size computation, they can still use e.g. guava's instead of {{ConsList}}.
- As [~julianhyde], if the problem is the creation of a {{FrozenGraph}}, maybe 
that procedure can be re-worked to have a more efficient implementation.


> Reduce the time complexity of getting shortest distances
> 
>
> Key: CALCITE-4049
> URL: https://issues.apache.org/jira/browse/CALCITE-4049
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
> Fix For: 1.24.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths 
> between all pairs of nodes. For many scenarios, however, we do not need the 
> exact paths between nodes. Instead, we are only interested in the lengths of 
> the shortest paths. 
> To get the path length, we need to get the shortest path first, which is 
> returned as a {{List}}, then we call the {{List#size()}} method. According to 
> the current implementation, the returned list is of type {{ConsList}}. The 
> time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on 
> the path), which is inefficient. 
> In this issue, we revise the implementation of {{ConsList#size}} so that it 
> takes O(1) time. In addition, we also give a utiltiy to get the shortest 
> distances between nodes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances

2020-06-12 Thread Xiening Dai (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134593#comment-17134593
 ] 

Xiening Dai commented on CALCITE-4049:
--

I should also mention that getShortestPath() is also used by materialized view 
to determine if a materialization is used. It simply checks 
getShortestPath(...) != null, which should have been getPath(...) != null.

https://github.com/apache/calcite/blob/24dd26640db01114c6931d6615b90a63969ffc42/core/src/main/java/org/apache/calcite/plan/RelOptMaterializations.java#L257

> Reduce the time complexity of getting shortest distances
> 
>
> Key: CALCITE-4049
> URL: https://issues.apache.org/jira/browse/CALCITE-4049
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
> Fix For: 1.24.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths 
> between all pairs of nodes. For many scenarios, however, we do not need the 
> exact paths between nodes. Instead, we are only interested in the lengths of 
> the shortest paths. 
> To get the path length, we need to get the shortest path first, which is 
> returned as a {{List}}, then we call the {{List#size()}} method. According to 
> the current implementation, the returned list is of type {{ConsList}}. The 
> time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on 
> the path), which is inefficient. 
> In this issue, we revise the implementation of {{ConsList#size}} so that it 
> takes O(1) time. In addition, we also give a utiltiy to get the shortest 
> distances between nodes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances

2020-06-12 Thread Xiening Dai (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134589#comment-17134589
 ] 

Xiening Dai commented on CALCITE-4049:
--

After looking it deeper, I feel the shorted path might not needed at all. 

Right now, the getShortestPath() is used only by 
ConventionTraitDef.canConvert(). [1] But actually in 
ConventionTraitDef.convert(), it doesn't use the shortest path for conversion, 
instead it gets all possible paths and enumerate them one by one. [2] So it 
looks to me that the shortest path is not used at all. And even though the 
comment of Graph#getPaths() says it returns shortest path first, but actually 
it uses a DFS to find the path, and doesn't grantee shortest first today. So I 
believe the getShortedPath() can be removed.

[1] 
https://github.com/apache/calcite/blob/52a57078ba081b24b9d086ed363c715485d1a519/core/src/main/java/org/apache/calcite/plan/ConventionTraitDef.java#L200
[2] 
https://github.com/apache/calcite/blob/52a57078ba081b24b9d086ed363c715485d1a519/core/src/main/java/org/apache/calcite/plan/ConventionTraitDef.java#L139

> Reduce the time complexity of getting shortest distances
> 
>
> Key: CALCITE-4049
> URL: https://issues.apache.org/jira/browse/CALCITE-4049
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
> Fix For: 1.24.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths 
> between all pairs of nodes. For many scenarios, however, we do not need the 
> exact paths between nodes. Instead, we are only interested in the lengths of 
> the shortest paths. 
> To get the path length, we need to get the shortest path first, which is 
> returned as a {{List}}, then we call the {{List#size()}} method. According to 
> the current implementation, the returned list is of type {{ConsList}}. The 
> time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on 
> the path), which is inefficient. 
> In this issue, we revise the implementation of {{ConsList#size}} so that it 
> takes O(1) time. In addition, we also give a utiltiy to get the shortest 
> distances between nodes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances

2020-06-12 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134573#comment-17134573
 ] 

Julian Hyde commented on CALCITE-4049:
--

The advantage of ConsList is that lists can share tails. For example, the paths 
[a, d, e] and [a, b, c, d, e] share the tail [d, e]. Sharing tails saves some 
memory (but I've not done any experiments to see whether there's a significant 
advantage).

A single-linked list could in principle do the same, but java's LinkedList 
class (or in fact any double-linked list) does not share links.

The original complaint in this bug is that when you get a path from a frozen 
graph, calling List.size() is expensive. But when I look at the code, I see 
that the "freeze" process ({{Graph.makeImmutable}}) heavily relies on 
List.size() and therefore will perform badly.

A better implementation of "freeze" would perhaps use Floyd-Warshall algorithm 
to compute and store the lengths of the shortest paths between any two nodes. 
It would only store the length, not the path. The path can be reconstructed 
fairly cheaply.

> Reduce the time complexity of getting shortest distances
> 
>
> Key: CALCITE-4049
> URL: https://issues.apache.org/jira/browse/CALCITE-4049
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
> Fix For: 1.24.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths 
> between all pairs of nodes. For many scenarios, however, we do not need the 
> exact paths between nodes. Instead, we are only interested in the lengths of 
> the shortest paths. 
> To get the path length, we need to get the shortest path first, which is 
> returned as a {{List}}, then we call the {{List#size()}} method. According to 
> the current implementation, the returned list is of type {{ConsList}}. The 
> time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on 
> the path), which is inefficient. 
> In this issue, we revise the implementation of {{ConsList#size}} so that it 
> takes O(1) time. In addition, we also give a utiltiy to get the shortest 
> distances between nodes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances

2020-06-12 Thread Xiening Dai (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134529#comment-17134529
 ] 

Xiening Dai commented on CALCITE-4049:
--

I agree with [~julianhyde]. You cannot say there's no tradeoffs just because we 
have more memory. There are OOM issues we have seen, such as CALCITE-3784. 

In terms of the change itself, I wonder why we would need ConList in the first 
place. If we just use a LinkedList, we can contact the head node and also have 
O(1) complexity for size().

> Reduce the time complexity of getting shortest distances
> 
>
> Key: CALCITE-4049
> URL: https://issues.apache.org/jira/browse/CALCITE-4049
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
> Fix For: 1.24.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths 
> between all pairs of nodes. For many scenarios, however, we do not need the 
> exact paths between nodes. Instead, we are only interested in the lengths of 
> the shortest paths. 
> To get the path length, we need to get the shortest path first, which is 
> returned as a {{List}}, then we call the {{List#size()}} method. According to 
> the current implementation, the returned list is of type {{ConsList}}. The 
> time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on 
> the path), which is inefficient. 
> In this issue, we revise the implementation of {{ConsList#size}} so that it 
> takes O(1) time. In addition, we also give a utiltiy to get the shortest 
> distances between nodes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances

2020-06-12 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134400#comment-17134400
 ] 

Julian Hyde commented on CALCITE-4049:
--

You don't seem to realize that it is a trade-off. A change that improves some 
things makes other things worse. It is glib of you to say 'computers have more 
memory now, so there's no trade-off'.

Cons is a general-purpose data structure. You made it better for your purposes 
but you have not considered other purposes.

The best practice with data structures is to be explicit about the cost for 
various operations. In Cons, the running time of size is O(n). People 
understand and accept that.

> Reduce the time complexity of getting shortest distances
> 
>
> Key: CALCITE-4049
> URL: https://issues.apache.org/jira/browse/CALCITE-4049
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
> Fix For: 1.24.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths 
> between all pairs of nodes. For many scenarios, however, we do not need the 
> exact paths between nodes. Instead, we are only interested in the lengths of 
> the shortest paths. 
> To get the path length, we need to get the shortest path first, which is 
> returned as a {{List}}, then we call the {{List#size()}} method. According to 
> the current implementation, the returned list is of type {{ConsList}}. The 
> time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on 
> the path), which is inefficient. 
> In this issue, we revise the implementation of {{ConsList#size}} so that it 
> takes O(1) time. In addition, we also give a utiltiy to get the shortest 
> distances between nodes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances

2020-06-10 Thread Liya Fan (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17132852#comment-17132852
 ] 

Liya Fan commented on CALCITE-4049:
---

[~julianhyde] Thanks a lot for your comments. 
The down-side of adding the size field is that it makes the object size 4 bytes 
larger (I am not sure if it is 50% larger, as an object has other meta-data to 
store). The up-side is reducing the time complexity of getting size from O(n) 
to O(1). 

IMO, a general trend for performance optimization is to reduce time complexity 
by taking more spaces, as memory/disk spaces are becoming less expensive today 
and modern computers tend to have larger memory/disk spaces. 

For example, we build indices for relation databases to reduce the query time, 
at the expense of extra memory/disk spaces. We can also find examples from 
Calcite code base. For example, in class 
[RexCall|https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rex/RexCall.java],
 we have the {{nodeCount}} field, which introduces an extra 4 byte. However, 
since it reduces the time complexity of getting node count, it should be 
considered a good optimization. 


> Reduce the time complexity of getting shortest distances
> 
>
> Key: CALCITE-4049
> URL: https://issues.apache.org/jira/browse/CALCITE-4049
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
> Fix For: 1.24.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths 
> between all pairs of nodes. For many scenarios, however, we do not need the 
> exact paths between nodes. Instead, we are only interested in the lengths of 
> the shortest paths. 
> To get the path length, we need to get the shortest path first, which is 
> returned as a {{List}}, then we call the {{List#size()}} method. According to 
> the current implementation, the returned list is of type {{ConsList}}. The 
> time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on 
> the path), which is inefficient. 
> In this issue, we revise the implementation of {{ConsList#size}} so that it 
> takes O(1) time. In addition, we also give a utiltiy to get the shortest 
> distances between nodes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances

2020-06-10 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17132562#comment-17132562
 ] 

Julian Hyde commented on CALCITE-4049:
--

This needed more discussion.

You made a general-purpose data structure 50% larger. That may have a down-side.

Also, I do not think the changes you made to the ASCII-art graphs are an 
improvement. The originals are concise and easy to read. Because they are DAGs 
arrows are implicitly left-to-right.

> Reduce the time complexity of getting shortest distances
> 
>
> Key: CALCITE-4049
> URL: https://issues.apache.org/jira/browse/CALCITE-4049
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
> Fix For: 1.24.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths 
> between all pairs of nodes. For many scenarios, however, we do not need the 
> exact paths between nodes. Instead, we are only interested in the lengths of 
> the shortest paths. 
> To get the path length, we need to get the shortest path first, which is 
> returned as a {{List}}, then we call the {{List#size()}} method. According to 
> the current implementation, the returned list is of type {{ConsList}}. The 
> time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on 
> the path), which is inefficient. 
> In this issue, we revise the implementation of {{ConsList#size}} so that it 
> takes O(1) time. In addition, we also give a utiltiy to get the shortest 
> distances between nodes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances

2020-06-09 Thread Liya Fan (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128997#comment-17128997
 ] 

Liya Fan commented on CALCITE-4049:
---

[~rubenql] Thanks a lot for your effort. 

> Reduce the time complexity of getting shortest distances
> 
>
> Key: CALCITE-4049
> URL: https://issues.apache.org/jira/browse/CALCITE-4049
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
> Fix For: 1.24.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths 
> between all pairs of nodes. For many scenarios, however, we do not need the 
> exact paths between nodes. Instead, we are only interested in the lengths of 
> the shortest paths. 
> To get the path length, we need to get the shortest path first, which is 
> returned as a {{List}}, then we call the {{List#size()}} method. According to 
> the current implementation, the returned list is of type {{ConsList}}. The 
> time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on 
> the path), which is inefficient. 
> In this issue, we revise the implementation of {{ConsList#size}} so that it 
> takes O(1) time. In addition, we also give a utiltiy to get the shortest 
> distances between nodes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-4049) Reduce the time complexity of getting shortest distances

2020-06-09 Thread Ruben Q L (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128971#comment-17128971
 ] 

Ruben Q L commented on CALCITE-4049:


Resolved via 
https://github.com/apache/calcite/commit/f9e8413b46f4c648736b529a55e25b57ab5dee74
Thanks [~fan_li_ya] for the PR!

> Reduce the time complexity of getting shortest distances
> 
>
> Key: CALCITE-4049
> URL: https://issues.apache.org/jira/browse/CALCITE-4049
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths 
> between all pairs of nodes. For many scenarios, however, we do not need the 
> exact paths between nodes. Instead, we are only interested in the lengths of 
> the shortest paths. 
> To get the path length, we need to get the shortest path first, which is 
> returned as a {{List}}, then we call the {{List#size()}} method. According to 
> the current implementation, the returned list is of type {{ConsList}}. The 
> time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on 
> the path), which is inefficient. 
> In this issue, we revise the implementation of {{ConsList#size}} so that it 
> takes O(1) time. In addition, we also give a utiltiy to get the shortest 
> distances between nodes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)