[jira] [Comment Edited] (CALCITE-4049) Reduce the time complexity of getting shortest distances

2020-06-15 Thread Liya Fan (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17135665#comment-17135665
 ] 

Liya Fan edited comment on CALCITE-4049 at 6/15/20, 12:20 PM:
--

Hi all, thank you for the fruitful discussion.

I think we all agree that we should not modify the general ConsList data 
structure. In addition, it seems that the path itself is not that important, 
and is not frequently used. 

For the problem of high cost of getting path length, I think we have some other 
ways to solve it:

1. We can have a list implementation that wraps the ConsList and have a size 
field.
2. We can store the shortest path length along with the shortest path in 

{{Map, List> shortestPaths = new HashMap<>();}}

3. We can employ Floyd's algorithm to get the shortest path lengths, and then 
restore the path in O(p) time, whenever necessary. 

Method 3 was suggested by Julian in the above discussion. I am also in favor of 
it, because storing all the shortest paths can be expensive. In addition, it 
speeds up algorithm convergence. What do you think?

I will prepare a PR ASAP, and we can have further discussions based on it. 
Thank you.




was (Author: fan_li_ya):
Hi all, thank you for the fruitful discussion.

I think we all agree that we should not modify the general ConsList data 
structure. In addition, it seems that the path itself is not that important, 
and is not frequently used. 

For the problem of high cost of getting path length, I think we have some other 
ways to solve it:

1. We can have a list implementation that wraps the ConsList and have a size 
field.
2. We can store the shortest path length along with the shortest path in 

{{Map, List> shortestPaths = new HashMap<>();}}

3. We can employ Floyd's algorithm to get the shortest path lengths, and then 
restore the path in O(p) time, whenever necessary. 

Method 3 was suggested by Julian in the above discussion. I am also in favor of 
it, because storing all the shortest paths can be expensive.

I will prepare a PR ASAP, and we can have further discussions based on it. 
Thank you.



> Reduce the time complexity of getting shortest distances
> 
>
> Key: CALCITE-4049
> URL: https://issues.apache.org/jira/browse/CALCITE-4049
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
> Fix For: 1.24.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths 
> between all pairs of nodes. For many scenarios, however, we do not need the 
> exact paths between nodes. Instead, we are only interested in the lengths of 
> the shortest paths. 
> To get the path length, we need to get the shortest path first, which is 
> returned as a {{List}}, then we call the {{List#size()}} method. According to 
> the current implementation, the returned list is of type {{ConsList}}. The 
> time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on 
> the path), which is inefficient. 
> In this issue, we revise the implementation of {{ConsList#size}} so that it 
> takes O(1) time. In addition, we also give a utiltiy to get the shortest 
> distances between nodes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (CALCITE-4049) Reduce the time complexity of getting shortest distances

2020-06-12 Thread Xiening Dai (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134529#comment-17134529
 ] 

Xiening Dai edited comment on CALCITE-4049 at 6/12/20, 8:44 PM:


I agree with [~julianhyde]. You cannot say there's no tradeoffs just because we 
have more memory. There are OOM issues we have seen, such as CALCITE-3784. 

In terms of the change itself, I wonder why we would need ConList in the first 
place. If we just use a LinkedList, we can concact the head node and also have 
O(1) complexity for size().


was (Author: xndai):
I agree with [~julianhyde]. You cannot say there's no tradeoffs just because we 
have more memory. There are OOM issues we have seen, such as CALCITE-3784. 

In terms of the change itself, I wonder why we would need ConList in the first 
place. If we just use a LinkedList, we can contact the head node and also have 
O(1) complexity for size().

> Reduce the time complexity of getting shortest distances
> 
>
> Key: CALCITE-4049
> URL: https://issues.apache.org/jira/browse/CALCITE-4049
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
> Fix For: 1.24.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths 
> between all pairs of nodes. For many scenarios, however, we do not need the 
> exact paths between nodes. Instead, we are only interested in the lengths of 
> the shortest paths. 
> To get the path length, we need to get the shortest path first, which is 
> returned as a {{List}}, then we call the {{List#size()}} method. According to 
> the current implementation, the returned list is of type {{ConsList}}. The 
> time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on 
> the path), which is inefficient. 
> In this issue, we revise the implementation of {{ConsList#size}} so that it 
> takes O(1) time. In addition, we also give a utiltiy to get the shortest 
> distances between nodes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (CALCITE-4049) Reduce the time complexity of getting shortest distances

2020-06-12 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134400#comment-17134400
 ] 

Julian Hyde edited comment on CALCITE-4049 at 6/12/20, 5:18 PM:


You don't seem to realize that it is a trade-off. A change that improves some 
things makes other things worse. It is glib of you to say 'computers have more 
memory now, so there's no trade-off'.

Cons is a general-purpose data structure. You made it better for your purposes 
but you have not considered other purposes.

The best practice with data structures is to be explicit about the cost for 
various operations. In Cons, the running time of size is O\(n). People 
understand and accept that.


was (Author: julianhyde):
You don't seem to realize that it is a trade-off. A change that improves some 
things makes other things worse. It is glib of you to say 'computers have more 
memory now, so there's no trade-off'.

Cons is a general-purpose data structure. You made it better for your purposes 
but you have not considered other purposes.

The best practice with data structures is to be explicit about the cost for 
various operations. In Cons, the running time of size is O(n). People 
understand and accept that.

> Reduce the time complexity of getting shortest distances
> 
>
> Key: CALCITE-4049
> URL: https://issues.apache.org/jira/browse/CALCITE-4049
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
> Fix For: 1.24.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths 
> between all pairs of nodes. For many scenarios, however, we do not need the 
> exact paths between nodes. Instead, we are only interested in the lengths of 
> the shortest paths. 
> To get the path length, we need to get the shortest path first, which is 
> returned as a {{List}}, then we call the {{List#size()}} method. According to 
> the current implementation, the returned list is of type {{ConsList}}. The 
> time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on 
> the path), which is inefficient. 
> In this issue, we revise the implementation of {{ConsList#size}} so that it 
> takes O(1) time. In addition, we also give a utiltiy to get the shortest 
> distances between nodes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (CALCITE-4049) Reduce the time complexity of getting shortest distances

2020-06-10 Thread Liya Fan (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17132852#comment-17132852
 ] 

Liya Fan edited comment on CALCITE-4049 at 6/11/20, 2:54 AM:
-

[~julianhyde] Thanks a lot for your comments. 
The down-side of adding the size field is that it makes the object size 4 bytes 
larger (I am not sure if it is 50% larger, as an object has other meta-data to 
store). The up-side is reducing the time complexity of getting size from O( n ) 
to O(1). 

IMO, a general trend for performance optimization is to reduce time complexity 
by taking more spaces, as memory/disk spaces are becoming less expensive today 
and modern computers tend to have larger memory/disk spaces. 

For example, we build indices for relation databases to reduce the query time, 
at the expense of extra memory/disk spaces. We can also find examples from 
Calcite code base. For example, in class 
[RexCall|https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rex/RexCall.java],
 we have the {{nodeCount}} field, which introduces an extra 4 byte. However, 
since it reduces the time complexity of getting node count, it should be 
considered a good optimization. 



was (Author: fan_li_ya):
[~julianhyde] Thanks a lot for your comments. 
The down-side of adding the size field is that it makes the object size 4 bytes 
larger (I am not sure if it is 50% larger, as an object has other meta-data to 
store). The up-side is reducing the time complexity of getting size from O(n) 
to O(1). 

IMO, a general trend for performance optimization is to reduce time complexity 
by taking more spaces, as memory/disk spaces are becoming less expensive today 
and modern computers tend to have larger memory/disk spaces. 

For example, we build indices for relation databases to reduce the query time, 
at the expense of extra memory/disk spaces. We can also find examples from 
Calcite code base. For example, in class 
[RexCall|https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rex/RexCall.java],
 we have the {{nodeCount}} field, which introduces an extra 4 byte. However, 
since it reduces the time complexity of getting node count, it should be 
considered a good optimization. 


> Reduce the time complexity of getting shortest distances
> 
>
> Key: CALCITE-4049
> URL: https://issues.apache.org/jira/browse/CALCITE-4049
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
> Fix For: 1.24.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths 
> between all pairs of nodes. For many scenarios, however, we do not need the 
> exact paths between nodes. Instead, we are only interested in the lengths of 
> the shortest paths. 
> To get the path length, we need to get the shortest path first, which is 
> returned as a {{List}}, then we call the {{List#size()}} method. According to 
> the current implementation, the returned list is of type {{ConsList}}. The 
> time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on 
> the path), which is inefficient. 
> In this issue, we revise the implementation of {{ConsList#size}} so that it 
> takes O(1) time. In addition, we also give a utiltiy to get the shortest 
> distances between nodes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)