[jira] [Commented] (ARROW-12334) [Rust] [Ballista] Aggregate queries producing incorrect results

2021-04-17 Thread Andy Grove (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324288#comment-17324288
 ] 

Andy Grove commented on ARROW-12334:


I tracked this down and there are two separate bugs:

1. We are getting RepartitionExec in the plan which is not compatible with 
Ballista and explodes the number of partitions (and likely causes incorrect 
results)
2. The query actually works fine and the final sort produces 2 rows, but the 
results are created by reading all the intermediate results as well

> [Rust] [Ballista] Aggregate queries producing incorrect results
> ---
>
> Key: ARROW-12334
> URL: https://issues.apache.org/jira/browse/ARROW-12334
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - Ballista
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>
> I just ran benchmarks for the first time in a while and I see duplicate 
> entries for group by keys.
>  
> For example, query 1 has "group by l_returnflag, l_linestatus" and I see 
> multiple results with l_returnflag = 'A' and l_linestatus = 'F'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-12334) [Rust] [Ballista] Aggregate queries producing incorrect results

2021-04-11 Thread Andy Grove (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318973#comment-17318973
 ] 

Andy Grove commented on ARROW-12334:


I'm now very confused about this issue. I have been working on debugging it and 
now it suddenly is working, so I don't know if it is an intermittent bug or 
not. When it works correctly, the query returns 4 rows and takes ~13 seconds 
for me. When it does not work it returns many times more rows and takes 3x as 
long.

It would be good to get a second pair of eyes on this.

> [Rust] [Ballista] Aggregate queries producing incorrect results
> ---
>
> Key: ARROW-12334
> URL: https://issues.apache.org/jira/browse/ARROW-12334
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - Ballista
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 4.0.0
>
>
> I just ran benchmarks for the first time in a while and I see duplicate 
> entries for group by keys.
>  
> For example, query 1 has "group by l_returnflag, l_linestatus" and I see 
> multiple results with l_returnflag = 'A' and l_linestatus = 'F'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-12334) [Rust] [Ballista] Aggregate queries producing incorrect results

2021-04-11 Thread Andy Grove (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318951#comment-17318951
 ] 

Andy Grove commented on ARROW-12334:


I tracked down the PR that introduced the regression in the original repo and 
it was [https://github.com/ballista-compute/ballista/pull/574]

> [Rust] [Ballista] Aggregate queries producing incorrect results
> ---
>
> Key: ARROW-12334
> URL: https://issues.apache.org/jira/browse/ARROW-12334
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - Ballista
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 4.0.0
>
>
> I just ran benchmarks for the first time in a while and I see duplicate 
> entries for group by keys.
>  
> For example, query 1 has "group by l_returnflag, l_linestatus" and I see 
> multiple results with l_returnflag = 'A' and l_linestatus = 'F'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)