[ 
https://issues.apache.org/jira/browse/DRILL-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541254#comment-17541254
 ] 

manabu nagamine commented on DRILL-8231:
----------------------------------------

I tried using Drill 1.19.

We have confirmed that the results are the same as in Drill 1.18.
{code:java}
Apache Drill 1.19.0
"Two things are infinite: the universe and Drill; and I'm not sure about the 
universe."
apache drill> select SUM(CAST(val11 as BIGINT)+CAST(val12 as BIGINT)) COL6408, 
COUNT(DISTINCT val2) COL4452 from 
dfs.root.`/drill/data/*/log_15872_R_79_*.parquet` WHERE 1 = 1  and ( ( dir0 
between '01
' and '10' )  ) and ( LOG_DATE >= '2022-04-01 00:00:00.000000' and LOG_DATE <= 
'2022-04-30 23:59:59.000000');
+------------+---------+
|  COL6408   | COL4452 |
+------------+---------+
| 9169057876 | 4       |
+------------+---------+
1 row selected (5.454 seconds)
apache drill> select COUNT(DISTINCT val2) COL4452, SUM(CAST(val11 as 
BIGINT)+CAST(val12 as BIGINT)) COL6408 from 
dfs.root.`/drill/data/*/log_15872_R_79_*.parquet` WHERE 1 = 1  and ( ( dir0 
between '01
' and '10' )  ) and ( LOG_DATE >= '2022-04-01 00:00:00.000000' and LOG_DATE <= 
'2022-04-30 23:59:59.000000');
+---------+------------+
| COL4452 |  COL6408   |
+---------+------------+
| 2       | 9169057876 |
+---------+------------+
1 row selected (0.812 seconds)
apache drill> select val2, val3, COUNT(DISTINCT val2) COL4452 from 
dfs.root.`/drill/data/*/log_15872_R_79_*.parquet` WHERE 1 = 1  and ( ( dir0 
between '01' and '10' )  ) and ( LOG_DATE >= '2022-04-01
00:00:00.000000' and LOG_DATE <= '2022-04-30 23:59:59.000000') group by val2, 
val3;
+------------------+-------+---------+
|       val2       | val3  | COL4452 |
+------------------+-------+---------+
| 4db387ff6ebcbe4d | HTV33 | 1       |
| c4b06a20f25edb91 | SHG01 | 1       |
+------------------+-------+---------+
2 rows selected (0.76 seconds)
apache drill> {code}

> Wrong result in the COUNT function position.
> --------------------------------------------
>
>                 Key: DRILL-8231
>                 URL: https://issues.apache.org/jira/browse/DRILL-8231
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.18.0
>            Reporter: manabu nagamine
>            Priority: Major
>         Attachments: drill.zip
>
>
> Hi Team.
> We using Drill 1.18.
> There is a phenomenon that the count values of COL4452 are different in the 
> execution results of the following queries.
> The only difference is that the positions of COL4452 and COL6408 have been 
> changed.
> {code:java}
> 1. 
> select COUNT(DISTINCT val2) COL4452, SUM(CAST(val11 as BIGINT)+CAST(val12 as 
> BIGINT)) COL6408 from dfs.root.`/drill/data/*/log_15872_R_79_*.parquet` WHERE 
> 1 = 1  and ( ( dir0 between '01' and '10' )  ) and ( LOG_DATE >= '2022-04-01 
> 00:00:00.000000' and LOG_DATE <= '2022-04-30 23:59:59.000000'); 
> 2.
> select SUM(CAST(val11 as BIGINT)+CAST(val12 as BIGINT)) COL6408, 
> COUNT(DISTINCT val2) COL4452 from 
> dfs.root.`/drill/data/*/log_15872_R_79_*.parquet` WHERE 1 = 1  and ( ( dir0 
> between '01' and '10' )  ) and ( LOG_DATE >= '2022-04-01 00:00:00.000000' and 
> LOG_DATE <= '2022-04-30 23:59:59.000000');{code}
> As for the actual data, the count with COL4452 at the beginning of 1. is 
> correct.
> I am having trouble understanding the cause of this phenomenon.
> Can anybody help me?Thanks in advance.
> Attached the parquet log file.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to