[
https://issues.apache.org/jira/browse/DRILL-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541254#comment-17541254
]
manabu nagamine commented on DRILL-8231:
----------------------------------------
I tried using Drill 1.19.
We have confirmed that the results are the same as in Drill 1.18.
{code:java}
Apache Drill 1.19.0
"Two things are infinite: the universe and Drill; and I'm not sure about the
universe."
apache drill> select SUM(CAST(val11 as BIGINT)+CAST(val12 as BIGINT)) COL6408,
COUNT(DISTINCT val2) COL4452 from
dfs.root.`/drill/data/*/log_15872_R_79_*.parquet` WHERE 1 = 1 and ( ( dir0
between '01
' and '10' ) ) and ( LOG_DATE >= '2022-04-01 00:00:00.000000' and LOG_DATE <=
'2022-04-30 23:59:59.000000');
+------------+---------+
| COL6408 | COL4452 |
+------------+---------+
| 9169057876 | 4 |
+------------+---------+
1 row selected (5.454 seconds)
apache drill> select COUNT(DISTINCT val2) COL4452, SUM(CAST(val11 as
BIGINT)+CAST(val12 as BIGINT)) COL6408 from
dfs.root.`/drill/data/*/log_15872_R_79_*.parquet` WHERE 1 = 1 and ( ( dir0
between '01
' and '10' ) ) and ( LOG_DATE >= '2022-04-01 00:00:00.000000' and LOG_DATE <=
'2022-04-30 23:59:59.000000');
+---------+------------+
| COL4452 | COL6408 |
+---------+------------+
| 2 | 9169057876 |
+---------+------------+
1 row selected (0.812 seconds)
apache drill> select val2, val3, COUNT(DISTINCT val2) COL4452 from
dfs.root.`/drill/data/*/log_15872_R_79_*.parquet` WHERE 1 = 1 and ( ( dir0
between '01' and '10' ) ) and ( LOG_DATE >= '2022-04-01
00:00:00.000000' and LOG_DATE <= '2022-04-30 23:59:59.000000') group by val2,
val3;
+------------------+-------+---------+
| val2 | val3 | COL4452 |
+------------------+-------+---------+
| 4db387ff6ebcbe4d | HTV33 | 1 |
| c4b06a20f25edb91 | SHG01 | 1 |
+------------------+-------+---------+
2 rows selected (0.76 seconds)
apache drill> {code}
> Wrong result in the COUNT function position.
> --------------------------------------------
>
> Key: DRILL-8231
> URL: https://issues.apache.org/jira/browse/DRILL-8231
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.18.0
> Reporter: manabu nagamine
> Priority: Major
> Attachments: drill.zip
>
>
> Hi Team.
> We using Drill 1.18.
> There is a phenomenon that the count values of COL4452 are different in the
> execution results of the following queries.
> The only difference is that the positions of COL4452 and COL6408 have been
> changed.
> {code:java}
> 1.
> select COUNT(DISTINCT val2) COL4452, SUM(CAST(val11 as BIGINT)+CAST(val12 as
> BIGINT)) COL6408 from dfs.root.`/drill/data/*/log_15872_R_79_*.parquet` WHERE
> 1 = 1 and ( ( dir0 between '01' and '10' ) ) and ( LOG_DATE >= '2022-04-01
> 00:00:00.000000' and LOG_DATE <= '2022-04-30 23:59:59.000000');
> 2.
> select SUM(CAST(val11 as BIGINT)+CAST(val12 as BIGINT)) COL6408,
> COUNT(DISTINCT val2) COL4452 from
> dfs.root.`/drill/data/*/log_15872_R_79_*.parquet` WHERE 1 = 1 and ( ( dir0
> between '01' and '10' ) ) and ( LOG_DATE >= '2022-04-01 00:00:00.000000' and
> LOG_DATE <= '2022-04-30 23:59:59.000000');{code}
> As for the actual data, the count with COL4452 at the beginning of 1. is
> correct.
> I am having trouble understanding the cause of this phenomenon.
> Can anybody help me?Thanks in advance.
> Attached the parquet log file.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)