korowa commented on issue #11850:
URL: https://github.com/apache/datafusion/issues/11850#issuecomment-2272059661
The query is
```
WITH ssci AS (
SELECT
ss_customer_sk customer_sk,
ss_item_sk item_sk
FROM store_sales, date_dim
WHERE ss_sold_date_sk = d_date_sk
AND d_month_seq BETWEEN 1200 AND 1200 + 11
GROUP BY ss_customer_sk, ss_item_sk),
csci AS (
SELECT
cs_bill_customer_sk customer_sk,
cs_item_sk item_sk
FROM catalog_sales, date_dim
WHERE cs_sold_date_sk = d_date_sk
AND d_month_seq BETWEEN 1200 AND 1200 + 11
GROUP BY cs_bill_customer_sk, cs_item_sk)
SELECT
sum(CASE WHEN ssci.customer_sk IS NOT NULL AND csci.customer_sk IS NULL
THEN 1 ELSE 0 END) store_only,
sum(CASE WHEN ssci.customer_sk IS NULL AND csci.customer_sk IS NOT NULL
THEN 1 ELSE 0 END) catalog_only,
sum(CASE WHEN ssci.customer_sk IS NOT NULL AND csci.customer_sk IS NOT
NULL THEN 1 ELSE 0 END) store_and_catalog
FROM ssci
FULL OUTER JOIN csci ON (ssci.customer_sk = csci.customer_sk
AND ssci.item_sk = csci.item_sk)
LIMIT 100
```
The last aggregation with `SUM(CASE WHEN ...)` is not a row hash aggregate
and should not trigger this feature, and the other two don't use any
accumulators, so, probably, the issue is not related to accumulator expressions
(I'll recheck it anyway).
@andygrove is there a chance that Comet may use datafusion `Partial`
aggregation results without finalizing the aggregation? In this case
partial-aggs in both CTEs may produce duplicates, which should result in
increased count values in the resultset.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]