korowa commented on issue #11850:
URL: https://github.com/apache/datafusion/issues/11850#issuecomment-2272059661

   The query is
   ```
   WITH ssci AS (
     SELECT
       ss_customer_sk customer_sk,
       ss_item_sk item_sk
     FROM store_sales, date_dim
     WHERE ss_sold_date_sk = d_date_sk
       AND d_month_seq BETWEEN 1200 AND 1200 + 11
     GROUP BY ss_customer_sk, ss_item_sk),
       csci AS (
       SELECT
         cs_bill_customer_sk customer_sk,
         cs_item_sk item_sk
       FROM catalog_sales, date_dim
       WHERE cs_sold_date_sk = d_date_sk
         AND d_month_seq BETWEEN 1200 AND 1200 + 11
       GROUP BY cs_bill_customer_sk, cs_item_sk)
   SELECT
     sum(CASE WHEN ssci.customer_sk IS NOT NULL AND csci.customer_sk IS NULL 
THEN 1 ELSE 0 END) store_only,
     sum(CASE WHEN ssci.customer_sk IS NULL AND csci.customer_sk IS NOT NULL 
THEN 1 ELSE 0 END) catalog_only,
     sum(CASE WHEN ssci.customer_sk IS NOT NULL AND csci.customer_sk IS NOT 
NULL THEN 1 ELSE 0 END) store_and_catalog
   FROM ssci
     FULL OUTER JOIN csci ON (ssci.customer_sk = csci.customer_sk
       AND ssci.item_sk = csci.item_sk)
   LIMIT 100
   ```
   
   The last aggregation with `SUM(CASE WHEN ...)` is not a row hash aggregate 
and should not trigger this feature, and the other two don't use any 
accumulators, so, probably, the issue is not related to accumulator expressions 
(I'll recheck it anyway).
   
   @andygrove is there a chance that Comet may use datafusion `Partial` 
aggregation results without finalizing the aggregation? In this case 
partial-aggs in both CTEs may produce duplicates, which should result in 
increased count values in the resultset.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to