[GitHub] [arrow-datafusion] Dandandan commented on issue #6892: Introduce `Partitioned` aggregation mode

via GitHub Thu, 13 Jul 2023 14:40:20 -0700


Dandandan commented on issue #6892:
URL: 
https://github.com/apache/arrow-datafusion/issues/6892#issuecomment-1634956646


   Yes that is the idea
   * I found out the `Single` aggregation mode which already does what we want 
to do (do aggregation in one go), so there is no need to create
   
   * I did some experiments skipping the `Partial` based on heuristic (e.g. for 
tables up to a number of columns), but this gets mixed results:
   ```
   ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
   ┃ Query        ┃ fast_gby_hash ┃ aggregate_partition_mode ┃        Change ┃
   ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
   │ QQuery 1     │      194.12ms │                 194.38ms │     no change │
   │ QQuery 2     │       36.59ms │                  30.54ms │ +1.20x faster │
   │ QQuery 3     │       44.99ms │                  45.69ms │     no change │
   │ QQuery 4     │       37.34ms │                  37.88ms │     no change │
   │ QQuery 5     │       91.69ms │                  90.49ms │     no change │
   │ QQuery 6     │       10.27ms │                  10.25ms │     no change │
   │ QQuery 7     │      193.05ms │                 190.58ms │     no change │
   │ QQuery 8     │       69.37ms │                  69.39ms │     no change │
   │ QQuery 9     │      132.29ms │                 132.95ms │     no change │
   │ QQuery 10    │       91.51ms │                  90.86ms │     no change │
   │ QQuery 11    │       40.53ms │                  39.73ms │     no change │
   │ QQuery 12    │       67.70ms │                  66.71ms │     no change │
   │ QQuery 13    │      130.96ms │                 132.62ms │     no change │
   │ QQuery 14    │       11.87ms │                  12.10ms │     no change │
   │ QQuery 15    │       14.80ms │                  19.92ms │  1.35x slower │
   │ QQuery 16    │       37.79ms │                  37.07ms │     no change │
   │ QQuery 17    │      210.67ms │                 209.54ms │     no change │
   │ QQuery 18    │      315.60ms │                 381.18ms │  1.21x slower │
   │ QQuery 19    │       57.40ms │                  57.59ms │     no change │
   │ QQuery 20    │       70.88ms │                  58.71ms │ +1.21x faster │
   │ QQuery 21    │      248.35ms │                 252.92ms │     no change │
   │ QQuery 22    │       28.11ms │                  27.87ms │     no change │
   └──────────────┴───────────────┴──────────────────────────┴───────────────┘
   ```
   
   My hope is better for https://github.com/apache/arrow-datafusion/issues/6937 
which I think might be similar to the "adaptive partial aggregation" of 
snowflake / teradata?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] Dandandan commented on issue #6892: Introduce `Partitioned` aggregation mode

Reply via email to