alamb commented on issue #1708:
URL:
https://github.com/apache/arrow-datafusion/issues/1708#issuecomment-1028906123
💯 with what @Dandandan and @houqp said; Thank you for writing this up
@yjshen ❤️
> I am wondering if for certain operations, e.g. hash aggregate, I feel fixed
size input the data is stored better in a columnar format (mutable array,
with offsets),
I agree with @Dandandan that for HashAggregate this would be super helpful
-- as the group keys and aggregates could be computed "in place" (so output was
free)
Sorting is indeed different because the sort key is different than what
appears in the output. For example `SELECT a, b, c ... ORDER by a+b` needs to
compare on `a+b`, but still produce tuples of `(a, b, c)`;
The grouping values are produced. For example `SELECT a+b, sum(c) .. GROUP
BY a+b` produces tuples of `(a+b, sum)`
p.s. for what it is worth I think DuckDB has a short string optimization so
the key may look something more like
```text
Table A (bool a, char b, int c, string d) row_value (true, 'W', 59, "XYZ")
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│ 0F │ 1 │ W │ 00 │ 00 │ 00 │ 3B │ 03 │ 00 │ 00 │ 00 │ 00 │ X │ Y
│ Z │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘
8
Table A (bool a, char b, int c, string d) row_value (true, 'W', 59,
"XYZXYZXYZ")
┌────┬────┬────┬────┬────┬────┬────┬─────────────────────────────────────────────┐
│ 0F │ 1 │ W │ 00 │ 00 │ 00 │ 3B │ PTR
│
└────┴────┴────┴────┴────┴────┴────┴─────────────────────────────────────────────┘
│
8 └───┐
▼
"XYZXYZXYZ"
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]