[GitHub] [arrow-datafusion] yjshen commented on issue #1708: Introduce a `Vec` based row-wise representation for DataFusion

GitBox Thu, 03 Feb 2022 04:56:33 -0800


yjshen commented on issue #1708:
URL: 
https://github.com/apache/arrow-datafusion/issues/1708#issuecomment-1028963254

Thank you all for your comments and for fixing my mistakes ❤️

I agree for a small hashtable, i.e., the whole data structure can be CPU
cache resident or a few times larger. The columnar organization gives us all
benefits at no cost.

However, as the group-by key cardinality grows, the bottleneck of hash
aggregation or hash join row concatenation becomes more memory access pattern
related. The columnar structured hashtable would cause N times extra cache
miss/loads since we are constantly accessing a tuple at a time.

The problem is identified as [the memory
wall](https://dl.acm.org/doi/10.1145/1409360.1409380). It results in quite a
lot of systems using row-wise hashtable even they utilize vectorized execution
model: in
[vertica](https://15721.courses.cs.cmu.edu/spring2020/papers/13-execution/shrinivas-icde2013.pdf),
[Typer and Tectorwise](https://www.vldb.org/pvldb/vol11/p2209-kersten.pdf),
[Operator Fusion](http://www.vldb.org/pvldb/vol11/p1-menon.pdf), and also the
latest
[DuckDB](https://github.com/duckdb/duckdb/blob/master/src/common/types/row_layout.cpp#L32).

For sort, the problem is similar; re-ordering will cause a random access
pattern in memory for each column. If there are many payload columns, this will
be slow.

The row <-> columnar conversion comes with a price; we need extra
computation. But considering we always need to buffer intermediate data in
memory and compose/decompose while data is cache resident, the conversion could
be pretty efficient.

I'm not proposing to change all these data structures at once to row-wised
but to point out existing theory and practice on using the row-wise
organization for these pipeline breakers' states. We should always implement
and benchmark before deciding on these performance-critical codes, as we did in
the past.

For the row structure, I'd like a combination of all these three systems:

- byte-aligned null bit set
- store all attributes sequentially
- for data columns, we do no padding
- for aggregate state columns, we do padding to make them aligned, to make
the CPU happier when we repeatedly check and update
- for the var-len attribute, we store (offset+length) in its position and
append the actual value at last for each row
- we could come up with a short string representation later for space
efficiency but also pay extra effort

The reason for the above-proposed structure is: we could determine each
attribute's offset with schema independently, and we pay the bill with extra
padding space for better computation performance.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] yjshen commented on issue #1708: Introduce a `Vec` based row-wise representation for DataFusion

Reply via email to