[Question] Allocations along 64 byte cache lines

Jorge Cardoso Leitão Mon, 06 Sep 2021 10:09:56 -0700

Hi,

We have a whole section related to byte alignment (
https://arrow.apache.org/docs/format/Columnar.html#buffer-alignment-and-padding)
recommending 64 byte alignment and referring to intel's manual.


Do we have evidence that this alignment helps (besides intel claims)?

I am asking because going through the arrow-rs we use an alignment of 128
bytes (following the stream prefetch recommendation from intel [1]).

I recently experimented changing it to 64 bytes and also to the native
alignment (i.e. i32 is aligned with 4 bytes), and I observed no difference
in performance when compiled for "skylake-avx512".

Specifically, I performed two types of tests, a "random sum" where we
compute the sum of the values taken at random indices, and "sum", where we
sum all values of the array (buffer[1] of the primitive array), both for
array ranging from 2^10 to 2^25 elements. I was expecting that, at least in
the latter, prefetching would help, but I do not observe any difference.

I was wondering if anyone:

* has observed an equivalent behavior
* know a good benchmark where these things matter or
* have an explanation

Thanks a lot!

Best,
Jorge

[1]
https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf,
sec. 3.7.3, page 162

[Question] Allocations along 64 byte cache lines

Reply via email to