Hi, We have a whole section related to byte alignment ( https://arrow.apache.org/docs/format/Columnar.html#buffer-alignment-and-padding) recommending 64 byte alignment and referring to intel's manual.
Do we have evidence that this alignment helps (besides intel claims)? I am asking because going through the arrow-rs we use an alignment of 128 bytes (following the stream prefetch recommendation from intel [1]). I recently experimented changing it to 64 bytes and also to the native alignment (i.e. i32 is aligned with 4 bytes), and I observed no difference in performance when compiled for "skylake-avx512". Specifically, I performed two types of tests, a "random sum" where we compute the sum of the values taken at random indices, and "sum", where we sum all values of the array (buffer[1] of the primitive array), both for array ranging from 2^10 to 2^25 elements. I was expecting that, at least in the latter, prefetching would help, but I do not observe any difference. I was wondering if anyone: * has observed an equivalent behavior * know a good benchmark where these things matter or * have an explanation Thanks a lot! Best, Jorge [1] https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf, sec. 3.7.3, page 162