Le 03/08/2022 à 18:29, Jorge Cardoso Leitão a écrit :
Hi Antoine,
Thanks a lot for your answer.
So, if I understand (I may have not), we do not impose restrictions to the
alignment of the data when we get the pointer; only when we read from it.
Doesn't this require checking for alignment at runtime?
Only if you do things that are alignment-sensitive.
That said, while it is formally allowed AFAIK, it probably occurs rarely
so potential issues (if any) are probably not surfaced.
Best regards
Antoine.
Best,
Jorge
On Tue, Aug 2, 2022 at 6:59 PM Antoine Pitrou <anto...@python.org> wrote:
Hi Jorge,
So there are two aspects to the answer:
- ideally, the C++ implementation also works on non-aligned data (though
this is poorly tested, if any)
- when mmap'ing a file, you should get a page-aligned address
As for int128 and int256, these usually don't exist at the hardware
level anyway, so implementing those reads as a combination of 64-bit
reads shouldn't hurt performance-wise.
More generally, I don't know about Rust but in C++ unaligned access
would be made UB-safe by using the memcpy trick, which is correctly
optimized by production compilers:
https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/ubsan.h#L55-L69
Regards
Antoine.
Le 01/08/2022 à 18:55, Jorge Cardoso Leitão a écrit :
Hi,
I am trying to follow the C++ implementation with respect to mmap IPC
files
and reading them zero-copy, in the context of reproducing it in Rust.
My understanding from reading the source code is that we essentially:
* identify the memory regions (offset and length) of each of the buffers,
via IPC's flatbuffer "Node".
* cast the uint8 pointer to the corresponding type based on the datatype
(e.g. f32 for float32)
I am struggling to understand how we ensure that the pointer is aligned
[2,3] to the type (e.g. f32) so that the uint8 pointer can be safely
casted
to it.
In other words, I would expect mmap to work when:
* the files' bit padding is 64 bits
* the target type is <= 64 bits
However,
* we have types with more than 64 bits (int128 and int256)
* a file can be 8-bit aligned
The background is that Rust requires pointers to be aligned to the type
for
safe casting (it is UB to read unaligned pointers), and the above
naturally
poses a challenge when reading i128, i256 and 8-bit padded files.
Best,
Jorge
[1] https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc
[2] https://en.wikipedia.org/wiki/Data_structure_alignment
[3] https://stackoverflow.com/a/4322950/931303