Hi Jorge,

So there are two aspects to the answer:

- ideally, the C++ implementation also works on non-aligned data (though this is poorly tested, if any)

- when mmap'ing a file, you should get a page-aligned address

As for int128 and int256, these usually don't exist at the hardware level anyway, so implementing those reads as a combination of 64-bit reads shouldn't hurt performance-wise.

More generally, I don't know about Rust but in C++ unaligned access would be made UB-safe by using the memcpy trick, which is correctly optimized by production compilers:
https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/ubsan.h#L55-L69

Regards

Antoine.


Le 01/08/2022 à 18:55, Jorge Cardoso Leitão a écrit :
Hi,

I am trying to follow the C++ implementation with respect to mmap IPC files
and reading them zero-copy, in the context of reproducing it in Rust.

My understanding from reading the source code is that we essentially:
* identify the memory regions (offset and length) of each of the buffers,
via IPC's flatbuffer "Node".
* cast the uint8 pointer to the corresponding type based on the datatype
(e.g. f32 for float32)

I am struggling to understand how we ensure that the pointer is aligned
[2,3] to the type (e.g. f32) so that the uint8 pointer can be safely casted
to it.

In other words, I would expect mmap to work when:
* the files' bit padding is 64 bits
* the target type is <= 64 bits
However,
* we have types with more than 64 bits (int128 and int256)
* a file can be 8-bit aligned

The background is that Rust requires pointers to be aligned to the type for
safe casting (it is UB to read unaligned pointers), and the above naturally
poses a challenge when reading i128, i256 and 8-bit padded files.

Best,
Jorge

[1] https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc
[2] https://en.wikipedia.org/wiki/Data_structure_alignment
[3] https://stackoverflow.com/a/4322950/931303

Reply via email to