Hi Antoine, Thanks a lot for your answer.
So, if I understand (I may have not), we do not impose restrictions to the alignment of the data when we get the pointer; only when we read from it. Doesn't this require checking for alignment at runtime? Best, Jorge On Tue, Aug 2, 2022 at 6:59 PM Antoine Pitrou <anto...@python.org> wrote: > > Hi Jorge, > > So there are two aspects to the answer: > > - ideally, the C++ implementation also works on non-aligned data (though > this is poorly tested, if any) > > - when mmap'ing a file, you should get a page-aligned address > > As for int128 and int256, these usually don't exist at the hardware > level anyway, so implementing those reads as a combination of 64-bit > reads shouldn't hurt performance-wise. > > More generally, I don't know about Rust but in C++ unaligned access > would be made UB-safe by using the memcpy trick, which is correctly > optimized by production compilers: > > https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/ubsan.h#L55-L69 > > Regards > > Antoine. > > > Le 01/08/2022 à 18:55, Jorge Cardoso Leitão a écrit : > > Hi, > > > > I am trying to follow the C++ implementation with respect to mmap IPC > files > > and reading them zero-copy, in the context of reproducing it in Rust. > > > > My understanding from reading the source code is that we essentially: > > * identify the memory regions (offset and length) of each of the buffers, > > via IPC's flatbuffer "Node". > > * cast the uint8 pointer to the corresponding type based on the datatype > > (e.g. f32 for float32) > > > > I am struggling to understand how we ensure that the pointer is aligned > > [2,3] to the type (e.g. f32) so that the uint8 pointer can be safely > casted > > to it. > > > > In other words, I would expect mmap to work when: > > * the files' bit padding is 64 bits > > * the target type is <= 64 bits > > However, > > * we have types with more than 64 bits (int128 and int256) > > * a file can be 8-bit aligned > > > > The background is that Rust requires pointers to be aligned to the type > for > > safe casting (it is UB to read unaligned pointers), and the above > naturally > > poses a challenge when reading i128, i256 and 8-bit padded files. > > > > Best, > > Jorge > > > > [1] https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc > > [2] https://en.wikipedia.org/wiki/Data_structure_alignment > > [3] https://stackoverflow.com/a/4322950/931303 > > >