Hi Antoine,

Thanks a lot for your answer.

So, if I understand (I may have not), we do not impose restrictions to the
alignment of the data when we get the pointer; only when we read from it.
Doesn't this require checking for alignment at runtime?

Best,
Jorge



On Tue, Aug 2, 2022 at 6:59 PM Antoine Pitrou <anto...@python.org> wrote:

>
> Hi Jorge,
>
> So there are two aspects to the answer:
>
> - ideally, the C++ implementation also works on non-aligned data (though
> this is poorly tested, if any)
>
> - when mmap'ing a file, you should get a page-aligned address
>
> As for int128 and int256, these usually don't exist at the hardware
> level anyway, so implementing those reads as a combination of 64-bit
> reads shouldn't hurt performance-wise.
>
> More generally, I don't know about Rust but in C++ unaligned access
> would be made UB-safe by using the memcpy trick, which is correctly
> optimized by production compilers:
>
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/ubsan.h#L55-L69
>
> Regards
>
> Antoine.
>
>
> Le 01/08/2022 à 18:55, Jorge Cardoso Leitão a écrit :
> > Hi,
> >
> > I am trying to follow the C++ implementation with respect to mmap IPC
> files
> > and reading them zero-copy, in the context of reproducing it in Rust.
> >
> > My understanding from reading the source code is that we essentially:
> > * identify the memory regions (offset and length) of each of the buffers,
> > via IPC's flatbuffer "Node".
> > * cast the uint8 pointer to the corresponding type based on the datatype
> > (e.g. f32 for float32)
> >
> > I am struggling to understand how we ensure that the pointer is aligned
> > [2,3] to the type (e.g. f32) so that the uint8 pointer can be safely
> casted
> > to it.
> >
> > In other words, I would expect mmap to work when:
> > * the files' bit padding is 64 bits
> > * the target type is <= 64 bits
> > However,
> > * we have types with more than 64 bits (int128 and int256)
> > * a file can be 8-bit aligned
> >
> > The background is that Rust requires pointers to be aligned to the type
> for
> > safe casting (it is UB to read unaligned pointers), and the above
> naturally
> > poses a challenge when reading i128, i256 and 8-bit padded files.
> >
> > Best,
> > Jorge
> >
> > [1] https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc
> > [2] https://en.wikipedia.org/wiki/Data_structure_alignment
> > [3] https://stackoverflow.com/a/4322950/931303
> >
>

Reply via email to