Re: [Discuss] IPC Specification, flatbuffers and unaligned memory accesses

Wes McKinney Thu, 18 Jul 2019 20:58:26 -0700

To be clear, we could make a patch 0.14.x release that includes the
necessary compatibility changes. I presume Spark will be able to upgrade to
a new patch release (I'd be surprised if not, otherwise how can you get
security fixes)?


On Thu, Jul 18, 2019, 10:52 PM Bryan Cutler <[email protected]> wrote:

> Hey Wes,
> I understand we don't want to burden 1.0 by maintaining compatibility and
> that is fine with me. I'm just try to figure out how to best handle this
> situation so Spark users won't get a cryptic error message. It sounds like
> it will need to be handled on the Spark side to not allow mixing 1.0 and
> pre-1.0 versions. I'm not too sure how much a 0.15.0 release with
> compatibility would help, it might depend on when things get released but
> we can discuss that in another thread.
>
> On Thu, Jul 18, 2019 at 12:03 PM Wes McKinney <[email protected]> wrote:
>
> > hi Bryan -- well, the reason for the current 0.x version is precisely
> > to avoid a situation where we are making decisions on the basis of
> > maintaining forward / backward compatibility.
> >
> > One possible way forward on this is to make a 0.15.0 (0.14.2, so there
> > is less trouble for Spark to upgrade) release that supports reading
> > _both_ old and new variants of the protocol.
> >
> > On Thu, Jul 18, 2019 at 1:20 PM Bryan Cutler <[email protected]> wrote:
> > >
> > > Are we going to say that Arrow 1.0 is not compatible with any version
> > > before?  My concern is that Spark 2.4.x might get stuck on Arrow Java
> > > 0.14.1 and a lot of users will install PyArrow 1.0.0, which will not
> > work.
> > > In Spark 3.0.0, though it will be no problem to update both Java and
> > Python
> > > to 1.0. Having a compatibility mode so that new readers/writers can
> work
> > > with old readers using a 4-byte prefix would solve the problem, but if
> we
> > > don't want to do this will pyarrow be able to raise an error that
> clearly
> > > the new version does not support the old protocol?  For example, would
> a
> > > pyarrow reader see the 0xFFFFFFFF and raise something like "PyArrow
> > > detected an old protocol and cannot continue, please use a version <
> > 1.0.0"?
> > >
> > > On Thu, Jul 11, 2019 at 12:39 PM Wes McKinney <[email protected]>
> > wrote:
> > >
> > > > Hi Francois -- copying the metadata into memory isn't the end of the
> > world
> > > > but it's a pretty ugly wart. This affects every IPC protocol message
> > > > everywhere.
> > > >
> > > > We have an opportunity to address the wart now but such a fix
> > post-1.0.0
> > > > will be much more difficult.
> > > >
> > > > On Thu, Jul 11, 2019, 2:05 PM Francois Saint-Jacques <
> > > > [email protected]> wrote:
> > > >
> > > > > If the data buffers are still aligned, then I don't think we should
> > > > > add a breaking change just for avoiding the copy on the metadata?
> I'd
> > > > > expect said metadata to be small enough that zero-copy doesn't
> really
> > > > > affect performance.
> > > > >
> > > > > François
> > > > >
> > > > > On Sun, Jun 30, 2019 at 4:01 AM Micah Kornfield <
> > [email protected]>
> > > > > wrote:
> > > > > >
> > > > > > While working on trying to fix undefined behavior for unaligned
> > memory
> > > > > > accesses [1], I ran into an issue with the IPC specification [2]
> > which
> > > > > > prevents us from ever achieving zero-copy memory mapping and
> having
> > > > > aligned
> > > > > > accesses (i.e. clean UBSan runs).
> > > > > >
> > > > > > Flatbuffer metadata needs 8-byte alignment to guarantee aligned
> > > > accesses.
> > > > > >
> > > > > > In the IPC format we align each message to 8-byte boundaries.  We
> > then
> > > > > > write a int32_t integer to to denote the size of flat buffer
> > metadata,
> > > > > > followed immediately  by the flatbuffer metadata.  This means the
> > > > > > flatbuffer metadata will never be 8 byte aligned.
> > > > > >
> > > > > > Do people care?  A simple fix  would be to use int64_t instead of
> > > > int32_t
> > > > > > for length.  However, any fix essentially breaks all previous
> > client
> > > > > > library versions or incurs a memory copy.
> > > > > >
> > > > > > [1] https://github.com/apache/arrow/pull/4757
> > > > > > [2] https://arrow.apache.org/docs/ipc.html
> > > > >
> > > >
> >
>

Re: [Discuss] IPC Specification, flatbuffers and unaligned memory accesses

Reply via email to