Re: [Discuss] IPC Specification, flatbuffers and unaligned memory accesses

Bryan Cutler Thu, 18 Jul 2019 20:52:32 -0700

Hey Wes,
I understand we don't want to burden 1.0 by maintaining compatibility and
that is fine with me. I'm just try to figure out how to best handle this
situation so Spark users won't get a cryptic error message. It sounds like
it will need to be handled on the Spark side to not allow mixing 1.0 and
pre-1.0 versions. I'm not too sure how much a 0.15.0 release with
compatibility would help, it might depend on when things get released but
we can discuss that in another thread.


On Thu, Jul 18, 2019 at 12:03 PM Wes McKinney <wesmck...@gmail.com> wrote:

> hi Bryan -- well, the reason for the current 0.x version is precisely
> to avoid a situation where we are making decisions on the basis of
> maintaining forward / backward compatibility.
>
> One possible way forward on this is to make a 0.15.0 (0.14.2, so there
> is less trouble for Spark to upgrade) release that supports reading
> _both_ old and new variants of the protocol.
>
> On Thu, Jul 18, 2019 at 1:20 PM Bryan Cutler <cutl...@gmail.com> wrote:
> >
> > Are we going to say that Arrow 1.0 is not compatible with any version
> > before?  My concern is that Spark 2.4.x might get stuck on Arrow Java
> > 0.14.1 and a lot of users will install PyArrow 1.0.0, which will not
> work.
> > In Spark 3.0.0, though it will be no problem to update both Java and
> Python
> > to 1.0. Having a compatibility mode so that new readers/writers can work
> > with old readers using a 4-byte prefix would solve the problem, but if we
> > don't want to do this will pyarrow be able to raise an error that clearly
> > the new version does not support the old protocol?  For example, would a
> > pyarrow reader see the 0xFFFFFFFF and raise something like "PyArrow
> > detected an old protocol and cannot continue, please use a version <
> 1.0.0"?
> >
> > On Thu, Jul 11, 2019 at 12:39 PM Wes McKinney <wesmck...@gmail.com>
> wrote:
> >
> > > Hi Francois -- copying the metadata into memory isn't the end of the
> world
> > > but it's a pretty ugly wart. This affects every IPC protocol message
> > > everywhere.
> > >
> > > We have an opportunity to address the wart now but such a fix
> post-1.0.0
> > > will be much more difficult.
> > >
> > > On Thu, Jul 11, 2019, 2:05 PM Francois Saint-Jacques <
> > > fsaintjacq...@gmail.com> wrote:
> > >
> > > > If the data buffers are still aligned, then I don't think we should
> > > > add a breaking change just for avoiding the copy on the metadata? I'd
> > > > expect said metadata to be small enough that zero-copy doesn't really
> > > > affect performance.
> > > >
> > > > François
> > > >
> > > > On Sun, Jun 30, 2019 at 4:01 AM Micah Kornfield <
> emkornfi...@gmail.com>
> > > > wrote:
> > > > >
> > > > > While working on trying to fix undefined behavior for unaligned
> memory
> > > > > accesses [1], I ran into an issue with the IPC specification [2]
> which
> > > > > prevents us from ever achieving zero-copy memory mapping and having
> > > > aligned
> > > > > accesses (i.e. clean UBSan runs).
> > > > >
> > > > > Flatbuffer metadata needs 8-byte alignment to guarantee aligned
> > > accesses.
> > > > >
> > > > > In the IPC format we align each message to 8-byte boundaries.  We
> then
> > > > > write a int32_t integer to to denote the size of flat buffer
> metadata,
> > > > > followed immediately  by the flatbuffer metadata.  This means the
> > > > > flatbuffer metadata will never be 8 byte aligned.
> > > > >
> > > > > Do people care?  A simple fix  would be to use int64_t instead of
> > > int32_t
> > > > > for length.  However, any fix essentially breaks all previous
> client
> > > > > library versions or incurs a memory copy.
> > > > >
> > > > > [1] https://github.com/apache/arrow/pull/4757
> > > > > [2] https://arrow.apache.org/docs/ipc.html
> > > >
> > >
>

Re: [Discuss] IPC Specification, flatbuffers and unaligned memory accesses

Reply via email to