To be clear, we could make a patch 0.14.x release that includes the necessary compatibility changes. I presume Spark will be able to upgrade to a new patch release (I'd be surprised if not, otherwise how can you get security fixes)?
On Thu, Jul 18, 2019, 10:52 PM Bryan Cutler <cutl...@gmail.com> wrote: > Hey Wes, > I understand we don't want to burden 1.0 by maintaining compatibility and > that is fine with me. I'm just try to figure out how to best handle this > situation so Spark users won't get a cryptic error message. It sounds like > it will need to be handled on the Spark side to not allow mixing 1.0 and > pre-1.0 versions. I'm not too sure how much a 0.15.0 release with > compatibility would help, it might depend on when things get released but > we can discuss that in another thread. > > On Thu, Jul 18, 2019 at 12:03 PM Wes McKinney <wesmck...@gmail.com> wrote: > > > hi Bryan -- well, the reason for the current 0.x version is precisely > > to avoid a situation where we are making decisions on the basis of > > maintaining forward / backward compatibility. > > > > One possible way forward on this is to make a 0.15.0 (0.14.2, so there > > is less trouble for Spark to upgrade) release that supports reading > > _both_ old and new variants of the protocol. > > > > On Thu, Jul 18, 2019 at 1:20 PM Bryan Cutler <cutl...@gmail.com> wrote: > > > > > > Are we going to say that Arrow 1.0 is not compatible with any version > > > before? My concern is that Spark 2.4.x might get stuck on Arrow Java > > > 0.14.1 and a lot of users will install PyArrow 1.0.0, which will not > > work. > > > In Spark 3.0.0, though it will be no problem to update both Java and > > Python > > > to 1.0. Having a compatibility mode so that new readers/writers can > work > > > with old readers using a 4-byte prefix would solve the problem, but if > we > > > don't want to do this will pyarrow be able to raise an error that > clearly > > > the new version does not support the old protocol? For example, would > a > > > pyarrow reader see the 0xFFFFFFFF and raise something like "PyArrow > > > detected an old protocol and cannot continue, please use a version < > > 1.0.0"? > > > > > > On Thu, Jul 11, 2019 at 12:39 PM Wes McKinney <wesmck...@gmail.com> > > wrote: > > > > > > > Hi Francois -- copying the metadata into memory isn't the end of the > > world > > > > but it's a pretty ugly wart. This affects every IPC protocol message > > > > everywhere. > > > > > > > > We have an opportunity to address the wart now but such a fix > > post-1.0.0 > > > > will be much more difficult. > > > > > > > > On Thu, Jul 11, 2019, 2:05 PM Francois Saint-Jacques < > > > > fsaintjacq...@gmail.com> wrote: > > > > > > > > > If the data buffers are still aligned, then I don't think we should > > > > > add a breaking change just for avoiding the copy on the metadata? > I'd > > > > > expect said metadata to be small enough that zero-copy doesn't > really > > > > > affect performance. > > > > > > > > > > François > > > > > > > > > > On Sun, Jun 30, 2019 at 4:01 AM Micah Kornfield < > > emkornfi...@gmail.com> > > > > > wrote: > > > > > > > > > > > > While working on trying to fix undefined behavior for unaligned > > memory > > > > > > accesses [1], I ran into an issue with the IPC specification [2] > > which > > > > > > prevents us from ever achieving zero-copy memory mapping and > having > > > > > aligned > > > > > > accesses (i.e. clean UBSan runs). > > > > > > > > > > > > Flatbuffer metadata needs 8-byte alignment to guarantee aligned > > > > accesses. > > > > > > > > > > > > In the IPC format we align each message to 8-byte boundaries. We > > then > > > > > > write a int32_t integer to to denote the size of flat buffer > > metadata, > > > > > > followed immediately by the flatbuffer metadata. This means the > > > > > > flatbuffer metadata will never be 8 byte aligned. > > > > > > > > > > > > Do people care? A simple fix would be to use int64_t instead of > > > > int32_t > > > > > > for length. However, any fix essentially breaks all previous > > client > > > > > > library versions or incurs a memory copy. > > > > > > > > > > > > [1] https://github.com/apache/arrow/pull/4757 > > > > > > [2] https://arrow.apache.org/docs/ipc.html > > > > > > > > > > > >