I'm canceling the vote until we can refine the proposal a bit more.

I think that having an 8-byte EOS as 0x0000000000000000 is OK -- I
think my concern about backwards compatibility is unwarranted.

We also previously added the EOS to the File Format

https://github.com/apache/arrow/blob/master/docs/source/format/IPC.rst#file-format

I think we should formalize this decision with this vote. Note that
making the EOS 8 bytes has the effect of aligning the file footer,
which otherwise would cause possible undefined behavior in C++.

I'll wait a day or two for more opinions to percolate and then call a new vote

On Wed, Aug 7, 2019 at 9:35 AM Antoine Pitrou <anto...@python.org> wrote:
>
>
> Hmm... not sure about that.  IMHO, if the old format is detected, then a
> 4-byte EOS marker should be used.  If the new format is detected, then a
> 8-byte EOS marker should be used.
>
> Regards
>
> Antoine.
>
>
> Le 07/08/2019 à 16:16, Wes McKinney a écrit :
> > You make a good point. For backward compatibility reasons, bytes 5
> > through 8 would need to be unspecified padding bytes, I think. Does
> > that sound right?
> >
> > On Wed, Aug 7, 2019 at 9:02 AM Antoine Pitrou <anto...@python.org> wrote:
> >>
> >>
> >> This may be coming a bit late, but I realize we could take the
> >> opportunity to *also* make the end-of-stream marker a 8-bytes marker
> >> (rather than 4-bytes).  What do you think?
> >>
> >> Regards
> >>
> >> Antoine.
> >>
> >>
> >> Le 06/08/2019 à 22:15, Wes McKinney a écrit :
> >>> hi all,
> >>>
> >>> As we've been discussing for the last 5 weeks or so [1], there is a
> >>> need to introduce 4 bytes of padding into the preamble of the
> >>> "encapsulated IPC message" format to ensure that the Flatbuffers
> >>> metadata payload begins on an 8-byte aligned memory offset. The
> >>> alternative to this would be for Arrow implementations where alignment
> >>> is important (e.g. C or C++) to copy the metadata (which is not always
> >>> small) into memory when it is unaligned.
> >>>
> >>> Micah has proposed to address this by adding a 4-byte "continuation"
> >>> value at the beginning of the payload having the value 0xFFFFFFFF. The
> >>> reason to do it this way is that old clients will see an invalid
> >>> length (what is currently the first 4 bytes of the message -- a 32-bit
> >>> little endian signed integer indicating the metadata length) rather
> >>> than potentially crashing on a valid length.
> >>>
> >>> This would be a backwards incompatible protocol change, so older Arrow
> >>> libraries would not be able to read these new messages. Maintaining
> >>> forward compatibility (reading data produced by older libraries) would
> >>> be possible as we can reason that a value other than the continuation
> >>> value was produced by an older library (and then validate the
> >>> Flatbuffer message of course). Arrow implementations could offer a
> >>> backward compatibility mode for the sake of old readers if they desire
> >>> (this may also assist with testing).
> >>>
> >>> The PR making these changes to the IPC documentation is here
> >>>
> >>> https://github.com/apache/arrow/pull/4951
> >>>
> >>> Please vote to accept this change. This vote will be open for at least 72 
> >>> hours
> >>>
> >>> [ ] +1 Adopt the Arrow protocol change
> >>> [ ] +0
> >>> [ ] -1 I disagree because...
> >>>
> >>> Here is my vote: +1
> >>>
> >>> Thanks,
> >>> Wes
> >>>
> >>> [1]: 
> >>> https://lists.apache.org/thread.html/8440be572c49b7b2ffb76b63e6d935ada9efd9c1c2021369b6d27786@%3Cdev.arrow.apache.org%3E
> >>>

Reply via email to