> > Could we detect the 4-byte length, incur a penalty copying the memory to > an aligned buffer, then continue consuming the stream?
I think that is the plan (or at least would be my plan) if we go ahead with the change > (It's probably > fine if we only write the 8-byte length, since consumers on older > versions of Arrow could slice from the 4th byte before passing a buffer > to the reader). I'm not sure I understand this suggestion: 1. Wouldn't this cause old readers to miss the last 4 bytes of the buffer (and provide meaningless bytes at the beginning). 2. The current proposal on the other thread is to have the pattern be <0xffffffff><buffer length><buffer data> Thanks, Micah On Tue, Jul 23, 2019 at 11:43 AM Paul Taylor <ptaylor.apa...@gmail.com> wrote: > +1 for a 0.15.0 before 1.0 if we go ahead with this. > > I'm curious to hear other's thoughts about compatibility. I think we > should avoid breaking backwards compatibility if possible. It's common > for apps/libs to be pinned on specific Arrow versions, and I worry it'd > cause a lot of work for downstream devs to audit their tool suite for > full Arrow binary compatibility (and/or require their customers to do > the same). > > Could we detect the 4-byte length, incur a penalty copying the memory to > an aligned buffer, then continue consuming the stream? (It's probably > fine if we only write the 8-byte length, since consumers on older > versions of Arrow could slice from the 4th byte before passing a buffer > to the reader). > > I've always understood the metadata to be a few dozen/hundred KB, a > small percentage of the total message size. I could be underestimating > the ratios though -- is it common to have tables w/ 1000+ columns? I've > seen a few reports like that in cuDF, but I'm curious to hear > Jacques'/Dremio's experience too. > > If copying is feasible, it doesn't seem so bad a trade-off to maintain > backwards-compatibility. As libraries and consumers upgrade their Arrow > dependencies, the 4-byte length will be less and less common, and > they'll be less likely to pay the cost. > > > > On 7/23/19 2:22 AM, Uwe L. Korn wrote: > > It is also a good way to test the change in public. We don't want to > adjust something like this anymore in a 1.0.0 release. Already doing this > in 0.15.0 and then maybe doing adjustments due to issues that appear "in > the wild" is psychologically the easier way. There is a lot of thinking of > users bound with the magic 1.0, thus I would plan to minimize what is > changed between 1.0 and pre-1.0. This also should save us maintainers some > time as I would expect different behaviour in bug reports between 1.0 and > pre-1.0 issues. > > > > Uwe > > > > On Tue, Jul 23, 2019, at 7:52 AM, Micah Kornfield wrote: > >> I think the main reason to do a release before 1.0.0 is if we want to > make > >> the change that would give a good error message for forward > incompatibility > >> (I think this could be done as 0.14.2 since it would just be clarifying > an > >> error message). Otherwise, I think including it in 1.0.0 would be fine > >> (its still not clear to me if there is consensus to fix the issue). > >> > >> Thanks, > >> Micah > >> > >> > >> On Monday, July 22, 2019, Wes McKinney <wesmck...@gmail.com> wrote: > >> > >>> I'd be satisfied with fixing the Flatbuffer alignment issue either in > >>> a 0.15.0 or 1.0.0. In the interest of expediency, though, making a > >>> 0.15.0 with this change sooner rather than later might be prudent. > >>> > >>> On Mon, Jul 22, 2019 at 12:35 PM Antoine Pitrou <anto...@python.org> > >>> wrote: > >>>> > >>>> Hello, > >>>> > >>>> Recently we've discussed breaking the IPC format to fix a > long-standing > >>>> alignment issue. See this discussion: > >>>> > >>> > https://lists.apache.org/thread.html/8cea56f2069710ac128ff9129c744f0ef96a3e33a4d79d7e820019af@%3Cdev.arrow.apache.org%3E > >>>> Should we first do a 0.15.0 in order to get those format fixes right? > >>>> Once that is fine and settled we can move to the 1.0.0 release? > >>>> > >>>> Regards > >>>> > >>>> Antoine. > > >