That said, the protocol data produced now by `RecordBatchStreamWriter`
should be readable in 1.0.0 and beyond. `pyarrow.serialize` is only
intended for transient storage. We should add some language to the
docstring for this function to explain that it is distinct from the
Arrow IPC format (which has a well-defined structure, and has
compatibility guarantees)

https://issues.apache.org/jira/browse/ARROW-6336

On Fri, Aug 23, 2019 at 3:05 AM Antoine Pitrou <anto...@python.org> wrote:
>
>
> Hi Yevgeni,
>
> I don't think we have ever promised binary stability of the
> pyarrow.serialize() protocol.  Binary compatibility starting from 1.0.0
> is about the Arrow in-memory format and the Arrow IPC format (i.e. how
> Arrow arrays, tables... are laid out and how their metadata is encoded
> on the wire).
>
> So I would not recommend using pa.serialize() for storage.  If you want
> to store data, you should use a well-known file format (or a combination
> thereof), such as Parquet.
>
> Regards
>
> Antoine.
>
>
> Le 23/08/2019 à 07:25, Yevgeni Litvin a écrit :
> > In our system we are using arrow serialization as it showed excellent
> > deserialization speed. However, seems that we made a mistake by persisting
> > the streams into a long-term storage as the serialized data appears to be
> > incompatible between versions. According to the release notes of 0.14.0 it
> > appears that starting 1.0.0 binary compatibility will be maintained. My
> > question is whether pyarrow.serialize is also guaranteed to maintain binary
> > compatibility starting with arrow 1.0 and it would be safe to persist its
> > output then (or maybe even starting now - 0.14)?
> >
> > (from my quick test the 0.13 is not compatible with 0.12 and before, while
> > it is compatible to 0.14)
> >
> > Thank you,
> >
> > - Yevgeni
> >

Reply via email to