[RESULT] [VOTE] Adopt Arrow in-process C Data Interface specification
Hello, The vote succeeds with 3 +1 (binding) and 2 +1 (non-binding). I'll soon open a JIRA for the specification and the C++ implementation, so that we can merge those timely. Regards Antoine. On Tue, 11 Feb 2020 20:06:33 +0100 Antoine Pitrou wrote: > Hello, > > We have been discussing the creation of a minimalist C-based data > interface for applications to exchange Arrow columnar data structures > with each other. Some notable features of this interface include: > > * A small amount of header-only C code can be copied independently into > third-party libraries and downstream applications, no dependencies are > needed even on Arrow C++ itself (notably, it is not required to use > Flatbuffers, though there are trade-offs resulting from this). > > * Low development investment (in other words: limited-scope use cases > can be accomplished with little code), so as to enable C or C++ > libraries to export Arrow columnar data with minimal code. > > * Data lifetime management hooks so as to properly handle non-trivial > data sharing (for example passing Arrow columnar data to an async > processing consumer). > > This "C Data Interface" serves different use cases from the > language-independent IPC protocol and trades away a number of features > in the interest of minimalism / simplicity. It is not a replacement for > the IPC protocol and will only be used to interchange in-process data at > C or C++ call sites. > > The PR providing the specification is here: > https://github.com/apache/arrow/pull/5442 > > In particular, you can read the spec document here: > https://github.com/pitrou/arrow/blob/doc-c-data-interface2/docs/source/format/CDataInterface.rst > > A fairly comprehensive C++ implementation of this demonstrating its > use is found here: > https://github.com/apache/arrow/pull/5608 > > (note that other applications implementing the interface may choose to > only support a few features and thus have far less code to write) > > Please vote to adopt the SPECIFICATION (GitHub PR #5442). > > This vote will be open for at least 72 hours > > [ ] +1 Adopt C Data Interface specification > [ ] +0 > [ ] -1 Do not adopt because... > > Thank you > > Regards > > Antoine. > > > (PS: yes, this is in large part a copy/paste of Wes's previous vote > email :-)) >
Re: [VOTE] Adopt Arrow in-process C Data Interface specification
A week has passed, I would say we should move forward with merging patches related to this. Any last words (in the next 12 hours or so)? On Tue, Feb 18, 2020 at 7:48 AM Krisztián Szűcs wrote: > > +1 (binding) > > On Tue, Feb 18, 2020 at 10:47 AM Antoine Pitrou wrote: > > > > > > There has also been interest from DuckDB: > > https://github.com/cwida/duckdb/issues/151 > > > > Regards > > > > Antoine. > > > > > > On Tue, 18 Feb 2020 02:37:43 -0600 > > Wes McKinney wrote: > > > As I recall TFX developers weighed in that this would be helpful for > > > TensorFlow-related use cases where they are concerns about C++ ABI > > > compatibility. Since this project has been ongoing for about 5 months > > > (see also related discussion around implementation guidelines for > > > third parties [1]) there has been a lot of time for people to have a > > > look > > > > > > [1]: > > > https://lists.apache.org/thread.html/b7c2094ac4e11ffce46914b603e16b6bba8f235bc6465f3ab6d320d5%40%3Cdev.arrow.apache.org%3E > > > > > > On Mon, Feb 17, 2020 at 11:19 PM Micah Kornfield > > > wrote: > > > > > > > > I reviewed the spec again (not the implementation). I'm +1 on this. > > > > > > > > I was wondering if we shared/received feedback on this with any other > > > > communities? > > > > > > > > Thanks, > > > > Micah > > > > > > > > > > > > > > > > On Sun, Feb 16, 2020 at 8:13 PM Micah Kornfield > > > > wrote: > > > > > > > > > I will try to review tomorrow and cast a vote. > > > > > > > > > > On Fri, Feb 14, 2020 at 5:41 AM Wes McKinney > > > > > wrote: > > > > > > > > > >> There is only 1 binding +1 vote so far, we should probably wait for > > > > >> three before closing the vote (it's possible that lazy consensus > > > > >> could > > > > >> be employed here but not much harm in waiting a few more days) > > > > >> > > > > >> On Thu, Feb 13, 2020 at 8:15 PM Francois Saint-Jacques > > > > >> wrote: > > > > >> > > > > > >> > +1 > > > > >> > > > > > >> > On Thu, Feb 13, 2020 at 9:08 PM Fan Liya > > > > >> > wrote: > > > > >> > > > > > > >> > > +1 (binding) > > > > >> > > > > > > >> > > On Thu, Feb 13, 2020 at q1:52 AM Wes McKinney > > > > >> > > > > > > >> wrote: > > > > >> > > > > > > >> > > > +1 (binding) > > > > >> > > > > > > > >> > > > On Tue, Feb 11, 2020 at 4:29 PM Antoine Pitrou > > > > >> > > > > > > > >> wrote: > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > Ah, you're right, it's PR 6040: > > > > >> > > > > https://github.com/apache/arrow/pull/6040 > > > > >> > > > > > > > > >> > > > > Similarly, the C++ implementation is at PR 6026: > > > > >> > > > > https://github.com/apache/arrow/pull/6026 > > > > >> > > > > > > > > >> > > > > Regar$s > > > > >> > > > > > > > > >> > > > > Antoine. > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > Le 11/02/2020 à 23:17, Wes McKinney a écrit : > > > > >> > > > > > hi Antoine, PR 5442 seems to no longer be the right one. > > > > >> > > > > > Which > > > > >> open PR > > > > >> > > > > > contains the specification now? > > > > >> > > > > > > > > > >> > > > > > On Tue, Feb 11, 2020 at 1:06 PM Antoine Pitrou < > > > > >> anto...@python.org> > > > > >> > > > wrote: > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> Hello, > > > > >> > > > > >> > > > > >> > > > > >> We have been discussing the creation of a minimalist > > > > >> > > > > >> C-based > > > > >> data > > > > >> > > > > >> interface for applications t/ exchange Arrow columnar data > > > > >> structures > > > > >> > > > > >> with each other. Some notable features of this interface > > > > >> include: > > > > >> > > > > >> > > > > >> > > > > >> * A small amount of header-only C code can be copied > > > > >> independently > > > > >> > > > into > > > > >> > > > > >> third-party libraries and downstream applications, no > > > > >> dependencies are > > > > >> > > > > >> needed even on Arrow C++ itself (notably, it is not > > > > >> > > > > >> required > > > > >> to use > > > > >> > > > > >> Flatbuffers, though there are trade-offs`resulting from > > > > >> > > > > >> this). > > > > >> > > > > >> > > > > >> > > > > >> * Low development investment (in other words: > > > > >> > > > > >> limited-scope > > > > >> use cases > > > > >> > > > > >> can be accomplished with little code), so as to enable C > > > > >> > > > > >> or C++ > > > > >> > > > > >> libraries to export Arrow columnar data with minimal code. > > > > >> > > > > >> > > > > >> > > > > >> * Data lifetime management hooks so as to properly handle > > > > >> non-trivial > > > > >> > > > > >> data sharing (for example passing Arrow columnar data to > > > > >> > > > > >> an > > > > >> async > > > > >> > > > > >> processing consumer). > > > > >> > > > > >> > > > > >> > > > > >> This "C Data Interface" serves different use cases from > > > > >> > > > > >> the > > > > >> > > > > >> language-independent IPC protocol and trades away a > > > > >> > > > > >> number of > > > > >> features > > > > >> > > > > >> in the interest of minimalism /
Re: [VOTE] Adopt Arrow in-process C Data Interface specification
+1 (binding) On Tue, Feb 18, 2020 at 10:47 AM Antoine Pitrou wrote: > > > There has also been interest from DuckDB: > https://github.com/cwida/duckdb/issues/151 > > Regards > > Antoine. > > > On Tue, 18 Feb 2020 02:37:43 -0600 > Wes McKinney wrote: > > As I recall TFX developers weighed in that this would be helpful for > > TensorFlow-related use cases where they are concerns about C++ ABI > > compatibility. Since this project has been ongoing for about 5 months > > (see also related discussion around implementation guidelines for > > third parties [1]) there has been a lot of time for people to have a > > look > > > > [1]: > > https://lists.apache.org/thread.html/b7c2094ac4e11ffce46914b603e16b6bba8f235bc6465f3ab6d320d5%40%3Cdev.arrow.apache.org%3E > > > > On Mon, Feb 17, 2020 at 11:19 PM Micah Kornfield > > wrote: > > > > > > I reviewed the spec again (not the implementation). I'm +1 on this. > > > > > > I was wondering if we shared/received feedback on this with any other > > > communities? > > > > > > Thanks, > > > Micah > > > > > > > > > > > > On Sun, Feb 16, 2020 at 8:13 PM Micah Kornfield > > > wrote: > > > > > > > I will try to review tomorrow and cast a vote. > > > > > > > > On Fri, Feb 14, 2020 at 5:41 AM Wes McKinney > > > > wrote: > > > > > > > >> There is only 1 binding +1 vote so far, we should probably wait for > > > >> three before closing the vote (it's possible that lazy consensus could > > > >> be employed here but not much harm in waiting a few more days) > > > >> > > > >> On Thu, Feb 13, 2020 at 8:15 PM Francois Saint-Jacques > > > >> wrote: > > > >> > > > > >> > +1 > > > >> > > > > >> > On Thu, Feb 13, 2020 at 9:08 PM Fan Liya > > > >> > wrote: > > > >> > > > > > >> > > +1 (binding) > > > >> > > > > > >> > > On Thu, Feb 13, 2020 at q1:52 AM Wes McKinney > > > >> wrote: > > > >> > > > > > >> > > > +1 (binding) > > > >> > > > > > > >> > > > On Tue, Feb 11, 2020 at 4:29 PM Antoine Pitrou > > > >> > > > > > > >> wrote: > > > >> > > > > > > > >> > > > > > > > >> > > > > Ah, you're right, it's PR 6040: > > > >> > > > > https://github.com/apache/arrow/pull/6040 > > > >> > > > > > > > >> > > > > Similarly, the C++ implementation is at PR 6026: > > > >> > > > > https://github.com/apache/arrow/pull/6026 > > > >> > > > > > > > >> > > > > Regar$s > > > >> > > > > > > > >> > > > > Antoine. > > > >> > > > > > > > >> > > > > > > > >> > > > > Le 11/02/2020 à 23:17, Wes McKinney a écrit : > > > >> > > > > > hi Antoine, PR 5442 seems to no longer be the right one. > > > >> > > > > > Which > > > >> open PR > > > >> > > > > > contains the specification now? > > > >> > > > > > > > > >> > > > > > On Tue, Feb 11, 2020 at 1:06 PM Antoine Pitrou < > > > >> anto...@python.org> > > > >> > > > wrote: > > > >> > > > > >> > > > >> > > > > >> > > > >> > > > > >> Hello, > > > >> > > > > >> > > > >> > > > > >> We have been discussing the creation of a minimalist C-based > > > >> data > > > >> > > > > >> interface for applications t/ exchange Arrow columnar data > > > >> structures > > > >> > > > > >> with each other. Some notable features of this interface > > > >> include: > > > >> > > > > >> > > > >> > > > > >> * A small amount of header-only C code can be copied > > > >> independently > > > >> > > > into > > > >> > > > > >> third-party libraries and downstream applications, no > > > >> dependencies are > > > >> > > > > >> needed even on Arrow C++ itself (notably, it is not required > > > >> to use > > > >> > > > > >> Flatbuffers, though there are trade-offs`resulting from > > > >> > > > > >> this). > > > >> > > > > >> > > > >> > > > > >> * Low development investment (in other words: limited-scope > > > >> use cases > > > >> > > > > >> can be accomplished with little code), so as to enable C or > > > >> > > > > >> C++ > > > >> > > > > >> libraries to export Arrow columnar data with minimal code. > > > >> > > > > >> > > > >> > > > > >> * Data lifetime management hooks so as to properly handle > > > >> non-trivial > > > >> > > > > >> data sharing (for example passing Arrow columnar data to an > > > >> async > > > >> > > > > >> processing consumer). > > > >> > > > > >> > > > >> > > > > >> This "C Data Interface" serves different use cases from the > > > >> > > > > >> language-independent IPC protocol and trades away a number > > > >> > > > > >> of > > > >> features > > > >> > > > > >> in the interest of minimalism / simplicity. It is not a > > > >> replacement > > > >> > > > for > > > >> > > > > >> the IPC protocol and will only be used to interchange > > > >> in-process data > > > >> > > > at > > > >> > > > > >> C or C++ call sites. > > > >> > > > > >> > > > >> > > > > >> The PR providing the specification is here: > > > >> > > > > >> https://github.com/apache/arrow/pull/5442 > > > >> > > > > >> > > > >> > > > > >> In particular, you can read the spec document here: > > > >> > > > > >> > > > >> > > > > > > >>
Re: [VOTE] Adopt Arrow in-process C Data Interface specification
There has also been interest from DuckDB: https://github.com/cwida/duckdb/issues/151 Regards Antoine. On Tue, 18 Feb 2020 02:37:43 -0600 Wes McKinney wrote: > As I recall TFX developers weighed in that this would be helpful for > TensorFlow-related use cases where they are concerns about C++ ABI > compatibility. Since this project has been ongoing for about 5 months > (see also related discussion around implementation guidelines for > third parties [1]) there has been a lot of time for people to have a > look > > [1]: > https://lists.apache.org/thread.html/b7c2094ac4e11ffce46914b603e16b6bba8f235bc6465f3ab6d320d5%40%3Cdev.arrow.apache.org%3E > > On Mon, Feb 17, 2020 at 11:19 PM Micah Kornfield > wrote: > > > > I reviewed the spec again (not the implementation). I'm +1 on this. > > > > I was wondering if we shared/received feedback on this with any other > > communities? > > > > Thanks, > > Micah > > > > > > > > On Sun, Feb 16, 2020 at 8:13 PM Micah Kornfield > > wrote: > > > > > I will try to review tomorrow and cast a vote. > > > > > > On Fri, Feb 14, 2020 at 5:41 AM Wes McKinney wrote: > > > > > >> There is only 1 binding +1 vote so far, we should probably wait for > > >> three before closing the vote (it's possible that lazy consensus could > > >> be employed here but not much harm in waiting a few more days) > > >> > > >> On Thu, Feb 13, 2020 at 8:15 PM Francois Saint-Jacques > > >> wrote: > > >> > > > >> > +1 > > >> > > > >> > On Thu, Feb 13, 2020 at 9:08 PM Fan Liya wrote: > > >> > > > >> > > > > >> > > +1 (binding) > > >> > > > > >> > > On Thu, Feb 13, 2020 at 11:52 AM Wes McKinney > > >> wrote: > > >> > > > > >> > > > +1 (binding) > > >> > > > > > >> > > > On Tue, Feb 11, 2020 at 4:29 PM Antoine Pitrou > > >> > > > > > >> wrote: > > >> > > > > > > >> > > > > > > >> > > > > Ah, you're right, it's PR 6040: > > >> > > > > https://github.com/apache/arrow/pull/6040 > > >> > > > > > > >> > > > > Similarly, the C++ implementation is at PR 6026: > > >> > > > > https://github.com/apache/arrow/pull/6026 > > >> > > > > > > >> > > > > Regards > > >> > > > > > > >> > > > > Antoine. > > >> > > > > > > >> > > > > > > >> > > > > Le 11/02/2020 à 23:17, Wes McKinney a écrit : > > >> > > > > > hi Antoine, PR 5442 seems to no longer be the right one. Which > > >> > > > > > > > >> open PR > > >> > > > > > contains the specification now? > > >> > > > > > > > >> > > > > > On Tue, Feb 11, 2020 at 1:06 PM Antoine Pitrou < > > >> anto...@python.org> > > >> > > > wrote: > > >> > > > > >> > > >> > > > > >> > > >> > > > > >> Hello, > > >> > > > > >> > > >> > > > > >> We have been discussing the creation of a minimalist C-based > > >> data > > >> > > > > >> interface for applications to exchange Arrow columnar data > > >> structures > > >> > > > > >> with each other. Some notable features of this interface > > >> include: > > >> > > > > >> > > >> > > > > >> * A small amount of header-only C code can be copied > > >> independently > > >> > > > into > > >> > > > > >> third-party libraries and downstream applications, no > > >> dependencies are > > >> > > > > >> needed even on Arrow C++ itself (notably, it is not required > > >> to use > > >> > > > > >> Flatbuffers, though there are trade-offs resulting from this). > > >> > > > > >> > > >> > > > > >> * Low development investment (in other words: limited-scope > > >> use cases > > >> > > > > >> can be accomplished with little code), so as to enable C or > > >> > > > > >> C++ > > >> > > > > >> libraries to export Arrow columnar data with minimal code. > > >> > > > > >> > > >> > > > > >> * Data lifetime management hooks so as to properly handle > > >> non-trivial > > >> > > > > >> data sharing (for example passing Arrow columnar data to an > > >> async > > >> > > > > >> processing consumer). > > >> > > > > >> > > >> > > > > >> This "C Data Interface" serves different use cases from the > > >> > > > > >> language-independent IPC protocol and trades away a number of > > >> > > > > >> > > >> features > > >> > > > > >> in the interest of minimalism / simplicity. It is not a > > >> replacement > > >> > > > for > > >> > > > > >> the IPC protocol and will only be used to interchange > > >> in-process data > > >> > > > at > > >> > > > > >> C or C++ call sites. > > >> > > > > >> > > >> > > > > >> The PR providing the specification is here: > > >> > > > > >> https://github.com/apache/arrow/pull/5442 > > >> > > > > >> > > >> > > > > >> In particular, you can read the spec document here: > > >> > > > > >> > > >> > > > > > >> https://github.com/pitrou/arrow/blob/doc-c-data-interface2/docs/source/format/CDataInterface.rst > > >> > > >> > > > > >> > > >> > > > > >> A fairly comprehensive C++ implementation of this > > >> demonstrating its > > >> > > > > >> use is found here: > > >> > > > > >> https://github.com/apache/arrow/pull/5608 > > >> > > > > >> > > >> > > > > >> (note that
Re: [VOTE] Adopt Arrow in-process C Data Interface specification
As I recall TFX developers weighed in that this would be helpful for TensorFlow-related use cases where they are concerns about C++ ABI compatibility. Since this project has been ongoing for about 5 months (see also related discussion around implementation guidelines for third parties [1]) there has been a lot of time for people to have a look [1]: https://lists.apache.org/thread.html/b7c2094ac4e11ffce46914b603e16b6bba8f235bc6465f3ab6d320d5%40%3Cdev.arrow.apache.org%3E On Mon, Feb 17, 2020 at 11:19 PM Micah Kornfield wrote: > > I reviewed the spec again (not the implementation). I'm +1 on this. > > I was wondering if we shared/received feedback on this with any other > communities? > > Thanks, > Micah > > > > On Sun, Feb 16, 2020 at 8:13 PM Micah Kornfield > wrote: > > > I will try to review tomorrow and cast a vote. > > > > On Fri, Feb 14, 2020 at 5:41 AM Wes McKinney wrote: > > > >> There is only 1 binding +1 vote so far, we should probably wait for > >> three before closing the vote (it's possible that lazy consensus could > >> be employed here but not much harm in waiting a few more days) > >> > >> On Thu, Feb 13, 2020 at 8:15 PM Francois Saint-Jacques > >> wrote: > >> > > >> > +1 > >> > > >> > On Thu, Feb 13, 2020 at 9:08 PM Fan Liya wrote: > >> > > > >> > > +1 (binding) > >> > > > >> > > On Thu, Feb 13, 2020 at 11:52 AM Wes McKinney > >> wrote: > >> > > > >> > > > +1 (binding) > >> > > > > >> > > > On Tue, Feb 11, 2020 at 4:29 PM Antoine Pitrou > >> wrote: > >> > > > > > >> > > > > > >> > > > > Ah, you're right, it's PR 6040: > >> > > > > https://github.com/apache/arrow/pull/6040 > >> > > > > > >> > > > > Similarly, the C++ implementation is at PR 6026: > >> > > > > https://github.com/apache/arrow/pull/6026 > >> > > > > > >> > > > > Regards > >> > > > > > >> > > > > Antoine. > >> > > > > > >> > > > > > >> > > > > Le 11/02/2020 à 23:17, Wes McKinney a écrit : > >> > > > > > hi Antoine, PR 5442 seems to no longer be the right one. Which > >> open PR > >> > > > > > contains the specification now? > >> > > > > > > >> > > > > > On Tue, Feb 11, 2020 at 1:06 PM Antoine Pitrou < > >> anto...@python.org> > >> > > > wrote: > >> > > > > >> > >> > > > > >> > >> > > > > >> Hello, > >> > > > > >> > >> > > > > >> We have been discussing the creation of a minimalist C-based > >> data > >> > > > > >> interface for applications to exchange Arrow columnar data > >> structures > >> > > > > >> with each other. Some notable features of this interface > >> include: > >> > > > > >> > >> > > > > >> * A small amount of header-only C code can be copied > >> independently > >> > > > into > >> > > > > >> third-party libraries and downstream applications, no > >> dependencies are > >> > > > > >> needed even on Arrow C++ itself (notably, it is not required > >> to use > >> > > > > >> Flatbuffers, though there are trade-offs resulting from this). > >> > > > > >> > >> > > > > >> * Low development investment (in other words: limited-scope > >> use cases > >> > > > > >> can be accomplished with little code), so as to enable C or C++ > >> > > > > >> libraries to export Arrow columnar data with minimal code. > >> > > > > >> > >> > > > > >> * Data lifetime management hooks so as to properly handle > >> non-trivial > >> > > > > >> data sharing (for example passing Arrow columnar data to an > >> async > >> > > > > >> processing consumer). > >> > > > > >> > >> > > > > >> This "C Data Interface" serves different use cases from the > >> > > > > >> language-independent IPC protocol and trades away a number of > >> features > >> > > > > >> in the interest of minimalism / simplicity. It is not a > >> replacement > >> > > > for > >> > > > > >> the IPC protocol and will only be used to interchange > >> in-process data > >> > > > at > >> > > > > >> C or C++ call sites. > >> > > > > >> > >> > > > > >> The PR providing the specification is here: > >> > > > > >> https://github.com/apache/arrow/pull/5442 > >> > > > > >> > >> > > > > >> In particular, you can read the spec document here: > >> > > > > >> > >> > > > > >> https://github.com/pitrou/arrow/blob/doc-c-data-interface2/docs/source/format/CDataInterface.rst > >> > > > > >> > >> > > > > >> A fairly comprehensive C++ implementation of this > >> demonstrating its > >> > > > > >> use is found here: > >> > > > > >> https://github.com/apache/arrow/pull/5608 > >> > > > > >> > >> > > > > >> (note that other applications implementing the interface may > >> choose to > >> > > > > >> only support a few features and thus have far less code to > >> write) > >> > > > > >> > >> > > > > >> Please vote to adopt the SPECIFICATION (GitHub PR #5442). > >> > > > > >> > >> > > > > >> This vote will be open for at least 72 hours > >> > > > > >> > >> > > > > >> [ ] +1 Adopt C Data Interface specification > >> > > > > >> [ ] +0 > >> > > > > >> [ ] -1 Do not adopt because... > >> > > > > >> > >> > > > > >> Thank you > >> > > > > >> > >> > > > > >> Regards > >> > > > > >> > >> > > > > >> Antoine. > >> > > > >
Re: [VOTE] Adopt Arrow in-process C Data Interface specification
I reviewed the spec again (not the implementation). I'm +1 on this. I was wondering if we shared/received feedback on this with any other communities? Thanks, Micah On Sun, Feb 16, 2020 at 8:13 PM Micah Kornfield wrote: > I will try to review tomorrow and cast a vote. > > On Fri, Feb 14, 2020 at 5:41 AM Wes McKinney wrote: > >> There is only 1 binding +1 vote so far, we should probably wait for >> three before closing the vote (it's possible that lazy consensus could >> be employed here but not much harm in waiting a few more days) >> >> On Thu, Feb 13, 2020 at 8:15 PM Francois Saint-Jacques >> wrote: >> > >> > +1 >> > >> > On Thu, Feb 13, 2020 at 9:08 PM Fan Liya wrote: >> > > >> > > +1 (binding) >> > > >> > > On Thu, Feb 13, 2020 at 11:52 AM Wes McKinney >> wrote: >> > > >> > > > +1 (binding) >> > > > >> > > > On Tue, Feb 11, 2020 at 4:29 PM Antoine Pitrou >> wrote: >> > > > > >> > > > > >> > > > > Ah, you're right, it's PR 6040: >> > > > > https://github.com/apache/arrow/pull/6040 >> > > > > >> > > > > Similarly, the C++ implementation is at PR 6026: >> > > > > https://github.com/apache/arrow/pull/6026 >> > > > > >> > > > > Regards >> > > > > >> > > > > Antoine. >> > > > > >> > > > > >> > > > > Le 11/02/2020 à 23:17, Wes McKinney a écrit : >> > > > > > hi Antoine, PR 5442 seems to no longer be the right one. Which >> open PR >> > > > > > contains the specification now? >> > > > > > >> > > > > > On Tue, Feb 11, 2020 at 1:06 PM Antoine Pitrou < >> anto...@python.org> >> > > > wrote: >> > > > > >> >> > > > > >> >> > > > > >> Hello, >> > > > > >> >> > > > > >> We have been discussing the creation of a minimalist C-based >> data >> > > > > >> interface for applications to exchange Arrow columnar data >> structures >> > > > > >> with each other. Some notable features of this interface >> include: >> > > > > >> >> > > > > >> * A small amount of header-only C code can be copied >> independently >> > > > into >> > > > > >> third-party libraries and downstream applications, no >> dependencies are >> > > > > >> needed even on Arrow C++ itself (notably, it is not required >> to use >> > > > > >> Flatbuffers, though there are trade-offs resulting from this). >> > > > > >> >> > > > > >> * Low development investment (in other words: limited-scope >> use cases >> > > > > >> can be accomplished with little code), so as to enable C or C++ >> > > > > >> libraries to export Arrow columnar data with minimal code. >> > > > > >> >> > > > > >> * Data lifetime management hooks so as to properly handle >> non-trivial >> > > > > >> data sharing (for example passing Arrow columnar data to an >> async >> > > > > >> processing consumer). >> > > > > >> >> > > > > >> This "C Data Interface" serves different use cases from the >> > > > > >> language-independent IPC protocol and trades away a number of >> features >> > > > > >> in the interest of minimalism / simplicity. It is not a >> replacement >> > > > for >> > > > > >> the IPC protocol and will only be used to interchange >> in-process data >> > > > at >> > > > > >> C or C++ call sites. >> > > > > >> >> > > > > >> The PR providing the specification is here: >> > > > > >> https://github.com/apache/arrow/pull/5442 >> > > > > >> >> > > > > >> In particular, you can read the spec document here: >> > > > > >> >> > > > >> https://github.com/pitrou/arrow/blob/doc-c-data-interface2/docs/source/format/CDataInterface.rst >> > > > > >> >> > > > > >> A fairly comprehensive C++ implementation of this >> demonstrating its >> > > > > >> use is found here: >> > > > > >> https://github.com/apache/arrow/pull/5608 >> > > > > >> >> > > > > >> (note that other applications implementing the interface may >> choose to >> > > > > >> only support a few features and thus have far less code to >> write) >> > > > > >> >> > > > > >> Please vote to adopt the SPECIFICATION (GitHub PR #5442). >> > > > > >> >> > > > > >> This vote will be open for at least 72 hours >> > > > > >> >> > > > > >> [ ] +1 Adopt C Data Interface specification >> > > > > >> [ ] +0 >> > > > > >> [ ] -1 Do not adopt because... >> > > > > >> >> > > > > >> Thank you >> > > > > >> >> > > > > >> Regards >> > > > > >> >> > > > > >> Antoine. >> > > > > >> >> > > > > >> >> > > > > >> (PS: yes, this is in large part a copy/paste of Wes's previous >> vote >> > > > > >> email :-)) >> > > > >> >
Re: [VOTE] Adopt Arrow in-process C Data Interface specification
I will try to review tomorrow and cast a vote. On Fri, Feb 14, 2020 at 5:41 AM Wes McKinney wrote: > There is only 1 binding +1 vote so far, we should probably wait for > three before closing the vote (it's possible that lazy consensus could > be employed here but not much harm in waiting a few more days) > > On Thu, Feb 13, 2020 at 8:15 PM Francois Saint-Jacques > wrote: > > > > +1 > > > > On Thu, Feb 13, 2020 at 9:08 PM Fan Liya wrote: > > > > > > +1 (binding) > > > > > > On Thu, Feb 13, 2020 at 11:52 AM Wes McKinney > wrote: > > > > > > > +1 (binding) > > > > > > > > On Tue, Feb 11, 2020 at 4:29 PM Antoine Pitrou > wrote: > > > > > > > > > > > > > > > Ah, you're right, it's PR 6040: > > > > > https://github.com/apache/arrow/pull/6040 > > > > > > > > > > Similarly, the C++ implementation is at PR 6026: > > > > > https://github.com/apache/arrow/pull/6026 > > > > > > > > > > Regards > > > > > > > > > > Antoine. > > > > > > > > > > > > > > > Le 11/02/2020 à 23:17, Wes McKinney a écrit : > > > > > > hi Antoine, PR 5442 seems to no longer be the right one. Which > open PR > > > > > > contains the specification now? > > > > > > > > > > > > On Tue, Feb 11, 2020 at 1:06 PM Antoine Pitrou < > anto...@python.org> > > > > wrote: > > > > > >> > > > > > >> > > > > > >> Hello, > > > > > >> > > > > > >> We have been discussing the creation of a minimalist C-based > data > > > > > >> interface for applications to exchange Arrow columnar data > structures > > > > > >> with each other. Some notable features of this interface > include: > > > > > >> > > > > > >> * A small amount of header-only C code can be copied > independently > > > > into > > > > > >> third-party libraries and downstream applications, no > dependencies are > > > > > >> needed even on Arrow C++ itself (notably, it is not required to > use > > > > > >> Flatbuffers, though there are trade-offs resulting from this). > > > > > >> > > > > > >> * Low development investment (in other words: limited-scope use > cases > > > > > >> can be accomplished with little code), so as to enable C or C++ > > > > > >> libraries to export Arrow columnar data with minimal code. > > > > > >> > > > > > >> * Data lifetime management hooks so as to properly handle > non-trivial > > > > > >> data sharing (for example passing Arrow columnar data to an > async > > > > > >> processing consumer). > > > > > >> > > > > > >> This "C Data Interface" serves different use cases from the > > > > > >> language-independent IPC protocol and trades away a number of > features > > > > > >> in the interest of minimalism / simplicity. It is not a > replacement > > > > for > > > > > >> the IPC protocol and will only be used to interchange > in-process data > > > > at > > > > > >> C or C++ call sites. > > > > > >> > > > > > >> The PR providing the specification is here: > > > > > >> https://github.com/apache/arrow/pull/5442 > > > > > >> > > > > > >> In particular, you can read the spec document here: > > > > > >> > > > > > https://github.com/pitrou/arrow/blob/doc-c-data-interface2/docs/source/format/CDataInterface.rst > > > > > >> > > > > > >> A fairly comprehensive C++ implementation of this demonstrating > its > > > > > >> use is found here: > > > > > >> https://github.com/apache/arrow/pull/5608 > > > > > >> > > > > > >> (note that other applications implementing the interface may > choose to > > > > > >> only support a few features and thus have far less code to > write) > > > > > >> > > > > > >> Please vote to adopt the SPECIFICATION (GitHub PR #5442). > > > > > >> > > > > > >> This vote will be open for at least 72 hours > > > > > >> > > > > > >> [ ] +1 Adopt C Data Interface specification > > > > > >> [ ] +0 > > > > > >> [ ] -1 Do not adopt because... > > > > > >> > > > > > >> Thank you > > > > > >> > > > > > >> Regards > > > > > >> > > > > > >> Antoine. > > > > > >> > > > > > >> > > > > > >> (PS: yes, this is in large part a copy/paste of Wes's previous > vote > > > > > >> email :-)) > > > > >
Re: [VOTE] Adopt Arrow in-process C Data Interface specification
There is only 1 binding +1 vote so far, we should probably wait for three before closing the vote (it's possible that lazy consensus could be employed here but not much harm in waiting a few more days) On Thu, Feb 13, 2020 at 8:15 PM Francois Saint-Jacques wrote: > > +1 > > On Thu, Feb 13, 2020 at 9:08 PM Fan Liya wrote: > > > > +1 (binding) > > > > On Thu, Feb 13, 2020 at 11:52 AM Wes McKinney wrote: > > > > > +1 (binding) > > > > > > On Tue, Feb 11, 2020 at 4:29 PM Antoine Pitrou wrote: > > > > > > > > > > > > Ah, you're right, it's PR 6040: > > > > https://github.com/apache/arrow/pull/6040 > > > > > > > > Similarly, the C++ implementation is at PR 6026: > > > > https://github.com/apache/arrow/pull/6026 > > > > > > > > Regards > > > > > > > > Antoine. > > > > > > > > > > > > Le 11/02/2020 à 23:17, Wes McKinney a écrit : > > > > > hi Antoine, PR 5442 seems to no longer be the right one. Which open PR > > > > > contains the specification now? > > > > > > > > > > On Tue, Feb 11, 2020 at 1:06 PM Antoine Pitrou > > > wrote: > > > > >> > > > > >> > > > > >> Hello, > > > > >> > > > > >> We have been discussing the creation of a minimalist C-based data > > > > >> interface for applications to exchange Arrow columnar data structures > > > > >> with each other. Some notable features of this interface include: > > > > >> > > > > >> * A small amount of header-only C code can be copied independently > > > into > > > > >> third-party libraries and downstream applications, no dependencies > > > > >> are > > > > >> needed even on Arrow C++ itself (notably, it is not required to use > > > > >> Flatbuffers, though there are trade-offs resulting from this). > > > > >> > > > > >> * Low development investment (in other words: limited-scope use cases > > > > >> can be accomplished with little code), so as to enable C or C++ > > > > >> libraries to export Arrow columnar data with minimal code. > > > > >> > > > > >> * Data lifetime management hooks so as to properly handle non-trivial > > > > >> data sharing (for example passing Arrow columnar data to an async > > > > >> processing consumer). > > > > >> > > > > >> This "C Data Interface" serves different use cases from the > > > > >> language-independent IPC protocol and trades away a number of > > > > >> features > > > > >> in the interest of minimalism / simplicity. It is not a replacement > > > for > > > > >> the IPC protocol and will only be used to interchange in-process data > > > at > > > > >> C or C++ call sites. > > > > >> > > > > >> The PR providing the specification is here: > > > > >> https://github.com/apache/arrow/pull/5442 > > > > >> > > > > >> In particular, you can read the spec document here: > > > > >> > > > https://github.com/pitrou/arrow/blob/doc-c-data-interface2/docs/source/format/CDataInterface.rst > > > > >> > > > > >> A fairly comprehensive C++ implementation of this demonstrating its > > > > >> use is found here: > > > > >> https://github.com/apache/arrow/pull/5608 > > > > >> > > > > >> (note that other applications implementing the interface may choose > > > > >> to > > > > >> only support a few features and thus have far less code to write) > > > > >> > > > > >> Please vote to adopt the SPECIFICATION (GitHub PR #5442). > > > > >> > > > > >> This vote will be open for at least 72 hours > > > > >> > > > > >> [ ] +1 Adopt C Data Interface specification > > > > >> [ ] +0 > > > > >> [ ] -1 Do not adopt because... > > > > >> > > > > >> Thank you > > > > >> > > > > >> Regards > > > > >> > > > > >> Antoine. > > > > >> > > > > >> > > > > >> (PS: yes, this is in large part a copy/paste of Wes's previous vote > > > > >> email :-)) > > >
Re: [VOTE] Adopt Arrow in-process C Data Interface specification
+1 On Thu, Feb 13, 2020 at 9:08 PM Fan Liya wrote: > > +1 (binding) > > On Thu, Feb 13, 2020 at 11:52 AM Wes McKinney wrote: > > > +1 (binding) > > > > On Tue, Feb 11, 2020 at 4:29 PM Antoine Pitrou wrote: > > > > > > > > > Ah, you're right, it's PR 6040: > > > https://github.com/apache/arrow/pull/6040 > > > > > > Similarly, the C++ implementation is at PR 6026: > > > https://github.com/apache/arrow/pull/6026 > > > > > > Regards > > > > > > Antoine. > > > > > > > > > Le 11/02/2020 à 23:17, Wes McKinney a écrit : > > > > hi Antoine, PR 5442 seems to no longer be the right one. Which open PR > > > > contains the specification now? > > > > > > > > On Tue, Feb 11, 2020 at 1:06 PM Antoine Pitrou > > wrote: > > > >> > > > >> > > > >> Hello, > > > >> > > > >> We have been discussing the creation of a minimalist C-based data > > > >> interface for applications to exchange Arrow columnar data structures > > > >> with each other. Some notable features of this interface include: > > > >> > > > >> * A small amount of header-only C code can be copied independently > > into > > > >> third-party libraries and downstream applications, no dependencies are > > > >> needed even on Arrow C++ itself (notably, it is not required to use > > > >> Flatbuffers, though there are trade-offs resulting from this). > > > >> > > > >> * Low development investment (in other words: limited-scope use cases > > > >> can be accomplished with little code), so as to enable C or C++ > > > >> libraries to export Arrow columnar data with minimal code. > > > >> > > > >> * Data lifetime management hooks so as to properly handle non-trivial > > > >> data sharing (for example passing Arrow columnar data to an async > > > >> processing consumer). > > > >> > > > >> This "C Data Interface" serves different use cases from the > > > >> language-independent IPC protocol and trades away a number of features > > > >> in the interest of minimalism / simplicity. It is not a replacement > > for > > > >> the IPC protocol and will only be used to interchange in-process data > > at > > > >> C or C++ call sites. > > > >> > > > >> The PR providing the specification is here: > > > >> https://github.com/apache/arrow/pull/5442 > > > >> > > > >> In particular, you can read the spec document here: > > > >> > > https://github.com/pitrou/arrow/blob/doc-c-data-interface2/docs/source/format/CDataInterface.rst > > > >> > > > >> A fairly comprehensive C++ implementation of this demonstrating its > > > >> use is found here: > > > >> https://github.com/apache/arrow/pull/5608 > > > >> > > > >> (note that other applications implementing the interface may choose to > > > >> only support a few features and thus have far less code to write) > > > >> > > > >> Please vote to adopt the SPECIFICATION (GitHub PR #5442). > > > >> > > > >> This vote will be open for at least 72 hours > > > >> > > > >> [ ] +1 Adopt C Data Interface specification > > > >> [ ] +0 > > > >> [ ] -1 Do not adopt because... > > > >> > > > >> Thank you > > > >> > > > >> Regards > > > >> > > > >> Antoine. > > > >> > > > >> > > > >> (PS: yes, this is in large part a copy/paste of Wes's previous vote > > > >> email :-)) > >
Re: [VOTE] Adopt Arrow in-process C Data Interface specification
+1 (binding) On Thu, Feb 13, 2020 at 11:52 AM Wes McKinney wrote: > +1 (binding) > > On Tue, Feb 11, 2020 at 4:29 PM Antoine Pitrou wrote: > > > > > > Ah, you're right, it's PR 6040: > > https://github.com/apache/arrow/pull/6040 > > > > Similarly, the C++ implementation is at PR 6026: > > https://github.com/apache/arrow/pull/6026 > > > > Regards > > > > Antoine. > > > > > > Le 11/02/2020 à 23:17, Wes McKinney a écrit : > > > hi Antoine, PR 5442 seems to no longer be the right one. Which open PR > > > contains the specification now? > > > > > > On Tue, Feb 11, 2020 at 1:06 PM Antoine Pitrou > wrote: > > >> > > >> > > >> Hello, > > >> > > >> We have been discussing the creation of a minimalist C-based data > > >> interface for applications to exchange Arrow columnar data structures > > >> with each other. Some notable features of this interface include: > > >> > > >> * A small amount of header-only C code can be copied independently > into > > >> third-party libraries and downstream applications, no dependencies are > > >> needed even on Arrow C++ itself (notably, it is not required to use > > >> Flatbuffers, though there are trade-offs resulting from this). > > >> > > >> * Low development investment (in other words: limited-scope use cases > > >> can be accomplished with little code), so as to enable C or C++ > > >> libraries to export Arrow columnar data with minimal code. > > >> > > >> * Data lifetime management hooks so as to properly handle non-trivial > > >> data sharing (for example passing Arrow columnar data to an async > > >> processing consumer). > > >> > > >> This "C Data Interface" serves different use cases from the > > >> language-independent IPC protocol and trades away a number of features > > >> in the interest of minimalism / simplicity. It is not a replacement > for > > >> the IPC protocol and will only be used to interchange in-process data > at > > >> C or C++ call sites. > > >> > > >> The PR providing the specification is here: > > >> https://github.com/apache/arrow/pull/5442 > > >> > > >> In particular, you can read the spec document here: > > >> > https://github.com/pitrou/arrow/blob/doc-c-data-interface2/docs/source/format/CDataInterface.rst > > >> > > >> A fairly comprehensive C++ implementation of this demonstrating its > > >> use is found here: > > >> https://github.com/apache/arrow/pull/5608 > > >> > > >> (note that other applications implementing the interface may choose to > > >> only support a few features and thus have far less code to write) > > >> > > >> Please vote to adopt the SPECIFICATION (GitHub PR #5442). > > >> > > >> This vote will be open for at least 72 hours > > >> > > >> [ ] +1 Adopt C Data Interface specification > > >> [ ] +0 > > >> [ ] -1 Do not adopt because... > > >> > > >> Thank you > > >> > > >> Regards > > >> > > >> Antoine. > > >> > > >> > > >> (PS: yes, this is in large part a copy/paste of Wes's previous vote > > >> email :-)) >
Re: [VOTE] Adopt Arrow in-process C Data Interface specification
+1 (binding) On Tue, Feb 11, 2020 at 4:29 PM Antoine Pitrou wrote: > > > Ah, you're right, it's PR 6040: > https://github.com/apache/arrow/pull/6040 > > Similarly, the C++ implementation is at PR 6026: > https://github.com/apache/arrow/pull/6026 > > Regards > > Antoine. > > > Le 11/02/2020 à 23:17, Wes McKinney a écrit : > > hi Antoine, PR 5442 seems to no longer be the right one. Which open PR > > contains the specification now? > > > > On Tue, Feb 11, 2020 at 1:06 PM Antoine Pitrou wrote: > >> > >> > >> Hello, > >> > >> We have been discussing the creation of a minimalist C-based data > >> interface for applications to exchange Arrow columnar data structures > >> with each other. Some notable features of this interface include: > >> > >> * A small amount of header-only C code can be copied independently into > >> third-party libraries and downstream applications, no dependencies are > >> needed even on Arrow C++ itself (notably, it is not required to use > >> Flatbuffers, though there are trade-offs resulting from this). > >> > >> * Low development investment (in other words: limited-scope use cases > >> can be accomplished with little code), so as to enable C or C++ > >> libraries to export Arrow columnar data with minimal code. > >> > >> * Data lifetime management hooks so as to properly handle non-trivial > >> data sharing (for example passing Arrow columnar data to an async > >> processing consumer). > >> > >> This "C Data Interface" serves different use cases from the > >> language-independent IPC protocol and trades away a number of features > >> in the interest of minimalism / simplicity. It is not a replacement for > >> the IPC protocol and will only be used to interchange in-process data at > >> C or C++ call sites. > >> > >> The PR providing the specification is here: > >> https://github.com/apache/arrow/pull/5442 > >> > >> In particular, you can read the spec document here: > >> https://github.com/pitrou/arrow/blob/doc-c-data-interface2/docs/source/format/CDataInterface.rst > >> > >> A fairly comprehensive C++ implementation of this demonstrating its > >> use is found here: > >> https://github.com/apache/arrow/pull/5608 > >> > >> (note that other applications implementing the interface may choose to > >> only support a few features and thus have far less code to write) > >> > >> Please vote to adopt the SPECIFICATION (GitHub PR #5442). > >> > >> This vote will be open for at least 72 hours > >> > >> [ ] +1 Adopt C Data Interface specification > >> [ ] +0 > >> [ ] -1 Do not adopt because... > >> > >> Thank you > >> > >> Regards > >> > >> Antoine. > >> > >> > >> (PS: yes, this is in large part a copy/paste of Wes's previous vote > >> email :-))
Re: [VOTE] Adopt Arrow in-process C Data Interface specification
Ah, you're right, it's PR 6040: https://github.com/apache/arrow/pull/6040 Similarly, the C++ implementation is at PR 6026: https://github.com/apache/arrow/pull/6026 Regards Antoine. Le 11/02/2020 à 23:17, Wes McKinney a écrit : > hi Antoine, PR 5442 seems to no longer be the right one. Which open PR > contains the specification now? > > On Tue, Feb 11, 2020 at 1:06 PM Antoine Pitrou wrote: >> >> >> Hello, >> >> We have been discussing the creation of a minimalist C-based data >> interface for applications to exchange Arrow columnar data structures >> with each other. Some notable features of this interface include: >> >> * A small amount of header-only C code can be copied independently into >> third-party libraries and downstream applications, no dependencies are >> needed even on Arrow C++ itself (notably, it is not required to use >> Flatbuffers, though there are trade-offs resulting from this). >> >> * Low development investment (in other words: limited-scope use cases >> can be accomplished with little code), so as to enable C or C++ >> libraries to export Arrow columnar data with minimal code. >> >> * Data lifetime management hooks so as to properly handle non-trivial >> data sharing (for example passing Arrow columnar data to an async >> processing consumer). >> >> This "C Data Interface" serves different use cases from the >> language-independent IPC protocol and trades away a number of features >> in the interest of minimalism / simplicity. It is not a replacement for >> the IPC protocol and will only be used to interchange in-process data at >> C or C++ call sites. >> >> The PR providing the specification is here: >> https://github.com/apache/arrow/pull/5442 >> >> In particular, you can read the spec document here: >> https://github.com/pitrou/arrow/blob/doc-c-data-interface2/docs/source/format/CDataInterface.rst >> >> A fairly comprehensive C++ implementation of this demonstrating its >> use is found here: >> https://github.com/apache/arrow/pull/5608 >> >> (note that other applications implementing the interface may choose to >> only support a few features and thus have far less code to write) >> >> Please vote to adopt the SPECIFICATION (GitHub PR #5442). >> >> This vote will be open for at least 72 hours >> >> [ ] +1 Adopt C Data Interface specification >> [ ] +0 >> [ ] -1 Do not adopt because... >> >> Thank you >> >> Regards >> >> Antoine. >> >> >> (PS: yes, this is in large part a copy/paste of Wes's previous vote >> email :-))
Re: [VOTE] Adopt Arrow in-process C Data Interface specification
hi Antoine, PR 5442 seems to no longer be the right one. Which open PR contains the specification now? On Tue, Feb 11, 2020 at 1:06 PM Antoine Pitrou wrote: > > > Hello, > > We have been discussing the creation of a minimalist C-based data > interface for applications to exchange Arrow columnar data structures > with each other. Some notable features of this interface include: > > * A small amount of header-only C code can be copied independently into > third-party libraries and downstream applications, no dependencies are > needed even on Arrow C++ itself (notably, it is not required to use > Flatbuffers, though there are trade-offs resulting from this). > > * Low development investment (in other words: limited-scope use cases > can be accomplished with little code), so as to enable C or C++ > libraries to export Arrow columnar data with minimal code. > > * Data lifetime management hooks so as to properly handle non-trivial > data sharing (for example passing Arrow columnar data to an async > processing consumer). > > This "C Data Interface" serves different use cases from the > language-independent IPC protocol and trades away a number of features > in the interest of minimalism / simplicity. It is not a replacement for > the IPC protocol and will only be used to interchange in-process data at > C or C++ call sites. > > The PR providing the specification is here: > https://github.com/apache/arrow/pull/5442 > > In particular, you can read the spec document here: > https://github.com/pitrou/arrow/blob/doc-c-data-interface2/docs/source/format/CDataInterface.rst > > A fairly comprehensive C++ implementation of this demonstrating its > use is found here: > https://github.com/apache/arrow/pull/5608 > > (note that other applications implementing the interface may choose to > only support a few features and thus have far less code to write) > > Please vote to adopt the SPECIFICATION (GitHub PR #5442). > > This vote will be open for at least 72 hours > > [ ] +1 Adopt C Data Interface specification > [ ] +0 > [ ] -1 Do not adopt because... > > Thank you > > Regards > > Antoine. > > > (PS: yes, this is in large part a copy/paste of Wes's previous vote > email :-))
[VOTE] Adopt Arrow in-process C Data Interface specification
Hello, We have been discussing the creation of a minimalist C-based data interface for applications to exchange Arrow columnar data structures with each other. Some notable features of this interface include: * A small amount of header-only C code can be copied independently into third-party libraries and downstream applications, no dependencies are needed even on Arrow C++ itself (notably, it is not required to use Flatbuffers, though there are trade-offs resulting from this). * Low development investment (in other words: limited-scope use cases can be accomplished with little code), so as to enable C or C++ libraries to export Arrow columnar data with minimal code. * Data lifetime management hooks so as to properly handle non-trivial data sharing (for example passing Arrow columnar data to an async processing consumer). This "C Data Interface" serves different use cases from the language-independent IPC protocol and trades away a number of features in the interest of minimalism / simplicity. It is not a replacement for the IPC protocol and will only be used to interchange in-process data at C or C++ call sites. The PR providing the specification is here: https://github.com/apache/arrow/pull/5442 In particular, you can read the spec document here: https://github.com/pitrou/arrow/blob/doc-c-data-interface2/docs/source/format/CDataInterface.rst A fairly comprehensive C++ implementation of this demonstrating its use is found here: https://github.com/apache/arrow/pull/5608 (note that other applications implementing the interface may choose to only support a few features and thus have far less code to write) Please vote to adopt the SPECIFICATION (GitHub PR #5442). This vote will be open for at least 72 hours [ ] +1 Adopt C Data Interface specification [ ] +0 [ ] -1 Do not adopt because... Thank you Regards Antoine. (PS: yes, this is in large part a copy/paste of Wes's previous vote email :-))
Re: [VOTE] Adopt Arrow in-process C Data Interface specification
Right, I'll give it a try in a few days. Best regards Antoine. Le 09/12/2019 à 12:46, Wes McKinney a écrit : > While it's unfortunate to have to re-examine some basic design issues > at this stage, I agree with Jacques's point that it would be nice if > we can accommodate (without great hardship) the use case where a > stream/pipeline of record batches are passed in C that does not > require the called function to have to parse or validate the schema > each time. Gandiva uses its own data structure [1] for passing a > schemaless record batch across JNI and in theory this could be > replaced by the C data structure > > [1]: https://github.com/apache/arrow/blob/master/cpp/src/gandiva/eval_batch.h > > On Sun, Dec 8, 2019 at 8:09 PM Fan Liya wrote: >> >> +1, as this is useful IMO. >> >> Best, >> Liya Fan >> >> On Sat, Dec 7, 2019 at 12:21 PM Jacques Nadeau wrote: >> >>> -1 (binding) >>> >>> I'm voting -1 on this. I posted the thinking why on the PR. The high-level >>> is that I think it needs to better address the pipelined use case as right >>> now it fails to support that at all and has too much weight to ignore that >>> use case. >>> >>> I actually would have posted it here but totally missed this vote thread >>> until just now (I'm traveling atm). My -1 is not an indefinite -1, I'm >>> simply asking for some small changes to the approach to also support the >>> pipelined usage pattern. >>> >>> On Sat, Dec 7, 2019 at 3:09 AM Wes McKinney wrote: >>> Hello, Could more PMC members take a look at this work? Thank you On Tue, Dec 3, 2019 at 1:50 PM Neal Richardson wrote: > > +1 (non-binding) > > On Tue, Dec 3, 2019 at 10:56 AM Wes McKinney wrote: > >> +1 (binding) >> >> On Tue, Dec 3, 2019 at 12:54 PM Wes McKinney wrote: >>> >>> hello, >>> >>> We have been discussing the creation of a minimalist C-based data >>> interface for applications to exchange Arrow columnar data >>> structures >>> with each other. Some notable features of this interface include: >>> >>> * A small amount of header-only C code can be copied into >>> downstream >>> applications, no external dependencies are needed (notable, it is >>> not >>> required to use Flatbuffers, though there are trade-offs resulting >>> from this) >>> * Low development investment (in other words: limited-scope use >>> cases >>> can be accomplished with little code). Enable C libraries to export >>> Arrow columnar data at C call sites with minimal code >>> >>> This "C Data Interface" serves different use cases from the >>> language-independent IPC protocol and trades away a number of features >>> (such as forward/backward compatibility) in the interest of minimalism >>> / simplicity. It is not a replacement for the IPC protocol and will >>> only be used to interchange in-process data at C call sites. >>> >>> The PR providing the specification is here >>> >>> https://github.com/apache/arrow/pull/5442 >>> >>> A fairly comprehensive C++ implementation of this demonstrating its >>> use is found here >>> >>> https://github.com/apache/arrow/pull/5608 >>> >>> (note that other applications implementing the interface may choose to >>> only support a few features and thus have far less code to write) >>> >>> Please vote to adopt the SPECIFICATION (GitHub PR #5442). >>> >>> This vote will be open for at least 72 hours >>> >>> [ ] +1 Adopt C Data Interface specification >>> [ ] +0 >>> [ ] -1 Do not adopt because... >>> >>> Thank you >> >>>
Re: [VOTE] Adopt Arrow in-process C Data Interface specification
While it's unfortunate to have to re-examine some basic design issues at this stage, I agree with Jacques's point that it would be nice if we can accommodate (without great hardship) the use case where a stream/pipeline of record batches are passed in C that does not require the called function to have to parse or validate the schema each time. Gandiva uses its own data structure [1] for passing a schemaless record batch across JNI and in theory this could be replaced by the C data structure [1]: https://github.com/apache/arrow/blob/master/cpp/src/gandiva/eval_batch.h On Sun, Dec 8, 2019 at 8:09 PM Fan Liya wrote: > > +1, as this is useful IMO. > > Best, > Liya Fan > > On Sat, Dec 7, 2019 at 12:21 PM Jacques Nadeau wrote: > > > -1 (binding) > > > > I'm voting -1 on this. I posted the thinking why on the PR. The high-level > > is that I think it needs to better address the pipelined use case as right > > now it fails to support that at all and has too much weight to ignore that > > use case. > > > > I actually would have posted it here but totally missed this vote thread > > until just now (I'm traveling atm). My -1 is not an indefinite -1, I'm > > simply asking for some small changes to the approach to also support the > > pipelined usage pattern. > > > > On Sat, Dec 7, 2019 at 3:09 AM Wes McKinney wrote: > > > > > Hello, > > > > > > Could more PMC members take a look at this work? > > > > > > Thank you > > > > > > On Tue, Dec 3, 2019 at 1:50 PM Neal Richardson > > > wrote: > > > > > > > > +1 (non-binding) > > > > > > > > On Tue, Dec 3, 2019 at 10:56 AM Wes McKinney > > > wrote: > > > > > > > > > +1 (binding) > > > > > > > > > > On Tue, Dec 3, 2019 at 12:54 PM Wes McKinney > > > wrote: > > > > > > > > > > > > hello, > > > > > > > > > > > > We have been discussing the creation of a minimalist C-based data > > > > > > interface for applications to exchange Arrow columnar data > > structures > > > > > > with each other. Some notable features of this interface include: > > > > > > > > > > > > * A small amount of header-only C code can be copied into > > downstream > > > > > > applications, no external dependencies are needed (notable, it is > > not > > > > > > required to use Flatbuffers, though there are trade-offs resulting > > > > > > from this) > > > > > > * Low development investment (in other words: limited-scope use > > cases > > > > > > can be accomplished with little code). Enable C libraries to export > > > > > > Arrow columnar data at C call sites with minimal code > > > > > > > > > > > > This "C Data Interface" serves different use cases from the > > > > > > language-independent IPC protocol and trades away a number of > > > features > > > > > > (such as forward/backward compatibility) in the interest of > > > minimalism > > > > > > / simplicity. It is not a replacement for the IPC protocol and will > > > > > > only be used to interchange in-process data at C call sites. > > > > > > > > > > > > The PR providing the specification is here > > > > > > > > > > > > https://github.com/apache/arrow/pull/5442 > > > > > > > > > > > > A fairly comprehensive C++ implementation of this demonstrating its > > > > > > use is found here > > > > > > > > > > > > https://github.com/apache/arrow/pull/5608 > > > > > > > > > > > > (note that other applications implementing the interface may choose > > > to > > > > > > only support a few features and thus have far less code to write) > > > > > > > > > > > > Please vote to adopt the SPECIFICATION (GitHub PR #5442). > > > > > > > > > > > > This vote will be open for at least 72 hours > > > > > > > > > > > > [ ] +1 Adopt C Data Interface specification > > > > > > [ ] +0 > > > > > > [ ] -1 Do not adopt because... > > > > > > > > > > > > Thank you > > > > > > > > > >
Re: [VOTE] Adopt Arrow in-process C Data Interface specification
+1, as this is useful IMO. Best, Liya Fan On Sat, Dec 7, 2019 at 12:21 PM Jacques Nadeau wrote: > -1 (binding) > > I'm voting -1 on this. I posted the thinking why on the PR. The high-level > is that I think it needs to better address the pipelined use case as right > now it fails to support that at all and has too much weight to ignore that > use case. > > I actually would have posted it here but totally missed this vote thread > until just now (I'm traveling atm). My -1 is not an indefinite -1, I'm > simply asking for some small changes to the approach to also support the > pipelined usage pattern. > > On Sat, Dec 7, 2019 at 3:09 AM Wes McKinney wrote: > > > Hello, > > > > Could more PMC members take a look at this work? > > > > Thank you > > > > On Tue, Dec 3, 2019 at 1:50 PM Neal Richardson > > wrote: > > > > > > +1 (non-binding) > > > > > > On Tue, Dec 3, 2019 at 10:56 AM Wes McKinney > > wrote: > > > > > > > +1 (binding) > > > > > > > > On Tue, Dec 3, 2019 at 12:54 PM Wes McKinney > > wrote: > > > > > > > > > > hello, > > > > > > > > > > We have been discussing the creation of a minimalist C-based data > > > > > interface for applications to exchange Arrow columnar data > structures > > > > > with each other. Some notable features of this interface include: > > > > > > > > > > * A small amount of header-only C code can be copied into > downstream > > > > > applications, no external dependencies are needed (notable, it is > not > > > > > required to use Flatbuffers, though there are trade-offs resulting > > > > > from this) > > > > > * Low development investment (in other words: limited-scope use > cases > > > > > can be accomplished with little code). Enable C libraries to export > > > > > Arrow columnar data at C call sites with minimal code > > > > > > > > > > This "C Data Interface" serves different use cases from the > > > > > language-independent IPC protocol and trades away a number of > > features > > > > > (such as forward/backward compatibility) in the interest of > > minimalism > > > > > / simplicity. It is not a replacement for the IPC protocol and will > > > > > only be used to interchange in-process data at C call sites. > > > > > > > > > > The PR providing the specification is here > > > > > > > > > > https://github.com/apache/arrow/pull/5442 > > > > > > > > > > A fairly comprehensive C++ implementation of this demonstrating its > > > > > use is found here > > > > > > > > > > https://github.com/apache/arrow/pull/5608 > > > > > > > > > > (note that other applications implementing the interface may choose > > to > > > > > only support a few features and thus have far less code to write) > > > > > > > > > > Please vote to adopt the SPECIFICATION (GitHub PR #5442). > > > > > > > > > > This vote will be open for at least 72 hours > > > > > > > > > > [ ] +1 Adopt C Data Interface specification > > > > > [ ] +0 > > > > > [ ] -1 Do not adopt because... > > > > > > > > > > Thank you > > > > > > >
Re: [VOTE] Adopt Arrow in-process C Data Interface specification
-1 (binding) I'm voting -1 on this. I posted the thinking why on the PR. The high-level is that I think it needs to better address the pipelined use case as right now it fails to support that at all and has too much weight to ignore that use case. I actually would have posted it here but totally missed this vote thread until just now (I'm traveling atm). My -1 is not an indefinite -1, I'm simply asking for some small changes to the approach to also support the pipelined usage pattern. On Sat, Dec 7, 2019 at 3:09 AM Wes McKinney wrote: > Hello, > > Could more PMC members take a look at this work? > > Thank you > > On Tue, Dec 3, 2019 at 1:50 PM Neal Richardson > wrote: > > > > +1 (non-binding) > > > > On Tue, Dec 3, 2019 at 10:56 AM Wes McKinney > wrote: > > > > > +1 (binding) > > > > > > On Tue, Dec 3, 2019 at 12:54 PM Wes McKinney > wrote: > > > > > > > > hello, > > > > > > > > We have been discussing the creation of a minimalist C-based data > > > > interface for applications to exchange Arrow columnar data structures > > > > with each other. Some notable features of this interface include: > > > > > > > > * A small amount of header-only C code can be copied into downstream > > > > applications, no external dependencies are needed (notable, it is not > > > > required to use Flatbuffers, though there are trade-offs resulting > > > > from this) > > > > * Low development investment (in other words: limited-scope use cases > > > > can be accomplished with little code). Enable C libraries to export > > > > Arrow columnar data at C call sites with minimal code > > > > > > > > This "C Data Interface" serves different use cases from the > > > > language-independent IPC protocol and trades away a number of > features > > > > (such as forward/backward compatibility) in the interest of > minimalism > > > > / simplicity. It is not a replacement for the IPC protocol and will > > > > only be used to interchange in-process data at C call sites. > > > > > > > > The PR providing the specification is here > > > > > > > > https://github.com/apache/arrow/pull/5442 > > > > > > > > A fairly comprehensive C++ implementation of this demonstrating its > > > > use is found here > > > > > > > > https://github.com/apache/arrow/pull/5608 > > > > > > > > (note that other applications implementing the interface may choose > to > > > > only support a few features and thus have far less code to write) > > > > > > > > Please vote to adopt the SPECIFICATION (GitHub PR #5442). > > > > > > > > This vote will be open for at least 72 hours > > > > > > > > [ ] +1 Adopt C Data Interface specification > > > > [ ] +0 > > > > [ ] -1 Do not adopt because... > > > > > > > > Thank you > > > >
Re: [VOTE] Adopt Arrow in-process C Data Interface specification
Hello, Could more PMC members take a look at this work? Thank you On Tue, Dec 3, 2019 at 1:50 PM Neal Richardson wrote: > > +1 (non-binding) > > On Tue, Dec 3, 2019 at 10:56 AM Wes McKinney wrote: > > > +1 (binding) > > > > On Tue, Dec 3, 2019 at 12:54 PM Wes McKinney wrote: > > > > > > hello, > > > > > > We have been discussing the creation of a minimalist C-based data > > > interface for applications to exchange Arrow columnar data structures > > > with each other. Some notable features of this interface include: > > > > > > * A small amount of header-only C code can be copied into downstream > > > applications, no external dependencies are needed (notable, it is not > > > required to use Flatbuffers, though there are trade-offs resulting > > > from this) > > > * Low development investment (in other words: limited-scope use cases > > > can be accomplished with little code). Enable C libraries to export > > > Arrow columnar data at C call sites with minimal code > > > > > > This "C Data Interface" serves different use cases from the > > > language-independent IPC protocol and trades away a number of features > > > (such as forward/backward compatibility) in the interest of minimalism > > > / simplicity. It is not a replacement for the IPC protocol and will > > > only be used to interchange in-process data at C call sites. > > > > > > The PR providing the specification is here > > > > > > https://github.com/apache/arrow/pull/5442 > > > > > > A fairly comprehensive C++ implementation of this demonstrating its > > > use is found here > > > > > > https://github.com/apache/arrow/pull/5608 > > > > > > (note that other applications implementing the interface may choose to > > > only support a few features and thus have far less code to write) > > > > > > Please vote to adopt the SPECIFICATION (GitHub PR #5442). > > > > > > This vote will be open for at least 72 hours > > > > > > [ ] +1 Adopt C Data Interface specification > > > [ ] +0 > > > [ ] -1 Do not adopt because... > > > > > > Thank you > >
Re: [VOTE] Adopt Arrow in-process C Data Interface specification
+1 (non-binding) On Tue, Dec 3, 2019 at 10:56 AM Wes McKinney wrote: > +1 (binding) > > On Tue, Dec 3, 2019 at 12:54 PM Wes McKinney wrote: > > > > hello, > > > > We have been discussing the creation of a minimalist C-based data > > interface for applications to exchange Arrow columnar data structures > > with each other. Some notable features of this interface include: > > > > * A small amount of header-only C code can be copied into downstream > > applications, no external dependencies are needed (notable, it is not > > required to use Flatbuffers, though there are trade-offs resulting > > from this) > > * Low development investment (in other words: limited-scope use cases > > can be accomplished with little code). Enable C libraries to export > > Arrow columnar data at C call sites with minimal code > > > > This "C Data Interface" serves different use cases from the > > language-independent IPC protocol and trades away a number of features > > (such as forward/backward compatibility) in the interest of minimalism > > / simplicity. It is not a replacement for the IPC protocol and will > > only be used to interchange in-process data at C call sites. > > > > The PR providing the specification is here > > > > https://github.com/apache/arrow/pull/5442 > > > > A fairly comprehensive C++ implementation of this demonstrating its > > use is found here > > > > https://github.com/apache/arrow/pull/5608 > > > > (note that other applications implementing the interface may choose to > > only support a few features and thus have far less code to write) > > > > Please vote to adopt the SPECIFICATION (GitHub PR #5442). > > > > This vote will be open for at least 72 hours > > > > [ ] +1 Adopt C Data Interface specification > > [ ] +0 > > [ ] -1 Do not adopt because... > > > > Thank you >
Re: [VOTE] Adopt Arrow in-process C Data Interface specification
+1 (binding) On Tue, Dec 3, 2019 at 12:54 PM Wes McKinney wrote: > > hello, > > We have been discussing the creation of a minimalist C-based data > interface for applications to exchange Arrow columnar data structures > with each other. Some notable features of this interface include: > > * A small amount of header-only C code can be copied into downstream > applications, no external dependencies are needed (notable, it is not > required to use Flatbuffers, though there are trade-offs resulting > from this) > * Low development investment (in other words: limited-scope use cases > can be accomplished with little code). Enable C libraries to export > Arrow columnar data at C call sites with minimal code > > This "C Data Interface" serves different use cases from the > language-independent IPC protocol and trades away a number of features > (such as forward/backward compatibility) in the interest of minimalism > / simplicity. It is not a replacement for the IPC protocol and will > only be used to interchange in-process data at C call sites. > > The PR providing the specification is here > > https://github.com/apache/arrow/pull/5442 > > A fairly comprehensive C++ implementation of this demonstrating its > use is found here > > https://github.com/apache/arrow/pull/5608 > > (note that other applications implementing the interface may choose to > only support a few features and thus have far less code to write) > > Please vote to adopt the SPECIFICATION (GitHub PR #5442). > > This vote will be open for at least 72 hours > > [ ] +1 Adopt C Data Interface specification > [ ] +0 > [ ] -1 Do not adopt because... > > Thank you
[VOTE] Adopt Arrow in-process C Data Interface specification
hello, We have been discussing the creation of a minimalist C-based data interface for applications to exchange Arrow columnar data structures with each other. Some notable features of this interface include: * A small amount of header-only C code can be copied into downstream applications, no external dependencies are needed (notable, it is not required to use Flatbuffers, though there are trade-offs resulting from this) * Low development investment (in other words: limited-scope use cases can be accomplished with little code). Enable C libraries to export Arrow columnar data at C call sites with minimal code This "C Data Interface" serves different use cases from the language-independent IPC protocol and trades away a number of features (such as forward/backward compatibility) in the interest of minimalism / simplicity. It is not a replacement for the IPC protocol and will only be used to interchange in-process data at C call sites. The PR providing the specification is here https://github.com/apache/arrow/pull/5442 A fairly comprehensive C++ implementation of this demonstrating its use is found here https://github.com/apache/arrow/pull/5608 (note that other applications implementing the interface may choose to only support a few features and thus have far less code to write) Please vote to adopt the SPECIFICATION (GitHub PR #5442). This vote will be open for at least 72 hours [ ] +1 Adopt C Data Interface specification [ ] +0 [ ] -1 Do not adopt because... Thank you