While it's unfortunate to have to re-examine some basic design issues at this stage, I agree with Jacques's point that it would be nice if we can accommodate (without great hardship) the use case where a stream/pipeline of record batches are passed in C that does not require the called function to have to parse or validate the schema each time. Gandiva uses its own data structure [1] for passing a schemaless record batch across JNI and in theory this could be replaced by the C data structure
[1]: https://github.com/apache/arrow/blob/master/cpp/src/gandiva/eval_batch.h On Sun, Dec 8, 2019 at 8:09 PM Fan Liya <liya.fa...@gmail.com> wrote: > > +1, as this is useful IMO. > > Best, > Liya Fan > > On Sat, Dec 7, 2019 at 12:21 PM Jacques Nadeau <jacq...@apache.org> wrote: > > > -1 (binding) > > > > I'm voting -1 on this. I posted the thinking why on the PR. The high-level > > is that I think it needs to better address the pipelined use case as right > > now it fails to support that at all and has too much weight to ignore that > > use case. > > > > I actually would have posted it here but totally missed this vote thread > > until just now (I'm traveling atm). My -1 is not an indefinite -1, I'm > > simply asking for some small changes to the approach to also support the > > pipelined usage pattern. > > > > On Sat, Dec 7, 2019 at 3:09 AM Wes McKinney <wesmck...@gmail.com> wrote: > > > > > Hello, > > > > > > Could more PMC members take a look at this work? > > > > > > Thank you > > > > > > On Tue, Dec 3, 2019 at 1:50 PM Neal Richardson > > > <neal.p.richard...@gmail.com> wrote: > > > > > > > > +1 (non-binding) > > > > > > > > On Tue, Dec 3, 2019 at 10:56 AM Wes McKinney <wesmck...@gmail.com> > > > wrote: > > > > > > > > > +1 (binding) > > > > > > > > > > On Tue, Dec 3, 2019 at 12:54 PM Wes McKinney <wesmck...@gmail.com> > > > wrote: > > > > > > > > > > > > hello, > > > > > > > > > > > > We have been discussing the creation of a minimalist C-based data > > > > > > interface for applications to exchange Arrow columnar data > > structures > > > > > > with each other. Some notable features of this interface include: > > > > > > > > > > > > * A small amount of header-only C code can be copied into > > downstream > > > > > > applications, no external dependencies are needed (notable, it is > > not > > > > > > required to use Flatbuffers, though there are trade-offs resulting > > > > > > from this) > > > > > > * Low development investment (in other words: limited-scope use > > cases > > > > > > can be accomplished with little code). Enable C libraries to export > > > > > > Arrow columnar data at C call sites with minimal code > > > > > > > > > > > > This "C Data Interface" serves different use cases from the > > > > > > language-independent IPC protocol and trades away a number of > > > features > > > > > > (such as forward/backward compatibility) in the interest of > > > minimalism > > > > > > / simplicity. It is not a replacement for the IPC protocol and will > > > > > > only be used to interchange in-process data at C call sites. > > > > > > > > > > > > The PR providing the specification is here > > > > > > > > > > > > https://github.com/apache/arrow/pull/5442 > > > > > > > > > > > > A fairly comprehensive C++ implementation of this demonstrating its > > > > > > use is found here > > > > > > > > > > > > https://github.com/apache/arrow/pull/5608 > > > > > > > > > > > > (note that other applications implementing the interface may choose > > > to > > > > > > only support a few features and thus have far less code to write) > > > > > > > > > > > > Please vote to adopt the SPECIFICATION (GitHub PR #5442). > > > > > > > > > > > > This vote will be open for at least 72 hours > > > > > > > > > > > > [ ] +1 Adopt C Data Interface specification > > > > > > [ ] +0 > > > > > > [ ] -1 Do not adopt because... > > > > > > > > > > > > Thank you > > > > > > > > > >