I'm also supportive of having a small vendorable C/C++ "Arrow
middleware" that provides:

* Schemas and types
* Columnar data structures and minimal APIs to build them and iterate over them
* C data interface
* Minimal validation (at the level of Validate but not ValidateFull)

I don't think it's going to be practical to try to refactor parts of
the existing Arrow C++ core to be vendorable since there are many
features / requirements (e.g. an extensible buffer and device API)
that these C++ classes include that aren't needed in this
limited-feature middleware library.

This also relates to the "Improving Arrow's database support" project
that David Li raised some time ago [1]. If we want to encourage
database driver libraries to add new APIs that emit the Arrow C
interface, we need to make it easier to generate the C interface
without requiring a new library dependency.

[1]: https://lists.apache.org/thread/gnz1kz2rj3rb8rh8qz7l0mv8lvzq254w

On Mon, May 30, 2022 at 11:31 AM Jonathan Keane <jke...@gmail.com> wrote:
>
> Thanks for working on this. I've heard people asking about something
> like this from a number of different fronts on top of the obvious use
> case in geoarrow | other geospatial libraries. I think a minimal piece
> of Arrow that other packages could depend on without needing to bring
> in all of arrow would be super valuable in building the bridges we
> want across other systems.
>
> Do you have any (design) documentation that describes the scope of
> what you're thinking? I know there have been others floating around
> [1] [2] that were in a similar spirit.
>
> A few more questions I hope will spark more conversation: How do the
> header files you linked in [3] overlap with these other efforts? Are
> those headers something we could|should "just" PR into apache/arrow
> and write up how to use them? If not what is the work to make them so
> that they could be (the answer of course could be design something
> else entirely and PR that!)?
>
> [1] https://github.com/paleolimbot/narrow
> [2] https://paleolimbot.github.io/narrow/articles/why-narrow.html
> [3] 
> https://github.com/paleolimbot/geoarrow-cpp/tree/main/src/geoarrow/internal/arrow-hpp
>
> -Jon
>
> -Jon
>
>
> On Wed, May 25, 2022 at 9:29 AM Dewey Dunnington <de...@voltrondata.com> 
> wrote:
> >
> > I'm writing to gauge interest in a set of helpers in C and/or C++ for
> > reading/exporting Arrow C Data interface structures. My use-case is
> > building Arrow geospatial support in R [1], and while the set of helpers
> > I've been using [2] has served the purpose of me writing about the
> > opportunities for Arrow + geospatial [3], I would like to rewrite the
> > prototype based on something developed by/with the Arrow community.
> >
> > Does a set of C/C++ helpers for Arrow C Data interface structures already
> > exist? *Should* it exist?
> >
> > If it doesn't, what should the name/scope of that library be? The names
> > 'nanoarrow', 'narrow', 'sparrow', and 'arrow-hpp' have all surfaced in my
> > limited discussion of this so far. For the purpose of starting the
> > discussion, I'll posit that the library should include helpers to
> > allocate/destroy C Data interface structures, a schema metadata
> > encoder/decoder, validation of a schema/array pair, and something like the
> > ArrayBuilder C++ class.
> >
> > [1] https://lists.apache.org/thread/yb7p9wpg3k128njskhwj9j788opb67g7
> > [2]
> > https://github.com/paleolimbot/geoarrow-cpp/tree/main/src/geoarrow/internal/arrow-hpp
> > [3]
> > https://docs.google.com/document/d/1A6e3XCerjhXVFHBDaoAlBBNFb2HG4RB9SVRpuBru7E4/edit?usp=sharing

Reply via email to