Hi all,

Thanks for all the feedback so far! I've opened up two more draft PRs
implementing [1] an API for owning buffers (precursor to creating struct
ArrowArrays) and [2] an API for creating ArrowSchema objects for all Arrow
types. All comments welcome!

-dewey

[1] https://github.com/paleolimbot/nanoarrow/pull/9
[2] https://github.com/paleolimbot/nanoarrow/pull/10

On Wed, Jun 15, 2022 at 12:18 AM Dewey Dunnington <de...@voltrondata.com>
wrote:

> Hi all,
>
> I drafted a second PR [1] drafting a design for storing parsed information
> obtained from a struct ArrowSchema (i.e., parsing the format string into
> usable C structures). There are some unsolved problems that could use a
> fresh perspective...all comments welcome!
>
> [1] https://github.com/paleolimbot/arrow-c/pull/5
>
> On Fri, Jun 10, 2022 at 12:27 PM Dewey Dunnington <de...@voltrondata.com>
> wrote:
>
>> Hi all,
>>
>> As promised, I converted the design document [1] into an initial PR [2].
>> Rather than draft the whole header, I started with README + implementations
>> + testing for error handling and schema allocation (depending on feedback,
>> next week I will draft another reviewable chunk).
>>
>> Also feel free to suggest another place to put this if one exists (the
>> choice to put it in its own repo was based on informal feedback that
>> perhaps that might be the best way to go).
>>
>> [1]
>> https://docs.google.com/document/d/11n7ICVZO8exZ-z3GRlI26VLzKPXlYlEz5xjLl1y0ujU/edit?usp=sharing
>> [2] https://github.com/paleolimbot/arrow-c/pull/1/files
>>
>> On Fri, Jun 3, 2022 at 12:41 PM Dewey Dunnington <de...@voltrondata.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> Based on the points raised above and a few adventures implementing some
>>> of this in related projects, I put together a brief design document
>>> proposing a scope and structure to perhaps solidify a few of these
>>> discussions:
>>> https://docs.google.com/document/d/11n7ICVZO8exZ-z3GRlI26VLzKPXlYlEz5xjLl1y0ujU/edit?usp=sharing
>>> .
>>>
>>> Any and all should feel free to add, rewrite, or propose a new
>>> structure...I wrote many of the pieces for argument's sake or because
>>> that's how I'd implemented them before.
>>>
>>> Next week I will phrase it as a skeleton header (like the one in the
>>> excellent ADBC design discussions) depending on feedback to keep the
>>> discussion going!
>>>
>>> Cheers,
>>>
>>> -dewey
>>>
>>> On Fri, Jun 3, 2022 at 9:57 AM Hannes Mühleisen <han...@duckdblabs.com>
>>> wrote:
>>>
>>>> Hello List,
>>>>
>>>> we at DuckDB are happy users of the Arrow C Data Interface and use it to
>>>> feed SQL queries and also use it to provide query results in Arrow
>>>> format
>>>> again. It is particularly appealing to us that the interface is merely a
>>>> (C) header file that we just ship with our source code [1]. Internally,
>>>> our
>>>> implementation then constructs DuckDB internal vectors from the Arrow
>>>> format [2] or vice-versa [3].
>>>>
>>>> As you can see from [2, 3] there is some complexity in getting the
>>>> conversion right, especially for more complex data types like nested
>>>> types
>>>> (list, strings). A lightweight, dependency-free library to help
>>>> constructing those would certainly be appreciated. What would also help
>>>> a
>>>> lot is validation code, Arrow structures are very delicate and one wrong
>>>> pointer can lead to disaster (which is then blamed on us), so a way to
>>>> verify the structures in said lightweight library would be very helpful.
>>>>
>>>> Best from Amsterdam, and Quack
>>>>
>>>> Hannes
>>>>
>>>> [1]
>>>>
>>>> https://github.com/duckdb/duckdb/blob/master/src/include/duckdb/common/arrow.hpp
>>>> [2]
>>>>
>>>> https://github.com/duckdb/duckdb/blob/master/src/function/table/arrow.cpp
>>>> [3]
>>>>
>>>> https://github.com/duckdb/duckdb/blob/master/src/common/types/data_chunk.cpp
>>>>
>>>>
>>>> On Fri, Jun 03, 2022 at 15:34:42, Jonathan Keane <jke...@gmail.com>
>>>> wrote:
>>>>
>>>> > cc Hannes Mühleisen from DuckDB Labs
>>>> >
>>>> > -Jon
>>>> >
>>>> >
>>>> > On Tue, May 31, 2022 at 5:03 PM Wes McKinney <wesmck...@gmail.com>
>>>> wrote:
>>>> >
>>>> > I'm also supportive of having a small vendorable C/C++ "Arrow
>>>> > middleware" that provides:
>>>> >
>>>> > * Schemas and types
>>>> > * Columnar data structures and minimal APIs to build them and iterate
>>>> over
>>>> > them
>>>> > * C data interface
>>>> > * Minimal validation (at the level of Validate but not ValidateFull)
>>>> >
>>>> > I don't think it's going to be practical to try to refactor parts of
>>>> > the existing Arrow C++ core to be vendorable since there are many
>>>> > features / requirements (e.g. an extensible buffer and device API)
>>>> > that these C++ classes include that aren't needed in this
>>>> > limited-feature middleware library.
>>>> >
>>>> > This also relates to the "Improving Arrow's database support" project
>>>> > that David Li raised some time ago [1]. If we want to encourage
>>>> > database driver libraries to add new APIs that emit the Arrow C
>>>> > interface, we need to make it easier to generate the C interface
>>>> > without requiring a new library dependency.
>>>> >
>>>> > [1]: https://lists.apache.org/thread/gnz1kz2rj3rb8rh8qz7l0mv8lvzq254w
>>>> >
>>>> > On Mon, May 30, 2022 at 11:31 AM Jonathan Keane <jke...@gmail.com>
>>>> wrote:
>>>> > >
>>>> > > Thanks for working on this. I've heard people asking about something
>>>> > > like this from a number of different fronts on top of the obvious
>>>> use
>>>> > > case in geoarrow | other geospatial libraries. I think a minimal
>>>> piece
>>>> > > of Arrow that other packages could depend on without needing to
>>>> bring
>>>> > > in all of arrow would be super valuable in building the bridges we
>>>> > > want across other systems.
>>>> > >
>>>> > > Do you have any (design) documentation that describes the scope of
>>>> > > what you're thinking? I know there have been others floating around
>>>> > > [1] [2] that were in a similar spirit.
>>>> > >
>>>> > > A few more questions I hope will spark more conversation: How do the
>>>> > > header files you linked in [3] overlap with these other efforts? Are
>>>> > > those headers something we could|should "just" PR into apache/arrow
>>>> > > and write up how to use them? If not what is the work to make them
>>>> so
>>>> > > that they could be (the answer of course could be design something
>>>> > > else entirely and PR that!)?
>>>> > >
>>>> > > [1] https://github.com/paleolimbot/narrow
>>>> > > [2] https://paleolimbot.github.io/narrow/articles/why-narrow.html
>>>> > > [3]
>>>> https://github.com/paleolimbot/geoarrow-cpp/tree/main/src/geoarrow/
>>>> > internal/arrow-hpp
>>>> > >
>>>> > > -Jon
>>>> > >
>>>> > > -Jon
>>>> > >
>>>> > >
>>>> > > On Wed, May 25, 2022 at 9:29 AM Dewey Dunnington <
>>>> de...@voltrondata.com>
>>>> > wrote:
>>>> > > >
>>>> > > > I'm writing to gauge interest in a set of helpers in C and/or C++
>>>> for
>>>> > > > reading/exporting Arrow C Data interface structures. My use-case
>>>> is
>>>> > > > building Arrow geospatial support in R [1], and while the set of
>>>> > helpers
>>>> > > > I've been using [2] has served the purpose of me writing about the
>>>> > > > opportunities for Arrow + geospatial [3], I would like to rewrite
>>>> the
>>>> > > > prototype based on something developed by/with the Arrow
>>>> community.
>>>> > > >
>>>> > > > Does a set of C/C++ helpers for Arrow C Data interface structures
>>>> > already
>>>> > > > exist? *Should* it exist?
>>>> > > >
>>>> > > > If it doesn't, what should the name/scope of that library be? The
>>>> names
>>>> > > > 'nanoarrow', 'narrow', 'sparrow', and 'arrow-hpp' have all
>>>> surfaced in
>>>> > my
>>>> > > > limited discussion of this so far. For the purpose of starting the
>>>> > > > discussion, I'll posit that the library should include helpers to
>>>> > > > allocate/destroy C Data interface structures, a schema metadata
>>>> > > > encoder/decoder, validation of a schema/array pair, and something
>>>> like
>>>> > the
>>>> > > > ArrayBuilder C++ class.
>>>> > > >
>>>> > > > [1]
>>>> https://lists.apache.org/thread/yb7p9wpg3k128njskhwj9j788opb67g7
>>>> > > > [2]
>>>> > > >
>>>> https://github.com/paleolimbot/geoarrow-cpp/tree/main/src/geoarrow/
>>>> > internal/arrow-hpp
>>>> > > > [3]
>>>> > > > https://docs.google.com/document/d/
>>>> > 1A6e3XCerjhXVFHBDaoAlBBNFb2HG4RB9SVRpuBru7E4/edit?usp=sharing
>>>> >
>>>> >
>>>>
>>>

Reply via email to