Hi, I'm the author of Arrow GLib.
I agree with Pros/Cons you summarized. If we have many development resources for the C# bindings, it may be better that we implement the C++ bindings directly like PyArrow does. If we doesn't, it may be better that we use Arrow GLib to combine development resources with GLib/Ruby developers like me. If we don't have many development resources for the C# bindings but we don't need many bindings, it may be better that we implement the C++ bindings directly. > * There's no need to distribute a native binary with NuGet packages, > and NuGet packages aren't bloated by builds for architectures that > aren't used > * Users need to separately install the Arrow GLib libraries in order > to use some Arrow NuGet packages, and this might complicate build and > deployment processes compared to just adding a NuGet package reference > to a project We may want to publish a NuGet package that includes Arrow GLib libraries like ParquetSharp includes ParquetSharpNative.* that are liked to Arrow/Parquet C++ statically. We may want to create a C# library in addition of auto generated codes based on GObject Introspection. It's an approach used by Ruby. The auto generated codes may be difficult to use from C#. For example, both of the following Ruby codes read a table: # With a Ruby library table = Arrow::Table.load("data.arrow") # Without a Ruby library (Use only auto generated API) input = Arrow::memoryMappedInputStream.new("data.arrow") reader = Arrow::RecordBatchFileReader.new(input) table = reader.read_all > I was worried about whether it's possible to use GObject to implement > bindings for some of the more complex parts of the Dataset API, like > providing a .NET implementation of a KmsClientFactory, which would be > required for reading encrypted Parquet data. We can use GObject for the case as you did. I can open a PR for it or I can review your implementation. (If you open a PR of your work.) Thanks, -- kou In <cagzxcpddxnkez7jyokrfgygijlnqow7ikch1+5fmy1c_hzi...@mail.gmail.com> "[DISCUSS][C#][GLib] Formalize use of the GLib libraries for native library bindings" on Tue, 7 May 2024 12:32:40 +1200, Adam Reeve <adre...@gmail.com> wrote: > Hi everyone, > > The .NET/C# Apache Arrow library currently only contains managed code, > but the addition of the C Data Interface implementation opens up the > ability to easily add bindings to the C++ Arrow library to add more > capabilities. There is currently a draft PR open to add bindings to > the Acero library for example [1], and I'm interested in adding .NET > bindings to the dataset library. > > The Acero bindings PR uses the Arrow GLib library, but I couldn't find > any official guidance on whether this is the recommended approach for > adding new native library bindings. As far as I can tell the GLib > libraries are currently only used for the Ruby Arrow library, and can > be used via GObject introspection by other languages like Lua. So I'd > like to start a discussion to see if there's consensus on whether > using the GLib libraries should be the standard way to add new native > library bindings for .NET. Standardising on one way of wrapping the > C++ libraries in .NET would help keep things simpler for both users > and developers. > > For context, I'm a member of the open-source team at G-Research and a > maintainer of ParquetSharp, a .NET library that wraps the Arrow C++ > Parquet library. In ParquetSharp, we build our own native library with > a C ABI that uses the C++ Arrow library from vcpkg internally, and > bundle pre-built native libraries inside the ParquetSharp Nuget > package for each OS and architecture combination supported. > > My thoughts on the advantages and disadvantages of using GLib over a > custom native wrapper library are: > Pros > * We can use the existing GLib Arrow libraries rather than having to > write custom C wrappers, and any improvements made there to support > .NET can also benefit users of other languages, and vice versa > (although this would only be Ruby and .NET initially, and anyone using > the library directly via GObject introspection) > * We can take advantage the tooling built around GLib/GObject to avoid > needing to implement a lot of boilerplate binding code manually. For > example, we could use the GapiCodegen tool from GtkSharp [2] to help > generate binding code > * There's no need to distribute a native binary with NuGet packages, > and NuGet packages aren't bloated by builds for architectures that > aren't used > Cons > * Users need to separately install the Arrow GLib libraries in order > to use some Arrow NuGet packages, and this might complicate build and > deployment processes compared to just adding a NuGet package reference > to a project > * GLib code can be a lot more complicated than plain C binding code > that is only going to be consumed by .NET > * Automatically generating .NET bindings for GObject libraries is not > as well supported as for some other languages/runtimes > * As far as I can tell it's expected that most .NET GLib library > bindings live inside one of the many forks of GtkSharp so all of the > tooling is internal to these repositories rather than being > distributed as standalone tools designed to be used by other projects > * You can manually write code to use a GLib library, as in the > Acero C# PR, but for more complex APIs I think it would make sense to > take advantage of the automated tooling available > > I was worried about whether it's possible to use GObject to implement > bindings for some of the more complex parts of the Dataset API, like > providing a .NET implementation of a KmsClientFactory, which would be > required for reading encrypted Parquet data. I recently added bindings > for this to ParquetSharp [3], so thought it would be a good test case > to try to implement something similar with GObject. Following the GTK > interface docs [4] and GtkSharp interface binding docs [5], and using > the GapiCodegen library, I was able to implement something like a > KmsClientFactory in C# and use it from GObject code in a C library, so > it doesn't look like using GObject would be too limiting. It did take > me a while to get this working though and I had a few missteps along > the way, like trying to get gapi-parser working before giving up and > writing an API XML file manually. > > I do think that if we use GapiCodegen we might want to avoid publicly > exposing classes that inherit from GLib.Object in order to keep the > API simple and provide more flexibility to change things in backwards > compatible ways as the library evolves. > > Does anyone have any opinions or thoughts on this? > > Thanks, > Adam > > [1] https://github.com/apache/arrow/pull/37544 > [2] https://github.com/GtkSharp/GtkSharp > [3] https://github.com/G-Research/ParquetSharp/pull/426 > [4] > https://docs.gtk.org/gobject/tutorial.html#how-to-define-and-implement-interfaces > [5] https://www.mono-project.com/docs/gui/gtksharp/implementing-ginterfaces/