Hi,

I'm the author of Arrow GLib.

I agree with Pros/Cons you summarized.

If we have many development resources for the C# bindings,
it may be better that we implement the C++ bindings directly
like PyArrow does. If we doesn't, it may be better that we
use Arrow GLib to combine development resources with
GLib/Ruby developers like me.

If we don't have many development resources for the C#
bindings but we don't need many bindings, it may be better
that we implement the C++ bindings directly.

> * There's no need to distribute a native binary with NuGet packages,
> and NuGet packages aren't bloated by builds for architectures that
> aren't used

> * Users need to separately install the Arrow GLib libraries in order
> to use some Arrow NuGet packages, and this might complicate build and
> deployment processes compared to just adding a NuGet package reference
> to a project

We may want to publish a NuGet package that includes Arrow
GLib libraries like ParquetSharp includes
ParquetSharpNative.* that are liked to Arrow/Parquet C++
statically.


We may want to create a C# library in addition of auto
generated codes based on GObject Introspection. It's an
approach used by Ruby. The auto generated codes may be
difficult to use from C#.

For example, both of the following Ruby codes read a table:

# With a Ruby library
table = Arrow::Table.load("data.arrow")

# Without a Ruby library (Use only auto generated API)
input = Arrow::memoryMappedInputStream.new("data.arrow")
reader = Arrow::RecordBatchFileReader.new(input)
table = reader.read_all


> I was worried about whether it's possible to use GObject to implement
> bindings for some of the more complex parts of the Dataset API, like
> providing a .NET implementation of a KmsClientFactory, which would be
> required for reading encrypted Parquet data.

We can use GObject for the case as you did. I can open a PR
for it or I can review your implementation. (If you open a
PR of your work.)



Thanks,
-- 
kou

In <cagzxcpddxnkez7jyokrfgygijlnqow7ikch1+5fmy1c_hzi...@mail.gmail.com>
  "[DISCUSS][C#][GLib] Formalize use of the GLib libraries for native library 
bindings" on Tue, 7 May 2024 12:32:40 +1200,
  Adam Reeve <adre...@gmail.com> wrote:

> Hi everyone,
> 
> The .NET/C# Apache Arrow library currently only contains managed code,
> but the addition of the C Data Interface implementation opens up the
> ability to easily add bindings to the C++ Arrow library to add more
> capabilities. There is currently a draft PR open to add bindings to
> the Acero library for example [1], and I'm interested in adding .NET
> bindings to the dataset library.
> 
> The Acero bindings PR uses the Arrow GLib library, but I couldn't find
> any official guidance on whether this is the recommended approach for
> adding new native library bindings. As far as I can tell the GLib
> libraries are currently only used for the Ruby Arrow library, and can
> be used via GObject introspection by other languages like Lua. So I'd
> like to start a discussion to see if there's consensus on whether
> using the GLib libraries should be the standard way to add new native
> library bindings for .NET. Standardising on one way of wrapping the
> C++ libraries in .NET would help keep things simpler for both users
> and developers.
> 
> For context, I'm a member of the open-source team at G-Research and a
> maintainer of ParquetSharp, a .NET library that wraps the Arrow C++
> Parquet library. In ParquetSharp, we build our own native library with
> a C ABI that uses the C++ Arrow library from vcpkg internally, and
> bundle pre-built native libraries inside the ParquetSharp Nuget
> package for each OS and architecture combination supported.
> 
> My thoughts on the advantages and disadvantages of using GLib over a
> custom native wrapper library are:
> Pros
> * We can use the existing GLib Arrow libraries rather than having to
> write custom C wrappers, and any improvements made there to support
> .NET can also benefit users of other languages, and vice versa
> (although this would only be Ruby and .NET initially, and anyone using
> the library directly via GObject introspection)
> * We can take advantage the tooling built around GLib/GObject to avoid
> needing to implement a lot of boilerplate binding code manually. For
> example, we could use the GapiCodegen tool from GtkSharp [2] to help
> generate binding code
> * There's no need to distribute a native binary with NuGet packages,
> and NuGet packages aren't bloated by builds for architectures that
> aren't used
> Cons
> * Users need to separately install the Arrow GLib libraries in order
> to use some Arrow NuGet packages, and this might complicate build and
> deployment processes compared to just adding a NuGet package reference
> to a project
> * GLib code can be a lot more complicated than plain C binding code
> that is only going to be consumed by .NET
> * Automatically generating .NET bindings for GObject libraries is not
> as well supported as for some other languages/runtimes
>     * As far as I can tell it's expected that most .NET GLib library
> bindings live inside one of the many forks of GtkSharp so all of the
> tooling is internal to these repositories rather than being
> distributed as standalone tools designed to be used by other projects
>     * You can manually write code to use a GLib library, as in the
> Acero C# PR, but for more complex APIs I think it would make sense to
> take advantage of the automated tooling available
> 
> I was worried about whether it's possible to use GObject to implement
> bindings for some of the more complex parts of the Dataset API, like
> providing a .NET implementation of a KmsClientFactory, which would be
> required for reading encrypted Parquet data. I recently added bindings
> for this to ParquetSharp [3], so thought it would be a good test case
> to try to implement something similar with GObject. Following the GTK
> interface docs [4] and GtkSharp interface binding docs [5], and using
> the GapiCodegen library, I was able to implement something like a
> KmsClientFactory in C# and use it from GObject code in a C library, so
> it doesn't look like using GObject would be too limiting. It did take
> me a while to get this working though and I had a few missteps along
> the way, like trying to get gapi-parser working before giving up and
> writing an API XML file manually.
> 
> I do think that if we use GapiCodegen we might want to avoid publicly
> exposing classes that inherit from GLib.Object in order to keep the
> API simple and provide more flexibility to change things in backwards
> compatible ways as the library evolves.
> 
> Does anyone have any opinions or thoughts on this?
> 
> Thanks,
> Adam
> 
> [1] https://github.com/apache/arrow/pull/37544
> [2] https://github.com/GtkSharp/GtkSharp
> [3] https://github.com/G-Research/ParquetSharp/pull/426
> [4] 
> https://docs.gtk.org/gobject/tutorial.html#how-to-define-and-implement-interfaces
> [5] https://www.mono-project.com/docs/gui/gtksharp/implementing-ginterfaces/

Reply via email to