As a ParquetSharp user, I think this is a great idea. I agree with Kou that
the best user experience includes distributing the native parts, but I
believe these can be separated out into individual "runtime" NuGet packages
and referenced in a way that only the required ones are downloaded --
see  NuGet
Gallery | runtime.native.System.Security.Cryptography.OpenSsl 4.3.3
<https://www.nuget.org/packages/runtime.native.System.Security.Cryptography.OpenSsl/#dependencies-body-tab>
for
an example of this.

On Tue, May 7, 2024 at 1:58 PM Adam Reeve <adre...@gmail.com> wrote:

> Hi Kou, thanks for your insight
>
> > If we have many development resources for the C# bindings,
> > it may be better that we implement the C++ bindings directly
> > like PyArrow does. If we doesn't, it may be better that we
> > use Arrow GLib to combine development resources with
> > GLib/Ruby developers like me.
>
> I think it's fair to say there isn't a lot of developer time dedicated
> to the C# library and bindings, but I can see there being demand for
> bindings to the full dataset and compute APIs at least, so from that
> perspective it sounds like using the GLib libraries would make sense.
>
> > We may want to publish a NuGet package that includes Arrow
> > GLib libraries like ParquetSharp includes
> > ParquetSharpNative.* that are liked to Arrow/Parquet C++
> > statically.
>
> Good point, that would definitely help simplify things for end users.
>
> > We may want to create a C# library in addition of auto
> > generated codes based on GObject Introspection. It's an
> > approach used by Ruby. The auto generated codes may be
> > difficult to use from C#.
>
> Right, yes this is similar to what I meant by not publicly exposing
> the GLib.GObject based classes, although we could do something closer
> to this where we make the GObject classes public but in a separate
> namespace, and provide a cleaner API built on top of the generated
> code but allow users to access the lower level GObject API if needed.
>
> > > I was worried about whether it's possible to use GObject to implement
> > > bindings for some of the more complex parts of the Dataset API, like
> > > providing a .NET implementation of a KmsClientFactory, which would be
> > > required for reading encrypted Parquet data.
> >
> > We can use GObject for the case as you did. I can open a PR
> > for it or I can review your implementation. (If you open a
> > PR of your work.)
>
> The code I have is more like a prototype of a simplified version of
> the KMS API, so it's not useful as is, but I'll look into expanding
> this to implement the full API and make a PR.
>
> Cheers,
> Adam
>
> On Tue, 7 May 2024 at 20:11, Sutou Kouhei <k...@clear-code.com> wrote:
> >
> > Hi,
> >
> > I'm the author of Arrow GLib.
> >
> > I agree with Pros/Cons you summarized.
> >
> > If we have many development resources for the C# bindings,
> > it may be better that we implement the C++ bindings directly
> > like PyArrow does. If we doesn't, it may be better that we
> > use Arrow GLib to combine development resources with
> > GLib/Ruby developers like me.
> >
> > If we don't have many development resources for the C#
> > bindings but we don't need many bindings, it may be better
> > that we implement the C++ bindings directly.
> >
> > > * There's no need to distribute a native binary with NuGet packages,
> > > and NuGet packages aren't bloated by builds for architectures that
> > > aren't used
> >
> > > * Users need to separately install the Arrow GLib libraries in order
> > > to use some Arrow NuGet packages, and this might complicate build and
> > > deployment processes compared to just adding a NuGet package reference
> > > to a project
> >
> > We may want to publish a NuGet package that includes Arrow
> > GLib libraries like ParquetSharp includes
> > ParquetSharpNative.* that are liked to Arrow/Parquet C++
> > statically.
> >
> >
> > We may want to create a C# library in addition of auto
> > generated codes based on GObject Introspection. It's an
> > approach used by Ruby. The auto generated codes may be
> > difficult to use from C#.
> >
> > For example, both of the following Ruby codes read a table:
> >
> > # With a Ruby library
> > table = Arrow::Table.load("data.arrow")
> >
> > # Without a Ruby library (Use only auto generated API)
> > input = Arrow::memoryMappedInputStream.new("data.arrow")
> > reader = Arrow::RecordBatchFileReader.new(input)
> > table = reader.read_all
> >
> >
> > > I was worried about whether it's possible to use GObject to implement
> > > bindings for some of the more complex parts of the Dataset API, like
> > > providing a .NET implementation of a KmsClientFactory, which would be
> > > required for reading encrypted Parquet data.
> >
> > We can use GObject for the case as you did. I can open a PR
> > for it or I can review your implementation. (If you open a
> > PR of your work.)
> >
> >
> >
> > Thanks,
> > --
> > kou
> >
> > In <cagzxcpddxnkez7jyokrfgygijlnqow7ikch1+5fmy1c_hzi...@mail.gmail.com>
> >   "[DISCUSS][C#][GLib] Formalize use of the GLib libraries for native
> library bindings" on Tue, 7 May 2024 12:32:40 +1200,
> >   Adam Reeve <adre...@gmail.com> wrote:
> >
> > > Hi everyone,
> > >
> > > The .NET/C# Apache Arrow library currently only contains managed code,
> > > but the addition of the C Data Interface implementation opens up the
> > > ability to easily add bindings to the C++ Arrow library to add more
> > > capabilities. There is currently a draft PR open to add bindings to
> > > the Acero library for example [1], and I'm interested in adding .NET
> > > bindings to the dataset library.
> > >
> > > The Acero bindings PR uses the Arrow GLib library, but I couldn't find
> > > any official guidance on whether this is the recommended approach for
> > > adding new native library bindings. As far as I can tell the GLib
> > > libraries are currently only used for the Ruby Arrow library, and can
> > > be used via GObject introspection by other languages like Lua. So I'd
> > > like to start a discussion to see if there's consensus on whether
> > > using the GLib libraries should be the standard way to add new native
> > > library bindings for .NET. Standardising on one way of wrapping the
> > > C++ libraries in .NET would help keep things simpler for both users
> > > and developers.
> > >
> > > For context, I'm a member of the open-source team at G-Research and a
> > > maintainer of ParquetSharp, a .NET library that wraps the Arrow C++
> > > Parquet library. In ParquetSharp, we build our own native library with
> > > a C ABI that uses the C++ Arrow library from vcpkg internally, and
> > > bundle pre-built native libraries inside the ParquetSharp Nuget
> > > package for each OS and architecture combination supported.
> > >
> > > My thoughts on the advantages and disadvantages of using GLib over a
> > > custom native wrapper library are:
> > > Pros
> > > * We can use the existing GLib Arrow libraries rather than having to
> > > write custom C wrappers, and any improvements made there to support
> > > .NET can also benefit users of other languages, and vice versa
> > > (although this would only be Ruby and .NET initially, and anyone using
> > > the library directly via GObject introspection)
> > > * We can take advantage the tooling built around GLib/GObject to avoid
> > > needing to implement a lot of boilerplate binding code manually. For
> > > example, we could use the GapiCodegen tool from GtkSharp [2] to help
> > > generate binding code
> > > * There's no need to distribute a native binary with NuGet packages,
> > > and NuGet packages aren't bloated by builds for architectures that
> > > aren't used
> > > Cons
> > > * Users need to separately install the Arrow GLib libraries in order
> > > to use some Arrow NuGet packages, and this might complicate build and
> > > deployment processes compared to just adding a NuGet package reference
> > > to a project
> > > * GLib code can be a lot more complicated than plain C binding code
> > > that is only going to be consumed by .NET
> > > * Automatically generating .NET bindings for GObject libraries is not
> > > as well supported as for some other languages/runtimes
> > >     * As far as I can tell it's expected that most .NET GLib library
> > > bindings live inside one of the many forks of GtkSharp so all of the
> > > tooling is internal to these repositories rather than being
> > > distributed as standalone tools designed to be used by other projects
> > >     * You can manually write code to use a GLib library, as in the
> > > Acero C# PR, but for more complex APIs I think it would make sense to
> > > take advantage of the automated tooling available
> > >
> > > I was worried about whether it's possible to use GObject to implement
> > > bindings for some of the more complex parts of the Dataset API, like
> > > providing a .NET implementation of a KmsClientFactory, which would be
> > > required for reading encrypted Parquet data. I recently added bindings
> > > for this to ParquetSharp [3], so thought it would be a good test case
> > > to try to implement something similar with GObject. Following the GTK
> > > interface docs [4] and GtkSharp interface binding docs [5], and using
> > > the GapiCodegen library, I was able to implement something like a
> > > KmsClientFactory in C# and use it from GObject code in a C library, so
> > > it doesn't look like using GObject would be too limiting. It did take
> > > me a while to get this working though and I had a few missteps along
> > > the way, like trying to get gapi-parser working before giving up and
> > > writing an API XML file manually.
> > >
> > > I do think that if we use GapiCodegen we might want to avoid publicly
> > > exposing classes that inherit from GLib.Object in order to keep the
> > > API simple and provide more flexibility to change things in backwards
> > > compatible ways as the library evolves.
> > >
> > > Does anyone have any opinions or thoughts on this?
> > >
> > > Thanks,
> > > Adam
> > >
> > > [1] https://github.com/apache/arrow/pull/37544
> > > [2] https://github.com/GtkSharp/GtkSharp
> > > [3] https://github.com/G-Research/ParquetSharp/pull/426
> > > [4]
> https://docs.gtk.org/gobject/tutorial.html#how-to-define-and-implement-interfaces
> > > [5]
> https://www.mono-project.com/docs/gui/gtksharp/implementing-ginterfaces/
>

Reply via email to