Re: [EXTERNAL] Re: .NET support for Arrow
Hi Anthony, On 12/07/2020 20:43, anthony.ab...@gmail.com wrote: It appears that fragmentation is already a problem (ie private forks) I should point out that the private fork currently used by my organisation is a "minimal divergence" from the upstream project, and it was my intention from the outset that all our changes would be submitted back. In fact, once I get my current set of PRs past Eric (apologies Eric - I will address your feedback shortly!) and into a release, we will have no need for our private fork any longer. I too would prefer to avoid fragmentation. Without knowing the full details of your use cases for using the Arrow format, the impression I got from your summary so far is that our reasons for diverging from upstream are quite different. If your use cases are sufficiently different that the Arrow .NET library wasn't for you, would you consider raising tickets for the bugs/enhancements/performance issues you encountered that are important to you? Many thanks, -- Adam Szmigin
Re: .NET support for Arrow
Hi Yash, My organisation is using the C# library for a product we are working on. However, we are using a fork which includes a number of bug-fixes for issues that would have otherwise blocked us. I've raised a few PRs to fix these upstream. I think it's fair to say that the C# library is at an early stage of development at the moment. The more people who are able to test and contribute back, the better. Kind regards, -- Adam Szmigin On 10/07/2020 04:05, Yash Ganthe wrote: Hi, The first paragraph of docs at https://arrow.apache.org/ says it supports C#. However there is no library for C# listed anywhere else in the documentation. Is .NET supported at all? Regards, Yash
Re: [DISCUSS] Move JIRA notifications to separate mailing list?
Hi Neal, On 08/06/2020 19:43, Neal Richardson wrote: I've noticed that some other Apache projects have a separate mailing list for JIRA notifications (Spark, for example, has iss...@spark.apache.org). The result is that the dev@ mailing list is focused on actual discussions threads (like this!), votes, and other official business. Would we be interested in doing the same? I have been lazy and not set up any anti-JIRA filters in the few weeks that I have been a member of this mailing list. Deleting JIRA notifications has fast become the most popular activity that my email client sees :-). So from the perspective of a new member of the community, I can see how some might find this a turn-off, and maybe even be dissuaded from participation - obviously not something anyone here would want. I'd certainly support a dedicated list for JIRA notifications. -- Adam Szmigin
[jira] [Created] (ARROW-8886) [C#] Decide and implement appropriate behaviour for Array builder resize to negative size
Adam Szmigin created ARROW-8886: --- Summary: [C#] Decide and implement appropriate behaviour for Array builder resize to negative size Key: ARROW-8886 URL: https://issues.apache.org/jira/browse/ARROW-8886 Project: Apache Arrow Issue Type: Improvement Components: C# Affects Versions: 0.17.1 Reporter: Adam Szmigin h1. Summary Currently, the {{ArrowBuffer.Builder}} class accepts a negative value to the {{Resize()}} method, and treats it as though the caller passed zero. This was implemented deliberately, as there is an explicit unit test to verify the behaviour. However, it is also unusual. By way of comparison: * The {{System.Array.Resize()}} method throws {{ArgumentOutOfRangeException}} if a negative value is passed: https://docs.microsoft.com/en-us/dotnet/api/system.array.resize?view=netcore-3.1 * The Arrow C++ implementation will refuse to accept a negative length: https://github.com/apache/arrow/blob/master/cpp/src/arrow/array/builder_base.h#L194 h1. Acceptance Criteria * The behaviour when receiving a negative length to a {{Resize()}} method _must_ be agreed upon. * Appropriate changes _must_ be made to the codebase in accordance with the outcome of the above agreement. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8788) [C#] Array builders to use bit-packed buffer builder rather than boolean array builder for validity map
Adam Szmigin created ARROW-8788: --- Summary: [C#] Array builders to use bit-packed buffer builder rather than boolean array builder for validity map Key: ARROW-8788 URL: https://issues.apache.org/jira/browse/ARROW-8788 Project: Apache Arrow Issue Type: Improvement Components: C# Affects Versions: 0.17.0 Reporter: Adam Szmigin The C# array builders were recently enhanced to have support for adding nullable values easily, under [PR #7032|https://github.com/apache/arrow/pull/7032]. However, the builders internally referenced {{BooleanArray.Builder}}, which itself then had logic "baked-in" for efficient bit-packing of boolean values into a byte buffer. It would be cleaner for there to be a general-purpose bit-packed buffer builder, and for all array builders to use that for their validity map. The boolean array builder would use it twice: once for values, once for validity. -- This message was sent by Atlassian Jira (v8.3.4#803005)
C# - Appetite for breaking changes to public API?
Dear team, I am keen to work on a number of the tickets relating to the C# implementation for Apache Arrow. Quite a few of the open tickets relate to making breaking changes to the public API (e.g. ARROW-7757, ARROW-8581, likely ARROW-6603 as well). What is the general appetite for making breaking changes to the C# code in its present state? The README.md hints at the C# implementation being alpha-grade at present, so I assume all ok, but I would like to check opinions from the devs before I embark on any PRs. Many thanks, -- Adam Szmigin
[jira] [Created] (ARROW-8581) [C#] Date32/64Array write & read back introduces off-by-one error
Adam Szmigin created ARROW-8581: --- Summary: [C#] Date32/64Array write & read back introduces off-by-one error Key: ARROW-8581 URL: https://issues.apache.org/jira/browse/ARROW-8581 Project: Apache Arrow Issue Type: Bug Components: C# Affects Versions: 0.17.0 Environment: Windows 10 x64 Reporter: Adam Szmigin h1. Summary Writing a Date value using either a {{Date32Array.Builder}} or {{Date64.Builder}} and then reading back the result from the built array introduces an off-by-one error in the value. The following minimal code illustrates: {code:c#} namespace Date32ArrayReadWriteBug { using Apache.Arrow; using Apache.Arrow.Memory; using System;internal static class Program { public static void Main(string[] args) { var allocator = new NativeMemoryAllocator(); var builder = new Date32Array.Builder(); var date = new DateTime(2020, 4, 24); Console.WriteLine($"Appending date {date:-MM-dd}"); builder.Append(date); var array = builder.Build(allocator); var dateAgain = array.GetDate(0); Console.WriteLine($"Read date {dateAgain:-MM-dd}"); } } }{code} h2. Expected Output {noformat} Appending date 2020-04-24 Read date 2020-04-24 {noformat} h2. Actual Output {noformat} Appending date 2020-04-24 Read date 2020-04-23 {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8344) [C#] StringArray.Builder.Clear() corrupts subsequent array contents
Adam Szmigin created ARROW-8344: --- Summary: [C#] StringArray.Builder.Clear() corrupts subsequent array contents Key: ARROW-8344 URL: https://issues.apache.org/jira/browse/ARROW-8344 Project: Apache Arrow Issue Type: Bug Components: C# Affects Versions: 0.16.0 Environment: Windows 10 x64 Reporter: Adam Szmigin h1. Summary Using the {{Clear()}} method on a {{StringArray.Builder}} class causes all subsequent built arrays to contain strings consisting solely of whitespace. The below minimal example illustrates: {code:java} namespace ArrowStringArrayBuilderBug { using Apache.Arrow; using Apache.Arrow.Memory; public class Program { private static readonly NativeMemoryAllocator Allocator = new NativeMemoryAllocator(); public static void Main() { var builder = new StringArray.Builder(); AppendBuildPrint(builder, "Hello", "World"); builder.Clear(); AppendBuildPrint(builder, "Foo", "Bar"); } private static void AppendBuildPrint( StringArray.Builder builder, params string[] strings) { foreach (var elem in strings) builder.Append(elem); var arr = builder.Build(Allocator); System.Console.Write("Array contents: ["); for (var i = 0; i < arr.Length; i++) { if (i > 0) System.Console.Write(", "); System.Console.Write($"'{arr.GetString(i)}'"); } System.Console.WriteLine("]"); } } {code} h2. Expected Output {noformat} Array contents: ['Hello', 'World'] Array contents: ['Foo', 'Bar'] {noformat} h2. Actual Output {noformat} Array contents: ['Hello', 'World'] Array contents: [' ', ' '] {noformat} h1. Workaround The bug can be trivially worked around by constructing a new {{StringArray.Builder}} instead of calling {{Clear()}}. The issue ARROW-7040 mentions other issues with string arrays in C#, but I'm not sure if this is related or not. -- This message was sent by Atlassian Jira (v8.3.4#803005)