Re: [EXTERNAL] Re: .NET support for Arrow

2020-07-12 Thread Adam Szmigin

Hi Anthony,

On 12/07/2020 20:43, anthony.ab...@gmail.com wrote:


It appears that fragmentation is already a problem (ie private forks)
I should point out that the private fork currently used by my 
organisation is a "minimal divergence" from the upstream project, and it 
was my intention from the outset that all our changes would be submitted 
back.  In fact, once I get my current set of PRs past Eric (apologies 
Eric - I will address your feedback shortly!) and into a release, we 
will have no need for our private fork any longer.  I too would prefer 
to avoid fragmentation.


Without knowing the full details of your use cases for using the Arrow 
format, the impression I got from your summary so far is that our 
reasons for diverging from upstream are quite different.


If your use cases are sufficiently different that the Arrow .NET library 
wasn't for you, would you consider raising tickets for the 
bugs/enhancements/performance issues you encountered that are important 
to you?


Many thanks,

--
Adam Szmigin



Re: .NET support for Arrow

2020-07-10 Thread Adam Szmigin

Hi Yash,

My organisation is using the C# library for a product we are working 
on.  However, we are using a fork which includes a number of bug-fixes 
for issues that would have otherwise blocked us. I've raised a few PRs 
to fix these upstream.


I think it's fair to say that the C# library is at an early stage of 
development at the moment.  The more people who are able to test and 
contribute back, the better.


Kind regards,


--
Adam Szmigin

On 10/07/2020 04:05, Yash Ganthe wrote:

Hi,

The first paragraph of docs at https://arrow.apache.org/ says it supports
C#.
However there is no library for C# listed anywhere else in the
documentation. Is .NET supported at all?

Regards,
Yash



Re: [DISCUSS] Move JIRA notifications to separate mailing list?

2020-06-08 Thread Adam Szmigin

Hi Neal,

On 08/06/2020 19:43, Neal Richardson wrote:

I've noticed that some other Apache projects have a separate mailing list
for JIRA notifications (Spark, for example, has iss...@spark.apache.org).
The result is that the dev@ mailing list is focused on actual discussions
threads (like this!), votes, and other official business. Would we be
interested in doing the same?


I have been lazy and not set up any anti-JIRA filters in the few weeks 
that I have been a member of this mailing list. Deleting JIRA 
notifications has fast become the most popular activity that my email 
client sees :-).


So from the perspective of a new member of the community, I can see how 
some might find this a turn-off, and maybe even be dissuaded from 
participation - obviously not something anyone here would want.


I'd certainly support a dedicated list for JIRA notifications.

--
Adam Szmigin



[jira] [Created] (ARROW-8886) [C#] Decide and implement appropriate behaviour for Array builder resize to negative size

2020-05-21 Thread Adam Szmigin (Jira)
Adam Szmigin created ARROW-8886:
---

 Summary: [C#] Decide and implement appropriate behaviour for Array 
builder resize to negative size
 Key: ARROW-8886
 URL: https://issues.apache.org/jira/browse/ARROW-8886
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C#
Affects Versions: 0.17.1
Reporter: Adam Szmigin


h1. Summary

Currently, the {{ArrowBuffer.Builder}} class accepts a negative value to the 
{{Resize()}} method, and treats it as though the caller passed zero.  This was 
implemented deliberately, as there is an explicit unit test to verify the 
behaviour.

However, it is also unusual.  By way of comparison:

* The {{System.Array.Resize()}} method throws 
{{ArgumentOutOfRangeException}} if a negative value is passed: 
https://docs.microsoft.com/en-us/dotnet/api/system.array.resize?view=netcore-3.1
* The Arrow C++ implementation will refuse to accept a negative length: 
https://github.com/apache/arrow/blob/master/cpp/src/arrow/array/builder_base.h#L194

h1. Acceptance Criteria

* The behaviour when receiving a negative length to a {{Resize()}} method 
_must_ be agreed upon.
* Appropriate changes _must_ be made to the codebase in accordance with the 
outcome of the above agreement.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8788) [C#] Array builders to use bit-packed buffer builder rather than boolean array builder for validity map

2020-05-13 Thread Adam Szmigin (Jira)
Adam Szmigin created ARROW-8788:
---

 Summary: [C#] Array builders to use bit-packed buffer builder 
rather than boolean array builder for validity map
 Key: ARROW-8788
 URL: https://issues.apache.org/jira/browse/ARROW-8788
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C#
Affects Versions: 0.17.0
Reporter: Adam Szmigin


The C# array builders were recently enhanced to have support for adding 
nullable values easily, under [PR 
#7032|https://github.com/apache/arrow/pull/7032].

However, the builders internally referenced {{BooleanArray.Builder}}, which 
itself then had logic "baked-in" for efficient bit-packing of boolean values 
into a byte buffer.

It would be cleaner for there to be a general-purpose bit-packed buffer 
builder, and for all array builders to use that for their validity map.  The 
boolean array builder would use it twice: once for values, once for validity.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


C# - Appetite for breaking changes to public API?

2020-04-27 Thread Adam Szmigin

Dear team,

I am keen to work on a number of the tickets relating to the C# 
implementation for Apache Arrow.


Quite a few of the open tickets relate to making breaking changes to the 
public API (e.g. ARROW-7757, ARROW-8581, likely ARROW-6603 as well).  
What is the general appetite for making breaking changes to the C# code 
in its present state?


The README.md hints at the C# implementation being alpha-grade at 
present, so I assume all ok, but I would like to check opinions from the 
devs before I embark on any PRs.


Many thanks,

--
Adam Szmigin



[jira] [Created] (ARROW-8581) [C#] Date32/64Array write & read back introduces off-by-one error

2020-04-24 Thread Adam Szmigin (Jira)
Adam Szmigin created ARROW-8581:
---

 Summary: [C#] Date32/64Array write & read back introduces 
off-by-one error
 Key: ARROW-8581
 URL: https://issues.apache.org/jira/browse/ARROW-8581
 Project: Apache Arrow
  Issue Type: Bug
  Components: C#
Affects Versions: 0.17.0
 Environment: Windows 10 x64
Reporter: Adam Szmigin


h1. Summary

Writing a Date value using either a {{Date32Array.Builder}} or 
{{Date64.Builder}} and then reading back the result from the built array 
introduces an off-by-one error in the value.  The following minimal code 
illustrates:
{code:c#}
namespace Date32ArrayReadWriteBug
{
using Apache.Arrow;
using Apache.Arrow.Memory;
using System;internal static class Program
{
public static void Main(string[] args)
{
var allocator = new NativeMemoryAllocator();
var builder = new Date32Array.Builder();
var date = new DateTime(2020, 4, 24);
Console.WriteLine($"Appending date {date:-MM-dd}");
builder.Append(date);
var array = builder.Build(allocator);
var dateAgain = array.GetDate(0);
Console.WriteLine($"Read date {dateAgain:-MM-dd}");
}
}
}{code}
h2. Expected Output
{noformat}
Appending date 2020-04-24
Read date 2020-04-24 {noformat}
h2. Actual Output
{noformat}
Appending date 2020-04-24
Read date 2020-04-23 {noformat}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8344) [C#] StringArray.Builder.Clear() corrupts subsequent array contents

2020-04-06 Thread Adam Szmigin (Jira)
Adam Szmigin created ARROW-8344:
---

 Summary: [C#] StringArray.Builder.Clear() corrupts subsequent 
array contents
 Key: ARROW-8344
 URL: https://issues.apache.org/jira/browse/ARROW-8344
 Project: Apache Arrow
  Issue Type: Bug
  Components: C#
Affects Versions: 0.16.0
 Environment: Windows 10 x64
Reporter: Adam Szmigin


h1. Summary

Using the {{Clear()}} method on a {{StringArray.Builder}} class causes all 
subsequent built arrays to contain strings consisting solely of whitespace.  
The below minimal example illustrates:
{code:java}
namespace ArrowStringArrayBuilderBug
{
using Apache.Arrow;
using Apache.Arrow.Memory;

public class Program
{
private static readonly NativeMemoryAllocator Allocator
= new NativeMemoryAllocator();

public static void Main()
{
var builder = new StringArray.Builder();
AppendBuildPrint(builder, "Hello", "World");
builder.Clear();
AppendBuildPrint(builder, "Foo", "Bar");
}

private static void AppendBuildPrint(
StringArray.Builder builder, params string[] strings)
{
foreach (var elem in strings)
builder.Append(elem);

var arr = builder.Build(Allocator);
System.Console.Write("Array contents: [");
for (var i = 0; i < arr.Length; i++)
{
if (i > 0) System.Console.Write(", ");
System.Console.Write($"'{arr.GetString(i)}'");
}
System.Console.WriteLine("]");
}
}
{code}
h2. Expected Output
{noformat}
Array contents: ['Hello', 'World']
Array contents: ['Foo', 'Bar']
{noformat}
h2. Actual Output
{noformat}
Array contents: ['Hello', 'World']
Array contents: ['   ', '   '] {noformat}
h1. Workaround

The bug can be trivially worked around by constructing a new 
{{StringArray.Builder}} instead of calling {{Clear()}}.

The issue ARROW-7040 mentions other issues with string arrays in C#, but I'm 
not sure if this is related or not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)