[jira] [Created] (ARROW-9048) [C#] Support Float16

2020-06-06 Thread Eric Erhardt (Jira)
Eric Erhardt created ARROW-9048:
---

 Summary: [C#] Support Float16
 Key: ARROW-9048
 URL: https://issues.apache.org/jira/browse/ARROW-9048
 Project: Apache Arrow
  Issue Type: Bug
  Components: C#
Reporter: Eric Erhardt


With [https://github.com/dotnet/runtime/issues/936], .NET is getting a 
`System.Half` type, which is a 16-bit floating point number. Once that type 
lands in .NET we can implement support for the Float16 type in Arrow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8953) [C#] Update to .NET SDK 3.1

2020-05-26 Thread Eric Erhardt (Jira)
Eric Erhardt created ARROW-8953:
---

 Summary: [C#] Update to .NET SDK 3.1
 Key: ARROW-8953
 URL: https://issues.apache.org/jira/browse/ARROW-8953
 Project: Apache Arrow
  Issue Type: Bug
  Components: C#
Reporter: Eric Erhardt


We should update our tools to the latest .NET SDK - 3.1. This will enable new 
tooling features, such as the code style rules package that will enforce coding 
style:

[https://github.com/apache/arrow/pull/7246#issuecomment-634206767]

 

There are 3 places that I know of that need updating:

 

[https://github.com/apache/arrow/blob/master/.github/workflows/csharp.yml]

[https://github.com/apache/arrow/blob/f16f76ab7693ae085e82f4269a0a0bc23770bef9/.github/workflows/dev.yml#L132]

[https://github.com/apache/arrow/blob/f16f76ab7693ae085e82f4269a0a0bc23770bef9/dev/release/verify-release-candidate.sh#L327]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8882) [C#] Add .editorconfig to C# code

2020-05-21 Thread Eric Erhardt (Jira)
Eric Erhardt created ARROW-8882:
---

 Summary: [C#] Add .editorconfig to C# code
 Key: ARROW-8882
 URL: https://issues.apache.org/jira/browse/ARROW-8882
 Project: Apache Arrow
  Issue Type: Bug
  Components: C#
Reporter: Eric Erhardt


This allows for a consistent code format throughout the C# code in the repo. 
That way when a new contributor submits a change, the editors will 
automatically format the code to be in the same format as the current code base.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7516) [C#] .NET Benchmarks are broken

2020-01-08 Thread Eric Erhardt (Jira)
Eric Erhardt created ARROW-7516:
---

 Summary: [C#] .NET Benchmarks are broken
 Key: ARROW-7516
 URL: https://issues.apache.org/jira/browse/ARROW-7516
 Project: Apache Arrow
  Issue Type: Bug
  Components: C#
Reporter: Eric Erhardt


See [https://github.com/apache/arrow/pull/6030#issuecomment-571877721]

 

It looks like the issue is that in the Benchmarks, `Length` is specified as 
`1_000_000`, and there has only been ~730,000 days since `DateTime.Min`, so 
this line fails:

https://github.com/apache/arrow/blob/4634c89fc77f70fb5b5d035d6172263a4604da82/csharp/test/Apache.Arrow.Tests/TestData.cs#L130

A simple fix would be to cap what we pass into `AddDays` to some number like 
`100_000`, or so.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6795) [C#] Reading large Arrow files in C# results in an exception

2019-10-04 Thread Eric Erhardt (Jira)
Eric Erhardt created ARROW-6795:
---

 Summary: [C#] Reading large Arrow files in C# results in an 
exception
 Key: ARROW-6795
 URL: https://issues.apache.org/jira/browse/ARROW-6795
 Project: Apache Arrow
  Issue Type: Bug
  Components: C#
Reporter: Eric Erhardt


If you try to read a large Arrow file (2GB+) using the C# reader, you get an 
exception because it is casting the file position (a 64-bit long) to a 32-bit 
integer. When the file size is large

 

See [https://github.com/apache/arrow/pull/5412]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6728) [C#] Support reading and writing Date32 and Date64 arrays

2019-09-27 Thread Eric Erhardt (Jira)
Eric Erhardt created ARROW-6728:
---

 Summary: [C#] Support reading and writing Date32 and Date64 arrays
 Key: ARROW-6728
 URL: https://issues.apache.org/jira/browse/ARROW-6728
 Project: Apache Arrow
  Issue Type: Bug
  Components: C#
Reporter: Eric Erhardt


The C# implementation doesn't support reading and writing Date32 and Date64 
arrays. We need to add support and some tests.

It looks like it is only a couple of lines to get this enabled. See 
[https://github.com/apache/arrow/pull/5413].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6643) [C#] Write no IPC buffer metadata for NullType

2019-09-20 Thread Eric Erhardt (Jira)
Eric Erhardt created ARROW-6643:
---

 Summary: [C#] Write no IPC buffer metadata for NullType
 Key: ARROW-6643
 URL: https://issues.apache.org/jira/browse/ARROW-6643
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C#
Reporter: Eric Erhardt


We need to align the C# writer (and test the reader) for NullType. See 
[https://github.com/apache/arrow/pull/5287] and ARROW-6379.

 

>The C++ implementation has been writing 2 {{Buffer}} Flatbuffer struct values 
>with length 0 for NullType. Rather than having dummy/placeholder Buffer I 
>think it is more consistent to write no metadata for this type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6603) [C#] ArrayBuilder API to support writing nulls

2019-09-18 Thread Eric Erhardt (Jira)
Eric Erhardt created ARROW-6603:
---

 Summary: [C#] ArrayBuilder API to support writing nulls
 Key: ARROW-6603
 URL: https://issues.apache.org/jira/browse/ARROW-6603
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C#
Reporter: Eric Erhardt


There is currently no API in the PrimitiveArrayBuilder class to support writing 
nulls.  See this TODO - 
[https://github.com/apache/arrow/blob/1515fe10c039fb6685df2e282e2e888b773caa86/csharp/src/Apache.Arrow/Arrays/PrimitiveArrayBuilder.cs#L101.]

 

Also see [https://github.com/apache/arrow/issues/5381].

 

We should add some APIs to support writing nulls.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6553) [C#] Decide how to read message lengths - little-endian or machine dependent

2019-09-12 Thread Eric Erhardt (Jira)
Eric Erhardt created ARROW-6553:
---

 Summary: [C#] Decide how to read message lengths - little-endian 
or machine dependent
 Key: ARROW-6553
 URL: https://issues.apache.org/jira/browse/ARROW-6553
 Project: Apache Arrow
  Issue Type: Bug
  Components: C#
Reporter: Eric Erhardt


See the discussion 
[here|[https://github.com/apache/arrow/pull/5280#discussion_r323896532]]. We 
are currently reading message lengths using machine dependent endianness. 
Should this be changed to little-endian all the time?

It appears the C++ implementation does this same thing - use machine dependent 
endianness.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6322) [C#] Implement a plasma client

2019-08-22 Thread Eric Erhardt (Jira)
Eric Erhardt created ARROW-6322:
---

 Summary: [C#] Implement a plasma client
 Key: ARROW-6322
 URL: https://issues.apache.org/jira/browse/ARROW-6322
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C#
Reporter: Eric Erhardt
Assignee: Eric Erhardt


We should create a C# plasma client, so .NET code can get and put objects into 
the plasma store.

An easy-ish way of implementing this would be to build on the c_glib C APIs 
already exposed for the plasma client. Unfortunately, I haven't found a decent 
C# GObject generator, so I think the C bindings will need to be written by 
hand, but there isn't too many of them.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-5908) [C#] ArrowStreamWriter doesn't align buffers to 8 bytes

2019-07-10 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-5908:
---

 Summary: [C#] ArrowStreamWriter doesn't align buffers to 8 bytes
 Key: ARROW-5908
 URL: https://issues.apache.org/jira/browse/ARROW-5908
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C#
Reporter: Eric Erhardt
Assignee: Eric Erhardt


When writing RecordBatches using ArrowStreamWriter, if the ArrowBuffers being 
written aren't all 8 byte aligned, the serialized RecordBatch won't conform to 
the Arrow specification. This leads to other languages' readers to throw an 
error when reading Arrow streams written by the C# writer.

For example, if reading the stream from Python or C++, an error is raised here: 

[https://github.com/apache/arrow/blob/f77c3427ca801597b572fb197b92b0133269049b/cpp/src/arrow/ipc/reader.cc#L107-L110]

A similar error is raised when Java tries to read the stream.

We should be ensuring that the buffers being written to the stream are padded 
to 8 bytes, no matter their length, as specified in 
[https://arrow.apache.org/docs/format/Layout.html#requirements-goals-and-non-goals]

 
{quote} * It is required to have all the contiguous memory buffers in an IPC 
payload aligned at 8-byte boundaries. In other words, each buffer must start at 
an aligned 8-byte offset. Additionally, each buffer should be padded to a 
multiple of 8 bytes.{quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5896) [C#] Array Builders should take an initial capacity in their constructors

2019-07-09 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-5896:
---

 Summary: [C#] Array Builders should take an initial capacity in 
their constructors
 Key: ARROW-5896
 URL: https://issues.apache.org/jira/browse/ARROW-5896
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C#
Reporter: Eric Erhardt


When using the Fluent Array Builder API, we should take in an initial capacity 
in the constructor, so we can avoid allocating unnecessary memory.

Today, if you create a builder, and then .Reserve(length) on it, the initial 
byte[] that was created in the constructor is wasted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5887) [C#] ArrowStreamWriter writes FieldNodes in wrong order

2019-07-09 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-5887:
---

 Summary: [C#] ArrowStreamWriter writes FieldNodes in wrong order
 Key: ARROW-5887
 URL: https://issues.apache.org/jira/browse/ARROW-5887
 Project: Apache Arrow
  Issue Type: Bug
  Components: C#
Reporter: Eric Erhardt
Assignee: Eric Erhardt


When ArrowStreamWriter is writing a {{RecordBatch}} with {{null}}s in it, it is 
mixing up the column's {{NullCount}}.

You can see here:

[https://github.com/apache/arrow/blob/90affbd2c41e80aa8c3fac1e4dbff60aafb415d3/csharp/src/Apache.Arrow/Ipc/ArrowStreamWriter.cs#L195-L200]

It is writing the fields from {{0}} -> {{fieldCount}} order. But then 
[lower|https://github.com/apache/arrow/blob/90affbd2c41e80aa8c3fac1e4dbff60aafb415d3/csharp/src/Apache.Arrow/Ipc/ArrowStreamWriter.cs#L216-L220],
 it is writing the fields from {{fieldCount}} -> {{0}}.

Looking at the [Java 
implementation|https://github.com/apache/arrow/blob/7b2d68570b4336308c52081a0349675e488caf11/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/FBSerializables.java#L36-L44]
 it says
{quote}// struct vectors have to be created in reverse order
{quote}
 

A simple test of roundtripping the following RecordBatch shows the issue:

 
{code:java}
var result = new RecordBatch(
new Schema.Builder()
.Field(f => f.Name("age").DataType(Int32Type.Default))
.Field(f => f.Name("CharCount").DataType(Int32Type.Default))
.Build(),
new IArrowArray[]
{
new Int32Array(
new ArrowBuffer.Builder().Append(0).Build(),
new ArrowBuffer.Builder().Append(0).Build(),
length: 1,
nullCount: 1,
offset: 0),
new Int32Array(
new ArrowBuffer.Builder().Append(7).Build(),
ArrowBuffer.Empty,
length: 1,
nullCount: 0,
offset: 0)
},
length: 1);
{code}
Here, the "age" column should have a `null` in it. However, when you write and 
read this RecordBatch back, you see that the "CharCount" column has `NullCount` 
== 1 and "age" column has `NullCount` == 0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5708) [C#] Null support for BooleanArray

2019-06-24 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-5708:
---

 Summary: [C#] Null support for BooleanArray
 Key: ARROW-5708
 URL: https://issues.apache.org/jira/browse/ARROW-5708
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C#
Reporter: Eric Erhardt


See the conversation 
[here|https://github.com/apache/arrow/pull/4640#discussion_r296417726] and 
[here|https://github.com/apache/arrow/pull/3574#discussion_r262662083].

We should add null support for BooleanArray.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5546) [C#] Remove IArrowArray and use Array base class.

2019-06-10 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-5546:
---

 Summary: [C#] Remove IArrowArray and use Array base class.
 Key: ARROW-5546
 URL: https://issues.apache.org/jira/browse/ARROW-5546
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C#
Affects Versions: 0.13.0
Reporter: Eric Erhardt


In .NET libraries, we have historically favored classes (abstract or otherwise) 
over interfaces. See [Choosing Between Classes and 
Interfaces|https://docs.microsoft.com/en-us/previous-versions/dotnet/netframework-4.0/ms229013(v%3dvs.100)].
 The main reasoning is that you can add members to a class over time, but once 
you ship an interface, it can never be changed. You can only add new interfaces.

 In light of this, we should remove the IArrowArray interface, and instead just 
the base `Array` class as the abstraction for all Arrow Arrays.

As part of this, we should also consider renaming `Array` because it conflicts 
with the System.Array type. Instead we should consider naming it `ArrowArray` 
to make it unique from the very common System.Array type in .NET.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5278) [C#] ArrowBuffer should either implement IEquatable correctly or not at all

2019-05-07 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-5278:
---

 Summary: [C#] ArrowBuffer should either implement IEquatable 
correctly or not at all
 Key: ARROW-5278
 URL: https://issues.apache.org/jira/browse/ARROW-5278
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Eric Erhardt


See the discussion 
[here|https://github.com/apache/arrow/pull/3925/#discussion_r281378027].

ArrowBuffer currently implement IEquatable, but doesn't override `GetHashCode`.

We should either implement IEquatable correctly by overriding Equals and 
GetHashCode, or remove IEquatable all together.

Looking at ArrowBuffer's [Equals 
implementation|https://github.com/apache/arrow/blob/08829248fd540b7e3bd96b980e357f8a4db7970e/csharp/src/Apache.Arrow/ArrowBuffer.cs#L66-L69],
 it compares each value in the buffer, which is not very efficient. Also, this 
implementation is not consistent with how `Memory` implements IEquatable - 
[https://source.dot.net/#System.Private.CoreLib/shared/System/Memory.cs,500].

If we continue implementing IEquatable on ArrowBuffer, we should consider 
implementing it in the same fashion as Memory does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5277) [C#] MemoryAllocator.Allocate(length: 0) should not return null

2019-05-07 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-5277:
---

 Summary: [C#] MemoryAllocator.Allocate(length: 0) should not 
return null
 Key: ARROW-5277
 URL: https://issues.apache.org/jira/browse/ARROW-5277
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C#
Reporter: Eric Erhardt


See the conversation 
[here|https://github.com/apache/arrow/pull/3925#discussion_r281187184].

We should change MemoryAllocator to not return `null` when the requested memory 
length is `0`. Instead, we should create a cached "NullObject" IMemoryOwner 
that has a no-op `Dispose` method, and always returns `Memory.Empty`.

This way consuming code doesn't need to check for `null` being returned from 
MemoryAllocator.Allocate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5276) [C#] NativeMemoryAllocator expose an option for clearing allocated memory

2019-05-07 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-5276:
---

 Summary: [C#] NativeMemoryAllocator expose an option for clearing 
allocated memory
 Key: ARROW-5276
 URL: https://issues.apache.org/jira/browse/ARROW-5276
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C#
Reporter: Eric Erhardt


See the discussion 
[here|https://github.com/apache/arrow/pull/3925#discussion_r281192698].

We should expose an option on NativeMemoryAllocator for controlling whether the 
allocated memory is cleared or not.

Maybe we should make the default not clear the memory, that way it is the best 
performing by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5092) [C#] Source Link doesn't work with the C# release script

2019-04-02 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-5092:
---

 Summary: [C#] Source Link doesn't work with the C# release script
 Key: ARROW-5092
 URL: https://issues.apache.org/jira/browse/ARROW-5092
 Project: Apache Arrow
  Issue Type: Bug
  Components: C#
Affects Versions: 0.13.0
Reporter: Eric Erhardt


With the 0.13.0 C# NuGet package, [Source 
Link|https://docs.microsoft.com/en-us/dotnet/standard/library-guidance/sourcelink]
 doesn't work. The symbols can be downloaded from nuget.org correctly, but when 
Visual Studio tries to download the code, it cannot find the correct files.

The following is why it doesn't work:

The .NET tooling expects the build of an official release to happen in the 
context of a {{git}} repository. This does 2 things to the produced assets:
 # In the {{.nupkg}} file that is generated, the .NET tooling will encode the 
current git commit's SHA hash into both the {{Apache.Arrow.nuspec}} file, and 
into the compiled {{Apache.Arrow.dll}} assembly. Looking at the released 
version that was published over the weekend: 
[https://www.nuget.org/packages/Apache.Arrow/0.13.0], this information made it 
into the {{.nuspec}} and the {{.dll}}:

{code}
[assembly: 
AssemblyInformationalVersion("0.13.0+57de5c3adffe526f37366bb15c3ff0d4a2e84655")]

https://github.com/apache/arrow"; 
commit="57de5c3adffe526f37366bb15c3ff0d4a2e84655" />
{code}

However, I don't see how the [C# release 
script|https://github.com/apache/arrow/blob/master/dev/release/post-06-csharp.sh]
 could have done that. 

 # Also, .NET has a feature called "Source Link", which allows for the source 
code to be automatically downloaded from GitHub when debugging into this 
library. The way the tooling works today, it requires that the git repository's 
{{origin}} remote is set to [https://github.com/apache/arrow.git]. The tooling 
reads uses the `origin` git remote to encode the GitHub URL into the symbols 
file in the {{.snupkg}} file.

This, however, doesn't work with the 0.13.0 release that occurred over the 
weekend. I tried using the Source Link feature, and it didn't automatically 
download the source files from GitHub.

Looking into the symbols file, I see the Source Link information that was 
embedded:


{code}
1: 
'/home/kou/work/cpp/arrow.kou/apache-arrow-0.13.0/csharp/src/Apache.Arrow/Flatbuf/FlatBuffers/ByteBuffer.cs'
 (#19c)C# (#3)   SHA-1 (#2) 
04-64-A0-48-82-EA-F5-B5-50-EC-CA-9F-85-75-E2-95-A4-EC-AB-B3 (#1b7)   
2: 
'/home/kou/work/cpp/arrow.kou/apache-arrow-0.13.0/csharp/src/Apache.Arrow/Flatbuf/FlatBuffers/ByteBufferUtil.cs'
 (#68f)C# (#3)   SHA-1 (#2) 
F0-4F-28-53-88-A4-E0-6E-F1-1F-17-F6-CD-FE-0E-64-AB-0B-C2-95 (#6aa)   
{code}

{code:json}
{
"documents": {
"/home/kou/work/cpp/arrow.kou/*": 
"https://raw.githubusercontent.com/kou/arrow/57de5c3adffe526f37366bb15c3ff0d4a2e84655/*";,
"/home/kou/work/cpp/arrow.kou/cpp/submodules/parquet-testing/*": 
"https://raw.githubusercontent.com/apache/parquet-testing/bb7b6abbb3fbeff845646364a4286142127be04c/*";
}
}
{code}

Here it appears the {{origin}} remote was set to {{kou/arrow}}, and not 
{{apache/arrow}}. Also, it appears the {{apache-arrow-0.13.0}} folder was under 
a git repository, and so the sources aren't matched up with the git repository. 
(Basically that folder shouldn't have appeared in the Documents list that has 
the {{.cs}} file path.) I think this explains how (1) above happened - the 
build was under a git repository - but this script downloaded an extra copy of 
the sources into that git repository.

I'm wondering how we can fix either this script, or the .NET Tooling, or both, 
to make this experience better for the next release. I think we need to ensure 
two things:
 # The git commit information is set correctly in the {{.nuspec}} and the 
{{.dll}} when the release build is run. I think it just happened by pure luck 
this time. It just so happened that the script was executed in an already 
established repo, and it just so happened to be on the right commit (or maybe 
it wasn't the right commit?).
 # The source link information is set correctly in the symbols file.

[~wesmckinn] [~kou]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5035) [C#] ArrowBuffer.Builder is broken

2019-03-27 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-5035:
---

 Summary: [C#] ArrowBuffer.Builder is broken
 Key: ARROW-5035
 URL: https://issues.apache.org/jira/browse/ARROW-5035
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C#
Reporter: Eric Erhardt


If someone creates and uses `ArrowBuffer.Builder` in their code to create 
an ArrowBuffer filled with Boolean values, it is currently producing the wrong 
results.

The reason it is producing the wrong results is because it is taking the 
`sizeof(bool)` (which is 1) and using that for how many bytes to write into the 
backing buffer for each element being added to the builder. However, in Arrow, 
Boolean values are stored in a bit-wise fashion allowing for 8 Boolean values 
in a single byte. Thus, when I add 4 `true` values to the buffer, I expect to 
get a buffer with 1 byte in it with the value 0x0F. However, I am getting a 
buffer with 4 bytes in it, each with value 0x01.

One way to fix this would be to throw in `ArrowBuffer.Builder`'s constructor 
if `T` == `bool` and instead create a new class `ArrowBuffer.BooleanBuilder`, 
which will create Boolean buffers correctly. Looking at the current 
implementation, I think it would be rather hard to special case `typeof(bool)` 
all over in the `Builder` class, but if someone wanted to take that approach 
and made it work, that would be great too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5034) [C#] ArrowStreamWriter should expose synchronous Write methods

2019-03-27 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-5034:
---

 Summary: [C#] ArrowStreamWriter should expose synchronous Write 
methods
 Key: ARROW-5034
 URL: https://issues.apache.org/jira/browse/ARROW-5034
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C#
Reporter: Eric Erhardt


There are times when callers are in a synchronous method and need to write an 
Arrow stream. However, ArrowStreamWriter (and ArrowFileWriter) only expose 
WriteAsync methods, which means the caller needs to call the Async method, and 
then block on the resulting Task.

Instead, we should also expose Write methods that complete in a synchronous 
fashion, so the callers are free to choose the sync or async methods as they 
need.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5019) [C#] ArrowStreamWriter doesn't work on a non-seekable stream

2019-03-26 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-5019:
---

 Summary: [C#] ArrowStreamWriter doesn't work on a non-seekable 
stream
 Key: ARROW-5019
 URL: https://issues.apache.org/jira/browse/ARROW-5019
 Project: Apache Arrow
  Issue Type: Bug
  Components: C#
Reporter: Eric Erhardt
Assignee: Eric Erhardt


When writing to a non-seekable .NET Stream (like a network/socket stream), 
ArrowStreamWriter will throw an exception:

 
{code:java}
Exception thrown: 'System.NotSupportedException' in System.Net.Sockets.dll
This stream does not support seek operations.
{code}
The reason this throws is because we are using `BastStream.Position` in the 
writer to calculate the length of bytes that we've written to the stream. We 
don't need to use the Position in order to calculate the lengths. We should be 
able to write an Arrow RecordBatch to a NetworkStream directly. Today, we need 
to write to a MemoryStream, and then copy the MemoryStream to the NetworkStream.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4997) [C#] ArrowStreamReader doesn't consume whole stream and doesn't implement sync read

2019-03-22 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-4997:
---

 Summary: [C#] ArrowStreamReader doesn't consume whole stream and 
doesn't implement sync read
 Key: ARROW-4997
 URL: https://issues.apache.org/jira/browse/ARROW-4997
 Project: Apache Arrow
  Issue Type: Bug
  Components: C#
Reporter: Eric Erhardt
Assignee: Eric Erhardt


There are 2 major issues with the ArrowStreamReader that are blocking me from 
using it.
 # When it reads a batch from a .NET Stream that doesn't return the whole chunk 
of memory in one "Read" call (like a socket/network stream), it only calls Read 
once, and then continues on. This is an issue because it has "garbage" at the 
end of its buffer (which was never written to by the stream), and when 
attempting to read the next batch, it is in the middle of the previous batch 
from the .NET Stream. This causes all sorts of issues because it assumes the 
next 4 bytes are the message length, which it obviously isn't. See [the reading 
code|https://github.com/apache/arrow/blob/13fd813445b4738cbebbd137490fe3c02071c04b/csharp/src/Apache.Arrow/Ipc/ArrowStreamReaderImplementation.cs#L90-L97]
 for where it only calls Read once - it should be in a loop.
 # ArrowStreamReader has a synchronous ReadNextRecordBatch() method - but it 
throws NotImplementedException. This is necessary when a caller isn't in an 
async method, they can't/shouldn't call the async API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4839) [C#] Add NuGet support

2019-03-12 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-4839:
---

 Summary: [C#] Add NuGet support
 Key: ARROW-4839
 URL: https://issues.apache.org/jira/browse/ARROW-4839
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C#
Reporter: Eric Erhardt
Assignee: Eric Erhardt


We should add the metadata to the .csproj so we can create a NuGet package 
without changing any source code.

Also, we should add any scripts and documentation on how to create the NuGet 
package to allow ease of creation at release time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4737) [C#] tests are not running in CI

2019-03-01 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-4737:
---

 Summary: [C#] tests are not running in CI
 Key: ARROW-4737
 URL: https://issues.apache.org/jira/browse/ARROW-4737
 Project: Apache Arrow
  Issue Type: Bug
  Components: C#
Reporter: Eric Erhardt
Assignee: Eric Erhardt


 The C# tests are not running in CI because the filtering logic needs to be 
updated.

For example see 
https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/22671460/job/nk1nn59k5njie720

{quote}Build started
git clone -q https://github.com/apache/arrow.git C:\projects\arrow
git fetch -q origin +refs/pull/3662/merge:
git checkout -qf FETCH_HEAD
Running Install scripts
python ci\detect-changes.py > generated_changes.bat
Affected files: [u'csharp/src/Apache.Arrow/Field.Builder.cs', 
u'csharp/src/Apache.Arrow/Schema.Builder.cs', 
u'csharp/test/Apache.Arrow.Tests/SchemaBuilderTests.cs', 
u'csharp/test/Apache.Arrow.Tests/TypeTests.cs']
Affected topics:
{'c_glib': False,
 'cpp': False,
 'dev': False,
 'docs': False,
 'go': False,
 'integration': False,
 'java': False,
 'js': False,
 'python': False,
 'r': False,
 'ruby': False,
 'rust': False,
 'site': False}
call generated_changes.bat
call ci\appveyor-filter-changes.bat
===
=== No C++ or Python changes, exiting job
===
Build was forcibly terminated
Build success{quote}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4717) [C#] Consider exposing ValueTask instead of Task

2019-02-28 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-4717:
---

 Summary: [C#] Consider exposing ValueTask instead of Task
 Key: ARROW-4717
 URL: https://issues.apache.org/jira/browse/ARROW-4717
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C#
Reporter: Eric Erhardt


See [https://github.com/apache/arrow/pull/3736#pullrequestreview-207169204] for 
the discussion and 
[https://devblogs.microsoft.com/dotnet/understanding-the-whys-whats-and-whens-of-valuetask/]
 for the reasoning.

Using `Task` in public API requires that a new Task instance be allocated on 
every call. When returning synchronously, using ValueTask will allow the method 
to not allocate.

In order to do this, we will need to take a new dependency on  
{{System.Threading.Tasks.Extensions}} NuGet package.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4571) [Format] Tensor.fbs file has multiple root_type declarations

2019-02-14 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-4571:
---

 Summary: [Format] Tensor.fbs file has multiple root_type 
declarations
 Key: ARROW-4571
 URL: https://issues.apache.org/jira/browse/ARROW-4571
 Project: Apache Arrow
  Issue Type: Bug
  Components: Format
Reporter: Eric Erhardt


Looking at [the flatbuffers 
doc|https://google.github.io/flatbuffers/flatbuffers_guide_tutorial.html], it 
appears there should only be one `root_type` declaration in an fbs file:
{code:java}
The last part of the schema is the root_type. The root type declares what will 
be the root table for the serialized data. In our case, the root type is our 
Monster table.{code}
However, the Tensor.fbs file has multiple `root_type` declarations:

[https://github.com/apache/arrow/blob/69d595ae4c61902b3f2778e536fca6675350c88c/format/Tensor.fbs#L53]

[https://github.com/apache/arrow/blob/69d595ae4c61902b3f2778e536fca6675350c88c/format/Tensor.fbs#L146]

 

See the discussion here: 
https://github.com/apache/arrow/pull/2546#discussion_r256549256



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4543) [C#] Update Flat Buffers code to latest version

2019-02-12 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-4543:
---

 Summary: [C#] Update Flat Buffers code to latest version
 Key: ARROW-4543
 URL: https://issues.apache.org/jira/browse/ARROW-4543
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C#
Reporter: Eric Erhardt
Assignee: Eric Erhardt


In order to support zero-copy reads, we should update to the latest Google Flat 
Buffers code. A recent change now allows [C# support for directly reading and 
writing to memory other than 
byte|https://github.com/google/flatbuffers/pull/4886][] which will make reading 
native memory using `Memory` possible.

Along with this update, we should mark the flat buffers types as `internal`, 
since they are an implementation detail of the library. From an API 
perspective, it is confusing to see multiple public types named "Schema", 
"Field", "RecordBatch" etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4503) [C#] ArrowStreamReader allocates and copies data excessively

2019-02-07 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-4503:
---

 Summary: [C#] ArrowStreamReader allocates and copies data 
excessively
 Key: ARROW-4503
 URL: https://issues.apache.org/jira/browse/ARROW-4503
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C#
Reporter: Eric Erhardt


When reading `RecordBatch` instances using the `ArrowStreamReader` class, it is 
currently allocating and copying memory 3 times for the data.
 # It is allocating memory in order to [read the data from the 
Stream|https://github.com/apache/arrow/blob/044b418fa108a57f0b4e2e887546cc3e68271397/csharp/src/Apache.Arrow/Ipc/ArrowStreamReader.cs#L72-L74],
 and then reading from the Stream.  (This should be the only allocation that is 
necessary.)
 # It then [creates a new 
`ArrowBuffer.Builder`|https://github.com/apache/arrow/blob/044b418fa108a57f0b4e2e887546cc3e68271397/csharp/src/Apache.Arrow/Ipc/ArrowStreamReader.cs#L227-L228],
 which allocates another `byte[]`, and calls `Append` on it, which copies the 
values to the new `byte[]`.
 # Finally, it then calls `.Build()` on the `ArrowBuffer.Builder`, which 
[allocates memory from the MemoryPool, and then copies the intermediate 
buffer|https://github.com/apache/arrow/blob/044b418fa108a57f0b4e2e887546cc3e68271397/csharp/src/Apache.Arrow/ArrowBuffer.Builder.cs#L112-L121]
 into it.

 

We should reduce this overhead to only allocating a single time (from the 
MemoryPool), and not copying the data more times than necessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4502) [C#] Add support for zero-copy reads

2019-02-07 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-4502:
---

 Summary: [C#] Add support for zero-copy reads
 Key: ARROW-4502
 URL: https://issues.apache.org/jira/browse/ARROW-4502
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C#
Reporter: Eric Erhardt
Assignee: Eric Erhardt


In the Python (and C++) API, you can create a `RecordBatchStreamReader`, and if 
you give it an `InputStream` that supports zero-copy reads, you can get back 
`RecordBatch` objects without allocating new memory and copying all the data.

There is currently no way to read Arrow RecordBatch instances without 
allocating new memory and copying all the data. We should enable this scenario 
in the C# API.

 

My proposal is to create a new `class ArrowRecordBatchReader : IArrowReader`. 
It's constructor will take a `ReadOnlyMemory data` parameter, and it will 
be able to read `RecordBatch` instances just like the existing 
`ArrowStreamReader`. As part of this new class, we will refactor any common 
code out of `ArrowStreamReader` in order for the parsing logic to be shared, 
where necessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4435) [C#] Add .sln file and minor .csproj fix ups

2019-01-30 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-4435:
---

 Summary: [C#] Add .sln file and minor .csproj fix ups
 Key: ARROW-4435
 URL: https://issues.apache.org/jira/browse/ARROW-4435
 Project: Apache Arrow
  Issue Type: Task
  Components: C#
Reporter: Eric Erhardt


There is currently no .sln file in the repo, which makes it hard to use the src 
and test code at the same time.

 

Also, there are some settings in the .csproj that can be moved up to the outer 
PropertyGroup, and not under a "Configuration|Platform" conditional, like they 
were in the old .csproj format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)