Eric Erhardt created ARROW-5887: ----------------------------------- Summary: [C#] ArrowStreamWriter writes FieldNodes in wrong order Key: ARROW-5887 URL: https://issues.apache.org/jira/browse/ARROW-5887 Project: Apache Arrow Issue Type: Bug Components: C# Reporter: Eric Erhardt Assignee: Eric Erhardt
When ArrowStreamWriter is writing a {{RecordBatch}} with {{null}}s in it, it is mixing up the column's {{NullCount}}. You can see here: [https://github.com/apache/arrow/blob/90affbd2c41e80aa8c3fac1e4dbff60aafb415d3/csharp/src/Apache.Arrow/Ipc/ArrowStreamWriter.cs#L195-L200] It is writing the fields from {{0}} -> {{fieldCount}} order. But then [lower|https://github.com/apache/arrow/blob/90affbd2c41e80aa8c3fac1e4dbff60aafb415d3/csharp/src/Apache.Arrow/Ipc/ArrowStreamWriter.cs#L216-L220], it is writing the fields from {{fieldCount}} -> {{0}}. Looking at the [Java implementation|https://github.com/apache/arrow/blob/7b2d68570b4336308c52081a0349675e488caf11/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/FBSerializables.java#L36-L44] it says {quote}// struct vectors have to be created in reverse order {quote} A simple test of roundtripping the following RecordBatch shows the issue: {code:java} var result = new RecordBatch( new Schema.Builder() .Field(f => f.Name("age").DataType(Int32Type.Default)) .Field(f => f.Name("CharCount").DataType(Int32Type.Default)) .Build(), new IArrowArray[] { new Int32Array( new ArrowBuffer.Builder<int>().Append(0).Build(), new ArrowBuffer.Builder<byte>().Append(0).Build(), length: 1, nullCount: 1, offset: 0), new Int32Array( new ArrowBuffer.Builder<int>().Append(7).Build(), ArrowBuffer.Empty, length: 1, nullCount: 0, offset: 0) }, length: 1); {code} Here, the "age" column should have a `null` in it. However, when you write and read this RecordBatch back, you see that the "CharCount" column has `NullCount` == 1 and "age" column has `NullCount` == 0. -- This message was sent by Atlassian JIRA (v7.6.3#76005)