paleolimbot commented on PR #39302:
URL: https://github.com/apache/arrow/pull/39302#issuecomment-1934467394
A few outstanding issues:
I'm not sure how to skip based on type in Archery. nanoarrow doesn't support
the new types yet, so it doesn't add them in integration testing. There must be
an example of this but I can't seem to find it!
```
FAILED TEST: run_end_encoded C++ producing, nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed:
Unsupported Type name: 'runendencoded'
FAILED TEST: binary_view C++ producing, nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed:
Unsupported Type name: 'binaryview'
FAILED TEST: list_view C++ producing, nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed:
Unsupported Type name: 'listview'
FAILED TEST: run_end_encoded C++ producing, nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed:
Unsupported Type name: 'runendencoded'
```
C# appears to export record batches as *nullable* structs, whereas nanoarrow
expects them to be *non-nullable*. In nanoarrow I should probably just ignore
the difference for a top-level "batch".
```
FAILED TEST: recursive_nested C# producing, nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Found 1
differences:
Path:
- .flags: 2
+ .flags: 0
```
Java's metadata probably goes through a hash map or something because it
looks like the order is not always maintained. We can relax the comparison to
consider all keys/values of the metadata as a whole:
```
FAILED TEST: custom_metadata Java producing, nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Found 2
differences:
Path: .children[1]
- {"name": "lots_of_meta", "nullable": true, "type": {"name": "int",
"bitWidth": 8, "isSigned": true}, "children": [], "metadata": [{"key": "..",
"value": "{}"}, {"key": "a", "value": "{}"}, {"key": "b", "value": "{}"},
{"key": "c", "value": "{}"}, {"key": "d", "value": "{}"}, {"key": "w", "value":
"{}"}, {"key": "x", "value": "{}"}, {"key": "y", "value": "{}"}, {"key": "z",
"value": "{}"}]}
+ {"name": "lots_of_meta", "nullable": true, "type": {"name": "int",
"bitWidth": 8, "isSigned": true}, "children": [], "metadata": [{"key": "a",
"value": "{}"}, {"key": "b", "value": "{}"}, {"key": "c", "value": "{}"},
{"key": "d", "value": "{}"}, {"key": "..", "value": "{}"}, {"key": "w",
"value": "{}"}, {"key": "x", "value": "{}"}, {"key": "y", "value": "{}"},
{"key": "z", "value": "{}"}]}
```
It looks like Go exports zero-length metadata (i.e., b"\x00\x00\x00\x00")
instead of NULL metadata. We can relax that check in the comparison.
```
FAILED TEST: primitive_no_batches Go producing, nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Found 31
differences:
Path: .children[0]
- {"name": "bool_nullable", "nullable": true, "type": {"name": "bool"},
"children": [], "metadata": []}
+ {"name": "bool_nullable", "nullable": true, "type": {"name": "bool"},
"children": []}
```
It looks like C# has some issues with the arrays produced by nanoarrow (or
at least by the JSON reader):
```
FAILED TEST: nested_dictionary nanoarrow producing, C# consuming
<class 'Xunit.Sdk.TrueException'>: Validity buffers do not match.
at Xunit.Assert.True(Nullable`1 condition, String userMessage) in
/_/src/xunit.assert/Asserts/BooleanAsserts.cs:line 146
at
Apache.Arrow.Tests.ArrowReaderVerifier.ArrayComparer.CompareValidityBuffer(Int32
nullCount, Int32 arrayLength, ArrowBuffer expectedValidityBuffer, ArrowBuffer
actualValidityBuffer) in
/arrow/csharp/test/Apache.Arrow.Tests/ArrowReaderVerifier.cs:line 435
```
...and C# reports a memory leak. I would have assumed that a memory leak
would have been consistent between languages so I'm puzzled by this one.
```
FAILED TEST: extension nanoarrow producing, C# consuming
<class 'RuntimeError'>: Memory was not released correctly after roundtrip:
before = 694, after = 734 (should have been equal)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]