Todd West created ARROW-17466:
---------------------------------

             Summary: valid metadata results in Invalid metadata$r warning from 
read_feather()
                 Key: ARROW-17466
                 URL: https://issues.apache.org/jira/browse/ARROW-17466
             Project: Apache Arrow
          Issue Type: Bug
          Components: C#, R
    Affects Versions: 9.0.0
            Reporter: Todd West
             Fix For: 9.0.1


I have some C# code using the Arrow 9.0.0 nuget to create record batches like

{{Dictionary<string, string> metadata = new()}}
{\{{}}
{{{}    \{ "resourceUnit", "foo" }{{}}}}{{{},{}}}
{{    // other keys...}}
{{};}}
{{Schema schema = new(fields, metadata);}}

For some reason using the key "resourceUnit" results in  arrow::read_feather() 
in R failing in 
[.deserialize_arrow_r_metadata()|https://github.com/apache/arrow/blob/master/r/R/metadata.R],
 triggering the warning

{{Warning message:}}
{{Invalid metadata$r }}

There are at least five issues here:

1) .deserialize_arrow_r_metadata()'s error handler swallows the actual error, 
leaving the caller without any information as to what's breaking

2) The error handler commutes the error to a warning without any caller control.

3) It's unclear why there's an R metadata deserialization path when 
`read_feather(as_data_frame = FALSE)` deserializes the metadata without issue 
to $metadata.

4) The warning is confusing as the deserialized fragment goes in $r_metadata, 
not $r.

5) "resourceUnit" should be a perfectly valid UTF8 string and deserialize 
without issue. Probing shows the "resource" bit is the problem and, if I change 
it to something like "_esourceUnit" no error/warning occurs on deserialization. 
I also have C# generating other feather files with "resourceUnit" as the first 
metadata key and those files deserialize without the error/warning in R. This 
suggests the root issue might something in the direction of alignment fragility 
in the schema portion of the stream.

I can't share the file publicly and the code hasn't pushed to github yet but 
both should be available by the time someone's ready to look at this. Just bump 
the issue and let me know.

(I think this is a normal priority issue but normal isn't available in the 
priority dropdown.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to