[ 
https://issues.apache.org/jira/browse/ARROW-17466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17646176#comment-17646176
 ] 

Dewey Dunnington commented on ARROW-17466:
------------------------------------------

Thank you for reporting this, and I'm sorry that it slipped through our issue 
triage! This is definitely something that needs to be fixed.

If you haven't arrived at this yourself yet, you can wipe the metadata before 
converting to a data.frame as a workaround:

{code:R}
library(arrow, warn.conflicts = FALSE)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for 
more information.

feather_file <- tempfile()
batch_that_will_contain_metadata <- as_arrow_table(
  data.frame(a = structure("this one thing", some_attr = "some value"))
)
batch_that_will_contain_metadata$metadata
#> $r
#> $r$attributes
#> $r$attributes$class
#> [1] "data.frame"
#> 
#> 
#> $r$columns
#> $r$columns$a
#> $r$columns$a$attributes
#> $r$columns$a$attributes$some_attr
#> [1] "some value"
#> 
#> 
#> $r$columns$a$columns
#> NULL

write_feather(batch_that_will_contain_metadata, feather_file)
df <- read_feather(feather_file)
str(df)
#> 'data.frame':    1 obs. of  1 variable:
#>  $ a: chr "this one thing"
#>   ..- attr(*, "some_attr")= chr "some value"

# if the metadata is impacting the data frame conversion, you can
# wipe it before converting
batch <- read_feather(feather_file, as_data_frame = FALSE)
batch$metadata <- NULL
df <- as.data.frame(batch)
str(df)
#> tibble [1 × 1] (S3: tbl_df/tbl/data.frame)
#>  $ a: chr "this one thing"
{code}


> valid metadata results in Invalid metadata$r warning from read_feather()
> ------------------------------------------------------------------------
>
>                 Key: ARROW-17466
>                 URL: https://issues.apache.org/jira/browse/ARROW-17466
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C#, R
>    Affects Versions: 9.0.0
>            Reporter: Todd West
>            Priority: Major
>             Fix For: 11.0.0
>
>
> I have some C# code using the Arrow 9.0.0 nuget to create record batches like
> {{Dictionary<string, string> metadata = new()}}
> {\{{}}
> {{{}    \{ "resourceUnit", "foo" }{{}}}}{{{},{}}}
> {{    // other keys...}}
> {{};}}
> {{Schema schema = new(fields, metadata);}}
> For some reason using the key "resourceUnit" results in  
> arrow::read_feather() in R failing in 
> [.deserialize_arrow_r_metadata()|https://github.com/apache/arrow/blob/master/r/R/metadata.R],
>  triggering the warning
> {{Warning message:}}
> {{Invalid metadata$r }}
> There are at least five issues here:
> 1) .deserialize_arrow_r_metadata()'s error handler swallows the actual error, 
> leaving the caller without any information as to what's breaking
> 2) The error handler commutes the error to a warning without any caller 
> control.
> 3) It's unclear why there's an R metadata deserialization path when 
> `read_feather(as_data_frame = FALSE)` deserializes the metadata without issue 
> to $metadata.
> 4) The warning is confusing as the deserialized fragment goes in $r_metadata, 
> not $r.
> 5) "resourceUnit" should be a perfectly valid UTF8 string and deserialize 
> without issue. Probing shows the "resource" bit is the problem and, if I 
> change it to something like "_esourceUnit" no error/warning occurs on 
> deserialization. I also have C# generating other feather files with 
> "resourceUnit" as the first metadata key and those files deserialize without 
> the error/warning in R. This suggests the root issue might something in the 
> direction of alignment fragility in the schema portion of the stream.
> I can't share the file publicly and the code hasn't pushed to github yet but 
> both should be available by the time someone's ready to look at this. Just 
> bump the issue and let me know.
> (I think this is a normal priority issue but normal isn't available in the 
> priority dropdown.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to