[ 
https://issues.apache.org/jira/browse/ARROW-15471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17486679#comment-17486679
 ] 

Dewey Dunnington commented on ARROW-15471:
------------------------------------------

A related issue that came up in geoarrow is that it isn't possible to restore 
field-level metadata (which we can use to make sure things like the coordinate 
reference system stay with the column when it goes through the compute engine). 
It looks like this is specifically ignored here:

https://github.com/apache/arrow/blob/master/r/R/field.R#L60

> [R] ExtensionType support in R
> ------------------------------
>
>                 Key: ARROW-15471
>                 URL: https://issues.apache.org/jira/browse/ARROW-15471
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Dewey Dunnington
>            Priority: Major
>
> In Python there is support for extension types that consists of a 
> registration step that defines functions to handle metadata serialization and 
> deserialization. In R, any extension name or metadata at the top level is 
> currently obliterated on import. To implement geometry reading and writing to 
> Parquet, IPC, and/or Feather, we will need to at the very least have the 
> extension name and metadata preserved (in R), and at best provide a 
> registration step to customize the behaviour of the resulting Array/DataType.
> Reprex for R:
> {code:R}
> # remotes::install_github("paleolimbot/narrow")
> library(narrow)
> carray <- as_narrow_array(1:5)
> carray$schema$metadata[["ARROW:extension:name"]] <- "extension name!"
> carray$schema$metadata[["ARROW:extension:metadata"]] <- "bananas"
> carray$schema$metadata[["something else"]] <- "more bananas"
> array <- from_narrow_array(carray, arrow::Array)
> carray2 <- as_narrow_array(array)
> carray2$schema$metadata[["ARROW:extension:name"]]
> #> NULL
> carray2$schema$metadata[["ARROW:extension:metadata"]]
> #> NULL
> carray2$schema$metadata[["something else"]]
> #> NULL
> {code}
> There is some discussion of that as a solution to ARROW-14378, including an 
> example of how pandas implements the 'interval' extension type (example 
> contributed by [~jorisvandenbossche]).
> For the Interval example, there are some different parts living in different 
> places:
> - The Arrow Extension Type definition for pandas' interval type: 
> https://github.com/pandas-dev/pandas/blob/fc6b441ba527ca32b460ae4f4f5a6802335497f9/pandas/core/arrays/_arrow_utils.py#L88-L136
> - The __from_arrow__ implementation (doing the conversion to arrow): 
> https://github.com/pandas-dev/pandas/blob/fc6b441ba527ca32b460ae4f4f5a6802335497f9/pandas/core/arrays/interval.py#L1405-L1455
> - The __from_arrow__ implementation (conversion arrow -> pandas): 
> https://github.com/pandas-dev/pandas/blob/fc6b441ba527ca32b460ae4f4f5a6802335497f9/pandas/core/dtypes/dtypes.py#L1227-L1255



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to