[ 
https://issues.apache.org/jira/browse/ARROW-10386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Petr Bouchal updated ARROW-10386:
---------------------------------
    Description: 
Hi all - thanks for the improvement addressed in ARROW-9271.

In arrow 2.0 spatial data (class sf) now retains metadata at column level, but 
still does not roundtrip correctly as metadata (attributes) are lost at the 
level of individual elements of the list-columns; at least I think that is the 
problem as that is where I can see changes in the metadata.) Is this something 
that is addressable?

See reprex below on what happens + what attributes exist at the element level.

FWIW a workaround with spatial data using sf would be to convert to WKT before 
writing it out (sf::st_as_text()). It might be useful to note this somewhere in 
the docs.

This is using arrow 2.0 and sf 0.9-6.

Reproducible example:

{code:R}

 library(arrow)
 #> 
 #> Attaching package: 'arrow'
 #> The following object is masked from 'package:utils':
 #> 
 #> timestamp
 library(sf)
 #> Linking to GEOS 3.8.1, GDAL 3.1.1, PROJ 6.3.1

fname <- system.file("shape/nc.shp", package="sf")
 df_spatial <- st_read(fname)
 #> Reading layer `nc' from data source 
`/Users/petr/Library/R/4.0/library/sf/shape/nc.shp' using driver `ESRI 
Shapefile'
 #> Simple feature collection with 100 features and 14 fields
 #> geometry type: MULTIPOLYGON
 #> dimension: XY
 #> bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
 #> geographic CRS: NAD27

write_parquet(df_spatial, "spatial.parquet")
 roundtripped <- read_parquet("spatial.parquet")
 roundtripped
 #> Simple feature collection with 100 features and 14 fields
 #> geometry type: MULTIPOLYGON
 #> dimension: arrow_list
 #> bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
 #> geographic CRS: NAD27
 #> First 10 features:
 #> Error in vapply(lst, class, rep(NA_character_, 3)): values must be length 3,
 #> but FUN(X[[1]]) result is length 1

attributes(roundtripped$geometry[[1]])
 #> $class
 #> [1] "arrow_list" "vctrs_list_of" "vctrs_vctr" "list" 
 #> 
 #> $ptype
 #> <list<double>[0]>

attributes(df_spatial$geometry[[1]])
 #> $class
 #> [1] "XY" "MULTIPOLYGON" "sfg"
{code}

  was:
Hi all - thanks for the improvement addressed in ARROW-9271.

In arrow 2.0 spatial data (class sf) now retains metadata at column level, but 
still does not roundtrip correctly as metadata (attributes) are lost at the 
level of individual elements of the list-columns; at least I think that is the 
problem as that is where I can see changes in the metadata.) Is this something 
that is addressable?

See reprex below on what happens + what attributes exist at the element level.

FWIW a workaround with spatial data using sf would be to convert to WKT before 
writing it out (sf::st_as_text()). It might be useful to note this somewhere in 
the docs.

This is using arrow 2.0 and sf 0.9-6.

Reproducible example:

``` r
 library(arrow)
 #> 
 #> Attaching package: 'arrow'
 #> The following object is masked from 'package:utils':
 #> 
 #> timestamp
 library(sf)
 #> Linking to GEOS 3.8.1, GDAL 3.1.1, PROJ 6.3.1

fname <- system.file("shape/nc.shp", package="sf")
 df_spatial <- st_read(fname)
 #> Reading layer `nc' from data source 
`/Users/petr/Library/R/4.0/library/sf/shape/nc.shp' using driver `ESRI 
Shapefile'
 #> Simple feature collection with 100 features and 14 fields
 #> geometry type: MULTIPOLYGON
 #> dimension: XY
 #> bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
 #> geographic CRS: NAD27

write_parquet(df_spatial, "spatial.parquet")
 roundtripped <- read_parquet("spatial.parquet")
 roundtripped
 #> Simple feature collection with 100 features and 14 fields
 #> geometry type: MULTIPOLYGON
 #> dimension: arrow_list
 #> bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
 #> geographic CRS: NAD27
 #> First 10 features:
 #> Error in vapply(lst, class, rep(NA_character_, 3)): values must be length 3,
 #> but FUN(X[[1]]) result is length 1

attributes(roundtripped$geometry[[1]])
 #> $class
 #> [1] "arrow_list" "vctrs_list_of" "vctrs_vctr" "list" 
 #> 
 #> $ptype
 #> <list<double>[0]>

attributes(df_spatial$geometry[[1]])
 #> $class
 #> [1] "XY" "MULTIPOLYGON" "sfg"
 ```


> Spatial (sf geometry) data does not roundtrip correctly (attributes lost)
> -------------------------------------------------------------------------
>
>                 Key: ARROW-10386
>                 URL: https://issues.apache.org/jira/browse/ARROW-10386
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>    Affects Versions: 2.0.0
>         Environment: Mac OS 10.15.7
> R 4.0.2
> arrow 2.0
> sf 0.9-6
>            Reporter: Petr Bouchal
>            Priority: Major
>
> Hi all - thanks for the improvement addressed in ARROW-9271.
> In arrow 2.0 spatial data (class sf) now retains metadata at column level, 
> but still does not roundtrip correctly as metadata (attributes) are lost at 
> the level of individual elements of the list-columns; at least I think that 
> is the problem as that is where I can see changes in the metadata.) Is this 
> something that is addressable?
> See reprex below on what happens + what attributes exist at the element level.
> FWIW a workaround with spatial data using sf would be to convert to WKT 
> before writing it out (sf::st_as_text()). It might be useful to note this 
> somewhere in the docs.
> This is using arrow 2.0 and sf 0.9-6.
> Reproducible example:
> {code:R}
>  library(arrow)
>  #> 
>  #> Attaching package: 'arrow'
>  #> The following object is masked from 'package:utils':
>  #> 
>  #> timestamp
>  library(sf)
>  #> Linking to GEOS 3.8.1, GDAL 3.1.1, PROJ 6.3.1
> fname <- system.file("shape/nc.shp", package="sf")
>  df_spatial <- st_read(fname)
>  #> Reading layer `nc' from data source 
> `/Users/petr/Library/R/4.0/library/sf/shape/nc.shp' using driver `ESRI 
> Shapefile'
>  #> Simple feature collection with 100 features and 14 fields
>  #> geometry type: MULTIPOLYGON
>  #> dimension: XY
>  #> bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
>  #> geographic CRS: NAD27
> write_parquet(df_spatial, "spatial.parquet")
>  roundtripped <- read_parquet("spatial.parquet")
>  roundtripped
>  #> Simple feature collection with 100 features and 14 fields
>  #> geometry type: MULTIPOLYGON
>  #> dimension: arrow_list
>  #> bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
>  #> geographic CRS: NAD27
>  #> First 10 features:
>  #> Error in vapply(lst, class, rep(NA_character_, 3)): values must be length 
> 3,
>  #> but FUN(X[[1]]) result is length 1
> attributes(roundtripped$geometry[[1]])
>  #> $class
>  #> [1] "arrow_list" "vctrs_list_of" "vctrs_vctr" "list" 
>  #> 
>  #> $ptype
>  #> <list<double>[0]>
> attributes(df_spatial$geometry[[1]])
>  #> $class
>  #> [1] "XY" "MULTIPOLYGON" "sfg"
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to