[jira] [Created] (ARROW-10386) spatial (sf geometry) data does not roundtrip correctly (attributes lost)

2020-10-24 Thread Petr Bouchal (Jira)
Petr Bouchal created ARROW-10386:


 Summary: spatial (sf geometry) data does not roundtrip correctly 
(attributes lost)
 Key: ARROW-10386
 URL: https://issues.apache.org/jira/browse/ARROW-10386
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Affects Versions: 2.0.0
 Environment: Mac OS 10.15.7
R 4.0.2
arrow 2.0
sf 0.9-6
Reporter: Petr Bouchal


Hi all - thanks for the improvement addressed in ARROW-9271.

In arrow 2.0 spatial data (class sf) now retains metadata at column level, but 
still does not roundtrip correctly as metadata (attributes) are lost at the 
level of individual elements of the list-columns; at least I think that is the 
problem as that is where I can see changes in the metadata.) Is this something 
that is addressable?

See reprex below on what happens + what attributes exist at the element level.

FWIW a workaround with spatial data using sf would be to convert to WKT before 
writing it out (sf::st_as_text()). It might be useful to note this somewhere in 
the docs.

This is using arrow 2.0 and sf 0.9-6.

Reproducible example:

{{
 ``` r
 library(arrow)
 #> 
 #> Attaching package: 'arrow'
 #> The following object is masked from 'package:utils':
 #> 
 #> timestamp
 library(sf)
 #> Linking to GEOS 3.8.1, GDAL 3.1.1, PROJ 6.3.1

fname <- system.file("shape/nc.shp", package="sf")
 df_spatial <- st_read(fname)
 #> Reading layer `nc' from data source 
`/Users/petr/Library/R/4.0/library/sf/shape/nc.shp' using driver `ESRI 
Shapefile'
 #> Simple feature collection with 100 features and 14 fields
 #> geometry type: MULTIPOLYGON
 #> dimension: XY
 #> bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
 #> geographic CRS: NAD27

write_parquet(df_spatial, "spatial.parquet")
 roundtripped <- read_parquet("spatial.parquet")
 roundtripped
 #> Simple feature collection with 100 features and 14 fields
 #> geometry type: MULTIPOLYGON
 #> dimension: arrow_list
 #> bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
 #> geographic CRS: NAD27
 #> First 10 features:
 #> Error in vapply(lst, class, rep(NA_character_, 3)): values must be length 3,
 #> but FUN(X[[1]]) result is length 1

attributes(roundtripped$geometry[[1]])
 #> $class
 #> [1] "arrow_list" "vctrs_list_of" "vctrs_vctr" "list" 
 #> 
 #> $ptype
 #> [0]>

attributes(df_spatial$geometry[[1]])
 #> $class
 #> [1] "XY" "MULTIPOLYGON" "sfg"
 ```
 }}

Created on 2020-10-24 by the [reprex 
package]([https://reprex.tidyverse.org|https://reprex.tidyverse.org/]) 
(v0.3.0)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10385) [C++][Gandiva] Add support for LLVM 11

2020-10-24 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-10385:


 Summary: [C++][Gandiva] Add support for LLVM 11
 Key: ARROW-10385
 URL: https://issues.apache.org/jira/browse/ARROW-10385
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Gandiva
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10384) [c++] Fix typos and spelling

2020-10-24 Thread Kazuaki Ishizaki (Jira)
Kazuaki Ishizaki created ARROW-10384:


 Summary: [c++] Fix typos and spelling
 Key: ARROW-10384
 URL: https://issues.apache.org/jira/browse/ARROW-10384
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 3.0.0
Reporter: Kazuaki Ishizaki


Fix typo under {{cpp}} directory



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10383) [docs] Fix typos and spelling

2020-10-24 Thread Kazuaki Ishizaki (Jira)
Kazuaki Ishizaki created ARROW-10383:


 Summary: [docs] Fix typos and spelling
 Key: ARROW-10383
 URL: https://issues.apache.org/jira/browse/ARROW-10383
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 3.0.0
Reporter: Kazuaki Ishizaki


Fix typo under {{docs}} directory



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10382) [rust] Fix typos and spelling

2020-10-24 Thread Kazuaki Ishizaki (Jira)
Kazuaki Ishizaki created ARROW-10382:


 Summary: [rust] Fix typos and spelling
 Key: ARROW-10382
 URL: https://issues.apache.org/jira/browse/ARROW-10382
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Affects Versions: 3.0.0
Reporter: Kazuaki Ishizaki


Fix typo under {{rust}} directory



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10381) [Rust] Generalize Arrow to support MergeSort

2020-10-24 Thread Jira
Jorge Leitão created ARROW-10381:


 Summary: [Rust] Generalize Arrow to support MergeSort
 Key: ARROW-10381
 URL: https://issues.apache.org/jira/browse/ARROW-10381
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Affects Versions: 3.0.0
Reporter: Jorge Leitão
Assignee: Jorge Leitão


Currently, the code to sort is centered around creating an array that can be 
sorted. This is useful for intra-array comparison, but does not allow things 
like `merge-sort`, where a comparison between two arrays (of the same data 
type).

The goal of this issue is to generalize the current code to support both.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)