paleolimbot opened a new issue, #66: URL: https://github.com/apache/arrow-nanoarrow/issues/66
After #65 we have built-in conversions for most Arrow types, including arbitrarily recursive nested struct and list types. There are a few rough edges remaining: - Converting streams with a fixed size (used in GDAL layer conversion, where the number of features is frequently known in advance) will fail for custom `to` targets (see reprex below) - Extension types just strip the extension type and convert the storage. This probably needs a registration step. - Converting streams with an unknown size currently falls back on a very slow "collect + rbind" approach. There should be a way to either implement growables or ALTREP + chunking to prevent two copies of the data + the slow rbind call. Reprex for extension types and S3 `convert_array()` methods: ``` r library(nanoarrow) # Extension types are not really supported ext_array <- as_nanoarrow_array( arrow::vctrs_extension_array(1:5) ) convert_array(ext_array) #> Warning in convert_array.default(ext_array): Converting unknown extension #> arrow.r.vctrs{int32} as storage type #> [1] 1 2 3 4 5 # Extensible targets are supported almost everywhere convert_array.some_custom_vctr <- function(array, to, ...) { vctrs::new_vctr(convert_array(array), class = "some_custom_vctr") } some_custom_vctr <- function() { vctrs::new_vctr(integer(), class = "some_custom_vctr") } array <- as_nanoarrow_array(1:10) struct_array <- as_nanoarrow_array(data.frame(x = 1:10)) convert_array(array, some_custom_vctr()) #> <some_custom_vctr[10]> #> [1] 1 2 3 4 5 6 7 8 9 10 convert_array(struct_array, tibble::tibble(x = some_custom_vctr())) #> # A tibble: 10 × 1 #> x #> <sm_cstm_> #> 1 1 #> 2 2 #> 3 3 #> 4 4 #> 5 5 #> 6 6 #> 7 7 #> 8 8 #> 9 9 #> 10 10 convert_array_stream( as_nanoarrow_array_stream(data.frame(x = 1:10)), tibble::tibble(x = some_custom_vctr()) ) #> # A tibble: 10 × 1 #> x #> <sm_cstm_> #> 1 1 #> 2 2 #> 3 3 #> 4 4 #> 5 5 #> 6 6 #> 7 7 #> 8 8 #> 9 9 #> 10 10 # ...except the version that materializes a stream to a known size convert_array_stream( as_nanoarrow_array_stream(data.frame(x = 1:10)), tibble::tibble(x = some_custom_vctr()), size = 10 ) #> Error in convert_array_stream(as_nanoarrow_array_stream(data.frame(x = 1:10)), : Expected to materialize 10 values in batch 1 but materialized 0 ``` Reprex for a stream conversion that would benefit from a a better approach than `rbind()`: ``` r library(nanoarrow) reader <- arrow::RecordBatchReader$create( arrow::record_batch(x = letters), arrow::record_batch(x = LETTERS) ) str(convert_array_stream(as_nanoarrow_array_stream(reader))) #> 'data.frame': 52 obs. of 1 variable: #> $ x: chr "a" "b" "c" "d" ... ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org