Kirill Müller created ARROW-17886: ------------------------------------- Summary: [R] Convert schema to the corresponding ptype (zero-row data frame)? Key: ARROW-17886 URL: https://issues.apache.org/jira/browse/ARROW-17886 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Kirill Müller
When fetching data e.g. from a RecordBatchReader, I would like to know, ahead of time, what the data will look like after it's converted to a data frame. I have found a way using utils::head(0), but I'm not sure if it's efficient in all scenarios. My use case is the Arrow extension to DBI, in particular the default implementation for drivers that don't speak Arrow yet. I'd like to know which types the columns should have on the database. I can already infer this from the corresponding R types, but those existing drivers don't know about Arrow types. Should we support as.data.frame() for schema objects? The semantics would be to return a zero-row data frame with correct column names and types. library(arrow) #> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information. #> #> Attaching package: 'arrow' #> The following object is masked from 'package:utils': #> #> timestamp data <- data.frame( a = 1:3, b = 2.5, c = "three", stringsAsFactors = FALSE ) data$d <- blob::blob(as.raw(1:10)) tbl <- arrow::as_arrow_table(data) rbr <- arrow::as_record_batch_reader(tbl) tibble::as_tibble(head(rbr, 0)) #> # A tibble: 0 × 4 #> # … with 4 variables: a <int>, b <dbl>, c <chr>, d <blob> rbr$read_table() #> Table #> 3 rows x 4 columns #> $a <int32> #> $b <double> #> $c <string> #> $d <<blob[0]>> #> #> See $metadata for additional Schema metadata -- This message was sent by Atlassian Jira (v8.20.10#820010)