Kirill Müller created ARROW-17886:
-------------------------------------

             Summary: [R] Convert schema to the corresponding ptype (zero-row 
data frame)?
                 Key: ARROW-17886
                 URL: https://issues.apache.org/jira/browse/ARROW-17886
             Project: Apache Arrow
          Issue Type: Improvement
          Components: R
            Reporter: Kirill Müller


When fetching data e.g. from a RecordBatchReader, I would like to know, ahead 
of time, what the data will look like after it's converted to a data frame. I 
have found a way using utils::head(0), but I'm not sure if it's efficient in 
all scenarios.

My use case is the Arrow extension to DBI, in particular the default 
implementation for drivers that don't speak Arrow yet. I'd like to know which 
types the columns should have on the database. I can already infer this from 
the corresponding R types, but those existing drivers don't know about Arrow 
types.

Should we support as.data.frame() for schema objects? The semantics would be to 
return a zero-row data frame with correct column names and types.


library(arrow)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for 
more information.
#> 
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#> 
#>     timestamp

data <- data.frame(
  a = 1:3,
  b = 2.5,
  c = "three",
  stringsAsFactors = FALSE
)
data$d <- blob::blob(as.raw(1:10))

tbl <- arrow::as_arrow_table(data)
rbr <- arrow::as_record_batch_reader(tbl)

tibble::as_tibble(head(rbr, 0))
#> # A tibble: 0 × 4
#> # … with 4 variables: a <int>, b <dbl>, c <chr>, d <blob>
rbr$read_table()
#> Table
#> 3 rows x 4 columns
#> $a <int32>
#> $b <double>
#> $c <string>
#> $d <<blob[0]>>
#> 
#> See $metadata for additional Schema metadata



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to