[ https://issues.apache.org/jira/browse/ARROW-16575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17566324#comment-17566324 ]
Neal Richardson commented on ARROW-16575: ----------------------------------------- This matches my expectations. write_dataset also won't write files for partitions that don't exist either. If you want a file/dataset with 0 rows and just the schema, you can use the single file writer, write_feather: {code} > write_feather(cars[cars$speed > 1000, ], "test.arrow") > read_feather("test.arrow", as_data_frame=FALSE) Table 0 rows x 2 columns $speed <double> $dist <double> See $metadata for additional Schema metadata {code} > [R] arrow::write_dataset() does nothing with 0 row dataframes in R > ------------------------------------------------------------------ > > Key: ARROW-16575 > URL: https://issues.apache.org/jira/browse/ARROW-16575 > Project: Apache Arrow > Issue Type: Improvement > Components: R > Environment: Mac OS 12.3, R 4.1 > Reporter: Adam Black > Priority: Minor > > In R a dataframe can have 0 rows. It still has column names and types. > > Expected behavior of arrow::write_dataset > I would expect that it would be possible to have a FileSystemDataset with > zero rows that would contain metadata about the column names and types. > arrow::write_dataset would create the FileSystemDataset metadata when given a > dataframe with zero rows. > > Actual behavior > arrow::write_dataset() does nothing when passed a dataframe with zero rows. > > Reproducible example using the current arrow package on CRAN > {code:java} > arrow::write_dataset(cars, here::here("cars")) > arrow::open_dataset(here::here("cars")) > #> FileSystemDataset with 1 Parquet file > #> speed: double > #> dist: double > #> > #> See $metadata for additional Schema metadata > file.exists(here::here("cars")) > #> [1] TRUE > df <- cars[cars$speed > 1000, ] > nrow(df) > #> [1] 0 > arrow::write_dataset(df, here::here("df"), format = "feather") > arrow::open_dataset(here::here("df")) > #> Error: IOError: Cannot list directory > '/private/var/folders/xx/01v98b6546ldnm1rg1_bvk000000gn/T/RtmpGkX0gK/reprex-17c305ed29ad5-nerdy-ram/df'. > Detail: [errno 2] No such file or directory > file.exists(here::here("df")) > #> [1] FALSE{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)