jonkeane commented on a change in pull request #9748: URL: https://github.com/apache/arrow/pull/9748#discussion_r697740009
########## File path: r/R/dataset-write.R ########## @@ -54,6 +54,43 @@ #' - `null_fallback`: character to be used in place of missing values (`NA` or #' `NULL`) when using Hive-style partitioning. See [hive_partition()]. #' @return The input `dataset`, invisibly +#' @examplesIf arrow_with_dataset() & arrow_with_parquet() & requireNamespace("dplyr", quietly = TRUE) +#' # You can write datasets partitioned by the values in a column (here: "cyl"). +#' # This creates a structure of the form cyl=X/part-Z.parquet. +#' one_level_tree <- tempfile() +#' write_dataset(mtcars, one_level_tree, partitioning = "cyl") +#' list.files(one_level_tree, recursive = TRUE) +#' +#' # You can also partition by the values in multiple columns +#' # (here: "cyl" and "gear"). +#' # This creates a structure of the form cyl=X/gear=Y/part-Z.parquet. +#' two_levels_tree <- tempfile() +#' write_dataset(mtcars, two_levels_tree, partitioning = c("cyl", "gear")) +#' list.files(two_levels_tree, recursive = TRUE) +#' +#' # In the two previous examples we would have: +#' # X = \{4,6,8\}, the number of cylinders. +#' # Y = \{3,4,5\}, the number of forward gears. +#' # Z = \{0,1,2\}, the number of saved parts, starting from 0. +#' +#' # You can obtain the same result as as the previous examples using arrow with +#' # a dplyr pipeline. This will be the same as two_levels_tree above, but the +#' # output directory will be different. +#' two_levels_tree_2 <- tempfile() +#' mtcars %>% +#' group_by(cyl, gear) %>% +#' write_dataset(two_levels_tree_2) +#' list.files(two_levels_tree_2, recursive = TRUE) +#' +#' # And you can also turn off the Hive-style directory naming where the column +#' # name is included with the values by using `hive_style = FALSE`. +#' +#' # Write a structure X/Y/part-Z.parquet. +#' two_levels_tree_no_hive <- tempfile() +#' mtcars %>% +#' group_by(cyl, gear) %>% +#' write_dataset(two_levels_tree_no_hive, hive_style = FALSE) Review comment: ```suggestion #' mtcars %>% #' group_by(cyl, gear) %>% #' write_dataset(two_levels_tree_no_hive, hive_style = FALSE) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org