westonpace commented on a change in pull request #9561:
URL: https://github.com/apache/arrow/pull/9561#discussion_r582351003



##########
File path: r/R/dataset-partition.R
##########
@@ -25,12 +25,17 @@
 #' `DirectoryPartitioning` describes how to interpret raw path segments, in
 #' order. For example, `schema(year = int16(), month = int8())` would define
 #' partitions for file paths like "2019/01/file.parquet",
-#' "2019/02/file.parquet", etc.
+#' "2019/02/file.parquet", etc. In this scheme `NULL` values will be skipped.
+#' In the previous example, if the month was `NULL`, the files would be placed

Review comment:
       It's a bit indirect but this applies for reading too.  In `open_dataset` 
there is this comment...
   
   ```
   #'   * a `HivePartitioning` or `HivePartitioningFactory`, as returned
   #'    by [hive_partition()] which parses explicit or autodetected fields from
   #'    Hive-style path segments
   ```
   
   ...and then in `hive_partition()` there is this comment 
https://github.com/apache/arrow/pull/9561/files/f042c66b1f53f6185e5a33949e9f5e148e39501a#diff-7f4a981c0f18320c1258082cf98249d3e6bc6d6f62ac561b731e724a6997c118R80
   
   I think this is ok because, unlike python, there is no option to specify a 
"flavor string" like "hive" or "directory" and the default is to use directory. 
 So if a user is working with hive partitioned data in the first place they are 
already having to use `hive_partition`.  Also, I think it will be a pretty rare 
case where the user has to specify `null_fallback` and they probably already 
have some idea of what they are looking for (since they will have custom 
configured whatever generated their data).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to