This is an automated email from the ASF dual-hosted git repository.
thisisnic pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/master by this push:
new d880d7517a MINOR: [R] remove duplication about hive-style file paths
d880d7517a is described below
commit d880d7517a33f2ac8ff259cad711bc210fd570c5
Author: François Michonneau <[email protected]>
AuthorDate: Tue Aug 16 11:24:32 2022 +0100
MINOR: [R] remove duplication about hive-style file paths
Reading the vignette about the datasets, it seems that the part about
having self-describing file paths is repeated.
This PR removes the second time this is mentioned and adds the link to the
Hive project when it's first mentioned.
Another small detail is that the months in the dataset (at least in the S3
bucket) use a single digit (e.g., `1` for January) while in the section removed
by this PR they are listed with 2 digits (`01` for January).
Closes #13844 from fmichonneau/rm-hive-duplication
Authored-by: François Michonneau <[email protected]>
Signed-off-by: Nic Crane <[email protected]>
---
r/vignettes/dataset.Rmd | 16 +---------------
1 file changed, 1 insertion(+), 15 deletions(-)
diff --git a/r/vignettes/dataset.Rmd b/r/vignettes/dataset.Rmd
index 1a969f979c..0890d36ff4 100644
--- a/r/vignettes/dataset.Rmd
+++ b/r/vignettes/dataset.Rmd
@@ -126,7 +126,7 @@ For more information on the usage of these parameters, see
`?read_delim_arrow()`
`open_dataset()` was able to automatically infer column values for `year` and
`month`
--which are not present in the data files--based on the directory structure.
The
-Hive-style partitioning structure is self-describing, with file paths like
+[Hive](https://hive.apache.org/)-style partitioning structure is
self-describing, with file paths like
```
year=2009/month=1/data.parquet
@@ -185,20 +185,6 @@ month: int32
")
```
-The other form of partitioning currently supported is
[Hive](https://hive.apache.org/)-style,
-in which the partition variable names are included in the path segments.
-If you had saved your files in paths like:
-
-```
-year=2009/month=01/data.parquet
-year=2009/month=02/data.parquet
-...
-```
-
-you would not have had to provide the names in `partitioning`;
-you could have just called `ds <- open_dataset("nyc-taxi")` and the partitions
-would have been detected automatically.
-
## Querying the dataset
Up to this point, you haven't loaded any data. You've walked directories to
find