[arrow] branch master updated: MINOR: [R] remove duplication about hive-style file paths

thisisnic Tue, 16 Aug 2022 03:25:37 -0700

This is an automated email from the ASF dual-hosted git repository.

thisisnic pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git



The following commit(s) were added to refs/heads/master by this push:
     new d880d7517a MINOR: [R] remove duplication about hive-style file paths
d880d7517a is described below

commit d880d7517a33f2ac8ff259cad711bc210fd570c5
Author: François Michonneau <[email protected]>
AuthorDate: Tue Aug 16 11:24:32 2022 +0100

    MINOR: [R] remove duplication about hive-style file paths
    
    Reading the vignette about the datasets, it seems that the part about 
having self-describing file paths is repeated.
    
    This PR removes the second time this is mentioned and adds the link to the 
Hive project when it's first mentioned.
    
    Another small detail is that the months in the dataset (at least in the S3 
bucket) use a single digit (e.g., `1` for January) while in the section removed 
by this PR they are listed with 2 digits (`01` for January).
    
    Closes #13844 from fmichonneau/rm-hive-duplication
    
    Authored-by: François Michonneau <[email protected]>
    Signed-off-by: Nic Crane <[email protected]>
---
 r/vignettes/dataset.Rmd | 16 +---------------
 1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/r/vignettes/dataset.Rmd b/r/vignettes/dataset.Rmd
index 1a969f979c..0890d36ff4 100644
--- a/r/vignettes/dataset.Rmd
+++ b/r/vignettes/dataset.Rmd
@@ -126,7 +126,7 @@ For more information on the usage of these parameters, see 
`?read_delim_arrow()`
 
 `open_dataset()` was able to automatically infer column values for `year` and 
`month`
 --which are not present in the data files--based on the directory structure. 
The 
-Hive-style partitioning structure is self-describing, with file paths like
+[Hive](https://hive.apache.org/)-style partitioning structure is 
self-describing, with file paths like
 
 ```
 year=2009/month=1/data.parquet
@@ -185,20 +185,6 @@ month: int32
 ")
 ```
 
-The other form of partitioning currently supported is 
[Hive](https://hive.apache.org/)-style,
-in which the partition variable names are included in the path segments.
-If you had saved your files in paths like:
-
-```
-year=2009/month=01/data.parquet
-year=2009/month=02/data.parquet
-...
-```
-
-you would not have had to provide the names in `partitioning`;
-you could have just called `ds <- open_dataset("nyc-taxi")` and the partitions
-would have been detected automatically.
-
 ## Querying the dataset
 
 Up to this point, you haven't loaded any data. You've walked directories to 
find

[arrow] branch master updated: MINOR: [R] remove duplication about hive-style file paths

Reply via email to