[jira] [Commented] (ARROW-15317) [R] Expose API to create Dataset from Fragments

Dewey Dunnington (Jira) Mon, 31 Jan 2022 06:26:35 -0800


    [ 
https://issues.apache.org/jira/browse/ARROW-15317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17484714#comment-17484714
 ]


Dewey Dunnington commented on ARROW-15317:
------------------------------------------

If I'm reading this correctly, this sounds useful for making an abstraction 
around arbitrary file formats (I'm thinking things like some geospatial formats 
like shapefiles here) in addition to the ones you listed above!

Where this is tested in Python: 
https://github.com/apache/arrow/blob/ad073b7c0fec80ce88aaf1e7d6a78104711952f2/python/pyarrow/tests/test_dataset.py#L788-L804

> [R] Expose API to create Dataset from Fragments
> -----------------------------------------------
>
>                 Key: ARROW-15317
>                 URL: https://issues.apache.org/jira/browse/ARROW-15317
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>    Affects Versions: 6.0.1
>            Reporter: Will Jones
>            Priority: Minor
>
> Third-party packages may define dataset factories for table formats like 
> Delta Lake and Apache Iceberg. These formats store metadata like schema, file 
> lists, and file-level statistics on the side, and can construct a dataset 
> without a discovery process needed. Python exposed enough API to do this 
> successfully for [a Delta Lake dataset reader 
> here|https://github.com/delta-io/delta-rs/blob/6a8195d6e3cbdcb0c58a14a3ffccc472dd094de0/python/deltalake/table.py#L267-L280].
> I propose adding the following to the R API:
>  * Expose {{Fragment}} as an R6 object
>  * Add the {{MakeFragment}} method to various file format objects. It's key 
> that {{partition_expression}} is included as an argument. ([See Python 
> equivalent 
> here|https://github.com/apache/arrow/blob/ab86daf3f7c8a67bee6a175a749575fd40417d27/python/pyarrow/_dataset_parquet.pyx#L209-L210])
>  * Add a dataset constructor that takes a list of {{Fragments}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (ARROW-15317) [R] Expose API to create Dataset from Fragments

Reply via email to