[ 
https://issues.apache.org/jira/browse/ARROW-15271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479636#comment-17479636
 ] 

Dewey Dunnington commented on ARROW-15271:
------------------------------------------

Just collecting a few related code comments here:

- 
https://github.com/apache/arrow/blob/03219e21b42f17294fba3b3d2b22a9117fe0f080/r/R/dataset-scan.R#L89
- 
https://github.com/apache/arrow/blob/03219e21b42f17294fba3b3d2b22a9117fe0f080/r/R/query-engine.R#L23-L26
- 
https://github.com/apache/arrow/blob/03219e21b42f17294fba3b3d2b22a9117fe0f080/r/R/dataset-scan.R#L184

Related is the ability to write files directly in a query plan using the 
{{WriteNode}} that was added in ARROW-13542. For example, there is a ticket 
open for using the {{WriteNode}} to write data sets (ARROW-14266). Writing 
files is useful but perhaps orthogonal to the ability to iterate over a 
{{RecordBatchReader}}, which is exemplified by the revamped {{map_batches()}} + 
vignette addition.

> [R] Refactor do_exec_plan to return a RecordBatchReader
> -------------------------------------------------------
>
>                 Key: ARROW-15271
>                 URL: https://issues.apache.org/jira/browse/ARROW-15271
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>    Affects Versions: 6.0.1
>            Reporter: Will Jones
>            Priority: Major
>
> Right now 
> [{{do_exec_plan}}|https://github.com/apache/arrow/blob/master/r/R/query-engine.R#L18]
>  returns an Arrow table because {{head}}, {{tail}}, and {{arrange}} do. If 
> ARROW-14289 is completed and similar work is done for {{arrange}}, we may be 
> able to alter {{do_exec_plan}} to return a RBR instead.
> The {{map_batches()}} implementation (ARROW-14029) could benefit from this 
> refactor. And it might make ARROW-15040 more useful.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to