[ 
https://issues.apache.org/jira/browse/ARROW-11782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17291706#comment-17291706
 ] 

Neal Richardson commented on ARROW-11782:
-----------------------------------------

I would love to delete ScanTask from the R bindings. The reason they're exposed 
there is to support a (hacky, experimental) attempt to do computations on the 
stream of record batches so that it's possible to compute things that we 
couldn't do otherwise because we can't hold the whole Table in memory. So 
Scanner::ToBatches doesn't work in that case because everything would be 
materialized.

What I _really_ want is to be able to essentially pass a function/lambda to 
something like ToTable or ToBatches and have that function be applied to every 
record batch in the stream. I don't want to manage consuming the 
ScanTasks/RecordBatchIterators, I'd prefer to have the C++ library handle that. 
(In my current hacky use of ScanTasks, it's actually prohibitively slow because 
it has to consume the iterators single-threaded.)

> [GLib][Dataset] Remove bindings for internal classes
> ----------------------------------------------------
>
>                 Key: ARROW-11782
>                 URL: https://issues.apache.org/jira/browse/ARROW-11782
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: GLib
>    Affects Versions: 3.0.0
>            Reporter: Ben Kietzman
>            Priority: Major
>             Fix For: 4.0.0
>
>
> GLib and ruby include bindings for internal classes such as ScanOptions, 
> ScanContext, InMemoryScanTask, ScanTask, ... These are probably unnecessary 
> and should be removed to present a simpler interface less prone to breakage 
> under refactoring of the wrapped classes 
> https://github.com/apache/arrow/pull/9532/checks?check_run_id=1974229719#step:8:2071



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to