[ 
https://issues.apache.org/jira/browse/ARROW-10853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17248666#comment-17248666
 ] 

Uwe Korn commented on ARROW-10853:
----------------------------------

On the Python side, we tend to work with single tables with the least amount of 
chunking. This gives the best performance there. Iterating over a dataset is 
quite uncommon, rather you typically load the data (or the largest subset you 
get into RAM) into memory and then work quite a while on that. This an approach 
that iterates small chunks is quite undesirable and would also be very slow on 
the Python side.

Interface-wise, it would be good to have the {{(connection, query)}} and 
{{(connection, query, config)}} interfaces that return a populated 
{{VectorSchemaRoot)}}. This requires the least amount of Java-side API 
knowledge and serves the typical use case.

> [Java] Undeprecate sqlToArrow helpers
> -------------------------------------
>
>                 Key: ARROW-10853
>                 URL: https://issues.apache.org/jira/browse/ARROW-10853
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Java
>    Affects Versions: 2.0.0
>            Reporter: Uwe Korn
>            Assignee: Uwe Korn
>            Priority: Major
>             Fix For: 3.0.0
>
>
> These helper functions are really useful when called from Python as they deal 
> with a lot of "internals" of Java that we don't want to handle from the 
> Python side. We rather would keep using these functions.
> Note that some of them are broken due to recent refactoring and only return 
> 1024 rows (the default iterator size) without the ability to change that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to