[
https://issues.apache.org/jira/browse/ARROW-10853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17248666#comment-17248666
]
Uwe Korn commented on ARROW-10853:
----------------------------------
On the Python side, we tend to work with single tables with the least amount of
chunking. This gives the best performance there. Iterating over a dataset is
quite uncommon, rather you typically load the data (or the largest subset you
get into RAM) into memory and then work quite a while on that. This an approach
that iterates small chunks is quite undesirable and would also be very slow on
the Python side.
Interface-wise, it would be good to have the {{(connection, query)}} and
{{(connection, query, config)}} interfaces that return a populated
{{VectorSchemaRoot)}}. This requires the least amount of Java-side API
knowledge and serves the typical use case.
> [Java] Undeprecate sqlToArrow helpers
> -------------------------------------
>
> Key: ARROW-10853
> URL: https://issues.apache.org/jira/browse/ARROW-10853
> Project: Apache Arrow
> Issue Type: Bug
> Components: Java
> Affects Versions: 2.0.0
> Reporter: Uwe Korn
> Assignee: Uwe Korn
> Priority: Major
> Fix For: 3.0.0
>
>
> These helper functions are really useful when called from Python as they deal
> with a lot of "internals" of Java that we don't want to handle from the
> Python side. We rather would keep using these functions.
> Note that some of them are broken due to recent refactoring and only return
> 1024 rows (the default iterator size) without the ability to change that.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)