[ https://issues.apache.org/jira/browse/ARROW-10853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17248666#comment-17248666 ]
Uwe Korn commented on ARROW-10853: ---------------------------------- On the Python side, we tend to work with single tables with the least amount of chunking. This gives the best performance there. Iterating over a dataset is quite uncommon, rather you typically load the data (or the largest subset you get into RAM) into memory and then work quite a while on that. This an approach that iterates small chunks is quite undesirable and would also be very slow on the Python side. Interface-wise, it would be good to have the {{(connection, query)}} and {{(connection, query, config)}} interfaces that return a populated {{VectorSchemaRoot)}}. This requires the least amount of Java-side API knowledge and serves the typical use case. > [Java] Undeprecate sqlToArrow helpers > ------------------------------------- > > Key: ARROW-10853 > URL: https://issues.apache.org/jira/browse/ARROW-10853 > Project: Apache Arrow > Issue Type: Bug > Components: Java > Affects Versions: 2.0.0 > Reporter: Uwe Korn > Assignee: Uwe Korn > Priority: Major > Fix For: 3.0.0 > > > These helper functions are really useful when called from Python as they deal > with a lot of "internals" of Java that we don't want to handle from the > Python side. We rather would keep using these functions. > Note that some of them are broken due to recent refactoring and only return > 1024 rows (the default iterator size) without the ability to change that. -- This message was sent by Atlassian Jira (v8.3.4#803005)