[jira] [Commented] (ARROW-10853) [Java] Undeprecate sqlToArrow helpers

2020-12-13 Thread Liya Fan (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248739#comment-17248739
 ] 

Liya Fan commented on ARROW-10853:
--

[~uwe] Thanks a lot for your feedback.
By setting JdbcToArrowConfig#targetBatchSize to NO_LIMIT_BATCH_SIZE (-1), we 
get everything in a single batch. So there is no need to iterate over the 
dataset. (However, the caller must make sure there is enough memory to avoid 
OOM).

For the interface issue, currently the process works in two steps:

1. {{(connection, query) -> result set}}, and 
2. {{(result set, config) -> VectorSchemaRoot}}. 

The above two steps are separate, primarily because they are independent 
processes. In particular, the functionality of step 2 is provided by Arrow, 
whereas step 1 does not have much to do with Arrow.

> [Java] Undeprecate sqlToArrow helpers
> -
>
> Key: ARROW-10853
> URL: https://issues.apache.org/jira/browse/ARROW-10853
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 2.0.0
>Reporter: Uwe Korn
>Assignee: Uwe Korn
>Priority: Major
> Fix For: 3.0.0
>
>
> These helper functions are really useful when called from Python as they deal 
> with a lot of "internals" of Java that we don't want to handle from the 
> Python side. We rather would keep using these functions.
> Note that some of them are broken due to recent refactoring and only return 
> 1024 rows (the default iterator size) without the ability to change that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-10853) [Java] Undeprecate sqlToArrow helpers

2020-12-13 Thread Uwe Korn (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248666#comment-17248666
 ] 

Uwe Korn commented on ARROW-10853:
--

On the Python side, we tend to work with single tables with the least amount of 
chunking. This gives the best performance there. Iterating over a dataset is 
quite uncommon, rather you typically load the data (or the largest subset you 
get into RAM) into memory and then work quite a while on that. This an approach 
that iterates small chunks is quite undesirable and would also be very slow on 
the Python side.

Interface-wise, it would be good to have the {{(connection, query)}} and 
{{(connection, query, config)}} interfaces that return a populated 
{{VectorSchemaRoot)}}. This requires the least amount of Java-side API 
knowledge and serves the typical use case.

> [Java] Undeprecate sqlToArrow helpers
> -
>
> Key: ARROW-10853
> URL: https://issues.apache.org/jira/browse/ARROW-10853
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 2.0.0
>Reporter: Uwe Korn
>Assignee: Uwe Korn
>Priority: Major
> Fix For: 3.0.0
>
>
> These helper functions are really useful when called from Python as they deal 
> with a lot of "internals" of Java that we don't want to handle from the 
> Python side. We rather would keep using these functions.
> Note that some of them are broken due to recent refactoring and only return 
> 1024 rows (the default iterator size) without the ability to change that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-10853) [Java] Undeprecate sqlToArrow helpers

2020-12-08 Thread Micah Kornfield (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246285#comment-17246285
 ] 

Micah Kornfield commented on ARROW-10853:
-

The reason for deprecation was the preference for small fixed size batches 
which aligns with Java allocator.  I guess I'm OK with undeprecating them, but 
would like to understand why it is hard to use the iterator?

> [Java] Undeprecate sqlToArrow helpers
> -
>
> Key: ARROW-10853
> URL: https://issues.apache.org/jira/browse/ARROW-10853
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 2.0.0
>Reporter: Uwe Korn
>Assignee: Uwe Korn
>Priority: Major
> Fix For: 3.0.0
>
>
> These helper functions are really useful when called from Python as they deal 
> with a lot of "internals" of Java that we don't want to handle from the 
> Python side. We rather would keep using these functions.
> Note that some of them are broken due to recent refactoring and only return 
> 1024 rows (the default iterator size) without the ability to change that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-10853) [Java] Undeprecate sqlToArrow helpers

2020-12-08 Thread Uwe Korn (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246152#comment-17246152
 ] 

Uwe Korn commented on ARROW-10853:
--

[~tianchen92] [~emkornfield] Any objections to this? You both worked on 
[https://github.com/apache/arrow/pull/5075] that marked them as deprecated.

> [Java] Undeprecate sqlToArrow helpers
> -
>
> Key: ARROW-10853
> URL: https://issues.apache.org/jira/browse/ARROW-10853
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 2.0.0
>Reporter: Uwe Korn
>Assignee: Uwe Korn
>Priority: Major
> Fix For: 3.0.0
>
>
> These helper functions are really useful when called from Python as they deal 
> with a lot of "internals" of Java that we don't want to handle from the 
> Python side. We rather would keep using these functions.
> Note that some of them are broken due to recent refactoring and only return 
> 1024 rows (the default iterator size) without the ability to change that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)