[ 
https://issues.apache.org/jira/browse/ARROW-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17532140#comment-17532140
 ] 

Antoine Pitrou commented on ARROW-15754:
----------------------------------------

Hi [~ljw1001]

This wouldn't change the fact that the ORC reader interface operates on a 
record batch at a time.

The issue here is that there is ad hoc code to transfer the record batches read 
by the C++ ORC reader, into Java.
You can see this code here:
https://github.com/apache/arrow/blob/master/cpp/src/jni/orc/jni_wrapper.cpp#L255-L285

The reason I'm saying it only handles primitive types is that this code doesn't 
take into account child arrays ({{dataArray->children}} isn't visited), so 
nested types won't work; ditto for dictionary types.

Similar ad hoc code existed on the JNI datasets side and it was removed in 
ARROW-7272, in favour of calling the C data interface. See in particular these 
changes: 
https://github.com/apache/arrow/pull/10883/files#diff-ae5c8db6104f5fc42b724e2e3272d093d7c8db128114fb401f1ea7dc3c6c5cb5L474

In addition to relying on a shared building block (the C data interface) and 
removing code duplication, this actually added support for complex types, 
though apparently no tests were added for that.

> [Java] ORC JNI bridge should use the C data interface
> -----------------------------------------------------
>
>                 Key: ARROW-15754
>                 URL: https://issues.apache.org/jira/browse/ARROW-15754
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Java
>            Reporter: Antoine Pitrou
>            Assignee: Larry White
>            Priority: Major
>
> Right now the ORC JNI bridge uses some custom buffer passing which only seems 
> to handle primitive arrays correctly (child array buffers and dictionaries 
> are not considered):
> https://github.com/apache/arrow/blob/master/cpp/src/jni/orc/jni_wrapper.cpp#L263-L265
> Instead, it should use the C data interface, which is now implemented in Java.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to