[ 
https://issues.apache.org/jira/browse/ARROW-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531959#comment-17531959
 ] 

Larry White commented on ARROW-15754:
-------------------------------------

Hi @apitrou, I have a couple of follow-up questions on this ticket. 
 
I would like to better understand how you see the C Data interface based 
version of this adapter working. From what I've read, that interface is 
primarily designed as a way to simplify access to Arrow memory. Looking at the 
orc adaptor, it seems to work like an ArrowReader operating on a file, in that 
it calls native code that reads an ORC file from disk into memory, and as it 
proceeds, it hands off each stripe (as a RecordBatch) to the Java code.  
 
Is it your idea that the adapter should separate the loading and the memory 
access, so that it first does a complete load of the data (into a SimpleTable, 
perhaps), and then access to the file is performed using C Data? If that is the 
case, would the API be simplified to something like a function that asks for 
the file to be loaded and returns to the caller something like a map of 
ArrowSchema to Arrow Array?
 
The second question is on your comment that complex types are not supported.  
Is it the C++ implementation here that lacks the ability to read complex types? 
From what I can see, the current JNI interface is built around the 
VectorSchemaRoot and ArrowRecordBatch, which I presume support complex types. 
 
thanks. 

> [Java] ORC JNI bridge should use the C data interface
> -----------------------------------------------------
>
>                 Key: ARROW-15754
>                 URL: https://issues.apache.org/jira/browse/ARROW-15754
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Java
>            Reporter: Antoine Pitrou
>            Assignee: Larry White
>            Priority: Major
>
> Right now the ORC JNI bridge uses some custom buffer passing which only seems 
> to handle primitive arrays correctly (child array buffers and dictionaries 
> are not considered):
> https://github.com/apache/arrow/blob/master/cpp/src/jni/orc/jni_wrapper.cpp#L263-L265
> Instead, it should use the C data interface, which is now implemented in Java.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to