[ 
https://issues.apache.org/jira/browse/ARROW-7808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17039672#comment-17039672
 ] 

Hongze Zhang commented on ARROW-7808:
-------------------------------------

I am not pretty sure but based on the mail discussion I would think of mapping 
1 or 2 methods via JNI is not final solution but something we can get started 
with. And, as for format Parquet, users may need access to different Datasets 
layers such as DataFragments for Parquet files, ScanTasks for RowGroups, even 
one may need to decide if C++ level post-scan filter should be 
enabled/disabled, if partition filter should be applied, and so on. One or two 
methods can not cover all of this.

And maintaining a JNI-based Datasets API may not be a heavy workload, because 
on Java side, things are just mirrored to some basical Datasets concepts like 
DataSource, DataFragment, and should keep away from re-implementing low-level 
logic like scaning, projecting, filtering, etc. But everything in C++ could be 
available in Java which is important to many users.

> [Java][Dataset] Implement Datasets Java API 
> --------------------------------------------
>
>                 Key: ARROW-7808
>                 URL: https://issues.apache.org/jira/browse/ARROW-7808
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++ - Dataset, Java
>            Reporter: Hongze Zhang
>            Priority: Major
>              Labels: dataset
>
> Porting following C++ Datasets APIs to Java: 
> * DataSource 
> * DataSourceDiscovery 
> * DataFragment 
> * Dataset
> * Scanner 
> * ScanTask 
> * ScanOptions 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to