[ https://issues.apache.org/jira/browse/ARROW-7808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17039672#comment-17039672 ]
Hongze Zhang commented on ARROW-7808: ------------------------------------- I am not pretty sure but based on the mail discussion I would think of mapping 1 or 2 methods via JNI is not final solution but something we can get started with. And, as for format Parquet, users may need access to different Datasets layers such as DataFragments for Parquet files, ScanTasks for RowGroups, even one may need to decide if C++ level post-scan filter should be enabled/disabled, if partition filter should be applied, and so on. One or two methods can not cover all of this. And maintaining a JNI-based Datasets API may not be a heavy workload, because on Java side, things are just mirrored to some basical Datasets concepts like DataSource, DataFragment, and should keep away from re-implementing low-level logic like scaning, projecting, filtering, etc. But everything in C++ could be available in Java which is important to many users. > [Java][Dataset] Implement Datasets Java API > -------------------------------------------- > > Key: ARROW-7808 > URL: https://issues.apache.org/jira/browse/ARROW-7808 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ - Dataset, Java > Reporter: Hongze Zhang > Priority: Major > Labels: dataset > > Porting following C++ Datasets APIs to Java: > * DataSource > * DataSourceDiscovery > * DataFragment > * Dataset > * Scanner > * ScanTask > * ScanOptions -- This message was sent by Atlassian Jira (v8.3.4#803005)