[
https://issues.apache.org/jira/browse/BEAM-4379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902224#comment-16902224
]
Ryan Skraba commented on BEAM-4379:
-----------------------------------
Alright, I was mistaken about one thing -- the current ParquetIO *already*
includes the hadoop-client and hadoop-common jars in its dependencies, but it
only uses the Parquet API that does not expose org.apache.hadoop classes.
I suppose PARQUET-1126 is necessary to remove the hadoop-client dependency
(which would be a desirable outcome, but not currently possible today with or
without splittability).
It should be possible to implement splittability by using the current Parquet
API, and I'll take a look.
> Make ParquetIO Read splittable
> ------------------------------
>
> Key: BEAM-4379
> URL: https://issues.apache.org/jira/browse/BEAM-4379
> Project: Beam
> Issue Type: Improvement
> Components: io-ideas, io-java-parquet
> Reporter: Lukasz Gajowy
> Priority: Major
>
> As the title stands - currently it is not splittable which is not optimal for
> runners that support splitting.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)