[ https://issues.apache.org/jira/browse/BEAM-4379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902224#comment-16902224 ]
Ryan Skraba commented on BEAM-4379: ----------------------------------- Alright, I was mistaken about one thing -- the current ParquetIO *already* includes the hadoop-client and hadoop-common jars in its dependencies, but it only uses the Parquet API that does not expose org.apache.hadoop classes. I suppose PARQUET-1126 is necessary to remove the hadoop-client dependency (which would be a desirable outcome, but not currently possible today with or without splittability). It should be possible to implement splittability by using the current Parquet API, and I'll take a look. > Make ParquetIO Read splittable > ------------------------------ > > Key: BEAM-4379 > URL: https://issues.apache.org/jira/browse/BEAM-4379 > Project: Beam > Issue Type: Improvement > Components: io-ideas, io-java-parquet > Reporter: Lukasz Gajowy > Priority: Major > > As the title stands - currently it is not splittable which is not optimal for > runners that support splitting. -- This message was sent by Atlassian JIRA (v7.6.14#76016)