[ https://issues.apache.org/jira/browse/BEAM-4379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902108#comment-16902108 ]
Ryan Skraba commented on BEAM-4379: ----------------------------------- I'm looking at what Spark has done for splittable Parquet files -- it looks like there's a lot of reusable strategy that might be pushed up to Parquet, especially with their own ReadSupport for catalyst data types (the equivalent of BEAM-4812, reading and writing Rows directly from Parquet). I'm still ramping up on the necessary changes to Parquet, but I won't be offended if my conclusion is proven wrong or someone with more expertise takes the JIRA, of course! > Make ParquetIO Read splittable > ------------------------------ > > Key: BEAM-4379 > URL: https://issues.apache.org/jira/browse/BEAM-4379 > Project: Beam > Issue Type: Improvement > Components: io-ideas, io-java-parquet > Reporter: Lukasz Gajowy > Priority: Major > > As the title stands - currently it is not splittable which is not optimal for > runners that support splitting. -- This message was sent by Atlassian JIRA (v7.6.14#76016)