[ 
https://issues.apache.org/jira/browse/BEAM-4379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17029712#comment-17029712
 ] 

Ryan Skraba commented on BEAM-4379:
-----------------------------------

Hello!  No progress to report -- feel free to take this if you like.  To sum 
up, it doesn't look possible to handle splits without using the API that relies 
on Hadoop artifacts.  Currently, ParquetIO avoids importing org.apache.hadoop 
classes (although they are still required at runtime).  

The solution is to fix Parquet, but my first attempt caused a bunch of 
backwards-incompatible changes, and I didn't continue for time constraints.  
The other alternative is to bring back the Beam ParquetIO dependencies on 
org.apache.hadoop for now.  I'll unassign myself for now!

> Make ParquetIO Read splittable
> ------------------------------
>
>                 Key: BEAM-4379
>                 URL: https://issues.apache.org/jira/browse/BEAM-4379
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-ideas, io-java-parquet
>            Reporter: Lukasz Gajowy
>            Assignee: Ryan Skraba
>            Priority: Major
>
> As the title stands - currently it is not splittable which is not optimal for 
> runners that support splitting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to