[ https://issues.apache.org/jira/browse/BEAM-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15432927#comment-15432927 ]
Kenneth Knowles commented on BEAM-577: -------------------------------------- When this is resolved, let's update the answer at http://stackoverflow.com/questions/39085869/opening-a-gzip-file-in-python-apache-beam > Update filebasedsource to support compressed files > -------------------------------------------------- > > Key: BEAM-577 > URL: https://issues.apache.org/jira/browse/BEAM-577 > Project: Beam > Issue Type: Improvement > Components: sdk-py > Reporter: Chamikara Jayalath > Assignee: Chamikara Jayalath > > FileBasedSource framework [1] should be updated to properly read compressed > files. > One possible way to do this might be to update FileBasedSource.open_file() > [2] to return a CompressedFile [3]. > Similar to Java implementation, we may not be able to support dynamic work > rebalancing for compressed files. > [1] > https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/filebasedsource.py > [2] > https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/filebasedsource.py#L125 > [3] > https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/fileio.py#L300 -- This message was sent by Atlassian JIRA (v6.3.4#6332)