[ https://issues.apache.org/jira/browse/BEAM-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16112011#comment-16112011 ]
Chamikara Jayalath commented on BEAM-2708: ------------------------------------------ A similar issue exists in Python SDK as well but manifests as an AssertionError instead of a data loss. Working on a fix. > Decompressing bzip2 files with multiple "streams" only reads the first stream > ----------------------------------------------------------------------------- > > Key: BEAM-2708 > URL: https://issues.apache.org/jira/browse/BEAM-2708 > Project: Beam > Issue Type: Bug > Components: sdk-java-extensions, sdk-py > Reporter: Pablo Estrada > Assignee: Chamikara Jayalath > Fix For: 2.1.0, 2.2.0 > > > I'm not sure which components to file this against. A user has observed that > pbzip2 files are not being properly decompressed: > https://stackoverflow.com/questions/45439117/google-dataflow-only-partly-uncompressing-files-compressed-with-pbzip2 -- This message was sent by Atlassian JIRA (v6.4.14#64029)