[ https://issues.apache.org/jira/browse/BEAM-6952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chamikara Jayalath resolved BEAM-6952. -------------------------------------- Resolution: Fixed > concatenated compressed files bug with python sdk > ------------------------------------------------- > > Key: BEAM-6952 > URL: https://issues.apache.org/jira/browse/BEAM-6952 > Project: Beam > Issue Type: Bug > Components: sdk-py-core > Affects Versions: 2.11.0 > Reporter: Daniel Lescohier > Priority: Major > Fix For: 2.14.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > The Python apache_beam.io.filesystem module has a bug handling concatenated > compressed files. > The PR I will create has two commits: > # a new unit test that shows the problem > # a fix to the problem. > The unit test is added to the apache_beam.io.filesystem_test module. It was > added to this module because the test: > apache_beam.io.textio_test.test_read_gzip_concat does not encounter the > problem in the Beam 2.11 and earlier code base because the test data is too > small: the data is smaller than read_size, so it goes through logic in the > code that avoids the problem in the code. So, this test sets read_size > smaller and test data bigger, in order to encounter the problem. It would be > difficult to test in the textio_test module, because you'd need very large > test data because default read_size is 16MiB, and the ReadFromText interface > does not allow you to modify the read_size. -- This message was sent by Atlassian JIRA (v7.6.3#76005)