[ https://issues.apache.org/jira/browse/BEAM-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pavlo Zhukov updated BEAM-7411: ------------------------------- Description: To reduce the size of uploaded files we decided to gzip it before upload. Unfortunately, we noticed that we don't have content-encoding 'gzip' in the uploaded files metadata. I rechecked the code and noticed that there is no way to pass gzip encoding on {code:java} apache_beam.io.gcp.gcsio.GcsIO.open(){code} Also, I noticed that apache_beam.io.gcp.gcsio.GcsUploader doesn't support uploading for gzipped files. To resolve this problem we need to allow pass gzip_encoded option, which can be passed to apitools.base.py.transfer on {code:java} GcsUploader.__init__() {code} Is there any possibility that you apply the required changes soon? *What steps to reproduce the problem?* 1. Prepare gzip encoded file for example pdf 2. Upload it to GCS using {code:java} from apache_beam.io.gcp.gcsio import GcsIO def upload_gzipped_pdf(gzipped_pdf, path) with GcsIO().open(path, 'w') as f: f.write(gzipped_pdf) {code} 3. Try to download uploaded file via browser *What is the expected result?* I see the file content properly *What happens instead?* I have a broken document was: To reduce the size of uploaded files we decided to gzip it before upload. Unfortunately, we noticed that we don't have content-encoding 'gzip' in the uploaded files metadata. I rechecked the code and noticed that there is no way to pass gzip encoding on {code:java} apache_beam.io.gcp.gcsio.GcsIO.open(){code} Also, I noticed that apache_beam.io.gcp.gcsio.GcsUploader doesn't support uploading for gzipped files. To resolve this problem we need to allow pass gzip_encoded option, which can be passed to apitools.base.py.transfer on {code:java} GcsUploader.__init__() {code} Is there any possibility that you apply the required changes soon? *What steps to reproduce the problem?* 1. Prepare gzip encoded file for example pdf 2. Upload it to GCS using {code} from apache_beam.io.gcp.gcsio import GcsIO with GcsIO().open(gzipped_pdf_gcs_path, 'w') as f: f.write(gzipped_pdf) {code} 3. Try to download uploaded file via browser *What is the expected result?* I see the file content properly *What happens instead?* I have a broken document > Allow upload gzipped files via apache_beam.io.gcp.gcsio.GcsIO with proper > content-encoding > ------------------------------------------------------------------------------------------ > > Key: BEAM-7411 > URL: https://issues.apache.org/jira/browse/BEAM-7411 > Project: Beam > Issue Type: Improvement > Components: io-python-gcp > Reporter: Pavlo Zhukov > Priority: Major > > To reduce the size of uploaded files we decided to gzip it before upload. > Unfortunately, we noticed that we don't have content-encoding 'gzip' in the > uploaded files metadata. I rechecked the code and noticed that there is no > way to pass gzip encoding on > {code:java} > apache_beam.io.gcp.gcsio.GcsIO.open(){code} > Also, I noticed that apache_beam.io.gcp.gcsio.GcsUploader doesn't support > uploading for gzipped files. > To resolve this problem we need to allow pass gzip_encoded option, which can > be passed to apitools.base.py.transfer on > {code:java} > GcsUploader.__init__() > {code} > Is there any possibility that you apply the required changes soon? > *What steps to reproduce the problem?* > 1. Prepare gzip encoded file for example pdf > 2. Upload it to GCS using > {code:java} > from apache_beam.io.gcp.gcsio import GcsIO > def upload_gzipped_pdf(gzipped_pdf, path) > with GcsIO().open(path, 'w') as f: > f.write(gzipped_pdf) > {code} > 3. Try to download uploaded file via browser > *What is the expected result?* > I see the file content properly > *What happens instead?* > I have a broken document -- This message was sent by Atlassian JIRA (v7.6.3#76005)