Hi friends, I encountered an issue with the beam python SDK (2.43.0) recently where I was using ReadFromTextWithFilename on a Google Cloud Storage (GCS) bucket that contains roughly 95k gzip compressed CSV files. One of the files was truncated in transit, so the job ran for a couple of hours before returning an exception like zlib.error: Error -3 while decompressing data: incorrect header check from within the apache_beam.io.Filesystem module. The exception didn't indicate the filename for the truncated file, and from looking through the standard library, I couldn't find any mechanism to handle the exception or to return additional context that would have allowed me to remediate the situation.
Is there an example of how to handle this situation? Ideally, the library would return a PCollection of filenames that encountered errors while reading or something similar to that for further processing rather than causing a job to crash.