carlpayne opened a new issue, #25794: URL: https://github.com/apache/beam/issues/25794
### What happened? We recently upgraded some of our Beam pipelines from `2.34.0` to `2.45.0` (latest at the time of writing). After doing so, we noticed that some of our batch jobs that read compressed (gzip) Avro files are now failing. I've created a repository [here](https://github.com/carlpayne/beam-demo) with a working example of how to reproduce, but will include the steps below. First, checkout the [sample repository](https://github.com/carlpayne/beam-demo) ## Apache Beam 2.34.0 The demo works as expected using this version of Apache Beam. To demonstrate: 1. First ensure `def beam_version = "2.34.0"` is set in `build.gradle` (it should be the default on a fresh git clone) 2. ```./gradlew run``` Notice that the code outputs the expected "tweet" contents by reading the zipped Avro files: ``` TWEET: User: migunoTweet: Rock: Nerf paper, scissors is fine. TWEET: User: migunoTweet: Rock: Nerf paper, scissors is fine. TWEET: User: BlizzardCSTweet: Works as intended. Terran is IMBA. TWEET: User: BlizzardCSTweet: Works as intended. Terran is IMBA. ``` ## Apache Beam 2.45.0 With later versions of Apache Beam, we see unexpected failures with the same code. To demonstrate: 1. First comment out `def beam_version = "2.34.0"` in `build.gradle` 2. Un-comment `def beam_version = "2.45.0"` 3. ```./gradlew run``` Notice that the code fails with the following stack trace: ``` > Task :run FAILED SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Exception in thread "main" org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.lang.ClassCastException: class java.nio.channels.Channels$ReadableByteChannelImpl cannot be cast to class java.nio.channels.SeekableByteChannel (java.nio.channels.Channels$ReadableByteChannelImpl and java.nio.channels.SeekableByteChannel are in module java.base of loader 'bootstrap') at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:374) at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:342) at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:218) at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:67) at org.apache.beam.sdk.Pipeline.run(Pipeline.java:323) at org.apache.beam.sdk.Pipeline.run(Pipeline.java:309) at beam.demo.Demo.main(Demo.java:52) Caused by: java.lang.ClassCastException: class java.nio.channels.Channels$ReadableByteChannelImpl cannot be cast to class java.nio.channels.SeekableByteChannel (java.nio.channels.Channels$ReadableByteChannelImpl and java.nio.channels.SeekableByteChannel are in module java.base of loader 'bootstrap') at org.apache.beam.sdk.io.AvroSource$AvroReader.startReading(AvroSource.java:743) at org.apache.beam.sdk.io.CompressedSource$CompressedReader.startReading(CompressedSource.java:449) at org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.startImpl(FileBasedSource.java:479) at org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.start(OffsetBasedSource.java:252) at org.apache.beam.sdk.io.ReadAllViaFileBasedSource$ReadFileRangesFn.process(ReadAllViaFileBasedSource.java:140) ``` ## Compressed vs Un-compressed Avro Files One important point worth noting is that this issue only seems to occur for compressed files. If we read in the un-compressed versions then the code works as expected. To demonstrate: 1. Update `Demo.java` so that `LinkedList<String> files` reads in `"twitter1.avro"` and `"twitter1.avro"`, instead of the compressed versions 2. ```./gradlew run``` ### Issue Priority Priority: 3 (minor) ### Issue Components - [ ] Component: Python SDK - [X] Component: Java SDK - [ ] Component: Go SDK - [ ] Component: Typescript SDK - [ ] Component: IO connector - [ ] Component: Beam examples - [ ] Component: Beam playground - [ ] Component: Beam katas - [ ] Component: Website - [ ] Component: Spark Runner - [ ] Component: Flink Runner - [ ] Component: Samza Runner - [ ] Component: Twister2 Runner - [ ] Component: Hazelcast Jet Runner - [ ] Component: Google Cloud Dataflow Runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
