bzablocki commented on issue #32596: URL: https://github.com/apache/beam/issues/32596#issuecomment-2407685063
> With redeliveries I also was wondering why are they not filtered out as duplicates (the requiresDedup property of the IO transforms in Beam)? I checked it specifically that the message id is the same between the original message and the same message when it is being redelivered but Beam/Dataflow just sends it over to our pipeline twice.. so I'm a bit confused about this deduplication logic. I assume you set the `SolaceIO.Read#withDeduplicateRecords()` to `true`? This adds a Reshuffle step based on an id. The id in this case is https://github.com/apache/beam/blob/e7ec432db7bf4d7c0b8c77a1dc5f54acab903462/sdks/java/io/solace/src/main/java/org/apache/beam/sdk/io/solace/SolaceIO.java#L560-L569 Fyi, the added Reshuffle step is in the Deduplicate transform: https://github.com/apache/beam/blob/e7ec432db7bf4d7c0b8c77a1dc5f54acab903462/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java#L2174-L2175 Could you check if you look at the same id that is used for deduplication? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
