aprochko opened a new issue, #37163: URL: https://github.com/apache/beam/issues/37163
### What would you like to happen? Hi dear team, I would like to ask if the current implementation of PubSubIO is sufficient for cases where message ordering is important. There is a note: _Integration with Dataflow: Don't enable message ordering for subscriptions when configuring Dataflow with Pub/Sub. Dataflow has its own mechanism for total message ordering, ensuring chronological order across all messages as part of windowing operations. This method of ordering differs from Pub/Sub's ordering key-based approach. Using ordering keys with Dataflow can potentially reduce pipeline performance._ https://docs.cloud.google.com/pubsub/docs/ordering#:~:text=Using ordering keys with Dataflow,with the same ordering key. Additionally, there's a notification when starting the pipeline indicating that enabling ordering in the subscription could cause performance issues. With https://github.com/apache/beam/pull/31608, there was a significant improvement that allows writing with an ordering key, but reading still lacks this functionality. I've tested a few approaches, and it appears that whether ordering is enabled or disabled in the subscription, there's a chance that messages will be passed to the next step in the wrong order after PubsubIO.readMessagesWithAttributesAndMessageIdAndOrderingKey(), even if the next step is just sorting back by ordering key. This is a big disadvantage for us compared to KafkaIO, which outputs KV<key, message> instead of just <message>. Is there a chance to fix this and to make the output of PubSubIO optionally KV (key-value) depending on the use case, to truly maintain the correct order, even with a potential performance downgrade? Thank you ### Issue Priority Priority: 2 (default / most feature requests should be filed as P2) ### Issue Components - [ ] Component: Python SDK - [x] Component: Java SDK - [ ] Component: Go SDK - [ ] Component: Typescript SDK - [ ] Component: IO connector - [ ] Component: Beam YAML - [ ] Component: Beam examples - [ ] Component: Beam playground - [ ] Component: Beam katas - [ ] Component: Website - [ ] Component: Infrastructure - [ ] Component: Spark Runner - [ ] Component: Flink Runner - [ ] Component: Samza Runner - [ ] Component: Twister2 Runner - [ ] Component: Hazelcast Jet Runner - [ ] Component: Google Cloud Dataflow Runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
