aprochko opened a new issue, #37163:
URL: https://github.com/apache/beam/issues/37163

   ### What would you like to happen?
   
   Hi dear team,
   
   I would like to ask if the current implementation of PubSubIO is sufficient 
for cases where message ordering is important.
   
   There is a note:
   _Integration with Dataflow: Don't enable message ordering for subscriptions 
when configuring Dataflow with Pub/Sub. Dataflow has its own mechanism for 
total message ordering, ensuring chronological order across all messages as 
part of windowing operations. This method of ordering differs from Pub/Sub's 
ordering key-based approach. Using ordering keys with Dataflow can potentially 
reduce pipeline performance._
   
   https://docs.cloud.google.com/pubsub/docs/ordering#:~:text=Using ordering 
keys with Dataflow,with the same ordering key.
   
   Additionally, there's a notification when starting the pipeline indicating 
that enabling ordering in the subscription could cause performance issues.
   
   With https://github.com/apache/beam/pull/31608, there was a significant 
improvement that allows writing with an ordering key, but reading still lacks 
this functionality.
   
   I've tested a few approaches, and it appears that whether ordering is 
enabled or disabled in the subscription, there's a chance that messages will be 
passed to the next step in the wrong order after 
PubsubIO.readMessagesWithAttributesAndMessageIdAndOrderingKey(), even if the 
next step is just sorting back by ordering key.
   
   This is a big disadvantage for us compared to KafkaIO, which outputs KV<key, 
message> instead of just <message>.
   
   Is there a chance to fix this and to make the output of PubSubIO optionally 
KV (key-value) depending on the use case, to truly maintain the correct order, 
even with a potential performance downgrade?
   
   Thank you
   
   ### Issue Priority
   
   Priority: 2 (default / most feature requests should be filed as P2)
   
   ### Issue Components
   
   - [ ] Component: Python SDK
   - [x] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [ ] Component: IO connector
   - [ ] Component: Beam YAML
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Infrastructure
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to