Github user rsltrifork commented on the pull request: https://github.com/apache/storm/pull/651#issuecomment-124075611 Event hub (and Kafka) play well into event source architectures as event ingest point for later Storm processing to downstream stateful consumers. Advanced event stream processing, such as replaying parts of a stream, requires that the downstream consumers can synchronise different "stream runs" to their stateful view, which itself can be seen as an aggregation of all previous events. To set up the right context for re-processing the stream in a deterministic way, they need to sync their view with the incoming old data. To be able to do this, they need knowledge of the event sequenceNumber and partition. For example, if you have a bolt that calculates total_order_amount for a stream of orders, and emits order tuples with the total_order_amount calculated for all previous orders, replaying an order event should not change total_order_amount. I.e. orders with a higher sequenceNumber than the order being processed should not be included in total_order_amount. This synchronisation can be achieved if the bolt has access to the parition and sequenceNumber from eventHub.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---