Github user rsltrifork commented on the pull request:

    https://github.com/apache/storm/pull/651#issuecomment-124075611
  
    Event hub (and Kafka) play well into event source architectures as event 
ingest point for later Storm processing to downstream stateful consumers.
    
    Advanced event stream processing, such as replaying parts of a stream, 
requires that the downstream consumers can synchronise different "stream runs" 
to their stateful view, which itself can be seen as an aggregation of all 
previous events. To set up the right context for re-processing the stream in a 
deterministic way, they need to sync their view with the incoming old data. To 
be able to do this, they need knowledge of the event sequenceNumber and 
partition.
    
    For example, if you have a bolt that calculates total_order_amount for a 
stream of orders, and emits order tuples with the total_order_amount calculated 
for all previous orders, replaying an order event should not change 
total_order_amount. I.e. orders with a higher sequenceNumber than the order 
being processed should not be included in total_order_amount.
    
    This synchronisation can be achieved if the bolt has access to the parition 
and sequenceNumber from eventHub.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to