[ https://issues.apache.org/jira/browse/STORM-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15008826#comment-15008826 ]
ASF GitHub Bot commented on STORM-1028: --------------------------------------- Github user rsltrifork commented on the pull request: https://github.com/apache/storm/pull/651#issuecomment-157402566 Mooso is right, and his suggestion is a cleaner implementation. I suggest we use his implementation of IEventDataScheme and EventDataScheme. Mooso can you make a PR for your copy of these? > Eventhub spout meta data > ------------------------ > > Key: STORM-1028 > URL: https://issues.apache.org/jira/browse/STORM-1028 > Project: Apache Storm > Issue Type: Bug > Components: storm-eventhubs, storm-kafka > Reporter: Mads Mætzke Tandrup > > Event hub (and Kafka) play well into event source architectures as event > ingest point for later Storm processing to downstream stateful consumers. > Advanced event stream processing, such as replaying parts of a stream, > requires that the downstream consumers can synchronise different "stream > runs" to their stateful view, which itself can be seen as an aggregation of > all previous events. To set up the right context for re-processing the stream > in a deterministic way, they need to sync their view with the incoming old > data. To be able to do this, they need knowledge of the event sequenceNumber > and partition. > For example, if you have a bolt that calculates total_order_amount for a > stream of orders, and emits order tuples with the total_order_amount > calculated for all previous orders, replaying an order event should not > change total_order_amount. I.e. orders with a higher sequenceNumber than the > order being processed should not be included in total_order_amount. > This synchronisation can be achieved if the bolt has access to the parition > and sequenceNumber from eventHub. -- This message was sent by Atlassian JIRA (v6.3.4#6332)