[
https://issues.apache.org/jira/browse/STORM-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730583#comment-14730583
]
Mads Mætzke Tandrup commented on STORM-1028:
--------------------------------------------
Created pull request https://github.com/apache/storm/pull/651
> Eventhub spout meta data
> ------------------------
>
> Key: STORM-1028
> URL: https://issues.apache.org/jira/browse/STORM-1028
> Project: Apache Storm
> Issue Type: Bug
> Reporter: Mads Mætzke Tandrup
>
> Event hub (and Kafka) play well into event source architectures as event
> ingest point for later Storm processing to downstream stateful consumers.
> Advanced event stream processing, such as replaying parts of a stream,
> requires that the downstream consumers can synchronise different "stream
> runs" to their stateful view, which itself can be seen as an aggregation of
> all previous events. To set up the right context for re-processing the stream
> in a deterministic way, they need to sync their view with the incoming old
> data. To be able to do this, they need knowledge of the event sequenceNumber
> and partition.
> For example, if you have a bolt that calculates total_order_amount for a
> stream of orders, and emits order tuples with the total_order_amount
> calculated for all previous orders, replaying an order event should not
> change total_order_amount. I.e. orders with a higher sequenceNumber than the
> order being processed should not be included in total_order_amount.
> This synchronisation can be achieved if the bolt has access to the parition
> and sequenceNumber from eventHub.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)