[
https://issues.apache.org/jira/browse/STORM-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791006#comment-14791006
]
ASF GitHub Bot commented on STORM-1028:
---------------------------------------
Github user mooso commented on the pull request:
https://github.com/apache/storm/pull/651#issuecomment-140865013
(Not a project maintainer, just someone who ran into the same problem)
I support the idea and encountered the same need myself, but I think they
way it's implemented in this pull request isn't correct since it doesn't work
for people who implement their own IEventDataScheme class. The way I did it was
to change the signature of deserialize() in that interface to add a MessageId
parameter, that way implementors of that class can choose to add the partition
ID (and sequence number) properly if they wish. Would that work?
> Eventhub spout meta data
> ------------------------
>
> Key: STORM-1028
> URL: https://issues.apache.org/jira/browse/STORM-1028
> Project: Apache Storm
> Issue Type: Bug
> Reporter: Mads Mætzke Tandrup
>
> Event hub (and Kafka) play well into event source architectures as event
> ingest point for later Storm processing to downstream stateful consumers.
> Advanced event stream processing, such as replaying parts of a stream,
> requires that the downstream consumers can synchronise different "stream
> runs" to their stateful view, which itself can be seen as an aggregation of
> all previous events. To set up the right context for re-processing the stream
> in a deterministic way, they need to sync their view with the incoming old
> data. To be able to do this, they need knowledge of the event sequenceNumber
> and partition.
> For example, if you have a bolt that calculates total_order_amount for a
> stream of orders, and emits order tuples with the total_order_amount
> calculated for all previous orders, replaying an order event should not
> change total_order_amount. I.e. orders with a higher sequenceNumber than the
> order being processed should not be included in total_order_amount.
> This synchronisation can be achieved if the bolt has access to the parition
> and sequenceNumber from eventHub.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)