[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15281905#comment-15281905
 ] 

Robert Joseph Evans commented on STORM-1757:
--------------------------------------------

I would love to have something like that happen, but I am not sure if the two 
are even compatible at a fundamental level.  I think the Source/Spout APIs are 
a good example of this.

https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/trident/spout/ITridentSpout.java

Provides two distinct things a BatchCoordinator and an Emitter.

https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/UnboundedSource.java

Really just provides a single thing, an UnboundedReader.

Both have a bit of metadata that is intended to act as a checkpoint of where 
they are currently at in processing (X in Trident and CheckpointMark in BEAM), 
but what they represent is fundamentally different.  In Trident X represents a 
complete batch of data for processing.  A start point through to an end point.  
In BEAM a CheckpointMark is just a place in the processing.  It is a spot where 
a checkpoint can be restored to.  There is no end to it, there is no batch.  I 
can turn it into a batch if I stop reading after some amount of time, or after 
a given amount of data, but then I am just at a new point.

State/Sinks are another.  In Trident a State 

https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/trident/state/State.java

is a specific thing that can have data written to it and in some cases queried 
from it.  But ultimately it really just comes down to two APIs 
begineTransaction and commit transaction.  Transactions to state are guaranteed 
to be committed in order.

Beam has not such concept the closest logically to it is a Sink, but a Sink is 
just for batch, in streaming you just use an idempotent ParDo. But in reality 
every ParDo offers similar lifecycle operations to a State.

https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java

A DoFn runs inside a ParDo, and has APIs for startBundle and finishBundle.  A 
bundle is an arbitrary batch of data being processed, it seems that in 
streaming it is intended to be used around check-pointing, but there is no id 
like with state, so there actually is no guarantee of the order of processing.  
In fact it is quite the opposite they want to be sure that ParDos can even run 
in parallel on different machines.




> Apache Beam Runner for Storm
> ----------------------------
>
>                 Key: STORM-1757
>                 URL: https://issues.apache.org/jira/browse/STORM-1757
>             Project: Apache Storm
>          Issue Type: Brainstorming
>            Reporter: P. Taylor Goetz
>            Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to