[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15284636#comment-15284636
 ] 

Robert Joseph Evans commented on STORM-1757:
--------------------------------------------

I have gone through the BEAM API and I think I understand it fairly well now.  
I have a plan on how we could implement this, but honestly it is a different 
enough model from trident, that I don't think the two will be able to reuse 
code, like I stated before.  Additionally I have not dug into our windowing 
code to know how it might work with BEAM, so any feedback on how if it can be 
reused would be appreciated.

Short term I would propose that we go ahead with a BEAM implementation that 
runs on storm, not trident.  Longer term I believe we should move towards a 
three tiered API approach.  This is what most compiler frameworks use, and I 
think makes a lot of since in this use case too.  

The lowest level is like assembly language.  For storm this would be bolts, 
spouts, groupings, and possibly some other things around state, check-pointing 
and coordination.  This is what the worker and most of the scheduler sees and 
executes.

There would also be an intermediate representation that describes the logical 
operations being performed.  These would be fairly simple logical operations 
link map, groupbykey, windowing, etc. An optimizer would take this as input and 
transform into an optimized solution.  Eventually this would be a cost based 
optimizer that would also take metrics from the running code + hints to 
understand where data is actually flowing, how skewed is the data, etc to 
improve the plan over time.  A "compiler" would then translate the optimized 
intermediate representation into the assembled topology.  The scheduler could 
then place the code physically on the cluster it is running on.

High level languages like SQL, BEAM, Trident, etc. would translate their 
operations into the intermediate representation and submit it to nimbus for 
execution.  This would let us keep the regular storm API, but also be able to 
maintaqin a few "standard" high level languages and support multiple domain 
specific languages.

BEAM design coming shortly.



> Apache Beam Runner for Storm
> ----------------------------
>
>                 Key: STORM-1757
>                 URL: https://issues.apache.org/jira/browse/STORM-1757
>             Project: Apache Storm
>          Issue Type: Brainstorming
>            Reporter: P. Taylor Goetz
>            Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to