[
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15284636#comment-15284636
]
Robert Joseph Evans commented on STORM-1757:
--------------------------------------------
I have gone through the BEAM API and I think I understand it fairly well now.
I have a plan on how we could implement this, but honestly it is a different
enough model from trident, that I don't think the two will be able to reuse
code, like I stated before. Additionally I have not dug into our windowing
code to know how it might work with BEAM, so any feedback on how if it can be
reused would be appreciated.
Short term I would propose that we go ahead with a BEAM implementation that
runs on storm, not trident. Longer term I believe we should move towards a
three tiered API approach. This is what most compiler frameworks use, and I
think makes a lot of since in this use case too.
The lowest level is like assembly language. For storm this would be bolts,
spouts, groupings, and possibly some other things around state, check-pointing
and coordination. This is what the worker and most of the scheduler sees and
executes.
There would also be an intermediate representation that describes the logical
operations being performed. These would be fairly simple logical operations
link map, groupbykey, windowing, etc. An optimizer would take this as input and
transform into an optimized solution. Eventually this would be a cost based
optimizer that would also take metrics from the running code + hints to
understand where data is actually flowing, how skewed is the data, etc to
improve the plan over time. A "compiler" would then translate the optimized
intermediate representation into the assembled topology. The scheduler could
then place the code physically on the cluster it is running on.
High level languages like SQL, BEAM, Trident, etc. would translate their
operations into the intermediate representation and submit it to nimbus for
execution. This would let us keep the regular storm API, but also be able to
maintaqin a few "standard" high level languages and support multiple domain
specific languages.
BEAM design coming shortly.
> Apache Beam Runner for Storm
> ----------------------------
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
> Issue Type: Brainstorming
> Reporter: P. Taylor Goetz
> Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1]
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to
> map naturally to the Beam API. If not, it may be indicative of shortcomings
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)