[ https://issues.apache.org/jira/browse/BEAM-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15279595#comment-15279595 ]
ASF GitHub Bot commented on BEAM-79: ------------------------------------ GitHub user manuzhang opened a pull request: https://github.com/apache/incubator-beam/pull/323 [BEAM-79] add Gearpump runner Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-<Jira issue #>] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `<Jira issue #>` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- This PR adds Gearpump runner to Beam meeting the goals of phase 1 in the [design document](https://docs.google.com/document/d/1nw64QUWVfT8L7FUprPGLEeNjSBpDMkn1otfLt2rHM5g/edit). The Gearpump runner supports the following functionalities, * Transformations: ParDo, GroupByKey, Flatten * Windows: using Beam's window logic and code * side outputs * serialization/deserialization: using Gearpump's Kryo serializer * sources: Beam's UnboundedSource * message delivery guarantee: at-most-once * tests: integration test for various translators Here's a snapshot of running the following Beam example on Gearpump cluster ```java PCollection<KV<String, Long>> wordCounts = p.apply(Read.from(new UnboundedTextSource()).named("WordStream")) .apply(ParDo.of(new ExtractWordsFn())) .apply(Window.<String>into(FixedWindows.of(Duration.standardSeconds(10)))) .apply(Count.<String>perElement()); wordCounts.apply(ParDo.of(new FormatAsStringFn())); ``` ![snip20160511_4](https://cloud.githubusercontent.com/assets/1191767/15171197/fd6ffba0-177e-11e6-99a1-30c7c2597244.png) Note that the Gearpump runner is still in early stage and lacking capabilities like trigger, side inputs, aggregator. However, I'd like to have the community to get a feel of what Gearpump is like, whether Beam and Gearpump go well, and gather ideas for improvements. You can merge this pull request into a Git repository by running: $ git pull https://github.com/manuzhang/incubator-beam gearpump_runner Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/323.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #323 ---- commit 73e5978f599bcf32ed8c2f1d54b6bd3bd8350092 Author: manuzhang <owenzhang1...@gmail.com> Date: 2016-03-15T08:15:16Z [BEAM-79] add Gearpump runner ---- > Gearpump runner > --------------- > > Key: BEAM-79 > URL: https://issues.apache.org/jira/browse/BEAM-79 > Project: Beam > Issue Type: New Feature > Components: runner-ideas > Reporter: Tyler Akidau > Assignee: Manu Zhang > > Intel is submitting Gearpump (http://www.gearpump.io) to ASF > (https://wiki.apache.org/incubator/GearpumpProposal). Appears to be a mix of > low-level primitives a la MillWheel, with some higher level primitives like > non-merging windowing mixed in. Seems like it would make a nice Beam runner. -- This message was sent by Atlassian JIRA (v6.3.4#6332)