[
https://issues.apache.org/jira/browse/SAMZA-184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Chen updated SAMZA-184:
-----------------------------
Description:
There has been some interest in supporting languages other than Java (or
JVM-based languages). We have already opened up SAMZA-18, which proposes
supporting a C implementation of SamzaContainer.
A second solution to this problem is to have a StreamTask implementation that
starts a child process in another language, and acts as a bridge between the
child process and the java-based Samza APIs. This is the way that both Storm
[1] and Hadoop work.
A lot of design decisions need to be fleshed out to support this, but most
people on the mailing list were very supportive of this approach. [2]
Things that need to be decided:
1. Should we start one subprocess per SamzaContainer, or one subprocess per
StreamTask?
2. How should the parent interact with the subprocess at both the transport
(stdin/stdout, unix sockets, TCP, HTTP, Thrift, etc) and serialization level
(protobuf, json, etc)?
3. What should the protocol look like? We should ideally support all of the
operations in StreamTask, InitableTask, WindowableTask, ClosableTask, etc.
4. Should the child process receive the messages in batches, or one at a time?
It'd be good to get a draft proposal up on the Wiki, so we can all discuss this
and converge on an implementation.
[1] http://storm.incubator.apache.org/documentation/Multilang-protocol.html
[2]
http://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201403.mbox/%3CCAB%2B2NVXX2Fq_61WfvH%2BAfW8ZW7vQbVfTN-JPGU%2Bd7AdZ73oPDQ%40mail.gmail.com%3E
was:
There has been some interest in supporting languages other than Java (or
JVM-based languages). We have already opened up SAMZA-18, which proposes
supporting a C implementation of SamzaContainer.
A second solution to this problem is to have a StreamTask implementation that
starts a child process in another language, and acts as a bridge between the
child process and the java-based Samza APIs. This is the way that both Storm
[1] and Hadoop work.
A lot of design decisions need to be fleshed out to support this, but most
people on the mailing list were very supportive of this approach. [2]
Things that need to be decided:
1. Should we start one subprocess per SamzaContainer, or one subprocess per
StreamTask?
2. How should the parent interact with the subprocess at both the transport
(stdin/stdout, unix sockets, TCP, HTTP, Thrift, etc) and serialization level
(protobuf, json, etc)?
3. What should the protocol look like? We should ideally support all of the
operations in StreamTask, InitableTask, WindowableTask, ClosableTask, etc.
4. Should the child process receive the messages in batches, or one at a time?
It'd be good to get a draft proposal up on the Wiki, so we can all discuss this
and converge on an implementation.
[1] https://github.com/nathanmarz/storm/wiki/Multilang-protocol
[2]
http://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201403.mbox/%3CCAB%2B2NVXX2Fq_61WfvH%2BAfW8ZW7vQbVfTN-JPGU%2Bd7AdZ73oPDQ%40mail.gmail.com%3E
> Add thin multi-language support for SamzaContainer
> --------------------------------------------------
>
> Key: SAMZA-184
> URL: https://issues.apache.org/jira/browse/SAMZA-184
> Project: Samza
> Issue Type: Bug
> Components: container
> Affects Versions: 0.6.0
> Reporter: Chris Riccomini
> Assignee: David Chen
> Attachments: Test.java
>
>
> There has been some interest in supporting languages other than Java (or
> JVM-based languages). We have already opened up SAMZA-18, which proposes
> supporting a C implementation of SamzaContainer.
> A second solution to this problem is to have a StreamTask implementation that
> starts a child process in another language, and acts as a bridge between the
> child process and the java-based Samza APIs. This is the way that both Storm
> [1] and Hadoop work.
> A lot of design decisions need to be fleshed out to support this, but most
> people on the mailing list were very supportive of this approach. [2]
> Things that need to be decided:
> 1. Should we start one subprocess per SamzaContainer, or one subprocess per
> StreamTask?
> 2. How should the parent interact with the subprocess at both the transport
> (stdin/stdout, unix sockets, TCP, HTTP, Thrift, etc) and serialization level
> (protobuf, json, etc)?
> 3. What should the protocol look like? We should ideally support all of the
> operations in StreamTask, InitableTask, WindowableTask, ClosableTask, etc.
> 4. Should the child process receive the messages in batches, or one at a time?
> It'd be good to get a draft proposal up on the Wiki, so we can all discuss
> this and converge on an implementation.
> [1] http://storm.incubator.apache.org/documentation/Multilang-protocol.html
> [2]
> http://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201403.mbox/%3CCAB%2B2NVXX2Fq_61WfvH%2BAfW8ZW7vQbVfTN-JPGU%2Bd7AdZ73oPDQ%40mail.gmail.com%3E
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)