[ 
https://issues.apache.org/jira/browse/SAMZA-184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14130461#comment-14130461
 ] 

David Chen commented on SAMZA-184:
----------------------------------

Agreed. Using stdin/stdout would be the simplest to do.

bq. Ideally, we'd support all operations in all existing interfaces.

I have been thinking about this. In order to support {{InitableTask}}, 
{{WindowableTask}}, and {{ClosableTask}}, we check whether the the 
{{StreamTask}} implements those interfaces. For the multi-lang protocol, one 
possibility would be to have the task specify which interfaces it implements in 
the initial handshake. Then, based on what Samza receives from the handshake, 
build a {{StreamTask}} object (let's call it {{PipedStreamTask}}) that 
implements all the interfaces but is told which interfaces the task actually 
uses. That way, if the task uses the {{WindowableTask}} interface, 
{{PipedStreamTask}} would forward the {{window}} call but would just no-op 
other calls, such as {{close}}.

> Add thin multi-language support for SamzaContainer
> --------------------------------------------------
>
>                 Key: SAMZA-184
>                 URL: https://issues.apache.org/jira/browse/SAMZA-184
>             Project: Samza
>          Issue Type: Bug
>          Components: container
>    Affects Versions: 0.6.0
>            Reporter: Chris Riccomini
>            Assignee: David Chen
>              Labels: project
>         Attachments: Test.java
>
>
> There has been some interest in supporting languages other than Java (or 
> JVM-based languages). We have already opened up SAMZA-18, which proposes 
> supporting a C implementation of SamzaContainer.
> A second solution to this problem is to have a StreamTask implementation that 
> starts a child process in another language, and acts as a bridge between the 
> child process and the java-based Samza APIs. This is the way that both Storm 
> [1] and Hadoop work.
> A lot of design decisions need to be fleshed out to support this, but most 
> people on the mailing list were very supportive of this approach. [2]
> Things that need to be decided:
> 1. Should we start one subprocess per SamzaContainer, or one subprocess per 
> StreamTask?
> 2. How should the parent interact with the subprocess at both the transport 
> (stdin/stdout, unix sockets, TCP, HTTP, Thrift, etc) and serialization level 
> (protobuf, json, etc)?
> 3. What should the protocol look like? We should ideally support all of the 
> operations in StreamTask, InitableTask, WindowableTask, ClosableTask, etc.
> 4. Should the child process receive the messages in batches, or one at a time?
> It'd be good to get a draft proposal up on the Wiki, so we can all discuss 
> this and converge on an implementation.
> [1] http://storm.incubator.apache.org/documentation/Multilang-protocol.html
> [2] 
> http://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201403.mbox/%3CCAB%2B2NVXX2Fq_61WfvH%2BAfW8ZW7vQbVfTN-JPGU%2Bd7AdZ73oPDQ%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to