Hi Chris,

Thanks for opening the ticket. I'll definitely create a wiki page about the
proposed design and protocol. Currently I'm still getting my feet wet with
Samza and working to understand its interfaces and capabilities. Once I
feel like I have a basic grasp of how all the pieces should fit together
I'll put up a wiki page and send a message to this thread.

Thanks,
Dave


On Thu, Mar 13, 2014 at 1:19 PM, Chris Riccomini <[email protected]>wrote:

> Hey Guys,
>
> Sorry I've been silent on this. I think we should pursue this approach as
> a nice easy low friction way to support multiple languages. I've opened up
> a ticket:
>
>   https://issues.apache.org/jira/browse/SAMZA-184
>
> Dave, if you're still up for it, it'd be really awesome to get a Wiki
> proposal from you. Since you have an actual use case, it'll be useful to
> have you provide direction. I'm happy to advise and answer any questions
> you have. I'm going to post my initial thoughts as a follow up comment on
> SAMZA-184.
>
> Cheers,
> Chris
>
> On 3/12/14 9:41 AM, "Dave Revell" <[email protected]> wrote:
>
> >Thanks a lot for the detailed reply. Responses inline:
> >
> >Agree -- it's easiest if there's only one message at a time being sent to
> >> the child process. Though we should benchmark that to make sure that
> >> performance is still good.
> >>
> >
> >Sounds good to me. The latency of the system calls + marshaling might turn
> >out to be significant.
> >
> >
> >> With one message at a time, a task always has to write something to
> >>stdout
> >> for every message it consumes, even if it doesn't want to emit an output
> >> for a particular input message -- otherwise Samza wouldn't know when to
> >> send the next message to the task.
> >>
> >
> >That's true, I didn't think of that. It shouldn't be hard though.
> >
> >
> >> Another thing to think about: do we want one child process per task, or
> >> one per container? One per task is a simpler processing model (matches
> >>the
> >> Java API), but one per container perhaps makes more sense from a
> >>resource
> >> allocation point of view.
> >>
> >
> >I think one per task. If the external process is stateful, that state
> >should be limited to a single partition. But I don't understand Samza well
> >enough to say whether this is the overriding concern. Also, like you say,
> >the programming model is much nicer.
> >
> >
> >> Yeah, I think allowing access to the KV store via stdin/stdout protocol
> >> makes the most sense. For example, to make a "get" request to the store,
> >> the task could write to stdout:
> >>
> >> {"cmd": "kv_get", "store": "my-store", "key": "foo"}
> >>
> >> to which Samza would respond by sending to stdin:
> >>
> >> {"cmd": "kv_get_response", "store": "my-store", "key": "foo", "value":
> >> "bar"}
> >>
> >
> >Sounds good to me.
> >
> >Thanks again,
> >Dave
>
>

Reply via email to