Hi Chris, Thanks for opening the ticket. I'll definitely create a wiki page about the proposed design and protocol. Currently I'm still getting my feet wet with Samza and working to understand its interfaces and capabilities. Once I feel like I have a basic grasp of how all the pieces should fit together I'll put up a wiki page and send a message to this thread.
Thanks, Dave On Thu, Mar 13, 2014 at 1:19 PM, Chris Riccomini <[email protected]>wrote: > Hey Guys, > > Sorry I've been silent on this. I think we should pursue this approach as > a nice easy low friction way to support multiple languages. I've opened up > a ticket: > > https://issues.apache.org/jira/browse/SAMZA-184 > > Dave, if you're still up for it, it'd be really awesome to get a Wiki > proposal from you. Since you have an actual use case, it'll be useful to > have you provide direction. I'm happy to advise and answer any questions > you have. I'm going to post my initial thoughts as a follow up comment on > SAMZA-184. > > Cheers, > Chris > > On 3/12/14 9:41 AM, "Dave Revell" <[email protected]> wrote: > > >Thanks a lot for the detailed reply. Responses inline: > > > >Agree -- it's easiest if there's only one message at a time being sent to > >> the child process. Though we should benchmark that to make sure that > >> performance is still good. > >> > > > >Sounds good to me. The latency of the system calls + marshaling might turn > >out to be significant. > > > > > >> With one message at a time, a task always has to write something to > >>stdout > >> for every message it consumes, even if it doesn't want to emit an output > >> for a particular input message -- otherwise Samza wouldn't know when to > >> send the next message to the task. > >> > > > >That's true, I didn't think of that. It shouldn't be hard though. > > > > > >> Another thing to think about: do we want one child process per task, or > >> one per container? One per task is a simpler processing model (matches > >>the > >> Java API), but one per container perhaps makes more sense from a > >>resource > >> allocation point of view. > >> > > > >I think one per task. If the external process is stateful, that state > >should be limited to a single partition. But I don't understand Samza well > >enough to say whether this is the overriding concern. Also, like you say, > >the programming model is much nicer. > > > > > >> Yeah, I think allowing access to the KV store via stdin/stdout protocol > >> makes the most sense. For example, to make a "get" request to the store, > >> the task could write to stdout: > >> > >> {"cmd": "kv_get", "store": "my-store", "key": "foo"} > >> > >> to which Samza would respond by sending to stdin: > >> > >> {"cmd": "kv_get_response", "store": "my-store", "key": "foo", "value": > >> "bar"} > >> > > > >Sounds good to me. > > > >Thanks again, > >Dave > >
