Hi Dave,

On 11 Mar 2014, at 16:54, Dave Revell <[email protected]> wrote:
> Samza currently has the nice property of having one message in flight at a
> time per partition. I think we'd want to preserve that and avoid adding any
> pipelining by having multiple messages in flight.

Agree -- it's easiest if there's only one message at a time being sent to the 
child process. Though we should benchmark that to make sure that performance is 
still good.

With one message at a time, a task always has to write something to stdout for 
every message it consumes, even if it doesn't want to emit an output for a 
particular input message -- otherwise Samza wouldn't know when to send the next 
message to the task.

Another thing to think about: do we want one child process per task, or one per 
container? One per task is a simpler processing model (matches the Java API), 
but one per container perhaps makes more sense from a resource allocation point 
of view.

> In reply to Jay, I take your point about Hadoop Streaming. Adding
> lifecycle/control messages to the protocol seems beneficial. We could also
> allow the external process to access the KV store in the JVM via the
> stdin/stdout protocol.

Yeah, I think allowing access to the KV store via stdin/stdout protocol makes 
the most sense. For example, to make a "get" request to the store, the task 
could write to stdout:

{"cmd": "kv_get", "store": "my-store", "key": "foo"}

to which Samza would respond by sending to stdin:

{"cmd": "kv_get_response", "store": "my-store", "key": "foo", "value": "bar"}

...or something of that sort.

> We'd also want to use an encoding scheme that
> doesn't have problems with separators (Hadoop Streaming has trouble with
> data that contains internal tabs and newlines). I think I'd prefer
> protobufs over JSON for performance reasons, but it could be configurable.

Protobuf would be an additional dependency, but I'd +1 it.

> So it seems like this is doable. There are some design questions but maybe
> a proof of concept would be a good next step. We'll be in touch if/when
> that happens. If anyone else wants to try it, we'd welcome that also :)

Would be awesome if you want to go ahead. Just say if you need any help.

Best,
Martin

Reply via email to