Hi Dave,
On 11 Mar 2014, at 16:54, Dave Revell <[email protected]> wrote:
> Samza currently has the nice property of having one message in flight at a
> time per partition. I think we'd want to preserve that and avoid adding any
> pipelining by having multiple messages in flight.
Agree -- it's easiest if there's only one message at a time being sent to the
child process. Though we should benchmark that to make sure that performance is
still good.
With one message at a time, a task always has to write something to stdout for
every message it consumes, even if it doesn't want to emit an output for a
particular input message -- otherwise Samza wouldn't know when to send the next
message to the task.
Another thing to think about: do we want one child process per task, or one per
container? One per task is a simpler processing model (matches the Java API),
but one per container perhaps makes more sense from a resource allocation point
of view.
> In reply to Jay, I take your point about Hadoop Streaming. Adding
> lifecycle/control messages to the protocol seems beneficial. We could also
> allow the external process to access the KV store in the JVM via the
> stdin/stdout protocol.
Yeah, I think allowing access to the KV store via stdin/stdout protocol makes
the most sense. For example, to make a "get" request to the store, the task
could write to stdout:
{"cmd": "kv_get", "store": "my-store", "key": "foo"}
to which Samza would respond by sending to stdin:
{"cmd": "kv_get_response", "store": "my-store", "key": "foo", "value": "bar"}
...or something of that sort.
> We'd also want to use an encoding scheme that
> doesn't have problems with separators (Hadoop Streaming has trouble with
> data that contains internal tabs and newlines). I think I'd prefer
> protobufs over JSON for performance reasons, but it could be configurable.
Protobuf would be an additional dependency, but I'd +1 it.
> So it seems like this is doable. There are some design questions but maybe
> a proof of concept would be a good next step. We'll be in touch if/when
> that happens. If anyone else wants to try it, we'd welcome that also :)
Would be awesome if you want to go ahead. Just say if you need any help.
Best,
Martin