FWIW, I actually think the Hadoop streaming approach has some benefits. It
is less efficient then writing and embedding a C library but also much much
easier to implement and with less duplicate logic. I think we should be
open to both of these--the streaming approach is so easy, it seems to me
like there is not a huge downside to having that available.

I think the mistake that Hadoop streaming might have made was
over-simplifying the interaction with the client process. You probably need
a richer protocol than just the data (though I haven't thought this
through).

-Jay


On Mon, Mar 10, 2014 at 12:26 PM, Jakob Homan <[email protected]> wrote:

> Hey Dave-
>    Thanks for taking a look at Samza.  No one in the community is currently
> working on this at the moment, to our knowledge.  SAMZA-18 (
> https://issues.apache.org/jira/browse/SAMZA-18) has the beginnings of a
> discussion about creating a single C library to help provide multilanguage
> support in Samza (which I believe would be accessible to Go as well).
> There's currently no JIRA for Hadoop-style streaming, but one could
> certainly be created and it would be something we'd be interested in.
> Thanks,
> Jakob
>
>
>
> On Mon, Mar 10, 2014 at 12:19 PM, Dave Revell <[email protected]> wrote:
>
> > Hi all,
> >
> > We're considering using Samza for our high-throughput stream processing
> > workload, but we don't want to rewrite all of our existing Go code. We're
> > considering writing something analogous to Hadoop Streaming, where the
> > Samza consumer would start an external process and communicate with it by
> > passing protobufs via stdin/stdout. We like Samza's fault tolerance,
> state
> > management, and load balancing features and don't want to rewrite them.
> >
> > This possibility is mentioned in the documentation (
> >
> >
> http://samza.incubator.apache.org/learn/documentation/0.7.0/comparisons/storm.html
> > ,
> > search for "stdin") as something that might exist some day. My
> > questions
> > are:
> >
> > 1. Is anyone working on this, or planning to? I couldn't find any related
> > JIRAs.
> > 2. Any advice for implementing this? Are there any challenges that might
> > not be obvious?
> > 3. Should we try to merge this upstream?
> >
> > Thanks a bunch,
> > Dave
> >
>

Reply via email to