FWIW, I actually think the Hadoop streaming approach has some benefits. It is less efficient then writing and embedding a C library but also much much easier to implement and with less duplicate logic. I think we should be open to both of these--the streaming approach is so easy, it seems to me like there is not a huge downside to having that available.
I think the mistake that Hadoop streaming might have made was over-simplifying the interaction with the client process. You probably need a richer protocol than just the data (though I haven't thought this through). -Jay On Mon, Mar 10, 2014 at 12:26 PM, Jakob Homan <[email protected]> wrote: > Hey Dave- > Thanks for taking a look at Samza. No one in the community is currently > working on this at the moment, to our knowledge. SAMZA-18 ( > https://issues.apache.org/jira/browse/SAMZA-18) has the beginnings of a > discussion about creating a single C library to help provide multilanguage > support in Samza (which I believe would be accessible to Go as well). > There's currently no JIRA for Hadoop-style streaming, but one could > certainly be created and it would be something we'd be interested in. > Thanks, > Jakob > > > > On Mon, Mar 10, 2014 at 12:19 PM, Dave Revell <[email protected]> wrote: > > > Hi all, > > > > We're considering using Samza for our high-throughput stream processing > > workload, but we don't want to rewrite all of our existing Go code. We're > > considering writing something analogous to Hadoop Streaming, where the > > Samza consumer would start an external process and communicate with it by > > passing protobufs via stdin/stdout. We like Samza's fault tolerance, > state > > management, and load balancing features and don't want to rewrite them. > > > > This possibility is mentioned in the documentation ( > > > > > http://samza.incubator.apache.org/learn/documentation/0.7.0/comparisons/storm.html > > , > > search for "stdin") as something that might exist some day. My > > questions > > are: > > > > 1. Is anyone working on this, or planning to? I couldn't find any related > > JIRAs. > > 2. Any advice for implementing this? Are there any challenges that might > > not be obvious? > > 3. Should we try to merge this upstream? > > > > Thanks a bunch, > > Dave > > >
