Hi all, We're considering using Samza for our high-throughput stream processing workload, but we don't want to rewrite all of our existing Go code. We're considering writing something analogous to Hadoop Streaming, where the Samza consumer would start an external process and communicate with it by passing protobufs via stdin/stdout. We like Samza's fault tolerance, state management, and load balancing features and don't want to rewrite them.
This possibility is mentioned in the documentation ( http://samza.incubator.apache.org/learn/documentation/0.7.0/comparisons/storm.html, search for "stdin") as something that might exist some day. My questions are: 1. Is anyone working on this, or planning to? I couldn't find any related JIRAs. 2. Any advice for implementing this? Are there any challenges that might not be obvious? 3. Should we try to merge this upstream? Thanks a bunch, Dave
