Hello Kafka people! Great to see Kafka Streams coming along, the design validates (and in many way supersedes) my own findings from working with various stream processing systems/frameworks and eventually ending-up using just a small custom library built directly around Kafka.
I have set out yesterday to translate Hello Samza (the wikipedia feed example) into Kafka Streams application. Now because this workflow starts by polling wikipedia IRC and publishes to a topic from which the stream processors pick-up it would be nice to have this first part done by Kafka Connect but: 1. IRC channels are not seekable and Kafka Connect architecture claims that all sources must be seekable - is this still suitable ? (I guess yes as FileStreamSourceTask can read from stdin which is similar) 2. I would like to have ConnectEmbedded (as opposed to ConnectStandalone or ConnectDistributed) which is similar to ConnectDistributed, just without the rest server - i.e. say I have the WikipediaFeedConnector and I want to launch it programatically from all the instances along-side the Kafka Streams - but reusing the connect distributed coordination so that only one instance actually reads the IRC data but another instance picks up work if that one dies - does it sound like a bad idea for some design reason ? - the only problem I see is rather technical that the coordination process uses the rest server for some actions. Cheers, Michal