Hello Kafka people!

Great to see Kafka Streams coming along, the design validates (and in many
way supersedes) my own findings from working with various stream processing
systems/frameworks and eventually ending-up using just a small custom
library built directly around Kafka.

I have set out yesterday to translate Hello Samza (the wikipedia feed
example) into Kafka Streams application. Now because this workflow starts
by polling wikipedia IRC and publishes to a topic from which the stream
processors pick-up it would be nice to have this first part done by Kafka
Connect but:

1. IRC channels are not seekable and Kafka Connect architecture claims that
all sources must be seekable - is this still suitable ? (I guess yes as
FileStreamSourceTask can read from stdin which is similar)

2. I would like to have ConnectEmbedded (as opposed to ConnectStandalone or
ConnectDistributed) which is similar to ConnectDistributed, just without
the rest server - i.e. say I have the WikipediaFeedConnector and I want to
launch it programatically from all the instances along-side the Kafka
Streams - but reusing the connect distributed coordination so that only one
instance actually reads the IRC data but another instance picks up work if
that one dies - does it sound like a bad idea for some design reason ? -
the only problem I see is rather technical that the coordination process
uses the rest server for some actions.

Cheers,
Michal

Reply via email to