Replies from the dev list weren't getting to my original sender account. I could create a pull request but it has some patchy code in ConnectEmbedded and dependencies which shouldn't be there (connect-runtime) as it was mostly intended for imagining how the integrated connect and streams topology would look - but if you are happy for :stream:examples to have dependencies that can be removed after connect-api has a proper embedded support than I can send a pull request your way.
On 25 March 2016 at 21:38, Guozhang Wang <wangg...@gmail.com> wrote: > I am thinking maybe we can even consider pulling the project as a whole > into examples instead of adding the connector and streams implementation > separately into Kafka Connect and Kafka Streams if Michal is interested to > filing a PR: currently the examples folder only contains consumer / > producer demos which is packaged as kafka.examples. > > Guozhang > > > On Fri, Mar 25, 2016 at 9:28 AM, Neha Narkhede <n...@confluent.io> wrote: > > > Michal -- This is really cool. Mind submitting a pull request? > > > > Also, would you like your IRC connector to be featured on the Kafka > > Connector Hub <http://connectors.confluent.io>? > > > > On Fri, Mar 25, 2016 at 9:08 AM, Michal Hariš <michal.har...@gmail.com> > > wrote: > > > > > So I had a go and hacked it up here: ConnectEmbedded.java > > > < > > > > > > https://github.com/amient/affinity-stack/blob/master/dev/connectors/connect-runtime/src/main/java/io/amient/kafka/connect/ConnectEmbedded.java > > > > > > > > > > > > > And this is how the wikipedia demo looks with it: hello-kafka-streams > > > < > > > > > > https://github.com/amient/affinity-stack/blob/master/dev/hello-kafka-streams/src/main/java/io/amient/kafka/streams/wikipedia/WikipediaStreamAppMain.java > > > > > > > > > > > > > As a side-effect there is a generic IRC connector too: > kafka-connect-irc > > > < > > > > > > https://github.com/amient/affinity-stack/tree/master/dev/connectors/kafka-connect-irc/src/main/java/io/amient/kafka/connect/irc > > > > > > > > > > It's kind of neat to have topology encapsulating connect and streams > in a > > > single instance that can just be scaled together symmetrically. > > > > > > Overall this was one of the most fun hack I had in a long time and the > > > result compared to the Samza equivalent looks clean and lightweight. It > > > also allows for zero-downtime with appropriate combination of > deployment > > > strategy and replication, which is something that was quite tricky with > > > Samza and YARN host affinity. > > > > > > One thing though I can't get my head around is why in Kafka Connect > there > > > has to be a custom internal schema format for the in-memory runtime > > > instead of just using Avro as the internal - the systems that talk in > > Avro > > > would have a performance gain and non-Avro guys would have converters > the > > > same way they have them now. > > > > > > > > > On Thu, Mar 24, 2016 at 11:46 AM, Michal Hariš < > michal.har...@gmail.com> > > > wrote: > > > > > > > Hello Kafka people! > > > > > > > > Great to see Kafka Streams coming along, the design validates (and in > > > many > > > > way supersedes) my own findings from working with various stream > > > processing > > > > systems/frameworks and eventually ending-up using just a small custom > > > > library built directly around Kafka. > > > > > > > > I have set out yesterday to translate Hello Samza (the wikipedia feed > > > > example) into Kafka Streams application. Now because this workflow > > starts > > > > by polling wikipedia IRC and publishes to a topic from which the > stream > > > > processors pick-up it would be nice to have this first part done by > > Kafka > > > > Connect but: > > > > > > > > 1. IRC channels are not seekable and Kafka Connect architecture > claims > > > > that all sources must be seekable - is this still suitable ? (I guess > > yes > > > > as FileStreamSourceTask can read from stdin which is similar) > > > > > > > > 2. I would like to have ConnectEmbedded (as opposed to > > ConnectStandalone > > > > or ConnectDistributed) which is similar to ConnectDistributed, just > > > without > > > > the rest server - i.e. say I have the WikipediaFeedConnector and I > want > > > to > > > > launch it programatically from all the instances along-side the Kafka > > > > Streams - but reusing the connect distributed coordination so that > only > > > one > > > > instance actually reads the IRC data but another instance picks up > work > > > if > > > > that one dies - does it sound like a bad idea for some design reason > ? > > - > > > > the only problem I see is rather technical that the coordination > > process > > > > uses the rest server for some actions. > > > > > > > > Cheers, > > > > Michal > > > > > > > > > > > > > > > -- > > Thanks, > > Neha > > > > > > -- > -- Guozhang > -- Michal Haris Technical Architect direct line: +44 (0) 207 749 0229 www.visualdna.com | t: +44 (0) 207 734 7033 31 Old Nichol Street London E2 7HR