Headers are used in producer record, how are the higher version broker and lower version clients compatible

2020-08-20 Thread Yuguang Zhao
Hi all, I would like to ask some questions.


our company env:
Broker version: 1.0.1
producer use kafka-client version:  1.0.1
consumer use kafka-client version:  0.10.2.2


the producer send record and the record with header,
the consumer can not consume the record and out a WARN LOG at 
`org.apache.kafka.clients.consumer.internals.Fetcher`  log.warn("Unknown error 
fetching data for topic-partition {}", tp);


the Stack is 
parseCompletedFetch:844, Fetcher (org.apache.kafka.clients.consumer.internals)
fetchedRecords:482, Fetcher (org.apache.kafka.clients.consumer.internals)
pollOnce:1038, KafkaConsumer (org.apache.kafka.clients.consumer)
poll:996, KafkaConsumer (org.apache.kafka.clients.consumer)
run:201, CaseController$ConsumerThread 
(test.org.apache.skywalking.apm.testcase.kafka.controller)



Because we have so many projects, in the tens of thousands
The question I want to ask is, without upgrading consumer Kafka-Client version, 
one,what configuration does the 1.0.1 Version Broker have that allows it to sit 
down and be compatible with a lower consumer kafka-client version like 0.10.2.2 
Kafka pulling messages, such as removing headers, or something better than that?
two,Where can I see all the kafka-Client versions of the subscribed Topic?
three,What better way to be compatible with this situation?


Thank You



Fwd: MirrorMaker2 for active-active setup

2020-08-20 Thread Anup Shirolkar
Hi Kafka users,

I am trying to set up an active-active Kafka setup in docker with 2 MM2
instances.
The setup logically looks like the diagram below.

[image: Screen Shot 2020-08-20 at 2.39.13 pm.png]
Each MM2 is just copying data from the other Kafka cluster and so is
unidirectional.
Strangely, when I start the setup the MM2 instance which is brought up
first works fine but the one later does not copy topics.
Also, the offsets-sync topic is not created by the MM2 instance brought up
later.

If I use a single MM2 instance for a bidirectional copy it works fine.
There are no errors in the logs of MM2 instances.

Please let me know what could be wrong here.

- Anup


MirrorMaker2 for active-active setup

2020-08-20 Thread Anup Shirolkar
Hi Kafka users,

I am trying to setup an active-active Kafka setup in docker with 2 MM2
instances.
The setup logically looks like the diagram below.

[image: Screen Shot 2020-08-20 at 2.39.13 pm.png]
Each MM2 is just copying data from the other kafka cluster and so is
unidirectional.
Strangely, when I start the setup the MM2 instance which is brough up first
works fine but the one later does not copy topics.
Also the offsets-sync topic is not created by the MM2 instance brought up
later.

If I use a single MM2 instance for bidirectional copy it works fine.
There are no errors in the logs of MM2 instances.

Please let me know what could be wrong here.

- Anup


How to update configurations in MM 2.0

2020-08-20 Thread nitin agarwal
Hi All,

If we run MM 2.0 in a dedicated MirrorMaker cluster, how do we update a few
properties like increasing the number of tasks, adding a topic in whitelist
or blacklist etc.

What I have observed is that if we update the properties in
connect-mirror-maker.properties and restart the MM 2.0 nodes then it
doesn't pick up the new configuration. It picks the one which was already
stored in the internal config topic.

Thank you,
Nitin


Re: Mirror Maker 2.0 Queries

2020-08-20 Thread Ananya Sen
Thanks a lot Ryanne. That was really very helpful.

On Thu, Aug 20, 2020, 11:49 PM Ryanne Dolan  wrote:

> > Can we configure tasks.max for each of these connectors separately?
>
> I don't believe that's currently possible. If you need fine-grained control
> over each Connector like that, you might consider running MM2's Connectors
> manually on a bunch of Connect clusters. This requires more effort to set
> up, but enables you to control the configuration of each Connector using
> the Connect REST API.
>
> Ryanne
>
> On Thu, Aug 20, 2020 at 12:30 PM Ananya Sen 
> wrote:
>
> > Thanks, Ryanne. That answers my questions. I was actually missing this
> > "tasks.max" property. Thanks for pointing that out.
> >
> > Furthermore, as per the KIP of Mirror Maker 2.0, there are 3 types of
> > connectors in a Mirror Maker Cluster:
> >
> >1. KafkaSourceConnector - focus on replicating topic partitions
> >2. KafkaCheckpointConnector - focus on replicating consumer groups
> >3. KafkaHeartbeatConnector - focus on checking cluster availability
> >
> > *Can we configure tasks.max for each of these connectors separately? That
> > is, Can I have 3 tasks for KafkaSourceConnector, 5
> > for KafkaCheckpointConnector, and 1 for KafkaHeartbeatConnector?*
> >
> >
> >
> > Regards
> > Ananya Sen
> >
> > On Thu, Aug 20, 2020 at 6:39 PM Ryanne Dolan 
> > wrote:
> >
> > > Ananya, see responses below.
> > >
> > > > Can this number of workers be configured?
> > >
> > > The number of workers is not exactly configurable, but you can control
> it
> > > by spinning up drivers and using the '--clusters' flag. A driver
> instance
> > > without '--clusters' will run one worker for each A->B replication
> flow.
> > So
> > > e.g. if you've got two clusters being replicated bidirectionally,
> you'll
> > > have an A->B worker and a B->A worker on each MM2 driver.
> > >
> > > You can use the '--clusters' flag to limit what clusters are targeted
> > for a
> > > given driver, which is useful in many ways, including to limit the
> number
> > > of workers for a given worker. So e.g. if you've got 10 clusters all
> > being
> > > replicated in a full mesh you can run a driver with '--clusters A' and
> it
> > > will have only 9 workers, one for each of the other clusters.
> > >
> > > Also note that there is a configuration property 'tasks.max' that
> > controls
> > > the number of tasks available to workers. Each A->B flow is replicated
> > by a
> > > Herd of Workers (in Connect terminology), and Herds work on Tasks. By
> > > default, 'tasks.max' is one, which means there will only be one task
> for
> > > each Herd, regardless of how many drivers and workers you spin up. You
> > > definitely want to change this property. You can tweak this for each
> A->B
> > > replication flow independently to strike the right balance. If
> > 'tasks.max'
> > > is the same or more than the total number of topic-partitions being
> > > replicated, it will mean each topic-partition is replicated in a
> > dedicated
> > > task, which is probably not an efficient use of resource overhead.
> > >
> > > > Does every topic partition given a new task?
> > >
> > > No, topic-partitions are spread out across tasks. Each topic's
> partitions
> > > are divided round-robin among available tasks. However, keep in mind
> that
> > > if 'tasks.max' is too high, you could end up with one topic-partition
> in
> > > each task.
> > >
> > > > Does every consumer group - topic pair given a new task for
> replicating
> > > offset?
> > >
> > > No, consumer-groups are also spread out across tasks. As with
> > > topic-partitions, 'tasks.max' applies.
> > >
> > > > How can I scale up the mirror maker instance so that I can have very
> > > little lag?
> > >
> > > Tweak 'tasks.max' and spin up more driver instances.
> > >
> > > Ryanne
> > >
> > > On Sat, Aug 8, 2020 at 1:43 AM Ananya Sen 
> > wrote:
> > >
> > > > Thank you Ryanne for the quick response.
> > > > I further want to clarify a few points.
> > > >
> > > > The mirror maker 2.0 is based on the Kafka Connect framework. In
> Kafka
> > > > connect we have multiple workers and each worker has some assigned
> > task.
> > > To
> > > > map this to Mirror Maker 2.0, A mirror Maker will driver have some
> > > workers.
> > > >
> > > > 1) Can this number of workers be configured?
> > > > 2) What is the default value of this worker configuration?
> > > > 3) Does every topic partition given a new task?
> > > > 4) Does every consumer group - topic pair given a new task for
> > > replicating
> > > > offset?
> > > >
> > > > Also, consider a case where I have 1000 topics in a Kafka cluster and
> > > each
> > > > topic has a high amount of data + new data is being written at high
> > > > throughput. Now I want to set up a mirror maker 2.0 on this cluster
> to
> > > > replicate all the old data (which is retained in the topic) as well
> as
> > > the
> > > > new incoming data in a backup cluster. How can I scale up the mirror
> > > maker
> > > > instance s

Re: Mirror Maker 2.0 Queries

2020-08-20 Thread Ryanne Dolan
> Can we configure tasks.max for each of these connectors separately?

I don't believe that's currently possible. If you need fine-grained control
over each Connector like that, you might consider running MM2's Connectors
manually on a bunch of Connect clusters. This requires more effort to set
up, but enables you to control the configuration of each Connector using
the Connect REST API.

Ryanne

On Thu, Aug 20, 2020 at 12:30 PM Ananya Sen  wrote:

> Thanks, Ryanne. That answers my questions. I was actually missing this
> "tasks.max" property. Thanks for pointing that out.
>
> Furthermore, as per the KIP of Mirror Maker 2.0, there are 3 types of
> connectors in a Mirror Maker Cluster:
>
>1. KafkaSourceConnector - focus on replicating topic partitions
>2. KafkaCheckpointConnector - focus on replicating consumer groups
>3. KafkaHeartbeatConnector - focus on checking cluster availability
>
> *Can we configure tasks.max for each of these connectors separately? That
> is, Can I have 3 tasks for KafkaSourceConnector, 5
> for KafkaCheckpointConnector, and 1 for KafkaHeartbeatConnector?*
>
>
>
> Regards
> Ananya Sen
>
> On Thu, Aug 20, 2020 at 6:39 PM Ryanne Dolan 
> wrote:
>
> > Ananya, see responses below.
> >
> > > Can this number of workers be configured?
> >
> > The number of workers is not exactly configurable, but you can control it
> > by spinning up drivers and using the '--clusters' flag. A driver instance
> > without '--clusters' will run one worker for each A->B replication flow.
> So
> > e.g. if you've got two clusters being replicated bidirectionally, you'll
> > have an A->B worker and a B->A worker on each MM2 driver.
> >
> > You can use the '--clusters' flag to limit what clusters are targeted
> for a
> > given driver, which is useful in many ways, including to limit the number
> > of workers for a given worker. So e.g. if you've got 10 clusters all
> being
> > replicated in a full mesh you can run a driver with '--clusters A' and it
> > will have only 9 workers, one for each of the other clusters.
> >
> > Also note that there is a configuration property 'tasks.max' that
> controls
> > the number of tasks available to workers. Each A->B flow is replicated
> by a
> > Herd of Workers (in Connect terminology), and Herds work on Tasks. By
> > default, 'tasks.max' is one, which means there will only be one task for
> > each Herd, regardless of how many drivers and workers you spin up. You
> > definitely want to change this property. You can tweak this for each A->B
> > replication flow independently to strike the right balance. If
> 'tasks.max'
> > is the same or more than the total number of topic-partitions being
> > replicated, it will mean each topic-partition is replicated in a
> dedicated
> > task, which is probably not an efficient use of resource overhead.
> >
> > > Does every topic partition given a new task?
> >
> > No, topic-partitions are spread out across tasks. Each topic's partitions
> > are divided round-robin among available tasks. However, keep in mind that
> > if 'tasks.max' is too high, you could end up with one topic-partition in
> > each task.
> >
> > > Does every consumer group - topic pair given a new task for replicating
> > offset?
> >
> > No, consumer-groups are also spread out across tasks. As with
> > topic-partitions, 'tasks.max' applies.
> >
> > > How can I scale up the mirror maker instance so that I can have very
> > little lag?
> >
> > Tweak 'tasks.max' and spin up more driver instances.
> >
> > Ryanne
> >
> > On Sat, Aug 8, 2020 at 1:43 AM Ananya Sen 
> wrote:
> >
> > > Thank you Ryanne for the quick response.
> > > I further want to clarify a few points.
> > >
> > > The mirror maker 2.0 is based on the Kafka Connect framework. In Kafka
> > > connect we have multiple workers and each worker has some assigned
> task.
> > To
> > > map this to Mirror Maker 2.0, A mirror Maker will driver have some
> > workers.
> > >
> > > 1) Can this number of workers be configured?
> > > 2) What is the default value of this worker configuration?
> > > 3) Does every topic partition given a new task?
> > > 4) Does every consumer group - topic pair given a new task for
> > replicating
> > > offset?
> > >
> > > Also, consider a case where I have 1000 topics in a Kafka cluster and
> > each
> > > topic has a high amount of data + new data is being written at high
> > > throughput. Now I want to set up a mirror maker 2.0 on this cluster to
> > > replicate all the old data (which is retained in the topic) as well as
> > the
> > > new incoming data in a backup cluster. How can I scale up the mirror
> > maker
> > > instance so that I can have very little lag?
> > >
> > > On 2020/07/11 06:37:56, Ananya Sen  wrote:
> > > > Hi
> > > >
> > > > I was exploring the Mirror maker 2.0. I read through this
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0
> > > > documentation
> > > > and I have  a few questions.
> > > >
> > > >1. For running mi

Re: Mirror Maker 2.0 Queries

2020-08-20 Thread Ananya Sen
Thanks, Ryanne. That answers my questions. I was actually missing this
"tasks.max" property. Thanks for pointing that out.

Furthermore, as per the KIP of Mirror Maker 2.0, there are 3 types of
connectors in a Mirror Maker Cluster:

   1. KafkaSourceConnector - focus on replicating topic partitions
   2. KafkaCheckpointConnector - focus on replicating consumer groups
   3. KafkaHeartbeatConnector - focus on checking cluster availability

*Can we configure tasks.max for each of these connectors separately? That
is, Can I have 3 tasks for KafkaSourceConnector, 5
for KafkaCheckpointConnector, and 1 for KafkaHeartbeatConnector?*



Regards
Ananya Sen

On Thu, Aug 20, 2020 at 6:39 PM Ryanne Dolan  wrote:

> Ananya, see responses below.
>
> > Can this number of workers be configured?
>
> The number of workers is not exactly configurable, but you can control it
> by spinning up drivers and using the '--clusters' flag. A driver instance
> without '--clusters' will run one worker for each A->B replication flow. So
> e.g. if you've got two clusters being replicated bidirectionally, you'll
> have an A->B worker and a B->A worker on each MM2 driver.
>
> You can use the '--clusters' flag to limit what clusters are targeted for a
> given driver, which is useful in many ways, including to limit the number
> of workers for a given worker. So e.g. if you've got 10 clusters all being
> replicated in a full mesh you can run a driver with '--clusters A' and it
> will have only 9 workers, one for each of the other clusters.
>
> Also note that there is a configuration property 'tasks.max' that controls
> the number of tasks available to workers. Each A->B flow is replicated by a
> Herd of Workers (in Connect terminology), and Herds work on Tasks. By
> default, 'tasks.max' is one, which means there will only be one task for
> each Herd, regardless of how many drivers and workers you spin up. You
> definitely want to change this property. You can tweak this for each A->B
> replication flow independently to strike the right balance. If 'tasks.max'
> is the same or more than the total number of topic-partitions being
> replicated, it will mean each topic-partition is replicated in a dedicated
> task, which is probably not an efficient use of resource overhead.
>
> > Does every topic partition given a new task?
>
> No, topic-partitions are spread out across tasks. Each topic's partitions
> are divided round-robin among available tasks. However, keep in mind that
> if 'tasks.max' is too high, you could end up with one topic-partition in
> each task.
>
> > Does every consumer group - topic pair given a new task for replicating
> offset?
>
> No, consumer-groups are also spread out across tasks. As with
> topic-partitions, 'tasks.max' applies.
>
> > How can I scale up the mirror maker instance so that I can have very
> little lag?
>
> Tweak 'tasks.max' and spin up more driver instances.
>
> Ryanne
>
> On Sat, Aug 8, 2020 at 1:43 AM Ananya Sen  wrote:
>
> > Thank you Ryanne for the quick response.
> > I further want to clarify a few points.
> >
> > The mirror maker 2.0 is based on the Kafka Connect framework. In Kafka
> > connect we have multiple workers and each worker has some assigned task.
> To
> > map this to Mirror Maker 2.0, A mirror Maker will driver have some
> workers.
> >
> > 1) Can this number of workers be configured?
> > 2) What is the default value of this worker configuration?
> > 3) Does every topic partition given a new task?
> > 4) Does every consumer group - topic pair given a new task for
> replicating
> > offset?
> >
> > Also, consider a case where I have 1000 topics in a Kafka cluster and
> each
> > topic has a high amount of data + new data is being written at high
> > throughput. Now I want to set up a mirror maker 2.0 on this cluster to
> > replicate all the old data (which is retained in the topic) as well as
> the
> > new incoming data in a backup cluster. How can I scale up the mirror
> maker
> > instance so that I can have very little lag?
> >
> > On 2020/07/11 06:37:56, Ananya Sen  wrote:
> > > Hi
> > >
> > > I was exploring the Mirror maker 2.0. I read through this
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0
> > > documentation
> > > and I have  a few questions.
> > >
> > >1. For running mirror maker as a dedicated mirror maker cluster, the
> > >documentation specifies a config file and a starter script. Is this
> > mirror
> > >maker process distributed ?
> > >2. I could not find any port configuration for the above mirror
> maker
> > >process, So can we configure mirror maker itself to run as a cluster
> > i.e
> > >running the process instance across multiple server to avoid
> downtime
> > due
> > >to server crash.
> > >3. If we could somehow run the mirror maker as a distributed process
> > >then does that mean that topic and consumer offset replication will
> be
> > >shared among those mirror maker processes?
> > >4

Re: Kafka Streams Key-value store question

2020-08-20 Thread Matthias J. Sax
`auto.offset.reset` does not apply for global-store-topics.

At startup, we app would always "seek-to-beginning" for a
global-store-topic, bootstrap the global-store, and afterwards start the
actually processing.

However, no offsets are committed for global-store-topics. Maybe this is
the reason why you think no data was read?


-Matthias

On 8/20/20 5:30 AM, Liam Clarke-Hutchinson wrote:
> Hi Pirow,
> 
> You can configure the auto offset reset for your stream source's consumer
> to "earliest" if you want to consume all available data if no committed
> offset exists. This will populate the state store on first run.
> 
> Cheers,
> 
> Liam Clarke-Hutchinson
> 
> 
> On Thu, 20 Aug. 2020, 11:58 pm Pirow Engelbrecht, <
> pirow.engelbre...@etion.co.za> wrote:
> 
>> Hi Bill,
>>
>>
>>
>> Yes, that seems to be exactly what I need. I’ve instantiated this global
>> store with:
>>
>> topology.addGlobalStore(storeBuilder, "KVstoreSource", Serdes.String().
>> deserializer(), Serdes.String().deserializer(), this.config_kvStoreTopic,
>> "KVprocessor", KeyValueProcessor::new);
>>
>>
>>
>>
>>
>> I’ve added a KeyValueProcessor that puts incoming Kafka key-value pairs
>> into the store. The problem is that if the application starts for the first
>> time, it does not process any key-value pairs already in the Kafka topic.
>> Is there a way around this?
>>
>>
>>
>> Thanks
>>
>>
>>
>> *Pirow Engelbrecht*
>> System Engineer
>>
>> *E.* pirow.engelbre...@etion.co.za
>> *T.* +27 12 678 9740 (ext. 9879)
>> *M.* +27 63 148 3376
>>
>> 76 Regency Drive | Irene | Centurion | 0157
>> 
>> *www.etion.co.za* 
>>
>> 
>>
>> Facebook
>>  |
>> YouTube  |
>> LinkedIn  | Twitter
>>  | Instagram
>> 
>>
>>
>>
>> *From:* Bill Bejeck 
>> *Sent:* Wednesday, 19 August 2020 3:53 PM
>> *To:* users@kafka.apache.org
>> *Subject:* Re: Kafka Streams Key-value store question
>>
>>
>>
>> Hi Pirow,
>>
>> If I'm understanding your requirements correctly, I think using a global
>> store
>> <
>> https://kafka.apache.org/26/javadoc/org/apache/kafka/streams/StreamsBuilder.html#addGlobalStore-org.apache.kafka.streams.state.StoreBuilder-java.lang.String-org.apache.kafka.streams.kstream.Consumed-org.apache.kafka.streams.processor.ProcessorSupplier-
>>>
>> will
>> work for you.
>>
>> HTH,
>> Bill
>>
>> On Wed, Aug 19, 2020 at 8:53 AM Pirow Engelbrecht <
>> pirow.engelbre...@etion.co.za> wrote:
>>
>>> Hello,
>>>
>>>
>>>
>>> We’re building a JSON decorator using Kafka Streams’ processing API.
>>>
>>>
>>>
>>> The process is briefly that a piece of JSON should be consumed from an
>>> input topic (keys are null, value is the JSON). The JSON contains a field
>>> (e.g. “thisField”) with a value (e.g. “someLink”) . This value (and a
>>> timestamp) is used to look-up another piece JSON from a key-value topic
>>> (keys are all the different values of “thisField”, values are JSON). This
>>> key-value topic is created by another service in Kafka. This additional
>>> piece of JSON then gets appended to the input JSON and the result gets
>>> written to an output topic (keys are null, value is now the original
>> JSON +
>>> lookup JSON).
>>>
>>>
>>>
>>> To do the query against a key-value store, ideally I want Kafka Streams
>> to
>>> directly create and update a window key-value store in memory (or disk)
>>> from my key-value topic in Kafka, but I am unable to find a way to
>> specify
>>> this through the StoreBuilder interface. Does anybody know how to do
>> this?
>>>
>>> Here is my current Storebuilder code snippet:
>>>
>>> StoreBuilder> storeBuilder = Stores.
>>> windowStoreBuilder(
>>>
>>> Stores.persistentWindowStore("loopkupStore",
>>> Duration.ofDays(14600), Duration.ofDays(14600), false),
>>>
>>> Serdes.String(),
>>>
>>> Serdes.String());
>>>
>>> storeBuilder.build();
>>>
>>>
>>>
>>>
>>>
>>> Currently my workaround is to have a sink for the key-value store and
>> then
>>> create/update this key-value store using a node in the processing
>> topology,
>>> but this has issues when restarting the service, i.e. when the service is
>>> restarted, the key-value store topic needs to be consumed from the start
>> to
>>> rebuild the store in memory, but the sink would have written commit
>> offsets
>>> which prevents the topic to be consumed from the start. I also cannot use
>>> streams.cleanUp() as this will reset all the sinks in my topology (y
>> other
>>> sink ingests records from the input topic).
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> *Pirow Engelbrecht*
>>> System Engineer
>>>
>>> *E.* pirow.engelbre...@etion.co.za
>>> *T.* +27 12 678 9740 (ext. 9879)
>>> *M.* +27 63 148 3376
>>>
>>> 76 Regency Drive | Irene | Centurion | 0157
>>> 

Re: MirrorMaker 2.0 - Translating offsets for remote topics and consumer groups

2020-08-20 Thread Josh C
Thanks again Ryanne, I didn't realize that MM2 would handle that.

However, I'm unable to mirror the remote topic back to the source cluster
by adding it to the topic whitelist. I've also tried to update the topic
blacklist and remove ".*\.replica" (since the blacklists take precedence
over the whitelists), but that doesn't seem to be doing much either? Is
there something else I should be aware of in the mm2.properties file?

Appreciate all your help!

Josh

On Wed, Aug 19, 2020 at 12:55 PM Ryanne Dolan  wrote:

> Josh, if you have two clusters with bidirectional replication, you only get
> two copies of each record. MM2 won't replicate the data "upstream", cuz it
> knows it's already there. In particular, MM2 knows not to create topics
> like B.A.topic1 on cluster A, as this would be an unnecessary cycle.
>
> >  is there a reason for MM2 not emitting checkpoint data for the source
> topic AND the remote topic
>
> No, not really! I think it would be surprising if one-directional flows
> insisted on writing checkpoints both ways -- but it's also surprising that
> you need to explicitly allow a remote topic to be checkpointed. I'd support
> changing this, fwiw.
>
> Ryanne
>
> On Wed, Aug 19, 2020 at 2:30 PM Josh C  wrote:
>
> > Sorry, correction -- I am realizing now it would be 3 copies of the same
> > topic data as A.topic1 has different data than B.topic1. However, that
> > would still be 3 copies as opposed to just 2 with something like topic1
> and
> > A.topic1.
> >
> > As well, if I were to explicitly replicate the remote topic back to the
> > source cluster by adding it to the topic whitelist, would I also need to
> > update the topic blacklist and remove ".*\.replica" (since the blacklists
> > take precedence over the whitelists)?
> >
> > Josh
> >
> > On Wed, Aug 19, 2020 at 11:46 AM Josh C  wrote:
> >
> > > Thanks for the clarification Ryanne. In the context of active/active
> > > clusters, does this mean there would be 6 copies of the same topic
> data?
> > >
> > > A topics:
> > > - topic1
> > > - B.topic1
> > > - B.A.topic1
> > >
> > > B topics:
> > > - topic1
> > > - A.topic1
> > > - A.B.topic1
> > >
> > > Out of curiosity, is there a reason for MM2 not emitting checkpoint
> data
> > > for the source topic AND the remote topic as a pair as opposed to
> having
> > to
> > > explicitly replicate the remote topic back to the source cluster just
> to
> > > have the checkpoints emitted upstream?
> > >
> > > Josh
> > >
> > > On Wed, Aug 19, 2020 at 6:16 AM Ryanne Dolan 
> > > wrote:
> > >
> > >> Josh, yes it's possible to migrate the consumer group back to the
> source
> > >> topic, but you need to explicitly replicate the remote topic back to
> the
> > >> source cluster -- otherwise no checkpoints will flow "upstream":
> > >>
> > >> A->B.topics=test1
> > >> B->A.topics=A.test1
> > >>
> > >> After the first checkpoint is emitted upstream,
> > >> RemoteClusterUtils.translateOffsets() will translate B's A.test1
> offsets
> > >> into A's test1 offsets for you.
> > >>
> > >> Ryanne
> > >>
> > >> On Tue, Aug 18, 2020 at 5:56 PM Josh C 
> wrote:
> > >>
> > >> > Hi there,
> > >> >
> > >> > I'm currently exploring MM2 and having some trouble with the
> > >> > RemoteClusterUtils.translateOffsets() method. I have been successful
> > in
> > >> > migrating a consumer group from the source cluster to the target
> > >> cluster,
> > >> > but was wondering how I could migrate this consumer group back to
> the
> > >> > original source topic?
> > >> >
> > >> > It is my understanding that there isn't any checkpoint data being
> > >> > emitted for this consumer group since it is consuming from a
> mirrored
> > >> topic
> > >> > in the target cluster. I'm currently getting an empty map since
> there
> > >> isn't
> > >> > any checkpoint data for 'target.checkpoints.internal' in the source
> > >> > cluster. So, I was wondering how would I get these new translated
> > >> offsets
> > >> > to migrate the consumer group back to the source cluster?
> > >> >
> > >> > Please let me know if my question was unclear or if you require
> > further
> > >> > clarification! Appreciate the help.
> > >> >
> > >> > Thanks,
> > >> > Josh
> > >> >
> > >>
> > >
> >
>


Re: Kafka streams sink outputs weird records

2020-08-20 Thread Bruno Cadonna

Hi Pirow,

hard to to have an idea without seeing the code that is executed in the 
processors.


Could you please post a minimal example that reproduces the issue?

Best,
Bruno

On 20.08.20 14:53, Pirow Engelbrecht wrote:

Hello,

I’ve got Kafka Streams up and running with the following topology:

Sub-topology: 0

     Source: TopicInput (topics: [inputTopic])

   --> InputProcessor

     Processor: InputProcessor (stores: [KvStore])

   --> TopicOutput

   <-- TopicInput

     Source: KvInput (topics: [kvStoreTopic])

   --> KvProcessor

     Processor: KvProcessor (stores: [KvStore])

   --> none

   <-- KvInput

     Sink: TopicOutput (topic: outputTopic)

   <-- InputProcessor

There is only one context().forward(key,value) call from the 
InputProcessor to the TopicOutput sink. For some reason I get one 
additional Kafka record for each record the TopicOutput sinks puts into 
the output topic. They look like this:


ConsumerRecord(topic='outputTopic', partition=0, offset=1, 
timestamp=1597926832492, timestamp_type=0, key=b'\x00\x00\x00\x01', 
value=b'\x00\x00\x00\x00\x00\x00', headers=[], checksum=None, 
serialized_key_size=4, serialized_value_size=6, serialized_header_size=-1)


With a binary key and value (key and value seems to always be the same). 
Any ideas?


Thanks

*Pirow Engelbrecht*
System Engineer

*E.*pirow.engelbre...@etion.co.za 


*T.* +27 12 678 9740 (ext. 9879)
*M.*+27 63 148 3376

76 Regency Drive | Irene | Centurion | 0157 


*www.etion.co.za *



Facebook 
 | 
YouTube  | 
LinkedIn  | Twitter 
 | Instagram 





Re: Mirror Maker 2.0 Queries

2020-08-20 Thread Ryanne Dolan
Ananya, see responses below.

> Can this number of workers be configured?

The number of workers is not exactly configurable, but you can control it
by spinning up drivers and using the '--clusters' flag. A driver instance
without '--clusters' will run one worker for each A->B replication flow. So
e.g. if you've got two clusters being replicated bidirectionally, you'll
have an A->B worker and a B->A worker on each MM2 driver.

You can use the '--clusters' flag to limit what clusters are targeted for a
given driver, which is useful in many ways, including to limit the number
of workers for a given worker. So e.g. if you've got 10 clusters all being
replicated in a full mesh you can run a driver with '--clusters A' and it
will have only 9 workers, one for each of the other clusters.

Also note that there is a configuration property 'tasks.max' that controls
the number of tasks available to workers. Each A->B flow is replicated by a
Herd of Workers (in Connect terminology), and Herds work on Tasks. By
default, 'tasks.max' is one, which means there will only be one task for
each Herd, regardless of how many drivers and workers you spin up. You
definitely want to change this property. You can tweak this for each A->B
replication flow independently to strike the right balance. If 'tasks.max'
is the same or more than the total number of topic-partitions being
replicated, it will mean each topic-partition is replicated in a dedicated
task, which is probably not an efficient use of resource overhead.

> Does every topic partition given a new task?

No, topic-partitions are spread out across tasks. Each topic's partitions
are divided round-robin among available tasks. However, keep in mind that
if 'tasks.max' is too high, you could end up with one topic-partition in
each task.

> Does every consumer group - topic pair given a new task for replicating
offset?

No, consumer-groups are also spread out across tasks. As with
topic-partitions, 'tasks.max' applies.

> How can I scale up the mirror maker instance so that I can have very
little lag?

Tweak 'tasks.max' and spin up more driver instances.

Ryanne

On Sat, Aug 8, 2020 at 1:43 AM Ananya Sen  wrote:

> Thank you Ryanne for the quick response.
> I further want to clarify a few points.
>
> The mirror maker 2.0 is based on the Kafka Connect framework. In Kafka
> connect we have multiple workers and each worker has some assigned task. To
> map this to Mirror Maker 2.0, A mirror Maker will driver have some workers.
>
> 1) Can this number of workers be configured?
> 2) What is the default value of this worker configuration?
> 3) Does every topic partition given a new task?
> 4) Does every consumer group - topic pair given a new task for replicating
> offset?
>
> Also, consider a case where I have 1000 topics in a Kafka cluster and each
> topic has a high amount of data + new data is being written at high
> throughput. Now I want to set up a mirror maker 2.0 on this cluster to
> replicate all the old data (which is retained in the topic) as well as the
> new incoming data in a backup cluster. How can I scale up the mirror maker
> instance so that I can have very little lag?
>
> On 2020/07/11 06:37:56, Ananya Sen  wrote:
> > Hi
> >
> > I was exploring the Mirror maker 2.0. I read through this
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0
> > documentation
> > and I have  a few questions.
> >
> >1. For running mirror maker as a dedicated mirror maker cluster, the
> >documentation specifies a config file and a starter script. Is this
> mirror
> >maker process distributed ?
> >2. I could not find any port configuration for the above mirror maker
> >process, So can we configure mirror maker itself to run as a cluster
> i.e
> >running the process instance across multiple server to avoid downtime
> due
> >to server crash.
> >3. If we could somehow run the mirror maker as a distributed process
> >then does that mean that topic and consumer offset replication will be
> >shared among those mirror maker processes?
> >4. What is the default port of this mirror maker process and how can
> we
> >override it?
> >
> > Looking forward to your reply.
> >
> >
> > Thanks & Regards
> > Ananya Sen
> >
>


Kafka streams sink outputs weird records

2020-08-20 Thread Pirow Engelbrecht
Hello,

I've got Kafka Streams up and running with the following topology:
Sub-topology: 0
Source: TopicInput (topics: [inputTopic])
  --> InputProcessor
Processor: InputProcessor (stores: [KvStore])
  --> TopicOutput
  <-- TopicInput
Source: KvInput (topics: [kvStoreTopic])
  --> KvProcessor
Processor: KvProcessor (stores: [KvStore])
  --> none
  <-- KvInput
Sink: TopicOutput (topic: outputTopic)
  <-- InputProcessor

There is only one context().forward(key,value) call from the InputProcessor to 
the TopicOutput sink. For some reason I get one additional Kafka record for 
each record the TopicOutput sinks puts into the output topic. They look like 
this:
ConsumerRecord(topic='outputTopic', partition=0, offset=1, 
timestamp=1597926832492, timestamp_type=0, key=b'\x00\x00\x00\x01', 
value=b'\x00\x00\x00\x00\x00\x00', headers=[], checksum=None, 
serialized_key_size=4, serialized_value_size=6, serialized_header_size=-1)

With a binary key and value (key and value seems to always be the same). Any 
ideas?

Thanks

Pirow Engelbrecht
System Engineer

E. 
pirow.engelbre...@etion.co.za
T. +27 12 678 9740 (ext. 9879)
M. +27 63 148 3376

76 Regency Drive | Irene | Centurion | 0157
www.etion.co.za


[cid:image001.jpg@01D67701.73EE0660]


Facebook | 
YouTube | 
LinkedIn | 
Twitter | 
Instagram




Re: Kafka Streams Key-value store question

2020-08-20 Thread Liam Clarke-Hutchinson
Hi Pirow,

You can configure the auto offset reset for your stream source's consumer
to "earliest" if you want to consume all available data if no committed
offset exists. This will populate the state store on first run.

Cheers,

Liam Clarke-Hutchinson


On Thu, 20 Aug. 2020, 11:58 pm Pirow Engelbrecht, <
pirow.engelbre...@etion.co.za> wrote:

> Hi Bill,
>
>
>
> Yes, that seems to be exactly what I need. I’ve instantiated this global
> store with:
>
> topology.addGlobalStore(storeBuilder, "KVstoreSource", Serdes.String().
> deserializer(), Serdes.String().deserializer(), this.config_kvStoreTopic,
> "KVprocessor", KeyValueProcessor::new);
>
>
>
>
>
> I’ve added a KeyValueProcessor that puts incoming Kafka key-value pairs
> into the store. The problem is that if the application starts for the first
> time, it does not process any key-value pairs already in the Kafka topic.
> Is there a way around this?
>
>
>
> Thanks
>
>
>
> *Pirow Engelbrecht*
> System Engineer
>
> *E.* pirow.engelbre...@etion.co.za
> *T.* +27 12 678 9740 (ext. 9879)
> *M.* +27 63 148 3376
>
> 76 Regency Drive | Irene | Centurion | 0157
> 
> *www.etion.co.za* 
>
> 
>
> Facebook
>  |
> YouTube  |
> LinkedIn  | Twitter
>  | Instagram
> 
>
>
>
> *From:* Bill Bejeck 
> *Sent:* Wednesday, 19 August 2020 3:53 PM
> *To:* users@kafka.apache.org
> *Subject:* Re: Kafka Streams Key-value store question
>
>
>
> Hi Pirow,
>
> If I'm understanding your requirements correctly, I think using a global
> store
> <
> https://kafka.apache.org/26/javadoc/org/apache/kafka/streams/StreamsBuilder.html#addGlobalStore-org.apache.kafka.streams.state.StoreBuilder-java.lang.String-org.apache.kafka.streams.kstream.Consumed-org.apache.kafka.streams.processor.ProcessorSupplier-
> >
> will
> work for you.
>
> HTH,
> Bill
>
> On Wed, Aug 19, 2020 at 8:53 AM Pirow Engelbrecht <
> pirow.engelbre...@etion.co.za> wrote:
>
> > Hello,
> >
> >
> >
> > We’re building a JSON decorator using Kafka Streams’ processing API.
> >
> >
> >
> > The process is briefly that a piece of JSON should be consumed from an
> > input topic (keys are null, value is the JSON). The JSON contains a field
> > (e.g. “thisField”) with a value (e.g. “someLink”) . This value (and a
> > timestamp) is used to look-up another piece JSON from a key-value topic
> > (keys are all the different values of “thisField”, values are JSON). This
> > key-value topic is created by another service in Kafka. This additional
> > piece of JSON then gets appended to the input JSON and the result gets
> > written to an output topic (keys are null, value is now the original
> JSON +
> > lookup JSON).
> >
> >
> >
> > To do the query against a key-value store, ideally I want Kafka Streams
> to
> > directly create and update a window key-value store in memory (or disk)
> > from my key-value topic in Kafka, but I am unable to find a way to
> specify
> > this through the StoreBuilder interface. Does anybody know how to do
> this?
> >
> > Here is my current Storebuilder code snippet:
> >
> > StoreBuilder> storeBuilder = Stores.
> > windowStoreBuilder(
> >
> > Stores.persistentWindowStore("loopkupStore",
> > Duration.ofDays(14600), Duration.ofDays(14600), false),
> >
> > Serdes.String(),
> >
> > Serdes.String());
> >
> > storeBuilder.build();
> >
> >
> >
> >
> >
> > Currently my workaround is to have a sink for the key-value store and
> then
> > create/update this key-value store using a node in the processing
> topology,
> > but this has issues when restarting the service, i.e. when the service is
> > restarted, the key-value store topic needs to be consumed from the start
> to
> > rebuild the store in memory, but the sink would have written commit
> offsets
> > which prevents the topic to be consumed from the start. I also cannot use
> > streams.cleanUp() as this will reset all the sinks in my topology (y
> other
> > sink ingests records from the input topic).
> >
> >
> >
> > Thanks
> >
> >
> >
> > *Pirow Engelbrecht*
> > System Engineer
> >
> > *E.* pirow.engelbre...@etion.co.za
> > *T.* +27 12 678 9740 (ext. 9879)
> > *M.* +27 63 148 3376
> >
> > 76 Regency Drive | Irene | Centurion | 0157
> > 
> > *www.etion.co.za *
> >
> > 
> >
> > Facebook
> >  |
> > YouTube  |
> > LinkedIn  | Twitter
> >  | Instagram
> > 
> >
> >
> >
>


Re: Kafka Streams Key-value store question

2020-08-20 Thread Nicolas Carlot
You need to set the auto offset reset to earliest, it uses latest as
default.

StreamsConfig.consumerPrefix(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG),
"earliest"

Le jeu. 20 août 2020 à 13:58, Pirow Engelbrecht <
pirow.engelbre...@etion.co.za> a écrit :

> Hi Bill,
>
>
>
> Yes, that seems to be exactly what I need. I’ve instantiated this global
> store with:
>
> topology.addGlobalStore(storeBuilder, "KVstoreSource", Serdes.String().
> deserializer(), Serdes.String().deserializer(), this.config_kvStoreTopic,
> "KVprocessor", KeyValueProcessor::new);
>
>
>
>
>
> I’ve added a KeyValueProcessor that puts incoming Kafka key-value pairs
> into the store. The problem is that if the application starts for the first
> time, it does not process any key-value pairs already in the Kafka topic.
> Is there a way around this?
>
>
>
> Thanks
>
>
>
> *Pirow Engelbrecht*
> System Engineer
>
> *E.* pirow.engelbre...@etion.co.za
> *T.* +27 12 678 9740 (ext. 9879)
> *M.* +27 63 148 3376
>
> 76 Regency Drive | Irene | Centurion | 0157
> 
> *www.etion.co.za* 
>
> 
>
> Facebook
>  |
> YouTube  |
> LinkedIn  | Twitter
>  | Instagram
> 
>
>
>
> *From:* Bill Bejeck 
> *Sent:* Wednesday, 19 August 2020 3:53 PM
> *To:* users@kafka.apache.org
> *Subject:* Re: Kafka Streams Key-value store question
>
>
>
> Hi Pirow,
>
> If I'm understanding your requirements correctly, I think using a global
> store
> <
> https://kafka.apache.org/26/javadoc/org/apache/kafka/streams/StreamsBuilder.html#addGlobalStore-org.apache.kafka.streams.state.StoreBuilder-java.lang.String-org.apache.kafka.streams.kstream.Consumed-org.apache.kafka.streams.processor.ProcessorSupplier-
> >
> will
> work for you.
>
> HTH,
> Bill
>
> On Wed, Aug 19, 2020 at 8:53 AM Pirow Engelbrecht <
> pirow.engelbre...@etion.co.za> wrote:
>
> > Hello,
> >
> >
> >
> > We’re building a JSON decorator using Kafka Streams’ processing API.
> >
> >
> >
> > The process is briefly that a piece of JSON should be consumed from an
> > input topic (keys are null, value is the JSON). The JSON contains a field
> > (e.g. “thisField”) with a value (e.g. “someLink”) . This value (and a
> > timestamp) is used to look-up another piece JSON from a key-value topic
> > (keys are all the different values of “thisField”, values are JSON). This
> > key-value topic is created by another service in Kafka. This additional
> > piece of JSON then gets appended to the input JSON and the result gets
> > written to an output topic (keys are null, value is now the original
> JSON +
> > lookup JSON).
> >
> >
> >
> > To do the query against a key-value store, ideally I want Kafka Streams
> to
> > directly create and update a window key-value store in memory (or disk)
> > from my key-value topic in Kafka, but I am unable to find a way to
> specify
> > this through the StoreBuilder interface. Does anybody know how to do
> this?
> >
> > Here is my current Storebuilder code snippet:
> >
> > StoreBuilder> storeBuilder = Stores.
> > windowStoreBuilder(
> >
> > Stores.persistentWindowStore("loopkupStore",
> > Duration.ofDays(14600), Duration.ofDays(14600), false),
> >
> > Serdes.String(),
> >
> > Serdes.String());
> >
> > storeBuilder.build();
> >
> >
> >
> >
> >
> > Currently my workaround is to have a sink for the key-value store and
> then
> > create/update this key-value store using a node in the processing
> topology,
> > but this has issues when restarting the service, i.e. when the service is
> > restarted, the key-value store topic needs to be consumed from the start
> to
> > rebuild the store in memory, but the sink would have written commit
> offsets
> > which prevents the topic to be consumed from the start. I also cannot use
> > streams.cleanUp() as this will reset all the sinks in my topology (y
> other
> > sink ingests records from the input topic).
> >
> >
> >
> > Thanks
> >
> >
> >
> > *Pirow Engelbrecht*
> > System Engineer
> >
> > *E.* pirow.engelbre...@etion.co.za
> > *T.* +27 12 678 9740 (ext. 9879)
> > *M.* +27 63 148 3376
> >
> > 76 Regency Drive | Irene | Centurion | 0157
> > 
> > *www.etion.co.za *
> >
> > 
> >
> > Facebook
> >  |
> > YouTube  |
> > LinkedIn  | Twitter
> >  | Instagram
> > 
> >
> >
> >
>


-- 
*Nicolas Carlot*

Lead dev
|  | nicolas.car...@chronopost.fr


*Veuillez noter qu'à partir du 20 mai, le siège Chronopost déménage. La
nouvelle 

RE: Kafka Streams Key-value store question

2020-08-20 Thread Pirow Engelbrecht
Hi Bill,

Yes, that seems to be exactly what I need. I’ve instantiated this global store 
with:
topology.addGlobalStore(storeBuilder, "KVstoreSource", 
Serdes.String().deserializer(), Serdes.String().deserializer(), 
this.config_kvStoreTopic, "KVprocessor", KeyValueProcessor::new);


I’ve added a KeyValueProcessor that puts incoming Kafka key-value pairs into 
the store. The problem is that if the application starts for the first time, it 
does not process any key-value pairs already in the Kafka topic. Is there a way 
around this?

Thanks

Pirow Engelbrecht
System Engineer

E. 
pirow.engelbre...@etion.co.za
T. +27 12 678 9740 (ext. 9879)
M. +27 63 148 3376

76 Regency Drive | Irene | Centurion | 0157
www.etion.co.za


[cid:image001.jpg@01D67642.9B5709A0]


Facebook | 
YouTube | 
LinkedIn | 
Twitter | 
Instagram


From: Bill Bejeck 
Sent: Wednesday, 19 August 2020 3:53 PM
To: users@kafka.apache.org
Subject: Re: Kafka Streams Key-value store question

Hi Pirow,

If I'm understanding your requirements correctly, I think using a global
store
>
will
work for you.

HTH,
Bill

On Wed, Aug 19, 2020 at 8:53 AM Pirow Engelbrecht <
pirow.engelbre...@etion.co.za> wrote:

> Hello,
>
>
>
> We’re building a JSON decorator using Kafka Streams’ processing API.
>
>
>
> The process is briefly that a piece of JSON should be consumed from an
> input topic (keys are null, value is the JSON). The JSON contains a field
> (e.g. “thisField”) with a value (e.g. “someLink”) . This value (and a
> timestamp) is used to look-up another piece JSON from a key-value topic
> (keys are all the different values of “thisField”, values are JSON). This
> key-value topic is created by another service in Kafka. This additional
> piece of JSON then gets appended to the input JSON and the result gets
> written to an output topic (keys are null, value is now the original JSON +
> lookup JSON).
>
>
>
> To do the query against a key-value store, ideally I want Kafka Streams to
> directly create and update a window key-value store in memory (or disk)
> from my key-value topic in Kafka, but I am unable to find a way to specify
> this through the StoreBuilder interface. Does anybody know how to do this?
>
> Here is my current Storebuilder code snippet:
>
> StoreBuilder> storeBuilder = Stores.
> windowStoreBuilder(
>
> Stores.persistentWindowStore("loopkupStore",
> Duration.ofDays(14600), Duration.ofDays(14600), false),
>
> Serdes.String(),
>
> Serdes.String());
>
> storeBuilder.build();
>
>
>
>
>
> Currently my workaround is to have a sink for the key-value store and then
> create/update this key-value store using a node in the processing topology,
> but this has issues when restarting the service, i.e. when the service is
> restarted, the key-value store topic needs to be consumed from the start to
> rebuild the store in memory, but the sink would have written commit offsets
> which prevents the topic to be consumed from the start. I also cannot use
> streams.cleanUp() as this will reset all the sinks in my topology (y other
> sink ingests records from the input topic).
>
>
>
> Thanks
>
>
>
> *Pirow Engelbrecht*
> System Engineer
>
> *E.* pirow.engelbre...@etion.co.za
> *T.* +27 12 678 9740 (ext. 9879)
> *M.* +27 63 148 3376
>
> 76 Regency Drive | Irene | Centurion | 0157
> >
> *www.etion.co.za 
> >*
>
> >
>
> Facebook
> >
>  |
> YouTube 
> >
>  |
> LinkedIn 
> >
>  | Twitter
> > | 
> Instagram
> 

Re: [VOTE] KIP-657: Add Customized Kafka Streams Logo

2020-08-20 Thread Ben Stopford
Just adding my 2c.

Whether "cute" is a good route to take for logos is pretty subjective, but
I do think the approach can work. However, a logo being simple seems. This
was echoed earlier in Robins 'can it be shrunk' comment. Visually there's a
lot going on in both of those images. I think simplifying, and being more
heavily based on the Kafka logo would help. It's a cool logo. Michael's is
that. Otherwise, maybe something like this.

[image: image.png]
[image: image.png]



On Thu, 20 Aug 2020 at 10:20, Antony Stubbs  wrote:

> (just my honest opinion)
>
> I strongly oppose the suggested logos. I completely agree with Michael's
> analysis.
>
> The design appears to me to be quite random (regardless of the association
> of streams with otters) and clashes terribly with the embedded Kafka logo
> making it appear quite unprofessional. It looks like KS is trying to jump
> on the cute animal band wagon against the natural resistance of
> its existing style (Kafka). It also looks far too similar to the Firefox
> logo.
>
> As for the process, I think for there to be meaningful
> community deliberation about a logo, there needs to be far more ideas put
> forward, rather than just the two takes on the one concept.
>
> As for any suggestion on what it should be, I'm afraid I won't be of much
> help.
>
> On Thu, Aug 20, 2020 at 7:59 AM Michael Noll  wrote:
>
> > For what it's worth, here is an example sketch that I came up with. Point
> > is to show an alternative direction for the KStreams logo.
> >
> > https://ibb.co/bmZxDCg
> >
> > Thinking process:
> >
> >- It shows much more clearly (I hope) that KStreams is an official
> part
> >of Kafka.
> >- The Kafka logo is still front and center, and KStreams orbits around
> >it like electrons around the Kafka core/nucleus. That’s important
> > because
> >we want users to adopt all of Kafka, not just bits and pieces.
> >- It uses and builds upon the same ‘simple is beautiful’ style of the
> >original Kafka logo. That also has the nice side-effect that it
> alludes
> > to
> >Kafka’s and KStreams’ architectural simplicity.
> >- It picks up the good idea in the original logo candidates to convey
> >the movement and flow of stream processing.
> >- Execution-wise, and like the main Kafka logo, this logo candidate
> >works well in smaller size, too, because of its simple and clear
> lines.
> >(Logo types like the otter ones tend to become undecipherable at
> smaller
> >sizes.)
> >- It uses the same color scheme of the revamped AK website for brand
> >consistency.
> >
> > I am sure we can come up with even better logo candidates.  But the
> > suggestion above is, in my book, certainly a better option than the
> otters.
> >
> > -Michael
> >
> >
> >
> > On Wed, Aug 19, 2020 at 11:09 PM Boyang Chen  >
> > wrote:
> >
> > > Hey Ben,
> > >
> > > that otter was supposed to be a river-otter to connect to "streams".
> And
> > of
> > > course, it's cute :)
> > >
> > > On Wed, Aug 19, 2020 at 12:41 PM Philip Schmitt <
> > > philip.schm...@outlook.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I’m with Robin and Michael here.
> > > >
> > > > What this decision needs is a good design brief.
> > > > This article seems decent:
> > > >
> > >
> >
> https://yourcreativejunkie.com/logo-design-brief-the-ultimate-guide-for-designers/
> > > >
> > > > Robin is right about the usage requirements.
> > > > It goes a bit beyond resolution. How does the logo work when it’s on
> a
> > > > sticker on someone’s laptop? Might there be some cases, where you
> want
> > to
> > > > print it in black and white?
> > > > And how would it look if you put the Kafka, ksqlDB, and Streams
> > stickers
> > > > on a laptop?
> > > >
> > > > Of the two, I prefer the first option.
> > > > The brown on black is a bit subdued – it might not work well on a
> > t-shirt
> > > > or a laptop sticker. Maybe that could be improved by using a bolder
> > > color,
> > > > but once it gets smaller or lower-resolution, it may not work any
> > longer.
> > > >
> > > >
> > > > Regards,
> > > > Philip
> > > >
> > > >
> > > > P.S.:
> > > > Another article about what makes a good logo:
> > > > https://vanschneider.com/what-makes-a-good-logo
> > > >
> > > > P.P.S.:
> > > >
> > > > If I were to pick a logo for Streams, I’d choose something that fits
> > well
> > > > with Kafka and ksqlDB.
> > > >
> > > > ksqlDB has the rocket.
> > > > I can’t remember (or find) the reasoning behind the Kafka logo (aside
> > > from
> > > > representing a K). Was there something about planets orbiting the
> sun?
> > Or
> > > > was it the atom?
> > > >
> > > > So I might stick with a space/sience metaphor.
> > > > Could Streams be a comet? UFO? Star? Eclipse? ...
> > > > Maybe a satellite logo for Connect.
> > > >
> > > > Space inspiration: https://thenounproject.com/term/space/
> > > >
> > > >
> > > >
> > > >
> > > > 
> > > > From: Robin Moffatt 
> > > > Sent: Wednes

Re: [VOTE] KIP-657: Add Customized Kafka Streams Logo

2020-08-20 Thread Antony Stubbs
(just my honest opinion)

I strongly oppose the suggested logos. I completely agree with Michael's
analysis.

The design appears to me to be quite random (regardless of the association
of streams with otters) and clashes terribly with the embedded Kafka logo
making it appear quite unprofessional. It looks like KS is trying to jump
on the cute animal band wagon against the natural resistance of
its existing style (Kafka). It also looks far too similar to the Firefox
logo.

As for the process, I think for there to be meaningful
community deliberation about a logo, there needs to be far more ideas put
forward, rather than just the two takes on the one concept.

As for any suggestion on what it should be, I'm afraid I won't be of much
help.

On Thu, Aug 20, 2020 at 7:59 AM Michael Noll  wrote:

> For what it's worth, here is an example sketch that I came up with. Point
> is to show an alternative direction for the KStreams logo.
>
> https://ibb.co/bmZxDCg
>
> Thinking process:
>
>- It shows much more clearly (I hope) that KStreams is an official part
>of Kafka.
>- The Kafka logo is still front and center, and KStreams orbits around
>it like electrons around the Kafka core/nucleus. That’s important
> because
>we want users to adopt all of Kafka, not just bits and pieces.
>- It uses and builds upon the same ‘simple is beautiful’ style of the
>original Kafka logo. That also has the nice side-effect that it alludes
> to
>Kafka’s and KStreams’ architectural simplicity.
>- It picks up the good idea in the original logo candidates to convey
>the movement and flow of stream processing.
>- Execution-wise, and like the main Kafka logo, this logo candidate
>works well in smaller size, too, because of its simple and clear lines.
>(Logo types like the otter ones tend to become undecipherable at smaller
>sizes.)
>- It uses the same color scheme of the revamped AK website for brand
>consistency.
>
> I am sure we can come up with even better logo candidates.  But the
> suggestion above is, in my book, certainly a better option than the otters.
>
> -Michael
>
>
>
> On Wed, Aug 19, 2020 at 11:09 PM Boyang Chen 
> wrote:
>
> > Hey Ben,
> >
> > that otter was supposed to be a river-otter to connect to "streams". And
> of
> > course, it's cute :)
> >
> > On Wed, Aug 19, 2020 at 12:41 PM Philip Schmitt <
> > philip.schm...@outlook.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I’m with Robin and Michael here.
> > >
> > > What this decision needs is a good design brief.
> > > This article seems decent:
> > >
> >
> https://yourcreativejunkie.com/logo-design-brief-the-ultimate-guide-for-designers/
> > >
> > > Robin is right about the usage requirements.
> > > It goes a bit beyond resolution. How does the logo work when it’s on a
> > > sticker on someone’s laptop? Might there be some cases, where you want
> to
> > > print it in black and white?
> > > And how would it look if you put the Kafka, ksqlDB, and Streams
> stickers
> > > on a laptop?
> > >
> > > Of the two, I prefer the first option.
> > > The brown on black is a bit subdued – it might not work well on a
> t-shirt
> > > or a laptop sticker. Maybe that could be improved by using a bolder
> > color,
> > > but once it gets smaller or lower-resolution, it may not work any
> longer.
> > >
> > >
> > > Regards,
> > > Philip
> > >
> > >
> > > P.S.:
> > > Another article about what makes a good logo:
> > > https://vanschneider.com/what-makes-a-good-logo
> > >
> > > P.P.S.:
> > >
> > > If I were to pick a logo for Streams, I’d choose something that fits
> well
> > > with Kafka and ksqlDB.
> > >
> > > ksqlDB has the rocket.
> > > I can’t remember (or find) the reasoning behind the Kafka logo (aside
> > from
> > > representing a K). Was there something about planets orbiting the sun?
> Or
> > > was it the atom?
> > >
> > > So I might stick with a space/sience metaphor.
> > > Could Streams be a comet? UFO? Star? Eclipse? ...
> > > Maybe a satellite logo for Connect.
> > >
> > > Space inspiration: https://thenounproject.com/term/space/
> > >
> > >
> > >
> > >
> > > 
> > > From: Robin Moffatt 
> > > Sent: Wednesday, August 19, 2020 6:24 PM
> > > To: users@kafka.apache.org 
> > > Cc: d...@kafka.apache.org 
> > > Subject: Re: [VOTE] KIP-657: Add Customized Kafka Streams Logo
> > >
> > > I echo what Michael says here.
> > >
> > > Another consideration is that logos are often shrunk (when used on
> > slides)
> > > and need to work at lower resolution (think: printing swag, stitching
> > > socks, etc) and so whatever logo we come up with needs to not be too
> > fiddly
> > > in the level of detail - something that I think both the current
> proposed
> > > options will fall foul of IMHO.
> > >
> > >
> > > On Wed, 19 Aug 2020 at 15:33, Michael Noll 
> wrote:
> > >
> > > > Hi all!
> > > >
> > > > Great to see we are in the process of creating a cool logo for Kafka
> > > > Streams.  First, I apologize for sharin