Re: [VOTE] KIP-657: Add Customized Kafka Streams Logo

2020-08-19 Thread Michael Noll
For what it's worth, here is an example sketch that I came up with. Point
is to show an alternative direction for the KStreams logo.

https://ibb.co/bmZxDCg

Thinking process:

   - It shows much more clearly (I hope) that KStreams is an official part
   of Kafka.
   - The Kafka logo is still front and center, and KStreams orbits around
   it like electrons around the Kafka core/nucleus. That’s important because
   we want users to adopt all of Kafka, not just bits and pieces.
   - It uses and builds upon the same ‘simple is beautiful’ style of the
   original Kafka logo. That also has the nice side-effect that it alludes to
   Kafka’s and KStreams’ architectural simplicity.
   - It picks up the good idea in the original logo candidates to convey
   the movement and flow of stream processing.
   - Execution-wise, and like the main Kafka logo, this logo candidate
   works well in smaller size, too, because of its simple and clear lines.
   (Logo types like the otter ones tend to become undecipherable at smaller
   sizes.)
   - It uses the same color scheme of the revamped AK website for brand
   consistency.

I am sure we can come up with even better logo candidates.  But the
suggestion above is, in my book, certainly a better option than the otters.

-Michael



On Wed, Aug 19, 2020 at 11:09 PM Boyang Chen 
wrote:

> Hey Ben,
>
> that otter was supposed to be a river-otter to connect to "streams". And of
> course, it's cute :)
>
> On Wed, Aug 19, 2020 at 12:41 PM Philip Schmitt <
> philip.schm...@outlook.com>
> wrote:
>
> > Hi,
> >
> > I’m with Robin and Michael here.
> >
> > What this decision needs is a good design brief.
> > This article seems decent:
> >
> https://yourcreativejunkie.com/logo-design-brief-the-ultimate-guide-for-designers/
> >
> > Robin is right about the usage requirements.
> > It goes a bit beyond resolution. How does the logo work when it’s on a
> > sticker on someone’s laptop? Might there be some cases, where you want to
> > print it in black and white?
> > And how would it look if you put the Kafka, ksqlDB, and Streams stickers
> > on a laptop?
> >
> > Of the two, I prefer the first option.
> > The brown on black is a bit subdued – it might not work well on a t-shirt
> > or a laptop sticker. Maybe that could be improved by using a bolder
> color,
> > but once it gets smaller or lower-resolution, it may not work any longer.
> >
> >
> > Regards,
> > Philip
> >
> >
> > P.S.:
> > Another article about what makes a good logo:
> > https://vanschneider.com/what-makes-a-good-logo
> >
> > P.P.S.:
> >
> > If I were to pick a logo for Streams, I’d choose something that fits well
> > with Kafka and ksqlDB.
> >
> > ksqlDB has the rocket.
> > I can’t remember (or find) the reasoning behind the Kafka logo (aside
> from
> > representing a K). Was there something about planets orbiting the sun? Or
> > was it the atom?
> >
> > So I might stick with a space/sience metaphor.
> > Could Streams be a comet? UFO? Star? Eclipse? ...
> > Maybe a satellite logo for Connect.
> >
> > Space inspiration: https://thenounproject.com/term/space/
> >
> >
> >
> >
> > 
> > From: Robin Moffatt 
> > Sent: Wednesday, August 19, 2020 6:24 PM
> > To: users@kafka.apache.org 
> > Cc: d...@kafka.apache.org 
> > Subject: Re: [VOTE] KIP-657: Add Customized Kafka Streams Logo
> >
> > I echo what Michael says here.
> >
> > Another consideration is that logos are often shrunk (when used on
> slides)
> > and need to work at lower resolution (think: printing swag, stitching
> > socks, etc) and so whatever logo we come up with needs to not be too
> fiddly
> > in the level of detail - something that I think both the current proposed
> > options will fall foul of IMHO.
> >
> >
> > On Wed, 19 Aug 2020 at 15:33, Michael Noll  wrote:
> >
> > > Hi all!
> > >
> > > Great to see we are in the process of creating a cool logo for Kafka
> > > Streams.  First, I apologize for sharing feedback so late -- I just
> > learned
> > > about it today. :-)
> > >
> > > Here's my *personal, subjective* opinion on the currently two logo
> > > candidates for Kafka Streams.
> > >
> > > TL;DR: Sorry, but I really don't like either of the proposed "otter"
> > logos.
> > > Let me try to explain why.
> > >
> > >- The choice to use an animal, regardless of which specific animal,
> > >seems random and doesn't fit Kafka. (What's the purpose? To show
> that
> > >KStreams is 'cute'?) In comparison, the O’Reilly books always have
> an
> > >animal cover, that’s their style, and it is very recognizable.
> Kafka
> > >however has its own, different style.  The Kafka logo has clear,
> > simple
> > >lines to achieve an abstract and ‘techy’ look, which also alludes
> > > nicely to
> > >its architectural simplicity. Its logo is also a smart play on the
> > >Kafka-identifying letter “K” and alluding to it being a distributed
> > > system
> > >(the circles and links that make the K).
> > >- The proposed logos

Re: GDPR compliance

2020-08-19 Thread Nemeth Sandor
Hey Christian,

my understanding is that you have an upstream system publishing  data via
Kafka topic to a downstream system, and your goal is to delete the PII data
both from Kafka and the downstream system via a message published through
the same topic. Is my understanding correct? Does the coordinator expect
some reply message from the downstream system (e.g.
"AnonymizationSuccessfulEvent")
Do you maybe want to prevent downstream systems accessing PII in in-flight
messages too if a delete request happens in the meantime?
Do you have a log-compacted or not compacted topic?

Everything below is for the data retention within Kafka topics, the
downstream system is not in scope:

For retention in non-compacted topics, you can expect that only messages
**published** in the the last retention.ms are in the topic, everything
before is deleted - so you could do something like set the retention.ms to
10 seconds, and have the coordinator simply assume that after 10 seconds
the data being deleted (to be honest I'm unaware of a method how you can
check if a given message was deleted or not - other than re-reading the
topic from the beginning). Naturally this solution would carry the
requirement that the downstream system processes the messages within the
same amount of time so that no messages are lost. This is something the
definitely requires fine tuning

For retention in compacted topics: Kafka will not automatically compact
messages - to trigger it, you need to publish a tombstone record. So even
with low activity there must be a new message for the deletion to occur
(triggering compaction). My understanding of the documentation is that
using a short segment.ms configuration (something like 1 second), you
should be able to assume that the compaction has occured, so only the
tombstone record remains in the topic. In this case the coordinator can
also assume that segment.ms after publishing the tombstone record the data
is gone from Kafka.

Kind regards,
Sandor


On Wed, 19 Aug 2020 at 19:49, Apolloni, Christian <
christian.apoll...@baloise.ch> wrote:

> Hi Sandor, thanks again for your reply.
>
> > If you have a non-log-compacted topic, after `retention.ms` the
> message>
> > (along with the PII) gets deleted from the Kafka message store without
> any>
> > further action, which should satisfy GDPR requirements:>
> > - you are handling PII in Kafka for a limited amount of time>
> > - you are processing the data for the given purpose it was given>
> > - the data will automatically be deleted without any further steps>
> > If you have a downstream system, you should also be able to publish a>
> > message through Kafka so that the downstream system executes its delete>
> > processes - if required. We implemented a similar process where we>
> > published an AnonymizeOrder event, which instructed downstream systems
> to>
> > anonymize the order data in their own data store.>
>
> Our problem is, the data could have been published shortly before the
> system receives a delete order from the "coordinator". This is because the
> data might have been mutated and the update needs to be propagated to
> consumer systems. If we go with a retention-period of days we would only be
> able to proceed with subsequent systems in the coordinated chain with too
> much of a delay. Going with an even shorter retention would be problematic.
>
> > If you have a log-compacted topic:>
> > - yes, I have the same understanding as you have on the active segment.>
> > - You can set the segment.ms>
> >  property to force
> the>
> > compaction to occur within an expected timeframe.>
> >
> > In general what I understand is true in both cases that Kafka gives you>
> > good enough guarantees to either remove the old message after
> retention.ms>
> > milliseconds or execute the topic compaction after segment.ms time that
> it>
> > is unnecessary to try to figure out more specifically in what exact
> moment>
> > the data is deleted. Setting these configurations should give you
> enough>
> > guarantee that the data removal will occur - if not, that imo should be>
> > considered a bug and reported back to the project.>
>
> We investigated the max.compaction.lag.ms parameter which was introduced
> in KIP-354 and from our understanding the intent is exactly what we'd like
> to accomplish, but unless we missed something we have noticed new segments
> are rolled only if new messages are appended. If the topic has very low
> activity it can be that no new message is appended and the segment is left
> active indefinitely. This means the cleaning for that segment might remain
> also indefinitely stalled. We are unsure whether our understanding is
> correct and whether it's a bug or not.
>
> In general, I think part of the issue is that the system receives the
> delete order at the time that it has to be performed: we don't deal with
> the processing of the required waiting periods, that's what happens in the
>

Re: Mirror Maker 2.0 Queries

2020-08-19 Thread Ananya Sen
Any help here would be greatly appreciated.

On Sat, Aug 8, 2020, 12:13 PM Ananya Sen  wrote:

> Thank you Ryanne for the quick response.
> I further want to clarify a few points.
>
> The mirror maker 2.0 is based on the Kafka Connect framework. In Kafka
> connect we have multiple workers and each worker has some assigned task. To
> map this to Mirror Maker 2.0, A mirror Maker will driver have some workers.
>
> 1) Can this number of workers be configured?
> 2) What is the default value of this worker configuration?
> 3) Does every topic partition given a new task?
> 4) Does every consumer group - topic pair given a new task for replicating
> offset?
>
> Also, consider a case where I have 1000 topics in a Kafka cluster and each
> topic has a high amount of data + new data is being written at high
> throughput. Now I want to set up a mirror maker 2.0 on this cluster to
> replicate all the old data (which is retained in the topic) as well as the
> new incoming data in a backup cluster. How can I scale up the mirror maker
> instance so that I can have very little lag?
>
> On 2020/07/11 06:37:56, Ananya Sen  wrote:
> > Hi
> >
> > I was exploring the Mirror maker 2.0. I read through this
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0
> > documentation
> > and I have  a few questions.
> >
> >1. For running mirror maker as a dedicated mirror maker cluster, the
> >documentation specifies a config file and a starter script. Is this
> mirror
> >maker process distributed ?
> >2. I could not find any port configuration for the above mirror maker
> >process, So can we configure mirror maker itself to run as a cluster
> i.e
> >running the process instance across multiple server to avoid downtime
> due
> >to server crash.
> >3. If we could somehow run the mirror maker as a distributed process
> >then does that mean that topic and consumer offset replication will be
> >shared among those mirror maker processes?
> >4. What is the default port of this mirror maker process and how can
> we
> >override it?
> >
> > Looking forward to your reply.
> >
> >
> > Thanks & Regards
> > Ananya Sen
> >
>


Re: [VOTE] KIP-657: Add Customized Kafka Streams Logo

2020-08-19 Thread Boyang Chen
Hey Ben,

that otter was supposed to be a river-otter to connect to "streams". And of
course, it's cute :)

On Wed, Aug 19, 2020 at 12:41 PM Philip Schmitt 
wrote:

> Hi,
>
> I’m with Robin and Michael here.
>
> What this decision needs is a good design brief.
> This article seems decent:
> https://yourcreativejunkie.com/logo-design-brief-the-ultimate-guide-for-designers/
>
> Robin is right about the usage requirements.
> It goes a bit beyond resolution. How does the logo work when it’s on a
> sticker on someone’s laptop? Might there be some cases, where you want to
> print it in black and white?
> And how would it look if you put the Kafka, ksqlDB, and Streams stickers
> on a laptop?
>
> Of the two, I prefer the first option.
> The brown on black is a bit subdued – it might not work well on a t-shirt
> or a laptop sticker. Maybe that could be improved by using a bolder color,
> but once it gets smaller or lower-resolution, it may not work any longer.
>
>
> Regards,
> Philip
>
>
> P.S.:
> Another article about what makes a good logo:
> https://vanschneider.com/what-makes-a-good-logo
>
> P.P.S.:
>
> If I were to pick a logo for Streams, I’d choose something that fits well
> with Kafka and ksqlDB.
>
> ksqlDB has the rocket.
> I can’t remember (or find) the reasoning behind the Kafka logo (aside from
> representing a K). Was there something about planets orbiting the sun? Or
> was it the atom?
>
> So I might stick with a space/sience metaphor.
> Could Streams be a comet? UFO? Star? Eclipse? ...
> Maybe a satellite logo for Connect.
>
> Space inspiration: https://thenounproject.com/term/space/
>
>
>
>
> 
> From: Robin Moffatt 
> Sent: Wednesday, August 19, 2020 6:24 PM
> To: users@kafka.apache.org 
> Cc: d...@kafka.apache.org 
> Subject: Re: [VOTE] KIP-657: Add Customized Kafka Streams Logo
>
> I echo what Michael says here.
>
> Another consideration is that logos are often shrunk (when used on slides)
> and need to work at lower resolution (think: printing swag, stitching
> socks, etc) and so whatever logo we come up with needs to not be too fiddly
> in the level of detail - something that I think both the current proposed
> options will fall foul of IMHO.
>
>
> On Wed, 19 Aug 2020 at 15:33, Michael Noll  wrote:
>
> > Hi all!
> >
> > Great to see we are in the process of creating a cool logo for Kafka
> > Streams.  First, I apologize for sharing feedback so late -- I just
> learned
> > about it today. :-)
> >
> > Here's my *personal, subjective* opinion on the currently two logo
> > candidates for Kafka Streams.
> >
> > TL;DR: Sorry, but I really don't like either of the proposed "otter"
> logos.
> > Let me try to explain why.
> >
> >- The choice to use an animal, regardless of which specific animal,
> >seems random and doesn't fit Kafka. (What's the purpose? To show that
> >KStreams is 'cute'?) In comparison, the O’Reilly books always have an
> >animal cover, that’s their style, and it is very recognizable.  Kafka
> >however has its own, different style.  The Kafka logo has clear,
> simple
> >lines to achieve an abstract and ‘techy’ look, which also alludes
> > nicely to
> >its architectural simplicity. Its logo is also a smart play on the
> >Kafka-identifying letter “K” and alluding to it being a distributed
> > system
> >(the circles and links that make the K).
> >- The proposed logos, however, make it appear as if KStreams is a
> >third-party technology that was bolted onto Kafka. They certainly, for
> > me,
> >do not convey the message "Kafka Streams is an official part of Apache
> >Kafka".
> >- I, too, don't like the way the main Kafka logo is obscured (a
> concern
> >already voiced in this thread). Also, the Kafka 'logo' embedded in the
> >proposed KStreams logos is not the original one.
> >- None of the proposed KStreams logos visually match the Kafka logo.
> >They have a totally different style, font, line art, and color scheme.
> >- Execution-wise, the main Kafka logo looks great at all sizes.  The
> >style of the otter logos, in comparison, becomes undecipherable at
> > smaller
> >sizes.
> >
> > What I would suggest is to first agree on what the KStreams logo is
> > supposed to convey to the reader.  Here's my personal take:
> >
> > Objective 1: First and foremost, the KStreams logo should make it clear
> and
> > obvious that KStreams is an official and integral part of Apache Kafka.
> > This applies to both what is depicted and how it is depicted (like font,
> > line art, colors).
> > Objective 2: The logo should allude to the role of KStreams in the Kafka
> > project, which is the processing part.  That is, "doing something useful
> to
> > the data in Kafka".
> >
> > The "circling arrow" aspect of the current otter logos does allude to
> > "continuous processing", which is going in the direction of (2), but the
> > logos do not meet (1) in my opinion.
> >
> > -Michael
> >
> >
> >
> 

Re: GDPR compliance

2020-08-19 Thread Christopher Smith
Yup. The crypto-shredding approach tends to be the most practical.
Basically do payload encryption of your PI and with a unique per-user key.
Throw away the per user key, and the data is "deleted" from a CCPA
perspective.

The alternative is to have the relevant topic have tight retention SLAs,
which often proves to be counter productive.

--Chris

On Wed, Aug 19, 2020 at 11:31 AM Patrick Plaatje  wrote:

> Hi all,
>
> there has been an interesting talk about this during a previous Kafka
> Summit. It talks about using crypto-shredding to 'forget' user information.
> I'm not sure if there are any slides, but it basically suggests that you'd
> encrypt user data on Kafka, and when you get a information removal request,
> the only thing you have to do is to delete the encryption key for that
> user.
>
> Here's the announcement of the talk:
>
> https://kafka-summit.org/sessions/handling-gdpr-apache-kafka-comply-without-freaking/
> ,
> but not sure where slides or a recording can be found unfortunately.
>
> Hope it helps.
>
> BR,
> Patrick
>
> On Wed, 19 Aug 2020 at 18:16, Nemeth Sandor 
> wrote:
>
> > Hi Christian,
> >
> > depending on how your Kafka topics are configured, you have 2 different
> > options:
> >
> > a) if you have a non-log-compacted then you can set the message retention
> > on the topic to the desired value. In that case the message will be
> deleted
> > by Kafka after the retention period expires. (the config value is `
> > retention.ms` I think)
> >
> > b) if you use Kafka as a log store with topics having infinite retention,
> > then one common solution is to send a so-called tombstone record (a
> record
> > with the same key containing only GDPR compatible data with the sensitive
> > information removed), and let Kafka take care of the removal using log
> > compaction.
> >
> > Kind regards,
> > Sandor
> >
> >
> > On Wed, 19 Aug 2020 at 16:53, Apolloni, Christian <
> > christian.apoll...@baloise.ch> wrote:
> >
> > > Hello,
> > >
> > > I have some questions about implementing GDPR compliance in Kafka.
> > >
> > > In our situation we have the requirement of removing personal data from
> > in
> > > coordination with multiple systems. The idea is having a central
> > > "coordinator system" which triggers the deletion process for the
> > individual
> > > systems in a specific, controlled sequence which takes into account the
> > > various system inter-dependencies and data flows. This means e.g.
> system
> > > nr. 2 will receive the delete order only after system nr. 1 has
> reported
> > > that it's done with the deletion on its side (and so forth).
> > >
> > > One of the systems in question publishes data in Kafka topics for
> > > consumption in other systems and part of the deletion process is to
> > remove
> > > the relevant personal data from these Kafka topics too. This has to
> > happen
> > > in a relatively short time after the deletion order is received, to
> > prevent
> > > a long delay before the systems further down the chain can start their
> > own
> > > deletion. Furthermore, we need to know when the operation is completed:
> > > only at that point we can give the "go" to the other systems.
> > >
> > > We are unsure how to satisfy those requirements in Kafka. If anyone has
> > > ideas or suggestions we would be very interested in your opinion. We
> are
> > > also interested in general about experiences in implementing GDPR
> > > compliance in Kafka, especially when dealing with multiple,
> > interconnected
> > > systems.
> > >
> > > Kind regards,
> > >
> > > --
> > > Christian Apolloni
> > >
> > > Disclaimer: The contents of this email and any attachment thereto are
> > > intended exclusively for the attention of the addressee(s). The email
> and
> > > any such attachment(s) may contain information that is confidential and
> > > protected on the strength of professional, official or business secrecy
> > > laws and regulations or contractual obligations. Should you have
> received
> > > this email by mistake, you may neither make use of nor divulge the
> > contents
> > > of the email or of any attachment thereto. In such a case, please
> inform
> > > the email's sender and delete the message and all attachments without
> > delay
> > > from your systems.
> > > You can find our e-mail disclaimer statement in other languages under
> > > http://www.baloise.ch/email_disclaimer
> > >
> >
>
>
> --
> Patrick Plaatje
>


-- 
Chris


Re: MirrorMaker 2.0 - Translating offsets for remote topics and consumer groups

2020-08-19 Thread Ryanne Dolan
Josh, if you have two clusters with bidirectional replication, you only get
two copies of each record. MM2 won't replicate the data "upstream", cuz it
knows it's already there. In particular, MM2 knows not to create topics
like B.A.topic1 on cluster A, as this would be an unnecessary cycle.

>  is there a reason for MM2 not emitting checkpoint data for the source
topic AND the remote topic

No, not really! I think it would be surprising if one-directional flows
insisted on writing checkpoints both ways -- but it's also surprising that
you need to explicitly allow a remote topic to be checkpointed. I'd support
changing this, fwiw.

Ryanne

On Wed, Aug 19, 2020 at 2:30 PM Josh C  wrote:

> Sorry, correction -- I am realizing now it would be 3 copies of the same
> topic data as A.topic1 has different data than B.topic1. However, that
> would still be 3 copies as opposed to just 2 with something like topic1 and
> A.topic1.
>
> As well, if I were to explicitly replicate the remote topic back to the
> source cluster by adding it to the topic whitelist, would I also need to
> update the topic blacklist and remove ".*\.replica" (since the blacklists
> take precedence over the whitelists)?
>
> Josh
>
> On Wed, Aug 19, 2020 at 11:46 AM Josh C  wrote:
>
> > Thanks for the clarification Ryanne. In the context of active/active
> > clusters, does this mean there would be 6 copies of the same topic data?
> >
> > A topics:
> > - topic1
> > - B.topic1
> > - B.A.topic1
> >
> > B topics:
> > - topic1
> > - A.topic1
> > - A.B.topic1
> >
> > Out of curiosity, is there a reason for MM2 not emitting checkpoint data
> > for the source topic AND the remote topic as a pair as opposed to having
> to
> > explicitly replicate the remote topic back to the source cluster just to
> > have the checkpoints emitted upstream?
> >
> > Josh
> >
> > On Wed, Aug 19, 2020 at 6:16 AM Ryanne Dolan 
> > wrote:
> >
> >> Josh, yes it's possible to migrate the consumer group back to the source
> >> topic, but you need to explicitly replicate the remote topic back to the
> >> source cluster -- otherwise no checkpoints will flow "upstream":
> >>
> >> A->B.topics=test1
> >> B->A.topics=A.test1
> >>
> >> After the first checkpoint is emitted upstream,
> >> RemoteClusterUtils.translateOffsets() will translate B's A.test1 offsets
> >> into A's test1 offsets for you.
> >>
> >> Ryanne
> >>
> >> On Tue, Aug 18, 2020 at 5:56 PM Josh C  wrote:
> >>
> >> > Hi there,
> >> >
> >> > I'm currently exploring MM2 and having some trouble with the
> >> > RemoteClusterUtils.translateOffsets() method. I have been successful
> in
> >> > migrating a consumer group from the source cluster to the target
> >> cluster,
> >> > but was wondering how I could migrate this consumer group back to the
> >> > original source topic?
> >> >
> >> > It is my understanding that there isn't any checkpoint data being
> >> > emitted for this consumer group since it is consuming from a mirrored
> >> topic
> >> > in the target cluster. I'm currently getting an empty map since there
> >> isn't
> >> > any checkpoint data for 'target.checkpoints.internal' in the source
> >> > cluster. So, I was wondering how would I get these new translated
> >> offsets
> >> > to migrate the consumer group back to the source cluster?
> >> >
> >> > Please let me know if my question was unclear or if you require
> further
> >> > clarification! Appreciate the help.
> >> >
> >> > Thanks,
> >> > Josh
> >> >
> >>
> >
>


Re: [VOTE] KIP-657: Add Customized Kafka Streams Logo

2020-08-19 Thread Philip Schmitt
Hi,

I’m with Robin and Michael here.

What this decision needs is a good design brief.
This article seems decent: 
https://yourcreativejunkie.com/logo-design-brief-the-ultimate-guide-for-designers/

Robin is right about the usage requirements.
It goes a bit beyond resolution. How does the logo work when it’s on a sticker 
on someone’s laptop? Might there be some cases, where you want to print it in 
black and white?
And how would it look if you put the Kafka, ksqlDB, and Streams stickers on a 
laptop?

Of the two, I prefer the first option.
The brown on black is a bit subdued – it might not work well on a t-shirt or a 
laptop sticker. Maybe that could be improved by using a bolder color, but once 
it gets smaller or lower-resolution, it may not work any longer.


Regards,
Philip


P.S.:
Another article about what makes a good logo: 
https://vanschneider.com/what-makes-a-good-logo

P.P.S.:

If I were to pick a logo for Streams, I’d choose something that fits well with 
Kafka and ksqlDB.

ksqlDB has the rocket.
I can’t remember (or find) the reasoning behind the Kafka logo (aside from 
representing a K). Was there something about planets orbiting the sun? Or was 
it the atom?

So I might stick with a space/sience metaphor.
Could Streams be a comet? UFO? Star? Eclipse? ...
Maybe a satellite logo for Connect.

Space inspiration: https://thenounproject.com/term/space/





From: Robin Moffatt 
Sent: Wednesday, August 19, 2020 6:24 PM
To: users@kafka.apache.org 
Cc: d...@kafka.apache.org 
Subject: Re: [VOTE] KIP-657: Add Customized Kafka Streams Logo

I echo what Michael says here.

Another consideration is that logos are often shrunk (when used on slides)
and need to work at lower resolution (think: printing swag, stitching
socks, etc) and so whatever logo we come up with needs to not be too fiddly
in the level of detail - something that I think both the current proposed
options will fall foul of IMHO.


On Wed, 19 Aug 2020 at 15:33, Michael Noll  wrote:

> Hi all!
>
> Great to see we are in the process of creating a cool logo for Kafka
> Streams.  First, I apologize for sharing feedback so late -- I just learned
> about it today. :-)
>
> Here's my *personal, subjective* opinion on the currently two logo
> candidates for Kafka Streams.
>
> TL;DR: Sorry, but I really don't like either of the proposed "otter" logos.
> Let me try to explain why.
>
>- The choice to use an animal, regardless of which specific animal,
>seems random and doesn't fit Kafka. (What's the purpose? To show that
>KStreams is 'cute'?) In comparison, the O’Reilly books always have an
>animal cover, that’s their style, and it is very recognizable.  Kafka
>however has its own, different style.  The Kafka logo has clear, simple
>lines to achieve an abstract and ‘techy’ look, which also alludes
> nicely to
>its architectural simplicity. Its logo is also a smart play on the
>Kafka-identifying letter “K” and alluding to it being a distributed
> system
>(the circles and links that make the K).
>- The proposed logos, however, make it appear as if KStreams is a
>third-party technology that was bolted onto Kafka. They certainly, for
> me,
>do not convey the message "Kafka Streams is an official part of Apache
>Kafka".
>- I, too, don't like the way the main Kafka logo is obscured (a concern
>already voiced in this thread). Also, the Kafka 'logo' embedded in the
>proposed KStreams logos is not the original one.
>- None of the proposed KStreams logos visually match the Kafka logo.
>They have a totally different style, font, line art, and color scheme.
>- Execution-wise, the main Kafka logo looks great at all sizes.  The
>style of the otter logos, in comparison, becomes undecipherable at
> smaller
>sizes.
>
> What I would suggest is to first agree on what the KStreams logo is
> supposed to convey to the reader.  Here's my personal take:
>
> Objective 1: First and foremost, the KStreams logo should make it clear and
> obvious that KStreams is an official and integral part of Apache Kafka.
> This applies to both what is depicted and how it is depicted (like font,
> line art, colors).
> Objective 2: The logo should allude to the role of KStreams in the Kafka
> project, which is the processing part.  That is, "doing something useful to
> the data in Kafka".
>
> The "circling arrow" aspect of the current otter logos does allude to
> "continuous processing", which is going in the direction of (2), but the
> logos do not meet (1) in my opinion.
>
> -Michael
>
>
>
>
> On Tue, Aug 18, 2020 at 10:34 PM Matthias J. Sax  wrote:
>
> > Adding the user mailing list -- I think we should accepts votes on both
> > lists for this special case, as it's not a technical decision.
> >
> > @Boyang: as mentioned by Bruno, can we maybe add black/white options for
> > both proposals, too?
> >
> > I also agree that Design B is not ideal with regard to the Kafka logo.
> 

Re: MirrorMaker 2.0 - Translating offsets for remote topics and consumer groups

2020-08-19 Thread Josh C
Sorry, correction -- I am realizing now it would be 3 copies of the same
topic data as A.topic1 has different data than B.topic1. However, that
would still be 3 copies as opposed to just 2 with something like topic1 and
A.topic1.

As well, if I were to explicitly replicate the remote topic back to the
source cluster by adding it to the topic whitelist, would I also need to
update the topic blacklist and remove ".*\.replica" (since the blacklists
take precedence over the whitelists)?

Josh

On Wed, Aug 19, 2020 at 11:46 AM Josh C  wrote:

> Thanks for the clarification Ryanne. In the context of active/active
> clusters, does this mean there would be 6 copies of the same topic data?
>
> A topics:
> - topic1
> - B.topic1
> - B.A.topic1
>
> B topics:
> - topic1
> - A.topic1
> - A.B.topic1
>
> Out of curiosity, is there a reason for MM2 not emitting checkpoint data
> for the source topic AND the remote topic as a pair as opposed to having to
> explicitly replicate the remote topic back to the source cluster just to
> have the checkpoints emitted upstream?
>
> Josh
>
> On Wed, Aug 19, 2020 at 6:16 AM Ryanne Dolan 
> wrote:
>
>> Josh, yes it's possible to migrate the consumer group back to the source
>> topic, but you need to explicitly replicate the remote topic back to the
>> source cluster -- otherwise no checkpoints will flow "upstream":
>>
>> A->B.topics=test1
>> B->A.topics=A.test1
>>
>> After the first checkpoint is emitted upstream,
>> RemoteClusterUtils.translateOffsets() will translate B's A.test1 offsets
>> into A's test1 offsets for you.
>>
>> Ryanne
>>
>> On Tue, Aug 18, 2020 at 5:56 PM Josh C  wrote:
>>
>> > Hi there,
>> >
>> > I'm currently exploring MM2 and having some trouble with the
>> > RemoteClusterUtils.translateOffsets() method. I have been successful in
>> > migrating a consumer group from the source cluster to the target
>> cluster,
>> > but was wondering how I could migrate this consumer group back to the
>> > original source topic?
>> >
>> > It is my understanding that there isn't any checkpoint data being
>> > emitted for this consumer group since it is consuming from a mirrored
>> topic
>> > in the target cluster. I'm currently getting an empty map since there
>> isn't
>> > any checkpoint data for 'target.checkpoints.internal' in the source
>> > cluster. So, I was wondering how would I get these new translated
>> offsets
>> > to migrate the consumer group back to the source cluster?
>> >
>> > Please let me know if my question was unclear or if you require further
>> > clarification! Appreciate the help.
>> >
>> > Thanks,
>> > Josh
>> >
>>
>


Re: GDPR compliance

2020-08-19 Thread Apolloni, Christian
> Hi all,>
>
> there has been an interesting talk about this during a previous Kafka>
> Summit. It talks about using crypto-shredding to 'forget' user information.>
> I'm not sure if there are any slides, but it basically suggests that you'd>
> encrypt user data on Kafka, and when you get a information removal request,>
> the only thing you have to do is to delete the encryption key for that user.>
>
> Here's the announcement of the talk:>
> https://kafka-summit.org/sessions/handling-gdpr-apache-kafka-comply-without-freaking/,>
> but not sure where slides or a recording can be found unfortunately.>
>
> Hope it helps.>
>
> BR,>
> Patrick>

Hi Patrick,

Thanks for your reply, we are aware of that talk: the documentation is 
avaliable here:

https://www.confluent.io/kafka-summit-lon19/handling-gdpr-apache-kafka-comply-freaking-out/

That's what sparked our interest in such a solution.

Kind regards,

 -- 
 Christian Apolloni
Disclaimer: The contents of this email and any attachment thereto are intended 
exclusively for the attention of the addressee(s). The email and any such 
attachment(s) may contain information that is confidential and protected on the 
strength of professional, official or business secrecy laws and regulations or 
contractual obligations. Should you have received this email by mistake, you 
may neither make use of nor divulge the contents of the email or of any 
attachment thereto. In such a case, please inform the email's sender and delete 
the message and all attachments without delay from your systems.
You can find our e-mail disclaimer statement in other languages under 
http://www.baloise.ch/email_disclaimer


Re: MirrorMaker 2.0 - Translating offsets for remote topics and consumer groups

2020-08-19 Thread Josh C
Thanks for the clarification Ryanne. In the context of active/active
clusters, does this mean there would be 6 copies of the same topic data?

A topics:
- topic1
- B.topic1
- B.A.topic1

B topics:
- topic1
- A.topic1
- A.B.topic1

Out of curiosity, is there a reason for MM2 not emitting checkpoint data
for the source topic AND the remote topic as a pair as opposed to having to
explicitly replicate the remote topic back to the source cluster just to
have the checkpoints emitted upstream?

Josh

On Wed, Aug 19, 2020 at 6:16 AM Ryanne Dolan  wrote:

> Josh, yes it's possible to migrate the consumer group back to the source
> topic, but you need to explicitly replicate the remote topic back to the
> source cluster -- otherwise no checkpoints will flow "upstream":
>
> A->B.topics=test1
> B->A.topics=A.test1
>
> After the first checkpoint is emitted upstream,
> RemoteClusterUtils.translateOffsets() will translate B's A.test1 offsets
> into A's test1 offsets for you.
>
> Ryanne
>
> On Tue, Aug 18, 2020 at 5:56 PM Josh C  wrote:
>
> > Hi there,
> >
> > I'm currently exploring MM2 and having some trouble with the
> > RemoteClusterUtils.translateOffsets() method. I have been successful in
> > migrating a consumer group from the source cluster to the target cluster,
> > but was wondering how I could migrate this consumer group back to the
> > original source topic?
> >
> > It is my understanding that there isn't any checkpoint data being
> > emitted for this consumer group since it is consuming from a mirrored
> topic
> > in the target cluster. I'm currently getting an empty map since there
> isn't
> > any checkpoint data for 'target.checkpoints.internal' in the source
> > cluster. So, I was wondering how would I get these new translated offsets
> > to migrate the consumer group back to the source cluster?
> >
> > Please let me know if my question was unclear or if you require further
> > clarification! Appreciate the help.
> >
> > Thanks,
> > Josh
> >
>


Re: GDPR compliance

2020-08-19 Thread Patrick Plaatje
Hi all,

there has been an interesting talk about this during a previous Kafka
Summit. It talks about using crypto-shredding to 'forget' user information.
I'm not sure if there are any slides, but it basically suggests that you'd
encrypt user data on Kafka, and when you get a information removal request,
the only thing you have to do is to delete the encryption key for that user.

Here's the announcement of the talk:
https://kafka-summit.org/sessions/handling-gdpr-apache-kafka-comply-without-freaking/,
but not sure where slides or a recording can be found unfortunately.

Hope it helps.

BR,
Patrick

On Wed, 19 Aug 2020 at 18:16, Nemeth Sandor 
wrote:

> Hi Christian,
>
> depending on how your Kafka topics are configured, you have 2 different
> options:
>
> a) if you have a non-log-compacted then you can set the message retention
> on the topic to the desired value. In that case the message will be deleted
> by Kafka after the retention period expires. (the config value is `
> retention.ms` I think)
>
> b) if you use Kafka as a log store with topics having infinite retention,
> then one common solution is to send a so-called tombstone record (a record
> with the same key containing only GDPR compatible data with the sensitive
> information removed), and let Kafka take care of the removal using log
> compaction.
>
> Kind regards,
> Sandor
>
>
> On Wed, 19 Aug 2020 at 16:53, Apolloni, Christian <
> christian.apoll...@baloise.ch> wrote:
>
> > Hello,
> >
> > I have some questions about implementing GDPR compliance in Kafka.
> >
> > In our situation we have the requirement of removing personal data from
> in
> > coordination with multiple systems. The idea is having a central
> > "coordinator system" which triggers the deletion process for the
> individual
> > systems in a specific, controlled sequence which takes into account the
> > various system inter-dependencies and data flows. This means e.g. system
> > nr. 2 will receive the delete order only after system nr. 1 has reported
> > that it's done with the deletion on its side (and so forth).
> >
> > One of the systems in question publishes data in Kafka topics for
> > consumption in other systems and part of the deletion process is to
> remove
> > the relevant personal data from these Kafka topics too. This has to
> happen
> > in a relatively short time after the deletion order is received, to
> prevent
> > a long delay before the systems further down the chain can start their
> own
> > deletion. Furthermore, we need to know when the operation is completed:
> > only at that point we can give the "go" to the other systems.
> >
> > We are unsure how to satisfy those requirements in Kafka. If anyone has
> > ideas or suggestions we would be very interested in your opinion. We are
> > also interested in general about experiences in implementing GDPR
> > compliance in Kafka, especially when dealing with multiple,
> interconnected
> > systems.
> >
> > Kind regards,
> >
> > --
> > Christian Apolloni
> >
> > Disclaimer: The contents of this email and any attachment thereto are
> > intended exclusively for the attention of the addressee(s). The email and
> > any such attachment(s) may contain information that is confidential and
> > protected on the strength of professional, official or business secrecy
> > laws and regulations or contractual obligations. Should you have received
> > this email by mistake, you may neither make use of nor divulge the
> contents
> > of the email or of any attachment thereto. In such a case, please inform
> > the email's sender and delete the message and all attachments without
> delay
> > from your systems.
> > You can find our e-mail disclaimer statement in other languages under
> > http://www.baloise.ch/email_disclaimer
> >
>


-- 
Patrick Plaatje


Re: [VOTE] KIP-657: Add Customized Kafka Streams Logo

2020-08-19 Thread Ben Stopford
Purely out of curiosity, why an otter? Is there some reasoning behind it or
is it just because it's cute?

On Wed, 19 Aug 2020 at 17:57, Guozhang Wang  wrote:

> Hi Michael,
>
> Thanks for the feedback, but I'm not totally in agreement with your
> proposed objectives of the logos. More specifically, though I agree Kafka's
> logo is a smartly designed one to convey some key ideas of the project such
> as "distributed" and "pipeline" in a techy manner, I'm not convinced it is
> the only golden-standard principle for any logo designs. And I also feel
> that it is not necessary to enforce Kafka Streams logo inheriting the same
> techy and cold simple line style as with Kafka logos and avoid other
> styles, e.g. animal images. To me the purpose of having a Kafka Streams
> logo is to demonstrate the stream processing part of Kafka, and the symbol
> of an otter to me illustrate "continuous actions in streams" surrounding
> Kafka in a nice way (and yes, it's also cute :). In addition I honestly do
> not share your concern that the current proposal would make people confuse
> Kafka Streams is not an official and integral part of Apache Kafka, but of
> course that's a subjective topic for us all.
>
> I do agree with Robin on the other feedback that it makes the Kafka logo
> too small especially in design A, while it obscures the Kafka logo in
> design B. I think this is addressable under the current design proposal
> though.
>
>
> Guozhang
>
>
> On Wed, Aug 19, 2020 at 9:24 AM Robin Moffatt  wrote:
>
> > I echo what Michael says here.
> >
> > Another consideration is that logos are often shrunk (when used on
> slides)
> > and need to work at lower resolution (think: printing swag, stitching
> > socks, etc) and so whatever logo we come up with needs to not be too
> fiddly
> > in the level of detail - something that I think both the current proposed
> > options will fall foul of IMHO.
> >
> >
> > On Wed, 19 Aug 2020 at 15:33, Michael Noll  wrote:
> >
> > > Hi all!
> > >
> > > Great to see we are in the process of creating a cool logo for Kafka
> > > Streams.  First, I apologize for sharing feedback so late -- I just
> > learned
> > > about it today. :-)
> > >
> > > Here's my *personal, subjective* opinion on the currently two logo
> > > candidates for Kafka Streams.
> > >
> > > TL;DR: Sorry, but I really don't like either of the proposed "otter"
> > logos.
> > > Let me try to explain why.
> > >
> > >- The choice to use an animal, regardless of which specific animal,
> > >seems random and doesn't fit Kafka. (What's the purpose? To show
> that
> > >KStreams is 'cute'?) In comparison, the O’Reilly books always have
> an
> > >animal cover, that’s their style, and it is very recognizable.
> Kafka
> > >however has its own, different style.  The Kafka logo has clear,
> > simple
> > >lines to achieve an abstract and ‘techy’ look, which also alludes
> > > nicely to
> > >its architectural simplicity. Its logo is also a smart play on the
> > >Kafka-identifying letter “K” and alluding to it being a distributed
> > > system
> > >(the circles and links that make the K).
> > >- The proposed logos, however, make it appear as if KStreams is a
> > >third-party technology that was bolted onto Kafka. They certainly,
> for
> > > me,
> > >do not convey the message "Kafka Streams is an official part of
> Apache
> > >Kafka".
> > >- I, too, don't like the way the main Kafka logo is obscured (a
> > concern
> > >already voiced in this thread). Also, the Kafka 'logo' embedded in
> the
> > >proposed KStreams logos is not the original one.
> > >- None of the proposed KStreams logos visually match the Kafka logo.
> > >They have a totally different style, font, line art, and color
> scheme.
> > >- Execution-wise, the main Kafka logo looks great at all sizes.  The
> > >style of the otter logos, in comparison, becomes undecipherable at
> > > smaller
> > >sizes.
> > >
> > > What I would suggest is to first agree on what the KStreams logo is
> > > supposed to convey to the reader.  Here's my personal take:
> > >
> > > Objective 1: First and foremost, the KStreams logo should make it clear
> > and
> > > obvious that KStreams is an official and integral part of Apache Kafka.
> > > This applies to both what is depicted and how it is depicted (like
> font,
> > > line art, colors).
> > > Objective 2: The logo should allude to the role of KStreams in the
> Kafka
> > > project, which is the processing part.  That is, "doing something
> useful
> > to
> > > the data in Kafka".
> > >
> > > The "circling arrow" aspect of the current otter logos does allude to
> > > "continuous processing", which is going in the direction of (2), but
> the
> > > logos do not meet (1) in my opinion.
> > >
> > > -Michael
> > >
> > >
> > >
> > >
> > > On Tue, Aug 18, 2020 at 10:34 PM Matthias J. Sax 
> > wrote:
> > >
> > > > Adding the user mailing list -- I think we should accepts votes on
> both
> > 

Re: Steps & best-practices to upgrade Confluent Kafka 4.1x to 5.3x

2020-08-19 Thread Rijo Roy
Sure Manoj!

Really appreciate your quick response..

On 2020/08/19 17:40:54,  wrote: 
> Great .
> Share your finding  to this group  once you done upgrade Confluent Kafka 4.1x 
> to 5.3x successfully .
> 
> I see many people having  same question here .
> 
> On 8/19/20, 10:38 AM, "Rijo Roy"  wrote:
> 
> [External]
> 
> 
> Thanks Manoj!
> 
> Yeah, the plan is to start with non-prod and validate first before going 
> to prod.
> 
> Thanks & Regards,
> Rijo Roy
> 
> On 2020/08/19 17:33:53,  wrote:
> > I advise to do it non-prod for validation .
> > You can backup data log folder if you want but I have'nt see any issue 
> . but better to backup data if it small .
> >
> > Don’t change below value to latest until you done full validation , 
> once you changed  to latest then you can't rollback .
> >
> > inter.broker.protocol.version=2.1.x
> >
> > On 8/19/20, 9:52 AM, "Rijo Roy"  wrote:
> >
> > [External]
> >
> >
> > Thanks Manoj! Appreciate your help..
> >
> > I will follow the steps you pointed out..
> >
> > Do you think there is a need to :
> > 1. backup the data before the rolling upgrade
> > 2. some kind of datasync that should be considered here.. I don't 
> think this is required as I am performing an in-place upgrade..
> >
> > Thanks & Regards,
> > Rijo Roy
> >
> > On 2020/08/18 20:45:42,  wrote:
> > > You can follow below steps
> > >
> > > 1. set inter.broker.protocol.version=2.1.x  and rolling restart 
> kafka
> > > 2. Rolling upgrade the Kafka cluster to 2.5 -
> > > 3. rolling upgrade ZK cluster
> > > Validate the kafka .
> > >
> > > 4. set inter.broker.protocol.version= new version and rolling 
> restart the Kafka
> > >
> > >
> > >
> > > On 8/18/20, 12:54 PM, "Rijo Roy"  wrote:
> > >
> > > [External]
> > >
> > >
> > > Hi,
> > >
> > > I am a newbie in Kafka and would greatly appreciate if 
> someone could help with best-practices and steps to upgrade to v5.3x.
> > >
> > > Below is my existing set-up:
> > > OS version:  Ubuntu 16.04.6 LTS
> > > ZooKeeper version : 3.4.10
> > > Kafka version : confluent-kafka-2.11 / 1.1.1-cp2 / v4.1.1
> > >
> > > We need to upgrade our OS version to Ubuntu 18.04 LTS whose 
> minimum requirement is to upgrade Kafka to v5.3x. Could someone please help 
> me with the best-practices & steps for the upgrade..
> > >
> > > Please let me know if you need any more information so that 
> you could help me.
> > >
> > > Appreciate your help!
> > >
> > > Thanks & Regards,
> > > Rijo Roy
> > >
> > >
> > >
> > > This e-mail and any files transmitted with it are for the sole 
> use of the intended recipient(s) and may contain confidential and privileged 
> information. If you are not the intended recipient(s), please reply to the 
> sender and destroy all copies of the original message. Any unauthorized 
> review, use, disclosure, dissemination, forwarding, printing or copying of 
> this email, and/or any action taken in reliance on the contents of this 
> e-mail is strictly prohibited and may be unlawful. Where permitted by 
> applicable law, this e-mail and other e-mail communications sent to and from 
> Cognizant e-mail addresses may be monitored.
> > > This e-mail and any files transmitted with it are for the sole 
> use of the intended recipient(s) and may contain confidential and privileged 
> information. If you are not the intended recipient(s), please reply to the 
> sender and destroy all copies of the original message. Any unauthorized 
> review, use, disclosure, dissemination, forwarding, printing or copying of 
> this email, and/or any action taken in reliance on the contents of this 
> e-mail is strictly prohibited and may be unlawful. Where permitted by 
> applicable law, this e-mail and other e-mail communications sent to and from 
> Cognizant e-mail addresses may be monitored.
> > >
> >
> >
> > This e-mail and any files transmitted with it are for the sole use of 
> the intended recipient(s) and may contain confidential and privileged 
> information. If you are not the intended recipient(s), please reply to the 
> sender and destroy all copies of the original message. Any unauthorized 
> review, use, disclosure, dissemination, forwarding, printing or copying of 
> this email, and/or any action taken in reliance on the contents of this 
> e-mail is strictly prohibited and may be unlawful. Where permitted by 
> applicable law, this e-mail and other e-mail communications sent to and from 
> Cognizant e-mail addresses may be monitored.
> > This e-mail

Re: GDPR compliance

2020-08-19 Thread Apolloni, Christian
As alternative solution we also investigated encryption: encrypting all 
messages with an individual key and removing the key once the "deletion" needs 
to be performed.

Has anyone experience with such a solution?

 -- 
 Christian Apolloni



Disclaimer: The contents of this email and any attachment thereto are intended 
exclusively for the attention of the addressee(s). The email and any such 
attachment(s) may contain information that is confidential and protected on the 
strength of professional, official or business secrecy laws and regulations or 
contractual obligations. Should you have received this email by mistake, you 
may neither make use of nor divulge the contents of the email or of any 
attachment thereto. In such a case, please inform the email's sender and delete 
the message and all attachments without delay from your systems.
You can find our e-mail disclaimer statement in other languages under 
http://www.baloise.ch/email_disclaimer


Re: GDPR compliance

2020-08-19 Thread Apolloni, Christian
Hi Sandor, thanks again for your reply.

> If you have a non-log-compacted topic, after `retention.ms` the message>
> (along with the PII) gets deleted from the Kafka message store without any>
> further action, which should satisfy GDPR requirements:>
> - you are handling PII in Kafka for a limited amount of time>
> - you are processing the data for the given purpose it was given>
> - the data will automatically be deleted without any further steps>
> If you have a downstream system, you should also be able to publish a>
> message through Kafka so that the downstream system executes its delete>
> processes - if required. We implemented a similar process where we>
> published an AnonymizeOrder event, which instructed downstream systems to>
> anonymize the order data in their own data store.>

Our problem is, the data could have been published shortly before the system 
receives a delete order from the "coordinator". This is because the data might 
have been mutated and the update needs to be propagated to consumer systems. If 
we go with a retention-period of days we would only be able to proceed with 
subsequent systems in the coordinated chain with too much of a delay. Going 
with an even shorter retention would be problematic.

> If you have a log-compacted topic:>
> - yes, I have the same understanding as you have on the active segment.>
> - You can set the segment.ms>
>  property to force the>
> compaction to occur within an expected timeframe.>
>
> In general what I understand is true in both cases that Kafka gives you>
> good enough guarantees to either remove the old message after retention.ms>
> milliseconds or execute the topic compaction after segment.ms time that it>
> is unnecessary to try to figure out more specifically in what exact moment>
> the data is deleted. Setting these configurations should give you enough>
> guarantee that the data removal will occur - if not, that imo should be>
> considered a bug and reported back to the project.>

We investigated the max.compaction.lag.ms parameter which was introduced in 
KIP-354 and from our understanding the intent is exactly what we'd like to 
accomplish, but unless we missed something we have noticed new segments are 
rolled only if new messages are appended. If the topic has very low activity it 
can be that no new message is appended and the segment is left active 
indefinitely. This means the cleaning for that segment might remain also 
indefinitely stalled. We are unsure whether our understanding is correct and 
whether it's a bug or not.

In general, I think part of the issue is that the system receives the delete 
order at the time that it has to be performed: we don't deal with the 
processing of the required waiting periods, that's what happens in the 
"coordinator system". The system with the data to be deleted receives the order 
and has to perform the deletion immediately.

Kind regards,

 -- 
 Christian Apolloni



Disclaimer: The contents of this email and any attachment thereto are intended 
exclusively for the attention of the addressee(s). The email and any such 
attachment(s) may contain information that is confidential and protected on the 
strength of professional, official or business secrecy laws and regulations or 
contractual obligations. Should you have received this email by mistake, you 
may neither make use of nor divulge the contents of the email or of any 
attachment thereto. In such a case, please inform the email's sender and delete 
the message and all attachments without delay from your systems.
You can find our e-mail disclaimer statement in other languages under 
http://www.baloise.ch/email_disclaimer


Re: Steps & best-practices to upgrade Confluent Kafka 4.1x to 5.3x

2020-08-19 Thread Manoj.Agrawal2
Great .
Share your finding  to this group  once you done upgrade Confluent Kafka 4.1x 
to 5.3x successfully .

I see many people having  same question here .

On 8/19/20, 10:38 AM, "Rijo Roy"  wrote:

[External]


Thanks Manoj!

Yeah, the plan is to start with non-prod and validate first before going to 
prod.

Thanks & Regards,
Rijo Roy

On 2020/08/19 17:33:53,  wrote:
> I advise to do it non-prod for validation .
> You can backup data log folder if you want but I have'nt see any issue . 
but better to backup data if it small .
>
> Don’t change below value to latest until you done full validation , once 
you changed  to latest then you can't rollback .
>
> inter.broker.protocol.version=2.1.x
>
> On 8/19/20, 9:52 AM, "Rijo Roy"  wrote:
>
> [External]
>
>
> Thanks Manoj! Appreciate your help..
>
> I will follow the steps you pointed out..
>
> Do you think there is a need to :
> 1. backup the data before the rolling upgrade
> 2. some kind of datasync that should be considered here.. I don't 
think this is required as I am performing an in-place upgrade..
>
> Thanks & Regards,
> Rijo Roy
>
> On 2020/08/18 20:45:42,  wrote:
> > You can follow below steps
> >
> > 1. set inter.broker.protocol.version=2.1.x  and rolling restart 
kafka
> > 2. Rolling upgrade the Kafka cluster to 2.5 -
> > 3. rolling upgrade ZK cluster
> > Validate the kafka .
> >
> > 4. set inter.broker.protocol.version= new version and rolling 
restart the Kafka
> >
> >
> >
> > On 8/18/20, 12:54 PM, "Rijo Roy"  wrote:
> >
> > [External]
> >
> >
> > Hi,
> >
> > I am a newbie in Kafka and would greatly appreciate if someone 
could help with best-practices and steps to upgrade to v5.3x.
> >
> > Below is my existing set-up:
> > OS version:  Ubuntu 16.04.6 LTS
> > ZooKeeper version : 3.4.10
> > Kafka version : confluent-kafka-2.11 / 1.1.1-cp2 / v4.1.1
> >
> > We need to upgrade our OS version to Ubuntu 18.04 LTS whose 
minimum requirement is to upgrade Kafka to v5.3x. Could someone please help me 
with the best-practices & steps for the upgrade..
> >
> > Please let me know if you need any more information so that you 
could help me.
> >
> > Appreciate your help!
> >
> > Thanks & Regards,
> > Rijo Roy
> >
> >
> >
> > This e-mail and any files transmitted with it are for the sole use 
of the intended recipient(s) and may contain confidential and privileged 
information. If you are not the intended recipient(s), please reply to the 
sender and destroy all copies of the original message. Any unauthorized review, 
use, disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored.
> > This e-mail and any files transmitted with it are for the sole use 
of the intended recipient(s) and may contain confidential and privileged 
information. If you are not the intended recipient(s), please reply to the 
sender and destroy all copies of the original message. Any unauthorized review, 
use, disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored.
> >
>
>
> This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored.
> This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forward

Re: Steps & best-practices to upgrade Confluent Kafka 4.1x to 5.3x

2020-08-19 Thread Rijo Roy
Thanks Manoj!

Yeah, the plan is to start with non-prod and validate first before going to 
prod.

Thanks & Regards,
Rijo Roy

On 2020/08/19 17:33:53,  wrote: 
> I advise to do it non-prod for validation .
> You can backup data log folder if you want but I have'nt see any issue . but 
> better to backup data if it small .
> 
> Don’t change below value to latest until you done full validation , once you 
> changed  to latest then you can't rollback .
> 
> inter.broker.protocol.version=2.1.x
> 
> On 8/19/20, 9:52 AM, "Rijo Roy"  wrote:
> 
> [External]
> 
> 
> Thanks Manoj! Appreciate your help..
> 
> I will follow the steps you pointed out..
> 
> Do you think there is a need to :
> 1. backup the data before the rolling upgrade
> 2. some kind of datasync that should be considered here.. I don't think 
> this is required as I am performing an in-place upgrade..
> 
> Thanks & Regards,
> Rijo Roy
> 
> On 2020/08/18 20:45:42,  wrote:
> > You can follow below steps
> >
> > 1. set inter.broker.protocol.version=2.1.x  and rolling restart kafka
> > 2. Rolling upgrade the Kafka cluster to 2.5 -
> > 3. rolling upgrade ZK cluster
> > Validate the kafka .
> >
> > 4. set inter.broker.protocol.version= new version and rolling restart 
> the Kafka
> >
> >
> >
> > On 8/18/20, 12:54 PM, "Rijo Roy"  wrote:
> >
> > [External]
> >
> >
> > Hi,
> >
> > I am a newbie in Kafka and would greatly appreciate if someone 
> could help with best-practices and steps to upgrade to v5.3x.
> >
> > Below is my existing set-up:
> > OS version:  Ubuntu 16.04.6 LTS
> > ZooKeeper version : 3.4.10
> > Kafka version : confluent-kafka-2.11 / 1.1.1-cp2 / v4.1.1
> >
> > We need to upgrade our OS version to Ubuntu 18.04 LTS whose minimum 
> requirement is to upgrade Kafka to v5.3x. Could someone please help me with 
> the best-practices & steps for the upgrade..
> >
> > Please let me know if you need any more information so that you 
> could help me.
> >
> > Appreciate your help!
> >
> > Thanks & Regards,
> > Rijo Roy
> >
> >
> >
> > This e-mail and any files transmitted with it are for the sole use of 
> the intended recipient(s) and may contain confidential and privileged 
> information. If you are not the intended recipient(s), please reply to the 
> sender and destroy all copies of the original message. Any unauthorized 
> review, use, disclosure, dissemination, forwarding, printing or copying of 
> this email, and/or any action taken in reliance on the contents of this 
> e-mail is strictly prohibited and may be unlawful. Where permitted by 
> applicable law, this e-mail and other e-mail communications sent to and from 
> Cognizant e-mail addresses may be monitored.
> > This e-mail and any files transmitted with it are for the sole use of 
> the intended recipient(s) and may contain confidential and privileged 
> information. If you are not the intended recipient(s), please reply to the 
> sender and destroy all copies of the original message. Any unauthorized 
> review, use, disclosure, dissemination, forwarding, printing or copying of 
> this email, and/or any action taken in reliance on the contents of this 
> e-mail is strictly prohibited and may be unlawful. Where permitted by 
> applicable law, this e-mail and other e-mail communications sent to and from 
> Cognizant e-mail addresses may be monitored.
> >
> 
> 
> This e-mail and any files transmitted with it are for the sole use of the 
> intended recipient(s) and may contain confidential and privileged 
> information. If you are not the intended recipient(s), please reply to the 
> sender and destroy all copies of the original message. Any unauthorized 
> review, use, disclosure, dissemination, forwarding, printing or copying of 
> this email, and/or any action taken in reliance on the contents of this 
> e-mail is strictly prohibited and may be unlawful. Where permitted by 
> applicable law, this e-mail and other e-mail communications sent to and from 
> Cognizant e-mail addresses may be monitored.
> This e-mail and any files transmitted with it are for the sole use of the 
> intended recipient(s) and may contain confidential and privileged 
> information. If you are not the intended recipient(s), please reply to the 
> sender and destroy all copies of the original message. Any unauthorized 
> review, use, disclosure, dissemination, forwarding, printing or copying of 
> this email, and/or any action taken in reliance on the contents of this 
> e-mail is strictly prohibited and may be unlawful. Where permitted by 
> applicable law, this e-mail and other e-mail communications sent to and from 
> Cognizant e-mail addresses may be monitored.
> 


Re: Steps & best-practices to upgrade Confluent Kafka 4.1x to 5.3x

2020-08-19 Thread Manoj.Agrawal2
I advise to do it non-prod for validation .
You can backup data log folder if you want but I have'nt see any issue . but 
better to backup data if it small .

Don’t change below value to latest until you done full validation , once you 
changed  to latest then you can't rollback .

inter.broker.protocol.version=2.1.x

On 8/19/20, 9:52 AM, "Rijo Roy"  wrote:

[External]


Thanks Manoj! Appreciate your help..

I will follow the steps you pointed out..

Do you think there is a need to :
1. backup the data before the rolling upgrade
2. some kind of datasync that should be considered here.. I don't think 
this is required as I am performing an in-place upgrade..

Thanks & Regards,
Rijo Roy

On 2020/08/18 20:45:42,  wrote:
> You can follow below steps
>
> 1. set inter.broker.protocol.version=2.1.x  and rolling restart kafka
> 2. Rolling upgrade the Kafka cluster to 2.5 -
> 3. rolling upgrade ZK cluster
> Validate the kafka .
>
> 4. set inter.broker.protocol.version= new version and rolling restart the 
Kafka
>
>
>
> On 8/18/20, 12:54 PM, "Rijo Roy"  wrote:
>
> [External]
>
>
> Hi,
>
> I am a newbie in Kafka and would greatly appreciate if someone could 
help with best-practices and steps to upgrade to v5.3x.
>
> Below is my existing set-up:
> OS version:  Ubuntu 16.04.6 LTS
> ZooKeeper version : 3.4.10
> Kafka version : confluent-kafka-2.11 / 1.1.1-cp2 / v4.1.1
>
> We need to upgrade our OS version to Ubuntu 18.04 LTS whose minimum 
requirement is to upgrade Kafka to v5.3x. Could someone please help me with the 
best-practices & steps for the upgrade..
>
> Please let me know if you need any more information so that you could 
help me.
>
> Appreciate your help!
>
> Thanks & Regards,
> Rijo Roy
>
>
>
> This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored.
> This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored.
>


This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored.
This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored.


Re: GDPR compliance

2020-08-19 Thread Nemeth Sandor
Hi Christian,

disclaimer: IANAL, so take everything with a grain of salt from the legal
perspective, I'm sharing the experience I have handling PII data with Kafka
in an ecommerce system, so your requirements may differ.

I'm not sure how your system is designed but in general from a data
management perspective you can consider the following:

If you have a non-log-compacted topic, after `retention.ms` the message
(along with the PII) gets deleted from the Kafka message store without any
further action, which should satisfy GDPR requirements:
- you are handling PII in Kafka for a limited amount of time
- you are processing the data for the given purpose it was given
- the data will automatically be deleted without any further steps
If you have a downstream system, you should also be able to publish a
message through Kafka so that the downstream system executes its delete
processes - if required. We implemented a similar process where we
published an AnonymizeOrder event, which instructed downstream systems to
anonymize the order data in their own data store.

If you have a log-compacted topic:
- yes, I have the same understanding as you have on the active segment.
- You can set the segment.ms
 property to force the
compaction to occur within an expected timeframe.

In general what I understand is true in both cases that Kafka gives you
good enough guarantees to either remove the old message after retention.ms
milliseconds or execute the topic compaction after segment.ms time that it
is unnecessary to try to figure out more specifically in what exact moment
the data is deleted. Setting these configurations should give you enough
guarantee that the data removal will occur - if not, that imo should be
considered a bug and reported back to the project.

>From the GDPR point-of-view if you set these values reasonably low enough
(couple of days) that should be acceptable as you have one month to comply
with the delete request (see
https://ico.org.uk/for-organisations/guide-to-data-protection/guide-to-the-general-data-protection-regulation-gdpr/individual-rights/right-to-erasure/#:~:text=The%20GDPR%20introduces%20a%20right,to%20respond%20to%20a%20request.
)

I hope this helps!

Kind regards,
Sandor


On Wed, 19 Aug 2020 at 18:43, Apolloni, Christian <
christian.apoll...@baloise.ch> wrote:

> On 2020/08/19 16:15:40, Nemeth Sandor  wrote:
> > Hi Christian,>
>
> Hi, thanks for your reply.
>
> > depending on how your Kafka topics are configured, you have 2 different>
> > options:>
> >
> > a) if you have a non-log-compacted then you can set the message
> retention>
> > on the topic to the desired value. In that case the message will be
> deleted>
> > by Kafka after the retention period expires. (the config value is `>
> > retention.ms` I think)>
>
> That's what we thought too at first as solution, but we likely cannot set
> the retention low enough.
>
> > b) if you use Kafka as a log store with topics having infinite
> retention,>
> > then one common solution is to send a so-called tombstone record (a
> record>
> > with the same key containing only GDPR compatible data with the
> sensitive>
> > information removed), and let Kafka take care of the removal using log>
> > compaction.>
>
> We also thought about this, but as far as we understood there is no real
> guarantee that the compaction completes in a given time for all messages in
> the topic. From what we understood compaction can be delayed by the
> messages still being in the active segment and/or the compaction thread
> pool being too busy.
>
> It's also unclear to us how we can know that the compaction has completed
> for all relevant messages and that we can safely report to our "coordinator
> system" that the next system can start its own deletion process safely.
>
> Kind Regards,
>
>  --
>  Christian Apolloni
>
>
> Disclaimer: The contents of this email and any attachment thereto are
> intended exclusively for the attention of the addressee(s). The email and
> any such attachment(s) may contain information that is confidential and
> protected on the strength of professional, official or business secrecy
> laws and regulations or contractual obligations. Should you have received
> this email by mistake, you may neither make use of nor divulge the contents
> of the email or of any attachment thereto. In such a case, please inform
> the email's sender and delete the message and all attachments without delay
> from your systems.
> You can find our e-mail disclaimer statement in other languages under
> http://www.baloise.ch/email_disclaimer
>


Re: [VOTE] KIP-657: Add Customized Kafka Streams Logo

2020-08-19 Thread Guozhang Wang
Hi Michael,

Thanks for the feedback, but I'm not totally in agreement with your
proposed objectives of the logos. More specifically, though I agree Kafka's
logo is a smartly designed one to convey some key ideas of the project such
as "distributed" and "pipeline" in a techy manner, I'm not convinced it is
the only golden-standard principle for any logo designs. And I also feel
that it is not necessary to enforce Kafka Streams logo inheriting the same
techy and cold simple line style as with Kafka logos and avoid other
styles, e.g. animal images. To me the purpose of having a Kafka Streams
logo is to demonstrate the stream processing part of Kafka, and the symbol
of an otter to me illustrate "continuous actions in streams" surrounding
Kafka in a nice way (and yes, it's also cute :). In addition I honestly do
not share your concern that the current proposal would make people confuse
Kafka Streams is not an official and integral part of Apache Kafka, but of
course that's a subjective topic for us all.

I do agree with Robin on the other feedback that it makes the Kafka logo
too small especially in design A, while it obscures the Kafka logo in
design B. I think this is addressable under the current design proposal
though.


Guozhang


On Wed, Aug 19, 2020 at 9:24 AM Robin Moffatt  wrote:

> I echo what Michael says here.
>
> Another consideration is that logos are often shrunk (when used on slides)
> and need to work at lower resolution (think: printing swag, stitching
> socks, etc) and so whatever logo we come up with needs to not be too fiddly
> in the level of detail - something that I think both the current proposed
> options will fall foul of IMHO.
>
>
> On Wed, 19 Aug 2020 at 15:33, Michael Noll  wrote:
>
> > Hi all!
> >
> > Great to see we are in the process of creating a cool logo for Kafka
> > Streams.  First, I apologize for sharing feedback so late -- I just
> learned
> > about it today. :-)
> >
> > Here's my *personal, subjective* opinion on the currently two logo
> > candidates for Kafka Streams.
> >
> > TL;DR: Sorry, but I really don't like either of the proposed "otter"
> logos.
> > Let me try to explain why.
> >
> >- The choice to use an animal, regardless of which specific animal,
> >seems random and doesn't fit Kafka. (What's the purpose? To show that
> >KStreams is 'cute'?) In comparison, the O’Reilly books always have an
> >animal cover, that’s their style, and it is very recognizable.  Kafka
> >however has its own, different style.  The Kafka logo has clear,
> simple
> >lines to achieve an abstract and ‘techy’ look, which also alludes
> > nicely to
> >its architectural simplicity. Its logo is also a smart play on the
> >Kafka-identifying letter “K” and alluding to it being a distributed
> > system
> >(the circles and links that make the K).
> >- The proposed logos, however, make it appear as if KStreams is a
> >third-party technology that was bolted onto Kafka. They certainly, for
> > me,
> >do not convey the message "Kafka Streams is an official part of Apache
> >Kafka".
> >- I, too, don't like the way the main Kafka logo is obscured (a
> concern
> >already voiced in this thread). Also, the Kafka 'logo' embedded in the
> >proposed KStreams logos is not the original one.
> >- None of the proposed KStreams logos visually match the Kafka logo.
> >They have a totally different style, font, line art, and color scheme.
> >- Execution-wise, the main Kafka logo looks great at all sizes.  The
> >style of the otter logos, in comparison, becomes undecipherable at
> > smaller
> >sizes.
> >
> > What I would suggest is to first agree on what the KStreams logo is
> > supposed to convey to the reader.  Here's my personal take:
> >
> > Objective 1: First and foremost, the KStreams logo should make it clear
> and
> > obvious that KStreams is an official and integral part of Apache Kafka.
> > This applies to both what is depicted and how it is depicted (like font,
> > line art, colors).
> > Objective 2: The logo should allude to the role of KStreams in the Kafka
> > project, which is the processing part.  That is, "doing something useful
> to
> > the data in Kafka".
> >
> > The "circling arrow" aspect of the current otter logos does allude to
> > "continuous processing", which is going in the direction of (2), but the
> > logos do not meet (1) in my opinion.
> >
> > -Michael
> >
> >
> >
> >
> > On Tue, Aug 18, 2020 at 10:34 PM Matthias J. Sax 
> wrote:
> >
> > > Adding the user mailing list -- I think we should accepts votes on both
> > > lists for this special case, as it's not a technical decision.
> > >
> > > @Boyang: as mentioned by Bruno, can we maybe add black/white options
> for
> > > both proposals, too?
> > >
> > > I also agree that Design B is not ideal with regard to the Kafka logo.
> > > Would it be possible to change Design B accordingly?
> > >
> > > I am not a font expert, but the fonts in both design are diff

Re: Steps & best-practices to upgrade Confluent Kafka 4.1x to 5.3x

2020-08-19 Thread Rijo Roy
Thanks Manoj! Appreciate your help..

I will follow the steps you pointed out..

Do you think there is a need to :
1. backup the data before the rolling upgrade 
2. some kind of datasync that should be considered here.. I don't think this is 
required as I am performing an in-place upgrade..

Thanks & Regards,
Rijo Roy

On 2020/08/18 20:45:42,  wrote: 
> You can follow below steps
> 
> 1. set inter.broker.protocol.version=2.1.x  and rolling restart kafka
> 2. Rolling upgrade the Kafka cluster to 2.5 -
> 3. rolling upgrade ZK cluster
> Validate the kafka .
> 
> 4. set inter.broker.protocol.version= new version and rolling restart the 
> Kafka
> 
> 
> 
> On 8/18/20, 12:54 PM, "Rijo Roy"  wrote:
> 
> [External]
> 
> 
> Hi,
> 
> I am a newbie in Kafka and would greatly appreciate if someone could help 
> with best-practices and steps to upgrade to v5.3x.
> 
> Below is my existing set-up:
> OS version:  Ubuntu 16.04.6 LTS
> ZooKeeper version : 3.4.10
> Kafka version : confluent-kafka-2.11 / 1.1.1-cp2 / v4.1.1
> 
> We need to upgrade our OS version to Ubuntu 18.04 LTS whose minimum 
> requirement is to upgrade Kafka to v5.3x. Could someone please help me with 
> the best-practices & steps for the upgrade..
> 
> Please let me know if you need any more information so that you could 
> help me.
> 
> Appreciate your help!
> 
> Thanks & Regards,
> Rijo Roy
> 
> 
> 
> This e-mail and any files transmitted with it are for the sole use of the 
> intended recipient(s) and may contain confidential and privileged 
> information. If you are not the intended recipient(s), please reply to the 
> sender and destroy all copies of the original message. Any unauthorized 
> review, use, disclosure, dissemination, forwarding, printing or copying of 
> this email, and/or any action taken in reliance on the contents of this 
> e-mail is strictly prohibited and may be unlawful. Where permitted by 
> applicable law, this e-mail and other e-mail communications sent to and from 
> Cognizant e-mail addresses may be monitored.
> This e-mail and any files transmitted with it are for the sole use of the 
> intended recipient(s) and may contain confidential and privileged 
> information. If you are not the intended recipient(s), please reply to the 
> sender and destroy all copies of the original message. Any unauthorized 
> review, use, disclosure, dissemination, forwarding, printing or copying of 
> this email, and/or any action taken in reliance on the contents of this 
> e-mail is strictly prohibited and may be unlawful. Where permitted by 
> applicable law, this e-mail and other e-mail communications sent to and from 
> Cognizant e-mail addresses may be monitored.
> 


Re: GDPR compliance

2020-08-19 Thread Apolloni, Christian
On 2020/08/19 16:15:40, Nemeth Sandor  wrote:
> Hi Christian,>

Hi, thanks for your reply.

> depending on how your Kafka topics are configured, you have 2 different>
> options:>
>
> a) if you have a non-log-compacted then you can set the message retention>
> on the topic to the desired value. In that case the message will be deleted>
> by Kafka after the retention period expires. (the config value is `>
> retention.ms` I think)>

That's what we thought too at first as solution, but we likely cannot set the 
retention low enough.

> b) if you use Kafka as a log store with topics having infinite retention,>
> then one common solution is to send a so-called tombstone record (a record>
> with the same key containing only GDPR compatible data with the sensitive>
> information removed), and let Kafka take care of the removal using log>
> compaction.>

We also thought about this, but as far as we understood there is no real 
guarantee that the compaction completes in a given time for all messages in the 
topic. From what we understood compaction can be delayed by the messages still 
being in the active segment and/or the compaction thread pool being too busy.

It's also unclear to us how we can know that the compaction has completed for 
all relevant messages and that we can safely report to our "coordinator system" 
that the next system can start its own deletion process safely.

Kind Regards,

 -- 
 Christian Apolloni


Disclaimer: The contents of this email and any attachment thereto are intended 
exclusively for the attention of the addressee(s). The email and any such 
attachment(s) may contain information that is confidential and protected on the 
strength of professional, official or business secrecy laws and regulations or 
contractual obligations. Should you have received this email by mistake, you 
may neither make use of nor divulge the contents of the email or of any 
attachment thereto. In such a case, please inform the email's sender and delete 
the message and all attachments without delay from your systems.
You can find our e-mail disclaimer statement in other languages under 
http://www.baloise.ch/email_disclaimer


Re: [VOTE] KIP-657: Add Customized Kafka Streams Logo

2020-08-19 Thread Robin Moffatt
I echo what Michael says here.

Another consideration is that logos are often shrunk (when used on slides)
and need to work at lower resolution (think: printing swag, stitching
socks, etc) and so whatever logo we come up with needs to not be too fiddly
in the level of detail - something that I think both the current proposed
options will fall foul of IMHO.


On Wed, 19 Aug 2020 at 15:33, Michael Noll  wrote:

> Hi all!
>
> Great to see we are in the process of creating a cool logo for Kafka
> Streams.  First, I apologize for sharing feedback so late -- I just learned
> about it today. :-)
>
> Here's my *personal, subjective* opinion on the currently two logo
> candidates for Kafka Streams.
>
> TL;DR: Sorry, but I really don't like either of the proposed "otter" logos.
> Let me try to explain why.
>
>- The choice to use an animal, regardless of which specific animal,
>seems random and doesn't fit Kafka. (What's the purpose? To show that
>KStreams is 'cute'?) In comparison, the O’Reilly books always have an
>animal cover, that’s their style, and it is very recognizable.  Kafka
>however has its own, different style.  The Kafka logo has clear, simple
>lines to achieve an abstract and ‘techy’ look, which also alludes
> nicely to
>its architectural simplicity. Its logo is also a smart play on the
>Kafka-identifying letter “K” and alluding to it being a distributed
> system
>(the circles and links that make the K).
>- The proposed logos, however, make it appear as if KStreams is a
>third-party technology that was bolted onto Kafka. They certainly, for
> me,
>do not convey the message "Kafka Streams is an official part of Apache
>Kafka".
>- I, too, don't like the way the main Kafka logo is obscured (a concern
>already voiced in this thread). Also, the Kafka 'logo' embedded in the
>proposed KStreams logos is not the original one.
>- None of the proposed KStreams logos visually match the Kafka logo.
>They have a totally different style, font, line art, and color scheme.
>- Execution-wise, the main Kafka logo looks great at all sizes.  The
>style of the otter logos, in comparison, becomes undecipherable at
> smaller
>sizes.
>
> What I would suggest is to first agree on what the KStreams logo is
> supposed to convey to the reader.  Here's my personal take:
>
> Objective 1: First and foremost, the KStreams logo should make it clear and
> obvious that KStreams is an official and integral part of Apache Kafka.
> This applies to both what is depicted and how it is depicted (like font,
> line art, colors).
> Objective 2: The logo should allude to the role of KStreams in the Kafka
> project, which is the processing part.  That is, "doing something useful to
> the data in Kafka".
>
> The "circling arrow" aspect of the current otter logos does allude to
> "continuous processing", which is going in the direction of (2), but the
> logos do not meet (1) in my opinion.
>
> -Michael
>
>
>
>
> On Tue, Aug 18, 2020 at 10:34 PM Matthias J. Sax  wrote:
>
> > Adding the user mailing list -- I think we should accepts votes on both
> > lists for this special case, as it's not a technical decision.
> >
> > @Boyang: as mentioned by Bruno, can we maybe add black/white options for
> > both proposals, too?
> >
> > I also agree that Design B is not ideal with regard to the Kafka logo.
> > Would it be possible to change Design B accordingly?
> >
> > I am not a font expert, but the fonts in both design are different and I
> > am wondering if there is an official Apache Kafka font that we should
> > reuse to make sure that the logos align -- I would expect that both
> > logos (including "Apache Kafka" and "Kafka Streams" names) will be used
> > next to each other and it would look awkward if the font differs.
> >
> >
> > -Matthias
> >
> > On 8/18/20 11:28 AM, Navinder Brar wrote:
> > > Hi,
> > > Thanks for the KIP, really like the idea. I am +1(non-binding) on A
> > mainly because I felt like you have to tilt your head to realize the
> > otter's head in B.
> > > Regards,Navinder
> > >
> > > On Tuesday, 18 August, 2020, 11:44:20 pm IST, Guozhang Wang <
> > wangg...@gmail.com> wrote:
> > >
> > >  I'm leaning towards design B primarily because it reminds me of the
> > Firefox
> > > logo which I like a lot. But I also share Adam's concern that it should
> > > better not obscure the Kafka logo --- so if we can tweak a bit to fix
> it
> > my
> > > vote goes to B, otherwise A :)
> > >
> > >
> > > Guozhang
> > >
> > > On Tue, Aug 18, 2020 at 9:48 AM Bruno Cadonna 
> > wrote:
> > >
> > >> Thanks for the KIP!
> > >>
> > >> I am +1 (non-binding) for A.
> > >>
> > >> I would also like to hear opinions whether the logo should be
> colorized
> > >> or just black and white.
> > >>
> > >> Best,
> > >> Bruno
> > >>
> > >>
> > >> On 15.08.20 16:05, Adam Bellemare wrote:
> > >>> I prefer Design B, but given that I missed the discussion thread, I
> > think
> > >>> it would be bet

Re: GDPR compliance

2020-08-19 Thread Nemeth Sandor
Hi Christian,

depending on how your Kafka topics are configured, you have 2 different
options:

a) if you have a non-log-compacted then you can set the message retention
on the topic to the desired value. In that case the message will be deleted
by Kafka after the retention period expires. (the config value is `
retention.ms` I think)

b) if you use Kafka as a log store with topics having infinite retention,
then one common solution is to send a so-called tombstone record (a record
with the same key containing only GDPR compatible data with the sensitive
information removed), and let Kafka take care of the removal using log
compaction.

Kind regards,
Sandor


On Wed, 19 Aug 2020 at 16:53, Apolloni, Christian <
christian.apoll...@baloise.ch> wrote:

> Hello,
>
> I have some questions about implementing GDPR compliance in Kafka.
>
> In our situation we have the requirement of removing personal data from in
> coordination with multiple systems. The idea is having a central
> "coordinator system" which triggers the deletion process for the individual
> systems in a specific, controlled sequence which takes into account the
> various system inter-dependencies and data flows. This means e.g. system
> nr. 2 will receive the delete order only after system nr. 1 has reported
> that it's done with the deletion on its side (and so forth).
>
> One of the systems in question publishes data in Kafka topics for
> consumption in other systems and part of the deletion process is to remove
> the relevant personal data from these Kafka topics too. This has to happen
> in a relatively short time after the deletion order is received, to prevent
> a long delay before the systems further down the chain can start their own
> deletion. Furthermore, we need to know when the operation is completed:
> only at that point we can give the "go" to the other systems.
>
> We are unsure how to satisfy those requirements in Kafka. If anyone has
> ideas or suggestions we would be very interested in your opinion. We are
> also interested in general about experiences in implementing GDPR
> compliance in Kafka, especially when dealing with multiple, interconnected
> systems.
>
> Kind regards,
>
> --
> Christian Apolloni
>
> Disclaimer: The contents of this email and any attachment thereto are
> intended exclusively for the attention of the addressee(s). The email and
> any such attachment(s) may contain information that is confidential and
> protected on the strength of professional, official or business secrecy
> laws and regulations or contractual obligations. Should you have received
> this email by mistake, you may neither make use of nor divulge the contents
> of the email or of any attachment thereto. In such a case, please inform
> the email's sender and delete the message and all attachments without delay
> from your systems.
> You can find our e-mail disclaimer statement in other languages under
> http://www.baloise.ch/email_disclaimer
>


Re: GDPR compliance

2020-08-19 Thread Jörn Franke
Be aware that deleting personal data is already processing ! You will already 
need user consent to process it In Kafka - even if it is about deletion .

Simply do not collect it. 

> Am 19.08.2020 um 16:53 schrieb Apolloni, Christian 
> :
> 
> Hello,
> 
> I have some questions about implementing GDPR compliance in Kafka.
> 
> In our situation we have the requirement of removing personal data from in 
> coordination with multiple systems. The idea is having a central "coordinator 
> system" which triggers the deletion process for the individual systems in a 
> specific, controlled sequence which takes into account the various system 
> inter-dependencies and data flows. This means e.g. system nr. 2 will receive 
> the delete order only after system nr. 1 has reported that it's done with the 
> deletion on its side (and so forth).
> 
> One of the systems in question publishes data in Kafka topics for consumption 
> in other systems and part of the deletion process is to remove the relevant 
> personal data from these Kafka topics too. This has to happen in a relatively 
> short time after the deletion order is received, to prevent a long delay 
> before the systems further down the chain can start their own deletion. 
> Furthermore, we need to know when the operation is completed: only at that 
> point we can give the "go" to the other systems.
> 
> We are unsure how to satisfy those requirements in Kafka. If anyone has ideas 
> or suggestions we would be very interested in your opinion. We are also 
> interested in general about experiences in implementing GDPR compliance in 
> Kafka, especially when dealing with multiple, interconnected systems.
> 
> Kind regards,
> 
> -- 
> Christian Apolloni
> 
> Disclaimer: The contents of this email and any attachment thereto are 
> intended exclusively for the attention of the addressee(s). The email and any 
> such attachment(s) may contain information that is confidential and protected 
> on the strength of professional, official or business secrecy laws and 
> regulations or contractual obligations. Should you have received this email 
> by mistake, you may neither make use of nor divulge the contents of the email 
> or of any attachment thereto. In such a case, please inform the email's 
> sender and delete the message and all attachments without delay from your 
> systems.
> You can find our e-mail disclaimer statement in other languages under 
> http://www.baloise.ch/email_disclaimer


GDPR compliance

2020-08-19 Thread Apolloni, Christian
Hello,

I have some questions about implementing GDPR compliance in Kafka.

In our situation we have the requirement of removing personal data from in 
coordination with multiple systems. The idea is having a central "coordinator 
system" which triggers the deletion process for the individual systems in a 
specific, controlled sequence which takes into account the various system 
inter-dependencies and data flows. This means e.g. system nr. 2 will receive 
the delete order only after system nr. 1 has reported that it's done with the 
deletion on its side (and so forth).

One of the systems in question publishes data in Kafka topics for consumption 
in other systems and part of the deletion process is to remove the relevant 
personal data from these Kafka topics too. This has to happen in a relatively 
short time after the deletion order is received, to prevent a long delay before 
the systems further down the chain can start their own deletion. Furthermore, 
we need to know when the operation is completed: only at that point we can give 
the "go" to the other systems.

We are unsure how to satisfy those requirements in Kafka. If anyone has ideas 
or suggestions we would be very interested in your opinion. We are also 
interested in general about experiences in implementing GDPR compliance in 
Kafka, especially when dealing with multiple, interconnected systems.

Kind regards,

-- 
Christian Apolloni

Disclaimer: The contents of this email and any attachment thereto are intended 
exclusively for the attention of the addressee(s). The email and any such 
attachment(s) may contain information that is confidential and protected on the 
strength of professional, official or business secrecy laws and regulations or 
contractual obligations. Should you have received this email by mistake, you 
may neither make use of nor divulge the contents of the email or of any 
attachment thereto. In such a case, please inform the email's sender and delete 
the message and all attachments without delay from your systems.
You can find our e-mail disclaimer statement in other languages under 
http://www.baloise.ch/email_disclaimer


Re: [VOTE] KIP-657: Add Customized Kafka Streams Logo

2020-08-19 Thread Michael Noll
Hi all!

Great to see we are in the process of creating a cool logo for Kafka
Streams.  First, I apologize for sharing feedback so late -- I just learned
about it today. :-)

Here's my *personal, subjective* opinion on the currently two logo
candidates for Kafka Streams.

TL;DR: Sorry, but I really don't like either of the proposed "otter" logos.
Let me try to explain why.

   - The choice to use an animal, regardless of which specific animal,
   seems random and doesn't fit Kafka. (What's the purpose? To show that
   KStreams is 'cute'?) In comparison, the O’Reilly books always have an
   animal cover, that’s their style, and it is very recognizable.  Kafka
   however has its own, different style.  The Kafka logo has clear, simple
   lines to achieve an abstract and ‘techy’ look, which also alludes nicely to
   its architectural simplicity. Its logo is also a smart play on the
   Kafka-identifying letter “K” and alluding to it being a distributed system
   (the circles and links that make the K).
   - The proposed logos, however, make it appear as if KStreams is a
   third-party technology that was bolted onto Kafka. They certainly, for me,
   do not convey the message "Kafka Streams is an official part of Apache
   Kafka".
   - I, too, don't like the way the main Kafka logo is obscured (a concern
   already voiced in this thread). Also, the Kafka 'logo' embedded in the
   proposed KStreams logos is not the original one.
   - None of the proposed KStreams logos visually match the Kafka logo.
   They have a totally different style, font, line art, and color scheme.
   - Execution-wise, the main Kafka logo looks great at all sizes.  The
   style of the otter logos, in comparison, becomes undecipherable at smaller
   sizes.

What I would suggest is to first agree on what the KStreams logo is
supposed to convey to the reader.  Here's my personal take:

Objective 1: First and foremost, the KStreams logo should make it clear and
obvious that KStreams is an official and integral part of Apache Kafka.
This applies to both what is depicted and how it is depicted (like font,
line art, colors).
Objective 2: The logo should allude to the role of KStreams in the Kafka
project, which is the processing part.  That is, "doing something useful to
the data in Kafka".

The "circling arrow" aspect of the current otter logos does allude to
"continuous processing", which is going in the direction of (2), but the
logos do not meet (1) in my opinion.

-Michael




On Tue, Aug 18, 2020 at 10:34 PM Matthias J. Sax  wrote:

> Adding the user mailing list -- I think we should accepts votes on both
> lists for this special case, as it's not a technical decision.
>
> @Boyang: as mentioned by Bruno, can we maybe add black/white options for
> both proposals, too?
>
> I also agree that Design B is not ideal with regard to the Kafka logo.
> Would it be possible to change Design B accordingly?
>
> I am not a font expert, but the fonts in both design are different and I
> am wondering if there is an official Apache Kafka font that we should
> reuse to make sure that the logos align -- I would expect that both
> logos (including "Apache Kafka" and "Kafka Streams" names) will be used
> next to each other and it would look awkward if the font differs.
>
>
> -Matthias
>
> On 8/18/20 11:28 AM, Navinder Brar wrote:
> > Hi,
> > Thanks for the KIP, really like the idea. I am +1(non-binding) on A
> mainly because I felt like you have to tilt your head to realize the
> otter's head in B.
> > Regards,Navinder
> >
> > On Tuesday, 18 August, 2020, 11:44:20 pm IST, Guozhang Wang <
> wangg...@gmail.com> wrote:
> >
> >  I'm leaning towards design B primarily because it reminds me of the
> Firefox
> > logo which I like a lot. But I also share Adam's concern that it should
> > better not obscure the Kafka logo --- so if we can tweak a bit to fix it
> my
> > vote goes to B, otherwise A :)
> >
> >
> > Guozhang
> >
> > On Tue, Aug 18, 2020 at 9:48 AM Bruno Cadonna 
> wrote:
> >
> >> Thanks for the KIP!
> >>
> >> I am +1 (non-binding) for A.
> >>
> >> I would also like to hear opinions whether the logo should be colorized
> >> or just black and white.
> >>
> >> Best,
> >> Bruno
> >>
> >>
> >> On 15.08.20 16:05, Adam Bellemare wrote:
> >>> I prefer Design B, but given that I missed the discussion thread, I
> think
> >>> it would be better without the Otter obscuring any part of the Kafka
> >> logo.
> >>>
> >>> On Thu, Aug 13, 2020 at 6:31 PM Boyang Chen <
> reluctanthero...@gmail.com>
> >>> wrote:
> >>>
>  Hello everyone,
> 
>  I would like to start a vote thread for KIP-657:
> 
> 
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-657%3A+Add+Customized+Kafka+Streams+Logo
> 
>  This KIP is aiming to add a new logo for the Kafka Streams library.
> And
> >> we
>  prepared two candidates with a cute otter. You could look up the KIP
> to
>  find those logos.
> 
> 
>  Please post your vote against these two

Re: Kafka Streams Key-value store question

2020-08-19 Thread Bill Bejeck
Hi Pirow,

If I'm understanding your requirements correctly, I think using a global
store

will
work for you.

HTH,
Bill

On Wed, Aug 19, 2020 at 8:53 AM Pirow Engelbrecht <
pirow.engelbre...@etion.co.za> wrote:

> Hello,
>
>
>
> We’re building a JSON decorator using Kafka Streams’ processing API.
>
>
>
> The process is briefly that a piece of JSON should be consumed from an
> input topic (keys are null, value is the JSON). The JSON contains a field
> (e.g. “thisField”) with a value (e.g. “someLink”) . This value (and a
> timestamp) is used to look-up another piece JSON from a key-value topic
> (keys are all the different values of “thisField”, values are JSON). This
> key-value topic is created by another service in Kafka. This additional
> piece of JSON then gets appended to the input JSON and the result gets
> written to an output topic (keys are null, value is now the original JSON +
> lookup JSON).
>
>
>
> To do the query against a key-value store, ideally I want Kafka Streams to
> directly create and update a window key-value store in memory (or disk)
> from my key-value topic in Kafka, but I am unable to find a way to specify
> this through the StoreBuilder interface. Does anybody know how to do this?
>
> Here is my current Storebuilder code snippet:
>
> StoreBuilder> storeBuilder = Stores.
> windowStoreBuilder(
>
> Stores.persistentWindowStore("loopkupStore",
> Duration.ofDays(14600), Duration.ofDays(14600), false),
>
> Serdes.String(),
>
> Serdes.String());
>
> storeBuilder.build();
>
>
>
>
>
> Currently my workaround is to have a sink for the key-value store and then
> create/update this key-value store using a node in the processing topology,
> but this has issues when restarting the service, i.e. when the service is
> restarted, the key-value store topic needs to be consumed from the start to
> rebuild the store in memory, but the sink would have written commit offsets
> which prevents the topic to be consumed from the start. I also cannot use
> streams.cleanUp() as this will reset all the sinks in my topology (y other
> sink ingests records from the input topic).
>
>
>
> Thanks
>
>
>
> *Pirow Engelbrecht*
> System Engineer
>
> *E.* pirow.engelbre...@etion.co.za
> *T.* +27 12 678 9740 (ext. 9879)
> *M.* +27 63 148 3376
>
> 76 Regency Drive | Irene | Centurion | 0157
> 
> *www.etion.co.za *
>
> 
>
> Facebook
>  |
> YouTube  |
> LinkedIn  | Twitter
>  | Instagram
> 
>
>
>


Re: MirrorMaker 2.0 - Translating offsets for remote topics and consumer groups

2020-08-19 Thread Ryanne Dolan
Josh, yes it's possible to migrate the consumer group back to the source
topic, but you need to explicitly replicate the remote topic back to the
source cluster -- otherwise no checkpoints will flow "upstream":

A->B.topics=test1
B->A.topics=A.test1

After the first checkpoint is emitted upstream,
RemoteClusterUtils.translateOffsets() will translate B's A.test1 offsets
into A's test1 offsets for you.

Ryanne

On Tue, Aug 18, 2020 at 5:56 PM Josh C  wrote:

> Hi there,
>
> I'm currently exploring MM2 and having some trouble with the
> RemoteClusterUtils.translateOffsets() method. I have been successful in
> migrating a consumer group from the source cluster to the target cluster,
> but was wondering how I could migrate this consumer group back to the
> original source topic?
>
> It is my understanding that there isn't any checkpoint data being
> emitted for this consumer group since it is consuming from a mirrored topic
> in the target cluster. I'm currently getting an empty map since there isn't
> any checkpoint data for 'target.checkpoints.internal' in the source
> cluster. So, I was wondering how would I get these new translated offsets
> to migrate the consumer group back to the source cluster?
>
> Please let me know if my question was unclear or if you require further
> clarification! Appreciate the help.
>
> Thanks,
> Josh
>


Kafka Streams Key-value store question

2020-08-19 Thread Pirow Engelbrecht
Hello,

We're building a JSON decorator using Kafka Streams' processing API.

The process is briefly that a piece of JSON should be consumed from an input 
topic (keys are null, value is the JSON). The JSON contains a field (e.g. 
"thisField") with a value (e.g. "someLink") . This value (and a timestamp) is 
used to look-up another piece JSON from a key-value topic (keys are all the 
different values of "thisField", values are JSON). This key-value topic is 
created by another service in Kafka. This additional piece of JSON then gets 
appended to the input JSON and the result gets written to an output topic (keys 
are null, value is now the original JSON + lookup JSON).

To do the query against a key-value store, ideally I want Kafka Streams to 
directly create and update a window key-value store in memory (or disk) from my 
key-value topic in Kafka, but I am unable to find a way to specify this through 
the StoreBuilder interface. Does anybody know how to do this?
Here is my current Storebuilder code snippet:
StoreBuilder> storeBuilder = 
Stores.windowStoreBuilder(
Stores.persistentWindowStore("loopkupStore", 
Duration.ofDays(14600), Duration.ofDays(14600), false),
Serdes.String(),
Serdes.String());
storeBuilder.build();


Currently my workaround is to have a sink for the key-value store and then 
create/update this key-value store using a node in the processing topology, but 
this has issues when restarting the service, i.e. when the service is 
restarted, the key-value store topic needs to be consumed from the start to 
rebuild the store in memory, but the sink would have written commit offsets 
which prevents the topic to be consumed from the start. I also cannot use 
streams.cleanUp() as this will reset all the sinks in my topology (y other sink 
ingests records from the input topic).

Thanks

Pirow Engelbrecht
System Engineer

E. 
pirow.engelbre...@etion.co.za
T. +27 12 678 9740 (ext. 9879)
M. +27 63 148 3376

76 Regency Drive | Irene | Centurion | 0157
www.etion.co.za


[cid:image001.jpg@01D67637.6A057630]


Facebook | 
YouTube | 
LinkedIn | 
Twitter | 
Instagram




Re: Kafka BrokerState Metric Value 3

2020-08-19 Thread Karolis Pocius
Note that even when all partitions are in sync, leader election might have
not happened yet and the broker isn't serving anything. Which might be OK,
depending on your actual use case.

On Wed, Aug 19, 2020 at 11:40 AM Dhirendra Singh 
wrote:

> Thank you Peter !
> I intended to use broker state to determine the health but i was not sure.
> I will use under replicated partition metric instead.
>
> --dsingh
>
> On Wed, Aug 19, 2020 at 1:40 PM Peter Bukowinski  wrote:
>
> > The broker state metric just reports on the state of the broker itself,
> > not whether it is in sync. A replacement broker will quickly reach a
> broker
> > state of 3 on startup even though it has to catch up on many replicas.
> > Don’t rely on it for checking if a cluster/broker is healthy with no
> > under-replicated partitions.
> >
> > For that, you can look at the underreplicated partition count metric.
> >
> > -- Peter (from phone)
> >
> > > On Aug 19, 2020, at 12:52 AM, Dhirendra Singh 
> > wrote:
> > >
> > > So is this metric just gives information that broker process up and
> > running
> > > ? or does it indicate something more of broker state or partitions it
> > hold ?
> > >
> > >
> > >
> > >> On Mon, Aug 17, 2020 at 6:17 PM Karolis Pocius
> > >>  wrote:
> > >>
> > >> I tried using this metric for determining when the broker is back in
> the
> > >> cluster and became the leader for partitions it owned before restart,
> > but
> > >> that's not the case.
> > >>
> > >> In the end I've settled for checking
> > >> kafka.server:name=LeaderCount,type=ReplicaManager which tells me when
> > the
> > >> broker is actually operational and serving data.
> > >>
> > >> On Mon, Aug 17, 2020 at 3:29 PM Dhirendra Singh <
> dhirendr...@gmail.com>
> > >> wrote:
> > >>
> > >>> I have a question regarding Kafka BrokerState Metric value 3.
> According
> > >> to
> > >>> the documentation value 3 means running state.
> > >>> What does this running state mean for the broker? Does it mean data
> of
> > >> all
> > >>> partitions on this broker is in sync ?
> > >>> Is it safe to assume that when broker transition to state 3 after
> > restart
> > >>> it recovered all partitions data from the leader and is in sync with
> > the
> > >>> leaders ?
> > >>>
> > >>> Thanks,
> > >>> dsingh
> > >>>
> > >>
> >
>


Re: Kafka BrokerState Metric Value 3

2020-08-19 Thread Dhirendra Singh
Thank you Peter !
I intended to use broker state to determine the health but i was not sure.
I will use under replicated partition metric instead.

--dsingh

On Wed, Aug 19, 2020 at 1:40 PM Peter Bukowinski  wrote:

> The broker state metric just reports on the state of the broker itself,
> not whether it is in sync. A replacement broker will quickly reach a broker
> state of 3 on startup even though it has to catch up on many replicas.
> Don’t rely on it for checking if a cluster/broker is healthy with no
> under-replicated partitions.
>
> For that, you can look at the underreplicated partition count metric.
>
> -- Peter (from phone)
>
> > On Aug 19, 2020, at 12:52 AM, Dhirendra Singh 
> wrote:
> >
> > So is this metric just gives information that broker process up and
> running
> > ? or does it indicate something more of broker state or partitions it
> hold ?
> >
> >
> >
> >> On Mon, Aug 17, 2020 at 6:17 PM Karolis Pocius
> >>  wrote:
> >>
> >> I tried using this metric for determining when the broker is back in the
> >> cluster and became the leader for partitions it owned before restart,
> but
> >> that's not the case.
> >>
> >> In the end I've settled for checking
> >> kafka.server:name=LeaderCount,type=ReplicaManager which tells me when
> the
> >> broker is actually operational and serving data.
> >>
> >> On Mon, Aug 17, 2020 at 3:29 PM Dhirendra Singh 
> >> wrote:
> >>
> >>> I have a question regarding Kafka BrokerState Metric value 3. According
> >> to
> >>> the documentation value 3 means running state.
> >>> What does this running state mean for the broker? Does it mean data of
> >> all
> >>> partitions on this broker is in sync ?
> >>> Is it safe to assume that when broker transition to state 3 after
> restart
> >>> it recovered all partitions data from the leader and is in sync with
> the
> >>> leaders ?
> >>>
> >>> Thanks,
> >>> dsingh
> >>>
> >>
>


MirrorMaker 2 WorkerSourceTask Failed to flush error messages

2020-08-19 Thread Iftach Ben-Yosef
Hello,
I'm seeing large lag sometimes on my MM2 clusters after restarting the
cluster (it runs on k8s).
I have 3 mm2 clusters, each one reads from 1 source and writes to the same
destination.
I am seeing these errors on one of my clusters right now.

WorkerSourceTask{id=MirrorSourceConnector-33} Failed to flush, timed out
while waiting for producer to flush outstanding 1948 messages
WorkerSourceTask{id=MirrorSourceConnector-33} Failed to commit offsets

The same cluster which has these errors is also lagging greatly (lag is
slowly going down, for other clusters they quickly recovered from lag post
update)

I saw some discussion on SO regarding similar issues but it was not
specific for MM2. The suggestions were



   - either increase offset.flush.timeout.ms configuration parameter in
   your Kafka Connect Worker Configs
   - or you can reduce the amount of data being buffered by decreasing
   producer.buffer.memory in your Kafka Connect Worker Configs. This turns
   to be the best option when you have fairly large messages.

How do I implement these configs into my mm2 config if that's possible, or
even relevant? Has anyone faced similar behaviour?

Thanks,
Iftach

-- 
The above terms reflect a potential business arrangement, are provided 
solely as a basis for further discussion, and are not intended to be and do 
not constitute a legally binding obligation. No legally binding obligations 
will be created, implied, or inferred until an agreement in final form is 
executed in writing by all parties involved.


This email and any 
attachments hereto may be confidential or privileged.  If you received this 
communication by mistake, please don't forward it to anyone else, please 
erase all copies and attachments, and please let me know that it has gone 
to the wrong person. Thanks.


Re: Kafka BrokerState Metric Value 3

2020-08-19 Thread Peter Bukowinski
The broker state metric just reports on the state of the broker itself, not 
whether it is in sync. A replacement broker will quickly reach a broker state 
of 3 on startup even though it has to catch up on many replicas. Don’t rely on 
it for checking if a cluster/broker is healthy with no under-replicated 
partitions.

For that, you can look at the underreplicated partition count metric.

-- Peter (from phone)

> On Aug 19, 2020, at 12:52 AM, Dhirendra Singh  wrote:
> 
> So is this metric just gives information that broker process up and running
> ? or does it indicate something more of broker state or partitions it hold ?
> 
> 
> 
>> On Mon, Aug 17, 2020 at 6:17 PM Karolis Pocius
>>  wrote:
>> 
>> I tried using this metric for determining when the broker is back in the
>> cluster and became the leader for partitions it owned before restart, but
>> that's not the case.
>> 
>> In the end I've settled for checking
>> kafka.server:name=LeaderCount,type=ReplicaManager which tells me when the
>> broker is actually operational and serving data.
>> 
>> On Mon, Aug 17, 2020 at 3:29 PM Dhirendra Singh 
>> wrote:
>> 
>>> I have a question regarding Kafka BrokerState Metric value 3. According
>> to
>>> the documentation value 3 means running state.
>>> What does this running state mean for the broker? Does it mean data of
>> all
>>> partitions on this broker is in sync ?
>>> Is it safe to assume that when broker transition to state 3 after restart
>>> it recovered all partitions data from the leader and is in sync with the
>>> leaders ?
>>> 
>>> Thanks,
>>> dsingh
>>> 
>> 


Re: Kafka BrokerState Metric Value 3

2020-08-19 Thread Dhirendra Singh
So is this metric just gives information that broker process up and running
? or does it indicate something more of broker state or partitions it hold ?



On Mon, Aug 17, 2020 at 6:17 PM Karolis Pocius
 wrote:

> I tried using this metric for determining when the broker is back in the
> cluster and became the leader for partitions it owned before restart, but
> that's not the case.
>
> In the end I've settled for checking
> kafka.server:name=LeaderCount,type=ReplicaManager which tells me when the
> broker is actually operational and serving data.
>
> On Mon, Aug 17, 2020 at 3:29 PM Dhirendra Singh 
> wrote:
>
> > I have a question regarding Kafka BrokerState Metric value 3. According
> to
> > the documentation value 3 means running state.
> > What does this running state mean for the broker? Does it mean data of
> all
> > partitions on this broker is in sync ?
> > Is it safe to assume that when broker transition to state 3 after restart
> > it recovered all partitions data from the leader and is in sync with the
> > leaders ?
> >
> > Thanks,
> > dsingh
> >
>