Re: [VOTE] KIP-616: Rename implicit Serdes instances in kafka-streams-scala

2020-08-09 Thread William Reynolds
Looks good,
+1 (non binding)


*William Reynolds**Technical Operations Engineer*


<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.

Instaclustr values your privacy. Our privacy policy can be found at
https://www.instaclustr.com/company/policies/privacy-policy


On Mon, 10 Aug 2020 at 13:01, Yuriy Badalyantc  wrote:

> Hi everybody.
>
> Just bumping this thread. This is a pretty minor change only for the Scala
> API and it's pending in the voting state for a while.
>
> -Yuriy
>
> On Fri, Aug 7, 2020 at 8:10 AM Yuriy Badalyantc  wrote:
>
> > Hi everybody.
> >
> > There was some minor change since the voting process started (nullSerde
> > added). Let's continue to vote.
> >
> > -Yuriy.
> >
> > On Thu, Jul 9, 2020 at 10:00 PM John Roesler 
> wrote:
> >
> >> Thanks Yuriy,
> >>
> >> I'm +1 (binding)
> >>
> >> -John
> >>
> >> On Wed, Jul 8, 2020, at 23:08, Yuriy Badalyantc wrote:
> >> > Hi everybody
> >> >
> >> > I would like to start a vote  for KIP-616:
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-616%3A+Rename+implicit+Serdes+instances+in+kafka-streams-scala
> >> >
> >> > This KIP fixes name clash in the
> org.apache.kafka.streams.scala.Serdes.
> >> >
> >> > -Yuriy
> >> >
> >>
> >
>


Re: [VOTE] KIP-632: Add DirectoryConfigProvider

2020-08-06 Thread William Reynolds
+1 (non binding) Looks like a good addition!

On 06/08/2020, Tom Bentley  wrote:
> This pretty minor change has 2 binding and 1 non-binding votes. It would be
> great if more people could take a look and either vote or give feedback
> about how it should be improved.
>
> Many thanks,
>
> Tom
>
> On Wed, Jul 8, 2020 at 7:07 PM Mickael Maison 
> wrote:
>
>> +1 (binding)
>> Thanks
>>
>> On Wed, Jul 8, 2020 at 11:31 AM Manikumar 
>> wrote:
>> >
>> > +1 (bindig)
>> >
>> > Thanks for the KIP.
>> >
>> > On Tue, Jul 7, 2020 at 10:30 PM David Jacot 
>> > wrote:
>> >
>> > > +1 (non-binding). Thanks for the KIP!
>> > >
>> > > On Tue, Jul 7, 2020 at 12:54 PM Tom Bentley 
>> wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > I'd like to start a vote on KIP-632, which is about making the
>> > > > config
>> > > > provider mechanism more ergonomic on Kubernetes:
>> > > >
>> > > >
>> > > >
>> > >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-632%3A+Add+DirectoryConfigProvider
>> > > >
>> > > > Please take a look if you have time.
>> > > >
>> > > > Many thanks,
>> > > >
>> > > > Tom
>> > > >
>> > >
>>
>>
>


-- 



*William Reynolds**Technical Operations Engineer*


<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.

Instaclustr values your privacy. Our privacy policy can be found at
https://www.instaclustr.com/company/policies/privacy-policy


[jira] [Created] (KAFKA-10107) Producer snapshots LSO used in certain situations which can lead to data loss on compacted topics as LSO breach occurs and early offsets cleaned

2020-06-05 Thread William Reynolds (Jira)
William Reynolds created KAFKA-10107:


 Summary: Producer snapshots LSO used in certain situations which 
can lead to data loss on compacted topics as LSO breach occurs and early 
offsets cleaned
 Key: KAFKA-10107
 URL: https://issues.apache.org/jira/browse/KAFKA-10107
 Project: Kafka
  Issue Type: Bug
  Components: core, log cleaner
Affects Versions: 2.4.1
Reporter: William Reynolds


While upgading a 1.1.0 cluster to 2.4.1 and also adding an interbroker port 
using SSL we ran into a situation where producer snapshot offsets get set as 
the log start offset then logs truncate to nothing across 2 relatively unsafe 
restarts.

 

Here is the timeline of what we did to trigger this

Broker 40 is shutdown as first to go to 2.4.1 and switch to interbroker port 
9094.
 As it shuts down it writes producer snapshots
 Broker 40 starts on 2.4.1, loads the snapshots then compares checkpointed 
offsets to log start offset and finds them to be invalid (exact reason unknown 
but looks to be producer snapshot load related)
 On broker 40 all topics show an offset reset like this 2020-05-18 
15:22:21,106] WARN Resetting first dirty offset of topic-name-60 to log start 
offset 6009368 since the checkpointed offset 5952382 is invalid. 
(kafka.log.LogCleanerManager$)" which then triggers log cleanup on broker 40 
for all these topics which is where the data is lost
 At this point only partitions led by broker 40 have lost data and would be 
failing for client lookups on older data but this can't spread as 40 has 
interbroker port 9094 and brokers 50 and 60 have interbroker port 9092
 I stop start brokers 50 and 60 in quick succession to take them to 2.4.1 and 
onto the new interbroker port 9094
 This leaves broker 40 as the in sync replica for all but a couple of 
partitions which aren't on 40 at all shown in the attached image
 Brokers 50 and 60 start and then take their start offset from leader (or if 
there was no leader pulls from recovery on returning broker 50 or 60) and so 
all the replicas also clean logs to remove data to catch up to broker 40 as 
that is the in sync replica
 Then I shutdown 40 and 50 leading to 60 leading all partitions it holds and 
then we see this happen across all of those partitions
 "May 18, 2020 @ 15:48:28.252",hostname-1,30438,apache-kafka:2.4.1,"[2020-05-18 
15:48:28,251] INFO [Log partition=topic-name-60, dir=/kafka-topic-data] Loading 
producer state till offset 0 with message format version 2 (kafka.log.Log)" 
 "May 18, 2020 @ 15:48:28.252",hostname-1,30438,apache-kafka:2.4.1,"[2020-05-18 
15:48:28,252] INFO [Log partition=topic-name-60, dir=/kafka-topic-data] 
Completed load of log with 1 segments, log start offset 0 and log end offset 0 
in 2 ms (kafka.log.Log)"
 "May 18, 2020 @ 15:48:45.883",hostname,7805,apache-kafka:2.4.1,"[2020-05-18 
15:48:45,883] WARN [ReplicaFetcher replicaId=50, leaderId=60, fetcherId=0] 
Leader or replica is on protocol version where leader epoch is not considered 
in the OffsetsForLeaderEpoch response. The leader's offset 0 will be used for 
truncation in topic-name-60. (kafka.server.ReplicaFetcherThread)" 
 "May 18, 2020 @ 15:48:45.883",hostname,7805,apache-kafka:2.4.1,"[2020-05-18 
15:48:45,883] INFO [Log partition=topic-name-60, dir=/kafka-topic-data] 
Truncating to offset 0 (kafka.log.Log)"

 

I believe the truncation has always been a problem but recent 
https://issues.apache.org/jira/browse/KAFKA-6266 fix allowed truncation to 
actually happen where it wouldn't have before. 
 The producer snapshots setting as log start offset is a mystery to me so any 
light you could shed on why that yhappened and how to avoid would be great.

 

I am sanitising full logs and will upload here soon



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10105) Regression in group coordinator dealing with flaky clients joining while leaving

2020-06-04 Thread William Reynolds (Jira)
William Reynolds created KAFKA-10105:


 Summary: Regression in group coordinator dealing with flaky 
clients joining while leaving
 Key: KAFKA-10105
 URL: https://issues.apache.org/jira/browse/KAFKA-10105
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 2.4.1
 Environment: Kafka 1.1.0 on jre 8 on debian 9 in docker
Kafka 2.4.1 on jre 11 on debian 9 in docker
Reporter: William Reynolds


Since upgrade of a cluster from 1.1.0 to 2.4.1 the broker no longer deals 
correctly with a consumer sending a join after a leave correctly.

What happens no is that if a consumer sends a leaving then follows up by trying 
to send a join again as it is shutting down the group coordinator adds the 
leaving member to the group but never seems to heartbeat that member.

Since the consumer is then gone when it joins again after starting it is added 
as a new member but the zombie member is there and is included in the partition 
assignment which means that those partitions never get consumed from. What can 
also happen is that one of the zombies gets group leader so rebalance gets 
stuck forever and the group is entirely blocked.

I have not been able to track down where this got introduced between 1.1.0 and 
2.4.1 but I will look further into this. Unfortunately the logs are essentially 
silent about the zombie mebers and I only had INFO level logging on during the 
issue and by stopping all the consumers in the group and restarting the broker 
coordinating that group we could get back to a working state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)