Jenkins build is unstable: Kafka » Kafka Branch Builder » trunk #1557

2023-02-03 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2023-02-03 Thread Igor Soarez
Hi Jun,

Thank you for your comments and questions.

30. Thank you for pointing this out. The isNew flag is not available
in KRaft mode. The broker can consider the metadata records:
If, and only if, the logdir assigned is Uuid.ZERO then the replica can
be considered new.

Being able to determine if a replica "isNew" is important to prevent
the remaining logdirs from filling up logdirs when some of them become
offline by re-creating replicas that already exist in the offline logdirs.
So the broker will refuse to create logs that are not new if there are
any offline logdirs.

If a logdir is removed from configuration, the controller will detect
this change upon broker registration and reset all partitions assigned
to the removed logdirs to Uuid.ZERO. In this case, it is OK for the
broker to assume that the partitions are new because they do not exist in
any _configured_ online or offline logdir, and the intended behavior is
to re-create them in one of the online logdirs anyway.

I have updated the KIP to make it clear broker decisions are based
on the metadata, and not on this flag.


31. I don't think I understand the question.
Why do we need to assign the same UUID?

A logdir may be replaced with a disk by replacing its configured path
with the new disk mount path under the `log.dirs` property.
While the broker was offline, the operator might have copied the contents
of the old logdir to the disk, or not.

If contents were copied over, then so was the logdir's meta.properties,
along with the UUID, in which case no change is necessary. The broker will
load all configured logdir paths, all existing meta.properties, and verify
that the full set of UUIDs is still congruent across all meta.properties
files. Neither broker or controller will know that something has changed,
and neither of them needs to. All partition assignments are still correct.
The mapping of UUID to logdir is determined by the meta.propeties
under that same logdir.

If the contents were not copied then this is assumed to be a new
and empty logdir. It should get a different UUID. When the broker loads
all meta.properties it will verify that one is missing for the new disk and
create it, generating a new UUID. It will also update the full set of UUIDs
listed in any other meta.properties files. On the broker registration
request the controller will notice a new UUID being registered, but also
notice a UUID missing.
Any topic partitions assinged to the now missing logdir UUID will be
updated to relate to UUID.Zero, so that the broker can place them in the
most suitable logdir - which is likely to be the new and empty one.


32. You are correct, the HeartBeat request should convey the failure
and the broker shouldn't need to send a AssignReplicasToDirs request.

The bit preceding that quote is important:
  "If the partition is assigned to an online log directory"
In this case the broker finds that the metadata indicates that a non-new
replica is assigned to an online logdir in the metadata but this replica
cannot actually be found in any online logdir.
So we want to tell the controller that the metadata is wrong, and that
the replica is actually offline.

This is a defensive design option.

In a scenario where for some reason the broker can see that the metadata
is incorrect about the logdir assignment of replica that existed in the
failed logdir, it is better to correct and recover than to allow the
problem to persist.

Ignoring the error could mean that the partition stays offline. If the
controller is only told about the UUID of the failure logdir, it won't
be able to determine that a leadership and ISR update is required for
any replica with an incorrect logdir assignment.

An alternative – when facing this unlikely failure scenario – would be
for the broker to error and exit, which would be more disruptive.


33. Correct. I should've made that clear. Updated.


34. No. It shouldn't be a large request, and it should only happen rarely.
This relates to point 32.
When a logdir fails, that failure is communicated to the controller by
indicating the logdir UUID in the heartbeat request. The controller
can determine that _the partitions assigned to that logdir UUID_
are now offline. But, if there are any partitions that were in that logdir
and do not have that same logdir UUID assigned to it in the cluster metadata
then the broker needs to signal that these are also offline, as the
controller will not be able to determine that without the assignment.

We expect each broker to proactively instruct the controller to keep the
metadata correct about the logdir assignment for each replica, so
situations where the metadata is wrong should be rare, and when they
happen only a small number of replicas should be affected. Hence this
should be both a small and rare request.


35. Hmm, I could not find the string "AlterReplicaDirRequest"
in the source:
  https://github.com/apache/kafka/search?q=AlterReplicaDirRequest

I'm referring to this API key:
  client

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2023-02-03 Thread Igor Soarez
Hi Tom,

Thank you for having another look.


20. That is a good point.

Thinking about your suggestion:
How would this look like in a non-JBOD KRraft cluster upgrade to JBOD mode?

Upgrading to version that includes the JBOD support patch would automatically
update meta.properties to include the new fields. A subsequent downgrade
might be desirable by the operator for whatever reason. The same problem
applies but here there is no future event at which version=2 could
later be set.

Alternatively, we could either:
 - Simply not change the version at all since older broker versions will ignore
   the new fields, and newer versions will add them when missing.
 - Introduce a new, separate file that would live along meta.properties.

WDYT?


Best,

--
Igor



[DISCUSS] KIP-903: Replicas with stale broker epoch should not be allowed to join the ISR

2023-02-03 Thread Calvin Liu
Hi everyone,
I'd like to discuss the fix for the broker reboot data loss KAFKA-14139
.
It changes the Fetch and AlterPartition requests to include the broker
epochs. Then the controller can use these epochs to help reject the stale
AlterPartition request.
Please take a look. Thanks!
https://cwiki.apache.org/confluence/display/KAFKA/KIP-903%3A+Replicas+with+stale+broker+epoch+should+not+be+allowed+to+join+the+ISR


Re: [VOTE] 3.4.0 RC2

2023-02-03 Thread Federico Valeri
+1 (non binding)

- Ran the unit and integration test suites with Java 17 and Scala 2.13
- Ran a series of basic examples and client configurations
- Spot checked the docs and Javadocs

Thanks
Fede

On Fri, Feb 3, 2023 at 5:29 PM Jakub Scholz  wrote:
>
> +1 (non-binding). I run my tests with the staged Scala 2.13 binaries and
> staged Maven artifacts. All seems to work fine.
>
> Thanks & Regards
> Jakub
>
> On Tue, Jan 31, 2023 at 8:01 PM David Arthur  wrote:
>
> > Hey folks, we found a couple of blockers with RC1 and have fixed them in
> > the latest release candidate, RC2.
> >
> > The major features of this release include:
> >
> > * KIP-881: Rack-aware Partition Assignment for Kafka Consumers
> > <
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-881%3A+Rack-aware+Partition+Assignment+for+Kafka+Consumers
> > >
> >
> > * KIP-876: Time based cluster metadata snapshots
> > <
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-876%3A+Time+based+cluster+metadata+snapshots
> > >
> >
> > * KIP-787: MM2 manage Kafka resources with custom Admin implementation.
> > <
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=191335620
> > >
> >
> > * KIP-866 ZooKeeper to KRaft Migration
> > <
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration
> > >
> > (Early
> > Access)
> >
> >
> >
> > Release notes for the 3.4.0 release:
> >
> > https://home.apache.org/~davidarthur/kafka-3.4.0-rc2/RELEASE_NOTES.html
> >
> >
> > Please download, test and vote by Friday, February 3, 5pm PT
> >
> >
> > ---
> >
> >
> > Kafka's KEYS file containing PGP keys we use to sign the release:
> >
> > https://kafka.apache.org/KEYS
> >
> >
> > * Release artifacts to be voted upon (source and binary):
> >
> > https://home.apache.org/~davidarthur/kafka-3.4.0-rc2/
> >
> >
> > * Maven artifacts to be voted upon:
> >
> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> >
> >
> > * Javadoc:
> >
> > https://home.apache.org/~davidarthur/kafka-3.4.0-rc2/javadoc/
> >
> >
> > * Tag to be voted upon (off 3.4 branch) is the 3.4.0 tag:
> >
> > https://github.com/apache/kafka/releases/tag/3.4.0-rc2
> >
> >
> > * Documentation:
> >
> > https://kafka.apache.org/34/documentation.html
> >
> >
> > * Protocol:
> >
> > https://kafka.apache.org/34/protocol.html
> >
> >
> > ---
> >
> >
> > Test results:
> >
> >
> > We haven't had a 100% passing build, but the latest system test run looks
> > pretty good:
> >
> > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/3.4/2023-01-31--001.system-test-kafka-3.4--1675184554--confluentinc--3.4--ef3f5bd834/report.html
> >
> >
> > Here are the Jenkins test runs for 3.4:
> > https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.4/. We will
> > continue
> > trying to diagnose the flaky test failures as the release continues. I do
> > not expect that any of these test failures are blockers for the release.
> >
> >
> > Thanks!
> >
> > David Arthur
> >


Re: [VOTE] 3.4.0 RC2

2023-02-03 Thread Mickael Maison
+1 (binding)

- verified checksums/signatures
- built from source
- ran quickstart with zookeeper and kraft with 2.13 binaries

Thanks for running this release

Mickael

On Fri, Feb 3, 2023 at 5:31 PM Jakub Scholz  wrote:
>
> +1 (non-binding). I run my tests with the staged Scala 2.13 binaries and
> staged Maven artifacts. All seems to work fine.
>
> Thanks & Regards
> Jakub
>
> On Tue, Jan 31, 2023 at 8:01 PM David Arthur  wrote:
>
> > Hey folks, we found a couple of blockers with RC1 and have fixed them in
> > the latest release candidate, RC2.
> >
> > The major features of this release include:
> >
> > * KIP-881: Rack-aware Partition Assignment for Kafka Consumers
> > <
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-881%3A+Rack-aware+Partition+Assignment+for+Kafka+Consumers
> > >
> >
> > * KIP-876: Time based cluster metadata snapshots
> > <
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-876%3A+Time+based+cluster+metadata+snapshots
> > >
> >
> > * KIP-787: MM2 manage Kafka resources with custom Admin implementation.
> > <
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=191335620
> > >
> >
> > * KIP-866 ZooKeeper to KRaft Migration
> > <
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration
> > >
> > (Early
> > Access)
> >
> >
> >
> > Release notes for the 3.4.0 release:
> >
> > https://home.apache.org/~davidarthur/kafka-3.4.0-rc2/RELEASE_NOTES.html
> >
> >
> > Please download, test and vote by Friday, February 3, 5pm PT
> >
> >
> > ---
> >
> >
> > Kafka's KEYS file containing PGP keys we use to sign the release:
> >
> > https://kafka.apache.org/KEYS
> >
> >
> > * Release artifacts to be voted upon (source and binary):
> >
> > https://home.apache.org/~davidarthur/kafka-3.4.0-rc2/
> >
> >
> > * Maven artifacts to be voted upon:
> >
> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> >
> >
> > * Javadoc:
> >
> > https://home.apache.org/~davidarthur/kafka-3.4.0-rc2/javadoc/
> >
> >
> > * Tag to be voted upon (off 3.4 branch) is the 3.4.0 tag:
> >
> > https://github.com/apache/kafka/releases/tag/3.4.0-rc2
> >
> >
> > * Documentation:
> >
> > https://kafka.apache.org/34/documentation.html
> >
> >
> > * Protocol:
> >
> > https://kafka.apache.org/34/protocol.html
> >
> >
> > ---
> >
> >
> > Test results:
> >
> >
> > We haven't had a 100% passing build, but the latest system test run looks
> > pretty good:
> >
> > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/3.4/2023-01-31--001.system-test-kafka-3.4--1675184554--confluentinc--3.4--ef3f5bd834/report.html
> >
> >
> > Here are the Jenkins test runs for 3.4:
> > https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.4/. We will
> > continue
> > trying to diagnose the flaky test failures as the release continues. I do
> > not expect that any of these test failures are blockers for the release.
> >
> >
> > Thanks!
> >
> > David Arthur
> >


Re: [VOTE] KIP-890: Transactions Server Side Defense

2023-02-03 Thread Justine Olshan
Thanks everyone! I'm going to close the vote.
The KIP is accepted with five binding votes from Jason, Guozhang, Matthias,
David (and me), and two non-binding votes from Colt and Artem.

Thanks again,
Justine

On Thu, Feb 2, 2023 at 11:41 PM David Jacot 
wrote:

> Thanks for the KIP, Justine. +1 (binding)
>
> On Fri, Feb 3, 2023 at 1:36 AM Matthias J. Sax  wrote:
>
> > Thanks for the KIP!
> >
> > +1 (binding)
> >
> >
> > On 2/2/23 4:18 PM, Artem Livshits wrote:
> > > (non-binding) +1.  Looking forward to the implementation and fixing the
> > > issues that we've got.
> > >
> > > -Artem
> > >
> > > On Mon, Jan 23, 2023 at 2:25 PM Guozhang Wang <
> > guozhang.wang...@gmail.com>
> > > wrote:
> > >
> > >> Thanks Justine, I have no further comments on the KIP. +1.
> > >>
> > >> On Tue, Jan 17, 2023 at 10:34 AM Jason Gustafson
> > >>  wrote:
> > >>>
> > >>> +1. Thanks Justine!
> > >>>
> > >>> -Jason
> > >>>
> > >>> On Tue, Jan 10, 2023 at 3:46 PM Colt McNealy 
> > >> wrote:
> > >>>
> >  (non-binding) +1. Thank you for the KIP, Justine! I've read it; it
> > >> makes
> >  sense to me and I am excited for the implementation.
> > 
> >  Colt McNealy
> >  *Founder, LittleHorse.io*
> > 
> > 
> >  On Tue, Jan 10, 2023 at 10:46 AM Justine Olshan
> >   wrote:
> > 
> > > Hi everyone,
> > >
> > > I would like to start a vote on KIP-890 which aims to prevent some
> > >> of the
> > > common causes of hanging transactions and make other general
> > >> improvements
> > > to transactions in Kafka.
> > >
> > >
> > >
> > 
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-890%3A+Transactions+Server-Side+Defense
> > >
> > > Please take a look if you haven't already and vote!
> > >
> > > Justine
> > >
> > 
> > >>
> > >
> >
>


Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #1556

2023-02-03 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 442290 lines...]
[2023-02-03T16:37:23.090Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 170 > KafkaZkClientTest > testCreateTopLevelPaths() STARTED
[2023-02-03T16:37:23.090Z] 
[2023-02-03T16:37:23.090Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 170 > KafkaZkClientTest > testCreateTopLevelPaths() PASSED
[2023-02-03T16:37:23.090Z] 
[2023-02-03T16:37:23.090Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 170 > KafkaZkClientTest > 
testGetAllTopicsInClusterDoesNotTriggerWatch() STARTED
[2023-02-03T16:37:24.036Z] 
[2023-02-03T16:37:24.036Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 170 > KafkaZkClientTest > 
testGetAllTopicsInClusterDoesNotTriggerWatch() PASSED
[2023-02-03T16:37:24.036Z] 
[2023-02-03T16:37:24.036Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 170 > KafkaZkClientTest > testIsrChangeNotificationGetters() STARTED
[2023-02-03T16:37:24.036Z] 
[2023-02-03T16:37:24.036Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 170 > KafkaZkClientTest > testIsrChangeNotificationGetters() PASSED
[2023-02-03T16:37:24.036Z] 
[2023-02-03T16:37:24.037Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 170 > KafkaZkClientTest > testLogDirEventNotificationsDeletion() 
STARTED
[2023-02-03T16:37:24.037Z] 
[2023-02-03T16:37:24.037Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 170 > KafkaZkClientTest > testLogDirEventNotificationsDeletion() PASSED
[2023-02-03T16:37:24.037Z] 
[2023-02-03T16:37:24.037Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 170 > KafkaZkClientTest > testGetLogConfigs() STARTED
[2023-02-03T16:37:25.079Z] 
[2023-02-03T16:37:25.079Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 170 > KafkaZkClientTest > testGetLogConfigs() PASSED
[2023-02-03T16:37:25.079Z] 
[2023-02-03T16:37:25.079Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 170 > KafkaZkClientTest > testBrokerSequenceIdMethods() STARTED
[2023-02-03T16:37:25.079Z] 
[2023-02-03T16:37:25.079Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 170 > KafkaZkClientTest > testBrokerSequenceIdMethods() PASSED
[2023-02-03T16:37:25.079Z] 
[2023-02-03T16:37:25.079Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 170 > KafkaZkClientTest > testAclMethods() STARTED
[2023-02-03T16:37:25.079Z] 
[2023-02-03T16:37:25.079Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 170 > KafkaZkClientTest > testAclMethods() PASSED
[2023-02-03T16:37:25.079Z] 
[2023-02-03T16:37:25.079Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 170 > KafkaZkClientTest > testCreateSequentialPersistentPath() STARTED
[2023-02-03T16:37:26.319Z] 
[2023-02-03T16:37:26.319Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 170 > KafkaZkClientTest > testCreateSequentialPersistentPath() PASSED
[2023-02-03T16:37:26.319Z] 
[2023-02-03T16:37:26.319Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 170 > KafkaZkClientTest > testConditionalUpdatePath() STARTED
[2023-02-03T16:37:26.319Z] 
[2023-02-03T16:37:26.319Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 170 > KafkaZkClientTest > testConditionalUpdatePath() PASSED
[2023-02-03T16:37:26.319Z] 
[2023-02-03T16:37:26.319Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 170 > KafkaZkClientTest > testGetAllTopicsInClusterTriggersWatch() 
STARTED
[2023-02-03T16:37:26.319Z] 
[2023-02-03T16:37:26.319Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 170 > KafkaZkClientTest > testGetAllTopicsInClusterTriggersWatch() 
PASSED
[2023-02-03T16:37:26.319Z] 
[2023-02-03T16:37:26.319Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 170 > KafkaZkClientTest > testDeleteTopicZNode() STARTED
[2023-02-03T16:37:27.437Z] 
[2023-02-03T16:37:27.437Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 170 > KafkaZkClientTest > testDeleteTopicZNode() PASSED
[2023-02-03T16:37:27.437Z] 
[2023-02-03T16:37:27.437Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 170 > KafkaZkClientTest > testDeletePath() STARTED
[2023-02-03T16:37:27.437Z] 
[2023-02-03T16:37:27.437Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 170 > KafkaZkClientTest > testDeletePath() PASSED
[2023-02-03T16:37:27.437Z] 
[2023-02-03T16:37:27.437Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 170 > KafkaZkClientTest > testGetBrokerMethods() STARTED
[2023-02-03T16:37:27.437Z] 
[2023-02-03T16:37:27.437Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 170 > KafkaZkClientTest > testGetBrokerMethods() PASSED
[2023-02-03T16:37:27.437Z] 
[2023-02-03T16:37:27.437Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 170 > KafkaZkClientTest > testJuteMaxBufffer() START

Re: [VOTE] 3.4.0 RC2

2023-02-03 Thread Jakub Scholz
+1 (non-binding). I run my tests with the staged Scala 2.13 binaries and
staged Maven artifacts. All seems to work fine.

Thanks & Regards
Jakub

On Tue, Jan 31, 2023 at 8:01 PM David Arthur  wrote:

> Hey folks, we found a couple of blockers with RC1 and have fixed them in
> the latest release candidate, RC2.
>
> The major features of this release include:
>
> * KIP-881: Rack-aware Partition Assignment for Kafka Consumers
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-881%3A+Rack-aware+Partition+Assignment+for+Kafka+Consumers
> >
>
> * KIP-876: Time based cluster metadata snapshots
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-876%3A+Time+based+cluster+metadata+snapshots
> >
>
> * KIP-787: MM2 manage Kafka resources with custom Admin implementation.
> <
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=191335620
> >
>
> * KIP-866 ZooKeeper to KRaft Migration
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration
> >
> (Early
> Access)
>
>
>
> Release notes for the 3.4.0 release:
>
> https://home.apache.org/~davidarthur/kafka-3.4.0-rc2/RELEASE_NOTES.html
>
>
> Please download, test and vote by Friday, February 3, 5pm PT
>
>
> ---
>
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
>
> https://kafka.apache.org/KEYS
>
>
> * Release artifacts to be voted upon (source and binary):
>
> https://home.apache.org/~davidarthur/kafka-3.4.0-rc2/
>
>
> * Maven artifacts to be voted upon:
>
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>
>
> * Javadoc:
>
> https://home.apache.org/~davidarthur/kafka-3.4.0-rc2/javadoc/
>
>
> * Tag to be voted upon (off 3.4 branch) is the 3.4.0 tag:
>
> https://github.com/apache/kafka/releases/tag/3.4.0-rc2
>
>
> * Documentation:
>
> https://kafka.apache.org/34/documentation.html
>
>
> * Protocol:
>
> https://kafka.apache.org/34/protocol.html
>
>
> ---
>
>
> Test results:
>
>
> We haven't had a 100% passing build, but the latest system test run looks
> pretty good:
>
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/3.4/2023-01-31--001.system-test-kafka-3.4--1675184554--confluentinc--3.4--ef3f5bd834/report.html
>
>
> Here are the Jenkins test runs for 3.4:
> https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.4/. We will
> continue
> trying to diagnose the flaky test failures as the release continues. I do
> not expect that any of these test failures are blockers for the release.
>
>
> Thanks!
>
> David Arthur
>


Re: [DISCUSS] KIP-894: Use incrementalAlterConfigs API for syncing topic configurations

2023-02-03 Thread Chris Egerton
Hi Tina,

Thanks for the KIP! I recently ran into this exact issue and it's great to
see a fix being proposed. I have a few small comments but overall this
looks good:

1) The current logic for syncing topic configs in MM2 is basically
fire-and-forget; all we do is log a warning message [1] if an attempt
fails. When "use.incremental.alter.configs" is set to "requested", we'll
need to know whether attempts using the incremental API succeed or not, and
then adjust our behavior accordingly. Will the plan here be to block on the
success/failure of the first request before sending any more, or will we
just switch over to the legacy API as soon as any request fails due to
targeting an incompatible broker, possibly after multiple requests with the
new API have been sent or at least queued up in the admin client?

2) We don't replicate default properties from the source cluster right now
[2].
Won't making ConfigPropertyFilter::shouldReplicateSourceDefault return true
by default change that behavior? If so, what's the motivation for favoring
defaults from the source cluster over defaults for the target cluster,
especially given that users may already be relying on the opposite?

3) Nit: IMO the parts in the "proposed changes" section that detail changes
to internal classes aren't really necessary since they're not relevant to
user-facing behavior and the classes/methods described in them might change
between now and when the PR for the KIP gets opened/reviewed/merged. I
think the only points that need to be in the KIP are the ones beginning
with "Extend ConfigPropertyFilter class", "Add a new configuration setting
to MirrorMaker", and "From Kafka 4.0" (which itself can just describe the
broker APIs that are used by MM2 in general, without referring to the
specific name of the method in MM2 that will call them).

[1] -
https://github.com/apache/kafka/blob/6e2b86597d9cd7c8b2019cffb895522deb63c93a/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorSourceConnector.java#L429

[2] -
https://github.com/apache/kafka/blob/6e2b86597d9cd7c8b2019cffb895522deb63c93a/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorSourceConnector.java#L460

Thanks again for the KIP!

Cheers,

Chris

On Wed, Jan 25, 2023 at 10:44 AM Gantigmaa Selenge 
wrote:

> Hi,
>
> If there are no further comments on the KIP, I will start a vote on it.
>
> Regards,
>
>
> On Mon, Jan 16, 2023 at 11:14 AM Gantigmaa Selenge 
> wrote:
>
> > Thanks everyone.
> >
> > I took the suggestions and updated the KIP accordingly. Please let me
> know
> > if there is anything else I could improve on.
> >
> > Regards,
> > Tina
> >
> > On Sun, Jan 15, 2023 at 10:24 PM Ismael Juma  wrote:
> >
> >> Hi Tina,
> >>
> >> See below.
> >>
> >> On Wed, Jan 11, 2023 at 3:03 AM Gantigmaa Selenge 
> >> wrote:
> >>
> >> > I do like this idea, however when it's set to required, I wasn't sure
> >> how
> >> > the mirrormaker should have. It's probably not a great experience if
> >> > mirrormaker starts crashing at some point after it's already running
> >> due to
> >> > an incompatible broker version.
> >> >
> >>
> >> This would only happen if the user explicitly requests the strict
> required
> >> ("non fallback") mode. There are many reasons why one may want this: say
> >> you want to be sure that your system is not susceptible to the
> >> "alterConfigs" problems or you want to write a test that fails if
> >> "alterConfigs' is used.
> >>
> >>
> >> > If the incrementalAlterConfig request fails because the target cluster
> >> is
> >> > running an older version, then we could log a WARN message that says
> >> > something like  "The config to use incrementalAlterConfig API for
> >> syncing
> >> > topic configurations has been set to true however target cluster is
> >> running
> >> > incompatible version therefore using the legacy alterConfig API". This
> >> way
> >> > the Mirrormaker never has to stop working and makes the user aware of
> >> > what's being used.
> >>
> >>
> >> Logging statements are not great as the sole mechanism (although useful
> as
> >> a complementary one) since one cannot easily test against them and
> they're
> >> often missed alongside all the other logging statements.
> >>
> >>
> >> >   In this case, we also would not need 3 separate values
> >> > for the config, instead would use the original true or false values:
> >> > - true - > would use incrementalAlterConfig API, but if it's
> >> unavailable,
> >> > fallback to the legacy API
> >> > - false -> keep using the legacy API
> >> >
> >> > Set the flag to true by default and remove the config in 4.0.
> >> >
> >>
> >> But this means you have different behavior depending on the
> >> supported versions forever. Even though the config values are simpler,
> >> it's
> >> harder to reason about the behavior.
> >>
> >> > One suggestion: I'm not sure how concerned you are about people's
> >> ability
> >> > to migrate, but if you want to make it as smooth as possible, you
> could
> >> add
> >>

Re: [VOTE] 3.4.0 RC2

2023-02-03 Thread Bill Bejeck
I did the following:


   - Validated all checksums and signatures
   - Built from source and ran all the unit tests
   - Ran quickstart steps for ZK, KRaft, and Kafka Streams (with KRaft)
   - Spot checked the docs and Javadocs

Thanks for running the release!

+1(binding)

-Bill


On Fri, Feb 3, 2023 at 5:14 AM David Jacot 
wrote:

> I performed the following validations:
> * Verified all checksums and signatures.
> * Built from source and ran unit tests.
> * Ran the first quickstart steps for both ZK and KRaft.
> * Spotchecked the Javadocs.
>
> I am +1 (binding). Thanks for running the release!
>
> Best,
> David
>
> On Tue, Jan 31, 2023 at 8:00 PM David Arthur 
> wrote:
>
> > Hey folks, we found a couple of blockers with RC1 and have fixed them in
> > the latest release candidate, RC2.
> >
> > The major features of this release include:
> >
> > * KIP-881: Rack-aware Partition Assignment for Kafka Consumers
> > <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-881%3A+Rack-aware+Partition+Assignment+for+Kafka+Consumers
> > >
> >
> > * KIP-876: Time based cluster metadata snapshots
> > <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-876%3A+Time+based+cluster+metadata+snapshots
> > >
> >
> > * KIP-787: MM2 manage Kafka resources with custom Admin implementation.
> > <
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=191335620
> > >
> >
> > * KIP-866 ZooKeeper to KRaft Migration
> > <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration
> > >
> > (Early
> > Access)
> >
> >
> >
> > Release notes for the 3.4.0 release:
> >
> > https://home.apache.org/~davidarthur/kafka-3.4.0-rc2/RELEASE_NOTES.html
> >
> >
> > Please download, test and vote by Friday, February 3, 5pm PT
> >
> >
> > ---
> >
> >
> > Kafka's KEYS file containing PGP keys we use to sign the release:
> >
> > https://kafka.apache.org/KEYS
> >
> >
> > * Release artifacts to be voted upon (source and binary):
> >
> > https://home.apache.org/~davidarthur/kafka-3.4.0-rc2/
> >
> >
> > * Maven artifacts to be voted upon:
> >
> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> >
> >
> > * Javadoc:
> >
> > https://home.apache.org/~davidarthur/kafka-3.4.0-rc2/javadoc/
> >
> >
> > * Tag to be voted upon (off 3.4 branch) is the 3.4.0 tag:
> >
> > https://github.com/apache/kafka/releases/tag/3.4.0-rc2
> >
> >
> > * Documentation:
> >
> > https://kafka.apache.org/34/documentation.html
> >
> >
> > * Protocol:
> >
> > https://kafka.apache.org/34/protocol.html
> >
> >
> > ---
> >
> >
> > Test results:
> >
> >
> > We haven't had a 100% passing build, but the latest system test run looks
> > pretty good:
> >
> >
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/3.4/2023-01-31--001.system-test-kafka-3.4--1675184554--confluentinc--3.4--ef3f5bd834/report.html
> >
> >
> > Here are the Jenkins test runs for 3.4:
> > https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.4/. We will
> > continue
> > trying to diagnose the flaky test failures as the release continues. I do
> > not expect that any of these test failures are blockers for the release.
> >
> >
> > Thanks!
> >
> > David Arthur
> >
>


Re: [VOTE] 3.4.0 RC2

2023-02-03 Thread David Jacot
I performed the following validations:
* Verified all checksums and signatures.
* Built from source and ran unit tests.
* Ran the first quickstart steps for both ZK and KRaft.
* Spotchecked the Javadocs.

I am +1 (binding). Thanks for running the release!

Best,
David

On Tue, Jan 31, 2023 at 8:00 PM David Arthur  wrote:

> Hey folks, we found a couple of blockers with RC1 and have fixed them in
> the latest release candidate, RC2.
>
> The major features of this release include:
>
> * KIP-881: Rack-aware Partition Assignment for Kafka Consumers
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-881%3A+Rack-aware+Partition+Assignment+for+Kafka+Consumers
> >
>
> * KIP-876: Time based cluster metadata snapshots
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-876%3A+Time+based+cluster+metadata+snapshots
> >
>
> * KIP-787: MM2 manage Kafka resources with custom Admin implementation.
> <
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=191335620
> >
>
> * KIP-866 ZooKeeper to KRaft Migration
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration
> >
> (Early
> Access)
>
>
>
> Release notes for the 3.4.0 release:
>
> https://home.apache.org/~davidarthur/kafka-3.4.0-rc2/RELEASE_NOTES.html
>
>
> Please download, test and vote by Friday, February 3, 5pm PT
>
>
> ---
>
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
>
> https://kafka.apache.org/KEYS
>
>
> * Release artifacts to be voted upon (source and binary):
>
> https://home.apache.org/~davidarthur/kafka-3.4.0-rc2/
>
>
> * Maven artifacts to be voted upon:
>
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>
>
> * Javadoc:
>
> https://home.apache.org/~davidarthur/kafka-3.4.0-rc2/javadoc/
>
>
> * Tag to be voted upon (off 3.4 branch) is the 3.4.0 tag:
>
> https://github.com/apache/kafka/releases/tag/3.4.0-rc2
>
>
> * Documentation:
>
> https://kafka.apache.org/34/documentation.html
>
>
> * Protocol:
>
> https://kafka.apache.org/34/protocol.html
>
>
> ---
>
>
> Test results:
>
>
> We haven't had a 100% passing build, but the latest system test run looks
> pretty good:
>
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/3.4/2023-01-31--001.system-test-kafka-3.4--1675184554--confluentinc--3.4--ef3f5bd834/report.html
>
>
> Here are the Jenkins test runs for 3.4:
> https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.4/. We will
> continue
> trying to diagnose the flaky test failures as the release continues. I do
> not expect that any of these test failures are blockers for the release.
>
>
> Thanks!
>
> David Arthur
>


Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #1555

2023-02-03 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-877: Mechanism for plugins and connectors to register metrics

2023-02-03 Thread Yash Mayya
Hi Mickael,

Thanks for the updates.

> the PluginMetrics implementation will append a
> suffix to sensor names to unique identify
> the plugin (based on the class name and tags).

Can we call this out explicitly in the KIP, since it is important to avoid
clashes in sensor naming? Also, should we allow plugins to retrieve sensors
from `PluginMetrics` if we can check / verify that they own the sensor
(based on the suffix)?

Other than the above minor points, this looks good to me now!

Thanks,
Yash

On Fri, Feb 3, 2023 at 2:29 AM Chris Egerton 
wrote:

> Hi Mickael,
>
> This is looking great. I have one small question left but I do not consider
> it a blocker.
>
> What is the intended use case for PluginMetrics::close? To me at least, it
> implies that plugin developers will be responsible for invoking that method
> themselves in order to clean up metrics that they've created, but wouldn't
> we want the runtime (i.e., KafkaProducer class, Connect framework, etc.) to
> handle that automatically when the resource that the plugin applies to is
> closed?
>
> Cheers,
>
> Chris
>
> On Thu, Jan 26, 2023 at 10:22 AM Mickael Maison 
> wrote:
>
> > Hi Yash,
> >
> > 1) To avoid conflicts with other sensors, the PluginMetrics
> > implementation will append a suffix to sensor names to unique identify
> > the plugin (based on the class name and tags). Also I changed the
> > semantics of the sensor() method to only create sensors (originally it
> > was get or create). If a sensor with the same name already exists, the
> > method will throw.
> > 2) Tags will be automatically added to metrics and sensors to unique
> > identify the plugin. For Connect plugins, the connector name, task id
> > and alias can be added if available. The class implementing
> > PluginMetrics will be similar to ConnectMetrics, as in it will provide
> > a simplified API wrapping Metrics. I'm planning to use PluginMetrics
> > for Connect plugin too and should not need to interact with
> > ConnectMetrics.
> > 3) Right, I fixed the last rejected alternative.
> >
> > Thanks,
> > Mickael
> >
> > On Thu, Jan 26, 2023 at 4:04 PM Mickael Maison  >
> > wrote:
> > >
> > > Hi Federico,
> > >
> > > - The metricName() method does not register anything, it just builds a
> > > MetricName instance which is just a container holding a name, group,
> > > description and tags for a metric. Each time it is called, it returns
> > > a new instance. If called with the same arguments, the returned value
> > > will be equal.
> > > - Initially I just copied the API of Metrics. I made some small
> > > changes so the metric and sensor methods are a bit more similar
> > > - Good catch! I fixed the example.
> > >
> > > Thanks,
> > > Mickael
> > >
> > >
> > > On Thu, Jan 26, 2023 at 3:54 PM Mickael Maison <
> mickael.mai...@gmail.com>
> > wrote:
> > > >
> > > > Hi Chris,
> > > >
> > > > 1) I updated the KIP to only mention the interface.
> > > > 2) This was a mistake. I've added ReplicationPolicy to the list of
> > plugins.
> > > >
> > > > Thanks,
> > > > Mickael
> > > >
> > > > On Tue, Jan 24, 2023 at 11:16 AM Yash Mayya 
> > wrote:
> > > > >
> > > > > Hi Mickael,
> > > > >
> > > > > Thanks for the updated KIP, this is looking really good! I had a
> > couple
> > > > > more questions -
> > > > >
> > > > > 1) Sensor names need to be unique across all groups for a `Metrics`
> > > > > instance. How are we planning to avoid naming clashes (both between
> > > > > different plugins as well as with pre-defined sensors)?
> > > > >
> > > > > 2) Connect has a `ConnectMetrics` wrapper around `Metrics` via
> which
> > > > > rebalance / worker / connector / task metrics are recorded. Could
> you
> > > > > please elaborate in the KIP how the plugin metrics for connectors /
> > tasks
> > > > > will inter-operate with this?
> > > > >
> > > > > Another minor point is that the third rejected alternative appears
> > to be an
> > > > > incomplete sentence?
> > > > >
> > > > > Thanks,
> > > > > Yash
> > > > >
> > > > > On Fri, Jan 13, 2023 at 10:56 PM Mickael Maison <
> > mickael.mai...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I've updated the KIP based on the feedback.
> > > > > >
> > > > > > Now instead of receiving a Metrics instance, plugins get access
> to
> > > > > > PluginMetrics that exposes a much smaller API. I've removed the
> > > > > > special handling for connectors and tasks and they must now
> > implement
> > > > > > the Monitorable interface as well to use this feature. Finally
> the
> > > > > > goal is to allow all plugins (apart from metrics reporters) to
> use
> > > > > > this feature. I've listed them all (there are over 30 pluggable
> > APIs)
> > > > > > but I've not added the list in the KIP. The reason is that new
> > plugins
> > > > > > could be added in the future and instead I'll focus on adding
> > support
> > > > > > in all the place that instantiate classes.
> > > > > >
> > > > > > Thanks,
> > > > > > Mickael
> > > > > >
> > > > > > On T