Jenkins build is still unstable: Kafka » Kafka Branch Builder » 3.6 #137

2024-01-16 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: Kafka » Kafka Branch Builder » 3.7 #65

2024-01-16 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2578

2024-01-16 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: Kafka » Kafka Branch Builder » 3.6 #136

2024-01-16 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-890 Server Side Defense

2024-01-16 Thread Jun Rao
Hi, Justine,

Thanks for the explanation. I understand the intention now. In the overflow
case, we set the non-tagged field to the old pid (and the max epoch) in the
prepare marker so that we could correctly write the marker to the data
partition if the broker downgrades. When writing the complete marker, we
know the marker has already been written to the data partition. We set the
non-tagged field to the new pid to avoid InvalidPidMappingException in the
client if the broker downgrades.

The above seems to work. It's just a bit inconsistent for a prepare marker
and a complete marker to use different pids in this special case. If we
downgrade with the complete marker, it seems that we will never be able to
write the complete marker with the old pid. Not sure if it causes any
issue, but it seems a bit weird. Instead of writing the complete marker
with the new pid, could we write two records: a complete marker with the
old pid followed by a TransactionLogValue with the new pid and an empty
state? We could make the two records in the same batch so that they will be
added to the log atomically.

Thanks,

Jun


On Fri, Jan 12, 2024 at 5:40 PM Justine Olshan 
wrote:

> (1) the prepare marker is written, but the endTxn response is not received
> by the client when the server downgrades
> (2)  the prepare marker is written, the endTxn response is received by the
> client when the server downgrades.
>
> I think I am still a little confused. In both of these cases, the
> transaction log has the old producer ID. We don't write the new producer ID
> in the prepare marker's non tagged fields.
> If the server downgrades now, it would read the records not in tagged
> fields and the complete marker will also have the old producer ID.
> (If we had used the new producer ID, we would not have transactional
> correctness since the producer id doesn't match the transaction and the
> state would not be correct on the data partition.)
>
> In the overflow case, I'd expect the following to happen on the client side
> Case 1  -- we retry EndTxn -- it is the same producer ID and epoch - 1 this
> would fence the producer
> Case 2 -- we don't retry EndTxn and use the new producer id which would
> result in InvalidPidMappingException
>
> Maybe we can have special handling for when a server downgrades. When it
> reconnects we could get an API version request showing KIP-890 part 2 is
> not supported. In that case, we can call initProducerId to abort the
> transaction. (In the overflow case, this correctly gives us a new producer
> ID)
>
> I guess the corresponding case would be where the *complete marker *is
> written but the endTxn is not received by the client and the server
> downgrades? This would result in the transaction coordinator having the new
> ID and not the old one.  If the client retries, it will receive an
> InvalidPidMappingException. The InitProducerId scenario above would help
> here too.
>
> To be clear, my compatibility story is meant to support downgrades server
> side in keeping the transactional correctness. Keeping the client from
> fencing itself is not the priority.
>
> Hope this helps. I can also add text in the KIP about InitProducerId if we
> think that fixes some edge cases.
>
> Justine
>
> On Fri, Jan 12, 2024 at 4:10 PM Jun Rao  wrote:
>
> > Hi, Justine,
> >
> > Thanks for the reply.
> >
> > I agree that we don't need to optimize for fencing during downgrades.
> > Regarding consistency, there are two possible cases: (1) the prepare
> marker
> > is written, but the endTxn response is not received by the client when
> the
> > server downgrades; (2)  the prepare marker is written, the endTxn
> response
> > is received by the client when the server downgrades. In (1), the client
> > will have the old produce Id and in (2), the client will have the new
> > produce Id. If we downgrade right after the prepare marker, we can't be
> > consistent to both (1) and (2) since we can only put one value in the
> > existing produce Id field. It's also not clear which case is more likely.
> > So we could probably be consistent with either case. By putting the new
> > producer Id in the prepare marker, we are consistent with case (2) and it
> > also has the slight benefit that the produce field in the prepare and
> > complete marker are consistent in the overflow case.
> >
> > Jun
> >
> > On Fri, Jan 12, 2024 at 3:11 PM Justine Olshan
> > 
> > wrote:
> >
> > > Hi Jun,
> > >
> > > In the case you describe, we would need to have a delayed request,
> send a
> > > successful EndTxn, and a successful AddPartitionsToTxn and then have
> the
> > > delayed EndTxn request go through for a given producer.
> > > I'm trying to figure out if it is possible for the client to transition
> > if
> > > a previous request is delayed somewhere. But yes, in this case I think
> we
> > > would fence the client.
> > >
> > > Not for the overflow case. In the overflow case, the producer ID and
> the
> > > epoch are different on the marker and on the 

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2577

2024-01-16 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-16152) Fix PlaintextConsumerTest.testStaticConsumerDetectsNewPartitionCreatedAfterRestart

2024-01-16 Thread Kirk True (Jira)
Kirk True created KAFKA-16152:
-

 Summary: Fix 
PlaintextConsumerTest.testStaticConsumerDetectsNewPartitionCreatedAfterRestart
 Key: KAFKA-16152
 URL: https://issues.apache.org/jira/browse/KAFKA-16152
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, unit tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16151) Fix PlaintextConsumerTest.testPerPartitionLedMetricsCleanUpWithSubscribe

2024-01-16 Thread Kirk True (Jira)
Kirk True created KAFKA-16151:
-

 Summary: Fix 
PlaintextConsumerTest.testPerPartitionLedMetricsCleanUpWithSubscribe
 Key: KAFKA-16151
 URL: https://issues.apache.org/jira/browse/KAFKA-16151
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, unit tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16150) Fix PlaintextConsumerTest.testPerPartitionLagMetricsCleanUpWithSubscribe

2024-01-16 Thread Kirk True (Jira)
Kirk True created KAFKA-16150:
-

 Summary: Fix 
PlaintextConsumerTest.testPerPartitionLagMetricsCleanUpWithSubscribe
 Key: KAFKA-16150
 URL: https://issues.apache.org/jira/browse/KAFKA-16150
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, unit tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-1016 Make MM2 heartbeats topic name configurable

2024-01-16 Thread Ryanne Dolan
Makes sense to me, +1.

On Tue, Jan 16, 2024 at 5:04 PM Kondrát Bertalan  wrote:

> Hey Team,
>
> I would like to start a discussion thread about the *KIP-1016 Make MM2
> heartbeats topic name configurable
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1016+Make+MM2+heartbeats+topic+name+configurable
> >*
> .
>
> This KIP aims to make the default heartbeat topic name (`heartbeats`) in
> the DefaultReplicationPolicy configurable via a property.
> Since this is my first KIP and the change is small, I implemented it in
> advance so, I can include the PR
>  as well.
>
> I appreciate all your feedbacks and comments.
>
> Special thanks to Viktor Somogyi-Vass  and
> Daniel
> Urban  for the original idea and help.
> Thank you,
> Berci
>
> --
> *Bertalan Kondrat* | Founder, SWE
> servy.hu 
>
>
>
> 
> --
>


[jira] [Created] (KAFKA-16149) Aggressively expire unused client connections

2024-01-16 Thread Kirk True (Jira)
Kirk True created KAFKA-16149:
-

 Summary: Aggressively expire unused client connections
 Key: KAFKA-16149
 URL: https://issues.apache.org/jira/browse/KAFKA-16149
 Project: Kafka
  Issue Type: Improvement
  Components: clients, consumer, producer 
Reporter: Kirk True
Assignee: Kirk True


The goal is to minimize the number of connections from the client to the 
brokers.

On the Java client, there are potentially two types of network connections to 
brokers:
 # Connections for metadata requests
 # Connections for fetch, produce, etc. requests

The idea is to apply a much shorter idle time to client connections that have 
_only_ served metadata (type 1 above) so that they become candidates for 
expiration more quickly.

Alternatively (or additionally), a change to the way metadata requests are 
routed could be made to reduce the number of connections.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[DISCUSS] KIP-1016 Make MM2 heartbeats topic name configurable

2024-01-16 Thread Kondrát Bertalan
Hey Team,

I would like to start a discussion thread about the *KIP-1016 Make MM2
heartbeats topic name configurable
*
.

This KIP aims to make the default heartbeat topic name (`heartbeats`) in
the DefaultReplicationPolicy configurable via a property.
Since this is my first KIP and the change is small, I implemented it in
advance so, I can include the PR
 as well.

I appreciate all your feedbacks and comments.

Special thanks to Viktor Somogyi-Vass  and Daniel
Urban  for the original idea and help.
Thank you,
Berci

-- 
*Bertalan Kondrat* | Founder, SWE
servy.hu 




--


[jira] [Resolved] (KAFKA-16083) Exclude throttle time when expiring inflight requests on a connection

2024-01-16 Thread Adithya Chandra (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adithya Chandra resolved KAFKA-16083.
-
Fix Version/s: 3.8.0
 Reviewer: Stanislav Kozlovski
   Resolution: Fixed

> Exclude throttle time when expiring inflight requests on a connection
> -
>
> Key: KAFKA-16083
> URL: https://issues.apache.org/jira/browse/KAFKA-16083
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Reporter: Adithya Chandra
>Priority: Critical
> Fix For: 3.8.0
>
>
> When expiring inflight requests, the network client does not take throttle 
> time into account. If a connection has multiple inflight requests (default of 
> 5) and each request is throttled then some of the requests can incorrectly 
> marked as expired. Subsequently the connection is closed and the client 
> establishes a new connection to the broker. This behavior leads to 
> unnecessary connections to the broker, leads to connection storms and 
> increases latencies. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16148) Implement GroupMetadataManager#onUnloaded

2024-01-16 Thread Jeff Kim (Jira)
Jeff Kim created KAFKA-16148:


 Summary: Implement GroupMetadataManager#onUnloaded
 Key: KAFKA-16148
 URL: https://issues.apache.org/jira/browse/KAFKA-16148
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jeff Kim


complete all awaiting futures with NOT_COORDINATOR (for classic group)

transition all groups to DEAD.

Cancel all timers related to the unloaded group metadata manager



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2576

2024-01-16 Thread Apache Jenkins Server
See 




[jira] [Resolved] (KAFKA-15834) Subscribing to non-existent topic blocks StreamThread from stopping

2024-01-16 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris resolved KAFKA-15834.
-
Resolution: Fixed

> Subscribing to non-existent topic blocks StreamThread from stopping
> ---
>
> Key: KAFKA-15834
> URL: https://issues.apache.org/jira/browse/KAFKA-15834
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.6.0
>Reporter: Greg Harris
>Priority: Major
> Fix For: 3.8.0
>
>
> In 
> NamedTopologyIntegrationTest#shouldContinueProcessingOtherTopologiesWhenNewTopologyHasMissingInputTopics
>  a topology is created which references an input topic which does not exist. 
> The test as-written passes, but the KafkaStreams#close(Duration) at the end 
> times out, and leaves StreamsThreads running.
> From some cursory investigation it appears that this is happening:
> 1. The consumer calls the StreamsPartitionAssignor, which calls 
> TaskManager#handleRebalanceStart as a side-effect
> 2. handleRebalanceStart sets the rebalanceInProgress flag
> 3. This flag is checked by StreamThread.runLoop, and causes the loop to 
> remain running.
> 4. The consumer never calls StreamsRebalanceListener#onPartitionsAssigned, 
> because the topic does not exist
> 5. Because no partitions are ever assigned, the 
> TaskManager#handleRebalanceComplete never clears the rebalanceInProgress flag
>  
> This log message is printed in a tight loop while the close is ongoing and 
> the consumer is being polled with zero duration:
> {noformat}
> [2023-11-15 11:42:43,661] WARN [Consumer 
> clientId=NamedTopologyIntegrationTestshouldContinueProcessingOtherTopologiesWhenNewTopologyHasMissingInputTopics-942756f8-5213-4c44-bb6b-5f805884e026-StreamThread-1-consumer,
>  
> groupId=NamedTopologyIntegrationTestshouldContinueProcessingOtherTopologiesWhenNewTopologyHasMissingInputTopics]
>  Received unknown topic or partition error in fetch for partition 
> unique_topic_prefix-topology-1-store-repartition-0 
> (org.apache.kafka.clients.consumer.internals.FetchCollector:321)
> {noformat}
> Practically, this means that this test leaks two StreamsThreads and the 
> associated clients and sockets, and delays the completion of the test until 
> the KafkaStreams#close(Duration) call times out.
> Either we should change the rebalanceInProgress flag to avoid getting stuck 
> in this rebalance state, or figure out a way to shut down a StreamsThread 
> that is in an extended rebalance state during shutdown.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16147) Partition is assigned to two members at the same time

2024-01-16 Thread Emanuele Sabellico (Jira)
Emanuele Sabellico created KAFKA-16147:
--

 Summary: Partition is assigned to two members at the same time
 Key: KAFKA-16147
 URL: https://issues.apache.org/jira/browse/KAFKA-16147
 Project: Kafka
  Issue Type: Sub-task
Reporter: Emanuele Sabellico


While running test 0113 of librdkafka, subtest 
_u_multiple_subscription_changes_ have received this error saying that a 
partition is assigned to two members at the same time.

{code:java}
Error: C_6#consumer-9 is assigned rdkafkatest_rnd550f20623daba04c_0113u_2 [0] 
which is already assigned to consumer C_5#consumer-8 {code}
I've reconstructed this sequence:

C_5 SUBSCRIBES TO T1

{code:java}
%7|1705403451.561|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: 
Heartbeat of member id "RaTCu6RXQH-FiSl95iZzdw", group id 
"rdkafkatest_rnd53b4eb0c2de343_0113u", generation id 6, group instance id 
"(null)", current assignment "", subscribe topics 
"rdkafkatest_rnd5a91902462d61c2e_0113u_1((null))[-1]"{code}

C_5 ASSIGNMENT CHANGES TO T1-P7, T1-P8, T1-P12


{code:java}
[2024-01-16 12:10:51,562] INFO [GroupCoordinator id=1 topic=__consumer_offsets 
partition=7] [GroupId rdkafkatest_rnd53b4eb0c2de343_0113u] Member 
RaTCu6RXQH-FiSl95iZzdw transitioned from CurrentAssignment(memberEpoch=6, 
previousMemberEpoch=0, targetMemberEpoch=6, state=assigning, 
assignedPartitions={}, partitionsPendingRevocation={}, 
partitionsPendingAssignment={IKXGrFR1Rv-Qes7Ummas6A=[3, 12]}) to 
CurrentAssignment(memberEpoch=14, previousMemberEpoch=6, targetMemberEpoch=14, 
state=stable, assignedPartitions={IKXGrFR1Rv-Qes7Ummas6A=[7, 8, 12]}, 
partitionsPendingRevocation={}, partitionsPendingAssignment={}). 
(org.apache.kafka.coordinator.group.GroupMetadataManager)C_5 RECEIVES TARGET 
ASSIGNMENT %7|1705403451.565|HEARTBEAT|C_5#consumer-8| [thrd:main]: 
GroupCoordinator/1: Heartbeat response received target assignment 
"(null)(IKXGrFR1Rv+Qes7Ummas6A)[7], (null)(IKXGrFR1Rv+Qes7Ummas6A)[8], 
(null)(IKXGrFR1Rv+Qes7Ummas6A)[12]"{code}


C_5 ACKS TARGET ASSIGNMENT

{code:java}
%7|1705403451.566|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: 
Heartbeat of member id "RaTCu6RXQH-FiSl95iZzdw", group id 
"rdkafkatest_rnd53b4eb0c2de343_0113u", generation id 14, group instance id 
"NULL", current assignment 
"rdkafkatest_rnd5a91902462d61c2e_0113u_1(IKXGrFR1Rv+Qes7Ummas6A)[7], 
rdkafkatest_rnd5a91902462d61c2e_0113u_1(IKXGrFR1Rv+Qes7Ummas6A)[8], 
rdkafkatest_rnd5a91902462d61c2e_0113u_1(IKXGrFR1Rv+Qes7Ummas6A)[12]", subscribe 
topics "rdkafkatest_rnd5a91902462d61c2e_0113u_1((null))[-1]" 
%7|1705403451.567|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: 
Heartbeat response received target assignment 
"(null)(IKXGrFR1Rv+Qes7Ummas6A)[7], (null)(IKXGrFR1Rv+Qes7Ummas6A)[8], 
(null)(IKXGrFR1Rv+Qes7Ummas6A)[12]"{code}


C_5 SUBSCRIBES TO T1,T2: T1 partitions are revoked, 5 T2 partitions are pending 
{code:java}
%7|1705403452.612|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: 
Heartbeat of member id "RaTCu6RXQH-FiSl95iZzdw", group id 
"rdkafkatest_rnd53b4eb0c2de343_0113u", generation id 14, group instance id 
"NULL", current assignment "NULL", subscribe topics 
"rdkafkatest_rnd550f20623daba04c_0113u_2((null))[-1], 
rdkafkatest_rnd5a91902462d61c2e_0113u_1((null))[-1]" [2024-01-16 12:10:52,615] 
INFO [GroupCoordinator id=1 topic=__consumer_offsets partition=7] [GroupId 
rdkafkatest_rnd53b4eb0c2de343_0113u] Member RaTCu6RXQH-FiSl95iZzdw updated its 
subscribed topics to: [rdkafkatest_rnd550f20623daba04c_0113u_2, 
rdkafkatest_rnd5a91902462d61c2e_0113u_1]. 
(org.apache.kafka.coordinator.group.GroupMetadataManager) [2024-01-16 
12:10:52,616] INFO [GroupCoordinator id=1 topic=__consumer_offsets partition=7] 
[GroupId rdkafkatest_rnd53b4eb0c2de343_0113u] Member RaTCu6RXQH-FiSl95iZzdw 
transitioned from CurrentAssignment(memberEpoch=14, previousMemberEpoch=6, 
targetMemberEpoch=14, state=stable, 
assignedPartitions={IKXGrFR1Rv-Qes7Ummas6A=[7, 8, 12]}, 
partitionsPendingRevocation={}, partitionsPendingAssignment={}) to 
CurrentAssignment(memberEpoch=14, previousMemberEpoch=6, targetMemberEpoch=16, 
state=revoking, assignedPartitions={}, 
partitionsPendingRevocation={IKXGrFR1Rv-Qes7Ummas6A=[7, 8, 12]}, 
partitionsPendingAssignment={EnZMikZURKiUoxZf0rozaA=[0, 1, 2, 8, 9]}). 
(org.apache.kafka.coordinator.group.GroupMetadataManager) 
%7|1705403452.618|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: 
Heartbeat response received target assignment ""{code}

C_5 FINISHES REVOCATION

{code:java}
%7|1705403452.618|CGRPJOINSTATE|C_5#consumer-8| [thrd:main]: Group 
"rdkafkatest_rnd53b4eb0c2de343_0113u" changed join state wait-assign-call -> 
steady (state up)C_5 ACKS REVOCATION, RECEIVES T2-P0,T2-P1,T2-P2 
%7|1705403452.618|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: 
Heartbeat of member id "RaTCu6RXQH-FiSl95iZzdw", group id 

Re: [VOTE] 3.7.0 RC2

2024-01-16 Thread Proven Provenzano
I have a PR https://github.com/apache/kafka/pull/15197 for
https://issues.apache.org/jira/browse/KAFKA-16131 that is building now.
--Proven

On Mon, Jan 15, 2024 at 5:03 AM Jakub Scholz  wrote:

> *> Hi Jakub,> > Thanks for trying the RC. I think what you found is a
> blocker bug because it *
> *> will generate huge amount of logspam. I guess we didn't find it in junit
> tests *
> *> since logspam doesn't fail the automated tests. But certainly it's not
> suitable *
> *> for production. Did you file a JIRA yet?*
>
> Hi Colin,
>
> I opened https://issues.apache.org/jira/browse/KAFKA-16131.
>
> Thanks & Regards
> Jakub
>
> On Mon, Jan 15, 2024 at 8:57 AM Colin McCabe  wrote:
>
> > Hi Stanislav,
> >
> > Thanks for making the first RC. The fact that it's titled RC2 is messing
> > with my mind a bit. I hope this doesn't make people think that we're
> > farther along than we are, heh.
> >
> > On Sun, Jan 14, 2024, at 13:54, Jakub Scholz wrote:
> > > *> Nice catch! It does seem like we should have gated this behind the
> > > metadata> version as KIP-858 implies. Is the cluster configured with
> > > multiple log> dirs? What is the impact of the error messages?*
> > >
> > > I did not observe any obvious impact. I was able to send and receive
> > > messages as normally. But to be honest, I have no idea what else
> > > this might impact, so I did not try anything special.
> > >
> > > I think everyone upgrading an existing KRaft cluster will go through
> this
> > > stage (running Kafka 3.7 with an older metadata version for at least a
> > > while). So even if it is just a logged exception without any other
> > impact I
> > > wonder if it might scare users from upgrading. But I leave it to others
> > to
> > > decide if this is a blocker or not.
> > >
> >
> > Hi Jakub,
> >
> > Thanks for trying the RC. I think what you found is a blocker bug because
> > it will generate huge amount of logspam. I guess we didn't find it in
> junit
> > tests since logspam doesn't fail the automated tests. But certainly it's
> > not suitable for production. Did you file a JIRA yet?
> >
> > > On Sun, Jan 14, 2024 at 10:17 PM Stanislav Kozlovski
> > >  wrote:
> > >
> > >> Hey Luke,
> > >>
> > >> This is an interesting problem. Given the fact that the KIP for
> having a
> > >> 3.8 release passed, I think it weights the scale towards not calling
> > this a
> > >> blocker and expecting it to be solved in 3.7.1.
> > >>
> > >> It is unfortunate that it would not seem safe to migrate to KRaft in
> > 3.7.0
> > >> (given the inability to rollback safely), but if that's true - the
> same
> > >> case would apply for 3.6.0. So in any case users w\ould be expected to
> > use a
> > >> patch release for this.
> >
> > Hi Luke,
> >
> > Thanks for testing rollback. I think this is a case where the
> > documentation is wrong. The intention was to for the steps to basically
> be:
> >
> > 1. roll all the brokers into zk mode, but with migration enabled
> > 2. take down the kraft quorum
> > 3. rmr /controller, allowing a hybrid broker to take over.
> > 4. roll all the brokers into zk mode without migration enabled (if
> desired)
> >
> > With these steps, there isn't really unavailability since a ZK controller
> > can be elected quickly after the kraft quorum is gone.
> >
> > >> Further, since we will have a 3.8 release - it is
> > >> likely we will ultimately recommend users upgrade from that version
> > given
> > >> its aim is to have strategic KRaft feature parity with ZK.
> > >> That being said, I am not 100% on this. Let me know whether you think
> > this
> > >> should block the release, Luke. I am also tagging Colin and David to
> > weigh
> > >> in with their opinions, as they worked on the migration logic.
> >
> > The rollback docs are new in 3.7 so the fact that they're wrong is a
> clear
> > blocker, I think. But easy to fix, I believe. I will create a PR.
> >
> > best,
> > Colin
> >
> > >>
> > >> Hey Kirk and Chris,
> > >>
> > >> Unless I'm missing something - KAFKALESS-16029 is simply a bad log due
> > to
> > >> improper closing. And the PR description implies this has been present
> > >> since 3.5. While annoying, I don't see a strong reason for this to
> block
> > >> the release.
> > >>
> > >> Hey Jakub,
> > >>
> > >> Nice catch! It does seem like we should have gated this behind the
> > metadata
> > >> version as KIP-858 implies. Is the cluster configured with multiple
> log
> > >> dirs? What is the impact of the error messages?
> > >>
> > >> Tagging Igor (the author of the KIP) to weigh in.
> > >>
> > >> Best,
> > >> Stanislav
> > >>
> > >> On Sat, Jan 13, 2024 at 7:22 PM Jakub Scholz  wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > I was trying the RC2 and run into the following issue ... when I run
> > >> > 3.7.0-RC2 KRaft cluster with metadata version set to 3.6-IV2
> metadata
> > >> > version, I seem to be getting repeated errors like this in the
> > controller
> > >> > logs:
> > >> >
> > >> > 2024-01-13 16:58:01,197 INFO [QuorumController id=0]

[jira] [Reopened] (KAFKA-15538) Client support for java regex based subscription

2024-01-16 Thread Lianet Magrans (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lianet Magrans reopened KAFKA-15538:


> Client support for java regex based subscription
> 
>
> Key: KAFKA-15538
> URL: https://issues.apache.org/jira/browse/KAFKA-15538
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, consumer
>Reporter: Lianet Magrans
>Assignee: Phuc Hong Tran
>Priority: Major
>  Labels: kip-848, kip-848-client-support
> Fix For: 3.8.0
>
>
> When using subscribe with a java regex (Pattern), we need to resolve it on 
> the client side to send the broker a list of topic names to subscribe to.
> Context:
> The new consumer group protocol uses [Google 
> RE2/J|https://github.com/google/re2j] for regular expressions and introduces 
> new methods in the consumer API to subscribe using a `SubscribePattern`. The 
> subscribe using a java `Pattern` will be still supported for a while but 
> eventually removed.
>  * When the subscribe with SubscriptionPattern is used, the client should 
> just send the regex to the broker and it will be resolved on the server side.
>  * In the case of the subscribe with Pattern, the regex should be resolved on 
> the client side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15863) Handle push telemetry throttling with quota manager

2024-01-16 Thread Apoorv Mittal (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apoorv Mittal resolved KAFKA-15863.
---
Resolution: Not A Problem

> Handle push telemetry throttling with quota manager
> ---
>
> Key: KAFKA-15863
> URL: https://issues.apache.org/jira/browse/KAFKA-15863
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Apoorv Mittal
>Assignee: Apoorv Mittal
>Priority: Major
>
> Details: https://github.com/apache/kafka/pull/14699#discussion_r1399714279



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16146) Checkpoint log-start-offset after remote log deletion

2024-01-16 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-16146:


 Summary: Checkpoint log-start-offset after remote log deletion
 Key: KAFKA-16146
 URL: https://issues.apache.org/jira/browse/KAFKA-16146
 Project: Kafka
  Issue Type: Task
Reporter: Kamal Chandraprakash
Assignee: Kamal Chandraprakash


The log-start-offset checkpoint is not getting updated after deleting the 
remote log segments due to the below check:

https://sourcegraph.com/github.com/apache/kafka@b16df3b103d915d33670b8156217fc6c2b473f61/-/blob/core/src/main/scala/kafka/log/LogManager.scala?L851



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16145) Windows Kafka Shutdown

2024-01-16 Thread jeongbobae (Jira)
jeongbobae created KAFKA-16145:
--

 Summary: Windows Kafka Shutdown
 Key: KAFKA-16145
 URL: https://issues.apache.org/jira/browse/KAFKA-16145
 Project: Kafka
  Issue Type: Bug
  Components: connect
Affects Versions: 3.6.0
 Environment: windows, openjdk-21, kafka_2.12-3.6.0
Reporter: jeongbobae
 Fix For: 3.6.0


ERROR Error while deleting segments for test.public.testtable-0 in dir 
C:\tmp\kafka-logs (org.apache.kafka.storage.internals.log.LogDirFailureChannel)
java.nio.file.FileSystemException: 
C:\tmp\kafka-logs\test.public.testtable-0\02043576.timeindex -> 
C:\tmp\kafka-logs\test.public.testtable-0\02043576.timeindex.deleted:
 The process cannot access the file because it is being used by another process
        at 
java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92)
        at 
java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103)
        at java.base/sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:414)
        at 
java.base/sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:291)
        at java.base/java.nio.file.Files.move(Files.java:1430)
        at 
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:982)
        at 
org.apache.kafka.storage.internals.log.AbstractIndex.renameTo(AbstractIndex.java:227)
        at 
org.apache.kafka.storage.internals.log.LazyIndex$IndexValue.renameTo(LazyIndex.java:122)
        at 
org.apache.kafka.storage.internals.log.LazyIndex.renameTo(LazyIndex.java:202)
        at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:495)
        at kafka.log.LocalLog$.$anonfun$deleteSegmentFiles$1(LocalLog.scala:917)
        at 
kafka.log.LocalLog$.$anonfun$deleteSegmentFiles$1$adapted(LocalLog.scala:915)
        at scala.collection.immutable.List.foreach(List.scala:333)
        at kafka.log.LocalLog$.deleteSegmentFiles(LocalLog.scala:915)
        at kafka.log.LocalLog.removeAndDeleteSegments(LocalLog.scala:317)
        at kafka.log.UnifiedLog.$anonfun$deleteSegments$2(UnifiedLog.scala:1469)
        at kafka.log.UnifiedLog.deleteSegments(UnifiedLog.scala:1845)
        at 
kafka.log.UnifiedLog.deleteRetentionMsBreachedSegments(UnifiedLog.scala:1443)
        at kafka.log.UnifiedLog.deleteOldSegments(UnifiedLog.scala:1487)
        at kafka.log.LogManager.$anonfun$cleanupLogs$3(LogManager.scala:1282)
        at 
kafka.log.LogManager.$anonfun$cleanupLogs$3$adapted(LogManager.scala:1279)
        at scala.collection.immutable.List.foreach(List.scala:333)
        at kafka.log.LogManager.cleanupLogs(LogManager.scala:1279)
        at 
kafka.log.LogManager.$anonfun$startupWithConfigOverrides$2(LogManager.scala:562)
        at 
org.apache.kafka.server.util.KafkaScheduler.lambda$schedule$1(KafkaScheduler.java:150)
        at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
        at 
java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:358)
        at 
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1583)
        Suppressed: java.nio.file.FileSystemException: 
C:\tmp\kafka-logs\test.public.testtable-0\02043576.timeindex -> 
C:\tmp\kafka-logs\test.public.testtable-0\02043576.timeindex.deleted:
 The process cannot access the file because it is being used by another process
                at 
java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92)
                at 
java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103)
                at 
java.base/sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:328)
                at 
java.base/sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:291)
                at java.base/java.nio.file.Files.move(Files.java:1430)
                at 
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:978)
                ... 25 more
 
 
ERROR Shutdown broker because all log dirs in C:\tmp\kafka-logs have failed 
(kafka.log.LogManager)
 



{color:#172b4d}*Run in administrator mode, no processes running*{color}
{color:#172b4d}*What more can I do?*{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Jenkins build is still unstable: Kafka » Kafka Branch Builder » 3.7 #64

2024-01-16 Thread Apache Jenkins Server
See 




[jira] [Resolved] (KAFKA-16132) Upgrading from 3.6 to 3.7 in KRaft will have seconds of partitions unavailable

2024-01-16 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16132.
---
Resolution: Duplicate

> Upgrading from 3.6 to 3.7 in KRaft will have seconds of partitions unavailable
> --
>
> Key: KAFKA-16132
> URL: https://issues.apache.org/jira/browse/KAFKA-16132
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0
>Reporter: Luke Chen
>Priority: Blocker
>
> When upgrading from 3.6 to 3.7, we noticed that after upgrade the metadata 
> version, all the partitions will be reset at one time, which causes a short 
> period of time unavailable. This doesn't happen before. 
> {code:java}
> [2024-01-15 20:45:19,757] INFO [BrokerMetadataPublisher id=2] Updating 
> metadata.version to 19 at offset OffsetAndEpoch(offset=229, epoch=2). 
> (kafka.server.metadata.BrokerMetadataPublisher)
> [2024-01-15 20:45:29,915] INFO [ReplicaFetcherManager on broker 2] Removed 
> fetcher for partitions Set(t1-29, t1-25, t1-21, t1-17, t1-46, t1-13, t1-42, 
> t1-9, t1-38, t1-5, t1-34, t1-1, t1-30, t1-26, t1-22, t1-18, t1-47, t1-14, 
> t1-43, t1-10, t1-39, t1-6, t1-35, t1-2, t1-31, t1-27, t1-23, t1-19, t1-48, 
> t1-15, t1-44, t1-11, t1-40, t1-7, t1-36, t1-3, t1-32, t1-28, t1-24, t1-20, 
> t1-49, t1-16, t1-45, t1-12, t1-41, t1-8, t1-37, t1-4, t1-33, t1-0) 
> (kafka.server.ReplicaFetcherManager)
> {code}
> Complete log:
> https://gist.github.com/showuon/665aa3ce6afd59097a2662f8260ecc10
> Steps:
> 1. start up a 3.6 kafka cluster in KRaft with 1 broker
> 2. create a topic
> 3. upgrade the binary to 3.7
> 4. use kafka-features.sh to upgrade to 3.7 metadata version
> 5. check the log (and metrics if interested)
> Analysis:
> In 3.7, we have JBOD support in KRaft, so the partitionRegistration added a 
> new directory field. And it causes diff found while comparing delta. We might 
> be able to identify this adding directory change doesn't need to reset the 
> leader/follower state, and just update the metadata, to avoid causing 
> unavailability. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15734) KRaft support in BaseConsumerTest

2024-01-16 Thread Christo Lolov (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christo Lolov resolved KAFKA-15734.
---
Resolution: Fixed

> KRaft support in BaseConsumerTest
> -
>
> Key: KAFKA-15734
> URL: https://issues.apache.org/jira/browse/KAFKA-15734
> Project: Kafka
>  Issue Type: Task
>  Components: core
>Reporter: Sameer Tejani
>Assignee: Sushant Mahajan
>Priority: Minor
>  Labels: kraft, kraft-test, newbie
>
> The following tests in BaseConsumerTest in 
> core/src/test/scala/integration/kafka/api/BaseConsumerTest.scala need to be 
> updated to support KRaft
> 38 : def testSimpleConsumption(): Unit = {
> 57 : def testClusterResourceListener(): Unit = {
> 78 : def testCoordinatorFailover(): Unit = {
> Scanned 125 lines. Found 0 KRaft tests out of 3 tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] 3.7.0 RC2

2024-01-16 Thread Stanislav Kozlovski
Hi Kirk,

Given we are going to have to roll a new RC anyway, and the change is so
simple - might as well get it in!

On Mon, Jan 15, 2024 at 8:26 PM Kirk True  wrote:

> Hi Stanislav,
>
> On Sun, Jan 14, 2024, at 1:17 PM, Stanislav Kozlovski wrote:
> > Hey Kirk and Chris,
> >
> > Unless I'm missing something - KAFKALESS-16029 is simply a bad log due to
> > improper closing. And the PR description implies this has been present
> > since 3.5. While annoying, I don't see a strong reason for this to block
> > the release.
>
> I would imagine that it would result in concerned users reporting the
> issue.
>
> I took another look, and the code that causes the issue was indeed changed
> in 3.7. It is easily reproducible.
>
> The PR is ready for review: https://github.com/apache/kafka/pull/15186
>
> Thanks,
> Kirk



-- 
Best,
Stanislav