Re: [PR] KAFKA-16234: Log directory failure re-creates partitions in another logdir automatically [kafka]

2024-04-05 Thread via GitHub


showuon merged PR #15335:
URL: https://github.com/apache/kafka/pull/15335


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16234: Log directory failure re-creates partitions in another logdir automatically [kafka]

2024-04-05 Thread via GitHub


showuon commented on PR #15335:
URL: https://github.com/apache/kafka/pull/15335#issuecomment-2040991135

   Failed tests are unrelated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16234: Log directory failure re-creates partitions in another logdir automatically [kafka]

2024-04-05 Thread via GitHub


OmniaGM commented on code in PR #15335:
URL: https://github.com/apache/kafka/pull/15335#discussion_r1553431953


##
core/src/main/scala/kafka/cluster/Partition.scala:
##
@@ -289,10 +300,11 @@ class Partition(val topicPartition: TopicPartition,
 delayedOperations: DelayedOperations,
 metadataCache: MetadataCache,
 logManager: LogManager,
-alterIsrManager: AlterPartitionManager) extends Logging {
+alterIsrManager: AlterPartitionManager,
+@volatile private var _topicId: Option[Uuid] = None // TODO: 
merge topicPartition and _topicId into TopicIdPartition once TopicId persist in 
most of the code

Review Comment:
   there is a jira that will address this already which is 
[KAFKA-16212](https://issues.apache.org/jira/browse/KAFKA-16212) I'll update 
the comment 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16234: Log directory failure re-creates partitions in another logdir automatically [kafka]

2024-04-05 Thread via GitHub


OmniaGM commented on code in PR #15335:
URL: https://github.com/apache/kafka/pull/15335#discussion_r1553431953


##
core/src/main/scala/kafka/cluster/Partition.scala:
##
@@ -289,10 +300,11 @@ class Partition(val topicPartition: TopicPartition,
 delayedOperations: DelayedOperations,
 metadataCache: MetadataCache,
 logManager: LogManager,
-alterIsrManager: AlterPartitionManager) extends Logging {
+alterIsrManager: AlterPartitionManager,
+@volatile private var _topicId: Option[Uuid] = None // TODO: 
merge topicPartition and _topicId into TopicIdPartition once TopicId persist in 
most of the code

Review Comment:
   there is a jira that will be address this already which is 
[KAFKA-16212](https://issues.apache.org/jira/browse/KAFKA-16212) I'll update 
the comment 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16234: Log directory failure re-creates partitions in another logdir automatically [kafka]

2024-04-05 Thread via GitHub


OmniaGM commented on code in PR #15335:
URL: https://github.com/apache/kafka/pull/15335#discussion_r1553428780


##
core/src/main/scala/kafka/cluster/Partition.scala:
##
@@ -104,9 +104,19 @@ class DelayedOperations(topicPartition: TopicPartition,
 object Partition {
   private val metricsGroup = new KafkaMetricsGroup(classOf[Partition])
 
-  def apply(topicPartition: TopicPartition,
+  def apply(topicIdPartition: TopicIdPartition,
 time: Time,
 replicaManager: ReplicaManager): Partition = {
+Partition(

Review Comment:
   The plan is to use this apply in 
[KAFKA-16212](https://issues.apache.org/jira/browse/KAFKA-16212) which will be 
raised soon



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16234: Log directory failure re-creates partitions in another logdir automatically [kafka]

2024-04-05 Thread via GitHub


chia7712 commented on code in PR #15335:
URL: https://github.com/apache/kafka/pull/15335#discussion_r1552954260


##
core/src/main/scala/kafka/cluster/Partition.scala:
##
@@ -104,9 +104,19 @@ class DelayedOperations(topicPartition: TopicPartition,
 object Partition {
   private val metricsGroup = new KafkaMetricsGroup(classOf[Partition])
 
-  def apply(topicPartition: TopicPartition,
+  def apply(topicIdPartition: TopicIdPartition,
 time: Time,
 replicaManager: ReplicaManager): Partition = {
+Partition(

Review Comment:
   not sure whether we need this new `apply`. No callers have 
`TopicIdPartition` and hence they have to create `TopicIdPartition` to use this 
`apply`



##
core/src/main/scala/kafka/cluster/Partition.scala:
##
@@ -289,10 +300,11 @@ class Partition(val topicPartition: TopicPartition,
 delayedOperations: DelayedOperations,
 metadataCache: MetadataCache,
 logManager: LogManager,
-alterIsrManager: AlterPartitionManager) extends Logging {
+alterIsrManager: AlterPartitionManager,
+@volatile private var _topicId: Option[Uuid] = None // TODO: 
merge topicPartition and _topicId into TopicIdPartition once TopicId persist in 
most of the code

Review Comment:
   Can we add jira link to the comment? The reader can trace the updates easily.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16234: Log directory failure re-creates partitions in another logdir automatically [kafka]

2024-04-05 Thread via GitHub


OmniaGM commented on PR #15335:
URL: https://github.com/apache/kafka/pull/15335#issuecomment-2039474182

   > @OmniaGM , there is compilation error in jdk8_scala2.12 job. Could you 
have a look?
   > 
   > ```
   > [2024-04-04T09:19:51.266Z] [Error] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka-pr_PR-15335/core/src/test/scala/unit/kafka/server/ReplicaManagerTest.scala:4952:125:
 recursive value leaderTopicsDelta needs type
   > [2024-04-04T09:19:51.266Z] [Error] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka-pr_PR-15335/core/src/test/scala/unit/kafka/server/ReplicaManagerTest.scala:4953:17:
 value makeLeader is not a member of Any
   > [2024-04-04T09:19:51.266Z] [Error] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka-pr_PR-15335/core/src/test/scala/unit/kafka/server/ReplicaManagerTest.scala:4957:7:
 overloaded method value assertTrue with alternatives:
   > [2024-04-04T09:19:51.266Z]   (x$1: java.util.function.BooleanSupplier)Unit 

   > [2024-04-04T09:19:51.266Z]   (x$1: Boolean)Unit
   > [2024-04-04T09:19:51.266Z]  cannot be applied to (Any)
   > ```
   > 
   > 
https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15335/13/pipeline
   
   Fixed it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16234: Log directory failure re-creates partitions in another logdir automatically [kafka]

2024-04-04 Thread via GitHub


showuon commented on PR #15335:
URL: https://github.com/apache/kafka/pull/15335#issuecomment-2038496756

   @OmniaGM , there is compilation error in jdk8_scala2.12 job. Could you have 
a look?
   ```
   [2024-04-04T09:19:51.266Z] [Error] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka-pr_PR-15335/core/src/test/scala/unit/kafka/server/ReplicaManagerTest.scala:4952:125:
 recursive value leaderTopicsDelta needs type
   [2024-04-04T09:19:51.266Z] [Error] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka-pr_PR-15335/core/src/test/scala/unit/kafka/server/ReplicaManagerTest.scala:4953:17:
 value makeLeader is not a member of Any
   [2024-04-04T09:19:51.266Z] [Error] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka-pr_PR-15335/core/src/test/scala/unit/kafka/server/ReplicaManagerTest.scala:4957:7:
 overloaded method value assertTrue with alternatives:
   [2024-04-04T09:19:51.266Z]   (x$1: java.util.function.BooleanSupplier)Unit 

   [2024-04-04T09:19:51.266Z]   (x$1: Boolean)Unit
   [2024-04-04T09:19:51.266Z]  cannot be applied to (Any)
   ```
   
https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15335/13/pipeline


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16234: Log directory failure re-creates partitions in another logdir automatically [kafka]

2024-04-04 Thread via GitHub


OmniaGM commented on PR #15335:
URL: https://github.com/apache/kafka/pull/15335#issuecomment-2036695494

   Thanks, and sorry for the delay I was trying to find any test beside the 
system test to test the full scenario in this PR but I think proofing the main 
cause is enough for now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16234: Log directory failure re-creates partitions in another logdir automatically [kafka]

2024-04-04 Thread via GitHub


OmniaGM commented on PR #15335:
URL: https://github.com/apache/kafka/pull/15335#issuecomment-2036650723

   > This change LGTM! But I think we need to have tests for the scenario you 
described in JIRA, to make sure it won't happen again. Could you help add some 
of them? Maybe add in `ReplicaManagerTest`?
   
   I added a test that ensure that offline partition shouldn't create new 
partition when `ReplicaManager::getOrCreatePartition` is triggered. The system 
test in pr https://github.com/apache/kafka/pull/15409 should also cover the 
full flow for this fix 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16234: Log directory failure re-creates partitions in another logdir automatically [kafka]

2024-04-03 Thread via GitHub


OmniaGM commented on code in PR #15335:
URL: https://github.com/apache/kafka/pull/15335#discussion_r1549552216


##
core/src/main/scala/kafka/server/metadata/BrokerMetadataPublisher.scala:
##
@@ -289,13 +289,17 @@ class BrokerMetadataPublisher(
 try {
   // Start log manager, which will perform (potentially lengthy)
   // recovery-from-unclean-shutdown if required.
-  logManager.startup(metadataCache.getAllTopics())
-
-  // Delete partition directories which we're not supposed to have. We have
-  // to do this before starting ReplicaManager, so that the stray replicas
-  // don't block creation of new ones with different IDs but the same 
names.
-  // See KAFKA-14616 for details.
-  logManager.deleteStrayKRaftReplicas(brokerId, newImage.topics())
+  logManager.startup(
+metadataCache.getAllTopics(),
+isStray = (topicId, partition) => {
+  val tid = topicId.getOrElse {
+throw new RuntimeException(s"Partition $partition does not have a 
topic ID, " +
+  "which is not allowed when running in KRaft mode.")
+  }
+  Option(newImage.topics().getPartition(tid, partition.partition()))
+.exists(_.replicas.contains(brokerId))

Review Comment:
   You are right, the new `isStrayKraftReplica` is finding both cases but I 
guess I took Igor suggestion without giving it a second thought 
https://github.com/apache/kafka/pull/15335#discussion_r1512748010 will fix this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16234: Log directory failure re-creates partitions in another logdir automatically [kafka]

2024-03-29 Thread via GitHub


showuon commented on code in PR #15335:
URL: https://github.com/apache/kafka/pull/15335#discussion_r1544263703


##
core/src/main/scala/kafka/server/metadata/BrokerMetadataPublisher.scala:
##
@@ -289,13 +289,17 @@ class BrokerMetadataPublisher(
 try {
   // Start log manager, which will perform (potentially lengthy)
   // recovery-from-unclean-shutdown if required.
-  logManager.startup(metadataCache.getAllTopics())
-
-  // Delete partition directories which we're not supposed to have. We have
-  // to do this before starting ReplicaManager, so that the stray replicas
-  // don't block creation of new ones with different IDs but the same 
names.
-  // See KAFKA-14616 for details.
-  logManager.deleteStrayKRaftReplicas(brokerId, newImage.topics())
+  logManager.startup(
+metadataCache.getAllTopics(),
+isStray = (topicId, partition) => {
+  val tid = topicId.getOrElse {
+throw new RuntimeException(s"Partition $partition does not have a 
topic ID, " +
+  "which is not allowed when running in KRaft mode.")
+  }
+  Option(newImage.topics().getPartition(tid, partition.partition()))
+.exists(_.replicas.contains(brokerId))

Review Comment:
   In original `findStrayReplicas`, we'll treat 2 cases as stray:
   1. newImage doesn't contain the topic ID
   2. newImage contains the topicID, but doesn't include this replica
   
   Here, we only treat case (2) as stray, not (1). Could I know what's the 
reason?
   Also, why can't we invoke `findStrayReplicas` or `isStrayReplicas` after 
your change? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16234: Log directory failure re-creates partitions in another logdir automatically [kafka]

2024-03-26 Thread via GitHub


OmniaGM commented on PR #15335:
URL: https://github.com/apache/kafka/pull/15335#issuecomment-2020485401

   Hi @cmccabe can you have look into this please? This bug is critical and 
need to be resolved before 3.7.1 release 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16234: Log directory failure re-creates partitions in another logdir automatically [kafka]

2024-03-15 Thread via GitHub


soarez commented on PR #15335:
URL: https://github.com/apache/kafka/pull/15335#issuecomment-1999531297

   @OmniaGM can you resolve the conflicts for this PR? Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16234: Log directory failure re-creates partitions in another logdir automatically [kafka]

2024-03-12 Thread via GitHub


OmniaGM commented on code in PR #15335:
URL: https://github.com/apache/kafka/pull/15335#discussion_r1521445120


##
core/src/main/scala/kafka/log/LogManager.scala:
##
@@ -355,10 +355,11 @@ class LogManager(logDirs: Seq[File],
 } else if (logDir.getName.endsWith(UnifiedLog.StrayDirSuffix)) {
   addStrayLog(topicPartition, log)
   warn(s"Loaded stray log: $logDir")
-} else if (shouldBeStrayKraftLog(log)) {
-  // Mark the partition directories we're not supposed to have as stray. 
We have to do this
-  // during log load because topics may have been recreated with the same 
name while a disk
-  // was offline.
+} else if (isStray(log.topicId, topicPartition)) {
+  // Opposite of Zookeeper mode deleted topic in KRAFT mode can be 
recreated while it's not fully deleted from broker.
+  // As a result of this broker in KRAFT mode with one offline directory 
has no way to detect to-be-deleted replica in an offline directory earlier.
+  // However, broker need to mark the partition directories as stray 
during log load because topics may have been
+  // recreated with the same name while a log directory was offline.

Review Comment:
   updated with the suggestion



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16234: Log directory failure re-creates partitions in another logdir automatically [kafka]

2024-03-09 Thread via GitHub


soarez commented on code in PR #15335:
URL: https://github.com/apache/kafka/pull/15335#discussion_r1518545295


##
core/src/main/scala/kafka/log/LogManager.scala:
##
@@ -355,10 +355,11 @@ class LogManager(logDirs: Seq[File],
 } else if (logDir.getName.endsWith(UnifiedLog.StrayDirSuffix)) {
   addStrayLog(topicPartition, log)
   warn(s"Loaded stray log: $logDir")
-} else if (shouldBeStrayKraftLog(log)) {
-  // Mark the partition directories we're not supposed to have as stray. 
We have to do this
-  // during log load because topics may have been recreated with the same 
name while a disk
-  // was offline.
+} else if (isStray(log.topicId, topicPartition)) {
+  // Opposite of Zookeeper mode deleted topic in KRAFT mode can be 
recreated while it's not fully deleted from broker.
+  // As a result of this broker in KRAFT mode with one offline directory 
has no way to detect to-be-deleted replica in an offline directory earlier.
+  // However, broker need to mark the partition directories as stray 
during log load because topics may have been
+  // recreated with the same name while a log directory was offline.

Review Comment:
   
   
   ```suggestion
 // Unlike Zookeeper mode, which tracks pending topic deletions under a 
ZNode, KRaft is unable to prevent a topic from being recreated before every 
replica has been deleted.
 // A KRaft broker with an offline directory may be unable to detect it 
still holds a to-be-deleted replica, and can create a conflicting topic 
partition for a new incarnation of the topic in one of the remaining online 
directories.
 // So upon a restart in which the offline directory is back online we 
need to clean up the old replica directory.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16234: Log directory failure re-creates partitions in another logdir automatically [kafka]

2024-03-08 Thread via GitHub


OmniaGM commented on PR #15335:
URL: https://github.com/apache/kafka/pull/15335#issuecomment-1985757706

   @soarez, @pprovenzano and @showuon can you please have a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16234: Log directory failure re-creates partitions in another logdir automatically [kafka]

2024-03-08 Thread via GitHub


OmniaGM commented on code in PR #15335:
URL: https://github.com/apache/kafka/pull/15335#discussion_r1517756842


##
core/src/main/scala/kafka/server/metadata/BrokerMetadataPublisher.scala:
##
@@ -289,13 +289,10 @@ class BrokerMetadataPublisher(
 try {
   // Start log manager, which will perform (potentially lengthy)
   // recovery-from-unclean-shutdown if required.
-  logManager.startup(metadataCache.getAllTopics())
-
-  // Delete partition directories which we're not supposed to have. We have
-  // to do this before starting ReplicaManager, so that the stray replicas
-  // don't block creation of new ones with different IDs but the same 
names.
-  // See KAFKA-14616 for details.
-  logManager.deleteStrayKRaftReplicas(brokerId, newImage.topics())
+  logManager.startup(
+metadataCache.getAllTopics(),
+shouldBeStrayKraftLog = log => 
LogManager.isStrayKraftReplica(brokerId, newImage.topics(), log)

Review Comment:
   I updated this with the suggested comment. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16234: Log directory failure re-creates partitions in another logdir automatically [kafka]

2024-03-08 Thread via GitHub


OmniaGM commented on code in PR #15335:
URL: https://github.com/apache/kafka/pull/15335#discussion_r1517754815


##
core/src/main/scala/kafka/log/LogManager.scala:
##
@@ -354,6 +355,14 @@ class LogManager(logDirs: Seq[File],
 } else if (logDir.getName.endsWith(UnifiedLog.StrayDirSuffix)) {
   addStrayLog(topicPartition, log)
   warn(s"Loaded stray log: $logDir")
+} else if (shouldBeStrayKraftLog(log)) {
+  // Mark the partition directories we're not supposed to have as stray. 
We have to do this
+  // during log load because topics may have been recreated with the same 
name while a disk
+  // was offline.
+  // See KAFKA-16234, KAFKA-16157 and KAFKA-14616 for details.

Review Comment:
   Updated this comment



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16234: Log directory failure re-creates partitions in another logdir automatically [kafka]

2024-03-05 Thread via GitHub


soarez commented on code in PR #15335:
URL: https://github.com/apache/kafka/pull/15335#discussion_r1512748010


##
core/src/main/scala/kafka/server/metadata/BrokerMetadataPublisher.scala:
##
@@ -289,13 +289,10 @@ class BrokerMetadataPublisher(
 try {
   // Start log manager, which will perform (potentially lengthy)
   // recovery-from-unclean-shutdown if required.
-  logManager.startup(metadataCache.getAllTopics())
-
-  // Delete partition directories which we're not supposed to have. We have
-  // to do this before starting ReplicaManager, so that the stray replicas
-  // don't block creation of new ones with different IDs but the same 
names.
-  // See KAFKA-14616 for details.
-  logManager.deleteStrayKRaftReplicas(brokerId, newImage.topics())
+  logManager.startup(
+metadataCache.getAllTopics(),
+shouldBeStrayKraftLog = log => 
LogManager.isStrayKraftReplica(brokerId, newImage.topics(), log)

Review Comment:
   A suggestion, due to the following concerns:
   
   * LogManager shouldn't handle metadata records, these types shouldn't depend 
on each other. The metadata records should be handled here instead.
   * The argument name looks a bit strange, namely the 'should' and 'kraft' 
parts.
   
   ```suggestion
   isStray = (topicId, partition) => 
Option(newImage.topics().getPartition(topicId.getOrElse{
 throw new RuntimeException(s"Partition $partition does not have a 
topic ID, " +
   "which is not allowed when running in KRaft mode.")
   }, partition.partition())).exists(_.replicas.contains(brokerId))
   ```
   
   Perhaps LogManager should declare a type for this argument since it's 
propagated down the call stack several levels?



##
core/src/main/scala/kafka/log/LogManager.scala:
##
@@ -354,6 +355,14 @@ class LogManager(logDirs: Seq[File],
 } else if (logDir.getName.endsWith(UnifiedLog.StrayDirSuffix)) {
   addStrayLog(topicPartition, log)
   warn(s"Loaded stray log: $logDir")
+} else if (shouldBeStrayKraftLog(log)) {
+  // Mark the partition directories we're not supposed to have as stray. 
We have to do this
+  // during log load because topics may have been recreated with the same 
name while a disk
+  // was offline.
+  // See KAFKA-16234, KAFKA-16157 and KAFKA-14616 for details.

Review Comment:
   Perhaps it could help a future reader to clarify that kraft mode (as opposed 
to zk) does not track deleted topics nor prevent them from being re-created 
with the same name before every replica has been deleted, and so there's no way 
for a broker with a to-be-deleted replica in an offline directory to detect 
this earlier.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-16234: Log directory failure re-creates partitions in another logdir automatically [kafka]

2024-02-08 Thread via GitHub


OmniaGM commented on PR #15335:
URL: https://github.com/apache/kafka/pull/15335#issuecomment-1934674083

   @pprovenzano, @cmccabe @showuon Can one of you please have a look into this 
pr? If we can't cherry-pick this for 3.7.0 I strongly believe we should revert 
#15263
   cc: @stanislavkozlovski & @gaurav-narula 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org