[GitHub] [kafka] jsancio commented on pull request #12274: KAFKA-13959: Controller should unfence Broker with busy metadata log

2022-08-12 Thread GitBox


jsancio commented on PR #12274:
URL: https://github.com/apache/kafka/pull/12274#issuecomment-1213269855

   Build failed with test errors. Rerunning:
   ```
   [Build / JDK 8 and Scala 2.12 / 
org.apache.kafka.connect.mirror.integration.IdentityReplicationIntegrationTest.testOffsetSyncsTopicsOnTarget()](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-12274/9/testReport/junit/org.apache.kafka.connect.mirror.integration/IdentityReplicationIntegrationTest/Build___JDK_8_and_Scala_2_12___testOffsetSyncsTopicsOnTarget__/)
   [Build / JDK 8 and Scala 2.12 / 
org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationBaseTest.testReplication()](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-12274/9/testReport/junit/org.apache.kafka.connect.mirror.integration/MirrorConnectorsIntegrationBaseTest/Build___JDK_8_and_Scala_2_12___testReplication__/)
   [Build / JDK 8 and Scala 2.12 / 
kafka.server.KRaftClusterTest.testIncrementalAlterConfigs()](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-12274/9/testReport/junit/kafka.server/KRaftClusterTest/Build___JDK_8_and_Scala_2_12___testIncrementalAlterConfigs__/)
   [Build / JDK 11 and Scala 2.13 / 
org.apache.kafka.controller.QuorumControllerTest.testNoOpRecordWriteAfterTimeout()](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-12274/9/testReport/junit/org.apache.kafka.controller/QuorumControllerTest/Build___JDK_11_and_Scala_2_13___testNoOpRecordWriteAfterTimeout__/)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka] jsancio commented on pull request #12274: KAFKA-13959: Controller should unfence Broker with busy metadata log

2022-08-11 Thread GitBox


jsancio commented on PR #12274:
URL: https://github.com/apache/kafka/pull/12274#issuecomment-1212113291

   @dengziming take a look at the tests failures. All of the failures in 
`ClusterControlManagerTest` seems related to this change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka] jsancio commented on pull request #12274: KAFKA-13959: Controller should unfence Broker with busy metadata log

2022-08-05 Thread GitBox


jsancio commented on PR #12274:
URL: https://github.com/apache/kafka/pull/12274#issuecomment-1206920706

   @dengziming I marked this issue as a blocker to 3.3.0. Let me know if you 
want to work on this issue. I don't mind.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka] jsancio commented on pull request #12274: KAFKA-13959: Controller should unfence Broker with busy metadata log

2022-07-19 Thread GitBox


jsancio commented on PR #12274:
URL: https://github.com/apache/kafka/pull/12274#issuecomment-1189538831

   > @jsancio I think the last question is that there may be more than one 
registration record for each broker after restarting it, so can we rely on the 
broker epoch? I think I need some time to check the logic before make a final 
decision.
   
   Yes. I think you can rely on broker epoch and broker id. Also the active 
controller is guaranteed to have read all of the records on the log before 
handling RPCs like heartbeat.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka] jsancio commented on pull request #12274: KAFKA-13959: Controller should unfence Broker with busy metadata log

2022-07-18 Thread GitBox


jsancio commented on PR #12274:
URL: https://github.com/apache/kafka/pull/12274#issuecomment-1187345352

   > This is a good idea, however, the offset of each registration record is a 
soft state(not persist to metadata log)
   
   @dengziming The offset is persistent along with the record. This offset is 
provided to the inactive controllers in handleCommit. When processing 
handleCommit the inactive controllers can use the `Batch::lastOffset` of the 
batch that contains the registration record. When processing handleSnapshot the 
inactive controllers can use the `SnapshotReader::lastContainedLogOffset` of 
the snapshot if the snapshot contains registration records. What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka] jsancio commented on pull request #12274: KAFKA-13959: Controller should unfence Broker with busy metadata log

2022-07-13 Thread GitBox


jsancio commented on PR #12274:
URL: https://github.com/apache/kafka/pull/12274#issuecomment-1183190708

   > maybe we should allow the lastAppliedOffset to be behind by A records (or 
time) to unfence the broker.
   
   @dengziming Here are my thoughts
   
   ### Improve logic for unfencing brokers
   How about the controller unfence the broker when the broker's high-watermark 
has reached the broker registration record for that broker? The a controller 
first registers for a given incarnation, the controller write a broker 
registration record. The controller can remember this offset as returned by the 
raft client. The controller can unfence the broker when the broker's 
high-watermark is greater than this registration offset.
   
   ### Propagate the HWM to the replicas as quickly as possible.
   I think that the solution above it would allow us to de-prioritize this. 
Here are my observations anyways.
   
   Looking at the KafkaRaftClient implementation we would have to have an index 
for both the fetch offset and the last sent high-watermark for that replica.
   
   Another issue here is that we changed the KafkaRaftManager so that it 
doesn't set the replica id when it is an observer/broker. Since the HWM is not 
part of the Fetch request the leader would have to keep track of this in the 
LeaderState
   ```
 val nodeId = if (config.processRoles.contains(ControllerRole)) {
   OptionalInt.of(config.nodeId)
 } else {
   OptionalInt.empty()
 }
   ```
   
   We would need to find a better solution for 
https://issues.apache.org/jira/browse/KAFKA-13168 or improve the FETCH request 
so that it includes the HWM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka] jsancio commented on pull request #12274: KAFKA-13959: Controller should unfence Broker with busy metadata log

2022-07-07 Thread GitBox


jsancio commented on PR #12274:
URL: https://github.com/apache/kafka/pull/12274#issuecomment-1177949852

   > I havn't flesh it, the basic idea is shown in this PR, only read up to 
Isolation.COMMITTED when reading from observer and invoke 
fetchPurgatory.maybeComplete(highWatermark.offset, currentTimeMs) on 
onUpdateLeaderHighWatermark.
   
   I see. This would not work for co-located Kafka servers, right? Co-located 
Kafka server are servers that are running both a controller and a broker. In 
that case the replica will read uncommitted data and the leader will not send a 
FETCH response when the HW changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka] jsancio commented on pull request #12274: KAFKA-13959: Controller should unfence Broker with busy metadata log

2022-07-06 Thread GitBox


jsancio commented on PR #12274:
URL: https://github.com/apache/kafka/pull/12274#issuecomment-1176530495

   > In this PR I tried your suggestion and it does solve this problem, 
however, this will make the logic in RaftClient very complex and we need to 
save more states in LeaderState and it's also difficult to test
   
   @dengziming Do you have a diff for this solution? I am interested in this 
solution as it would work in both REMOTE and COLOCATED configuration for KRaft.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka] jsancio commented on pull request #12274: KAFKA-13959: Controller should unfence Broker with busy metadata log

2022-07-06 Thread GitBox


jsancio commented on PR #12274:
URL: https://github.com/apache/kafka/pull/12274#issuecomment-1176512896

   > > here is the code change: 
https://github.com/dengziming/kafka/tree/KAFKA-13959-2
   > 
   > Hey @dengziming, I took at look at the commits in this tree. Is this the 
only commit 
[dengziming@79dc8ec](https://github.com/dengziming/kafka/commit/79dc8ec423cd74fba462e934f89bdec3dcd8528d)?
 Can you maybe share a diff/compare. For example, something like 
[dengziming/kafka@30216ea...KAFKA-13959-2](https://github.com/dengziming/kafka/compare/30216ea1c58761e62f51af40033f24e3ae1c5c2a...KAFKA-13959-2)
   
   Never mind. I understand now. The broker sends the active controller the 
local LEO instead of the last applied offset by the broker listener. I think 
this will unfence the broker at startup even if the broker hasn't applied the 
snapshot or any of the log records, right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka] jsancio commented on pull request #12274: KAFKA-13959: Controller should unfence Broker with busy metadata log

2022-07-06 Thread GitBox


jsancio commented on PR #12274:
URL: https://github.com/apache/kafka/pull/12274#issuecomment-1176510158

   > here is the code change: 
https://github.com/dengziming/kafka/tree/KAFKA-13959-2
   
   Hey @dengziming, I took at look at the commits in this tree. Is this the 
only commit 
https://github.com/dengziming/kafka/commit/79dc8ec423cd74fba462e934f89bdec3dcd8528d?
 Can you maybe share a diff/compare. For example, something like 
https://github.com/dengziming/kafka/compare/30216ea1c58761e62f51af40033f24e3ae1c5c2a...KAFKA-13959-2
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org