Re: [PR] KAFKA-15378: fix streams upgrade system test [kafka]
mjsax commented on PR #14539: URL: https://github.com/apache/kafka/pull/14539#issuecomment-1773534318 Just looked into the branch builder results for `test_compatible_brokers_eos_v2_enabled` 2.6.3 in more details. Broker log shows: ``` bash: /opt/kafka-2.6.3/bin/kafka-server-start.sh: No such file or directory ``` And the console output show ``` worker4: + get_kafka 2.6.2 2.12 17:45:33 worker4: + version=2.6.2 17:45:33 worker4: + scala_version=2.12 17:45:33 worker4: + kafka_dir=/opt/kafka-2.6.2 17:45:33 worker4: + url=https://s3-us-west-2.amazonaws.com/kafka-packages/kafka_2.12-2.6.2.tgz 17:45:33 worker4: + url_streams_test=https://s3-us-west-2.amazonaws.com/kafka-packages/kafka-streams-2.6.2-test.jar 17:45:33 worker4: + '[' '!' -d /opt/kafka-2.6.2 ']' 17:45:33 worker4: /tmp /opt/jdk/8 17:45:33 worker4: + pushd /tmp 17:45:33 worker4: + curl --retry 5 -O https://s3-us-west-2.amazonaws.com/kafka-packages/kafka_2.12-2.6.2.tgz ``` The `Dockerfile` does use `2.6.3` though -- not sure where `2.6.2` come from? Can it be that this PR should have been rebased to pickup some `Dockerfile` updates I did recently (https://github.com/apache/kafka/commit/cdf726fd358f9be3438ceefb01073ab40a31a8b4) Maybe we should keep observing `trunk` runs and see what it does... Given that it's always 2.6.2, 2.7.3, and 3.3.2 that failed above, and that's exactly the versions the other PR bumped, I see a clear relationship. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-15378: fix streams upgrade system test [kafka]
mjsax commented on PR #14539: URL: https://github.com/apache/kafka/pull/14539#issuecomment-1773526503 Merged to `trunk` and cherry-picked to `3.6`, `3.5`, and `3.4` branches. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-15378: fix streams upgrade system test [kafka]
mjsax commented on PR #14539: URL: https://github.com/apache/kafka/pull/14539#issuecomment-1773521845 For example, just re-run: ``` $ TC_PATHS="tests/kafkatest/tests/streams/streams_broker_compatibility_test.py::StreamsBrokerCompatibility.test_compatible_brokers_eos_v2_enabled" bash tests/docker/run_tests.sh [...] SESSION REPORT (ALL TESTS) ducktape version: 0.11.4 session_id: 2023-10-20--004 run time: 5 minutes 50.448 seconds tests run:8 passed: 7 flaky:0 failed: 1 ignored: 0 [...] ``` Only the run for 2.6.3 failed. Looking into the test failure, the issue was that the broker did not startup on time, and it run into a test timeout. Broker log shows: ``` Error: Exception thrown by the agent : java.rmi.server.ExportException: Port already in use: 9192; nested exception is: java.net.BindException: Address already in use (Bind failed) ``` So I re-run just this single configuration for 2.6.3 and it passed afterwards. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-15378: fix streams upgrade system test [kafka]
mjsax merged PR #14539: URL: https://github.com/apache/kafka/pull/14539 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-15378: fix streams upgrade system test [kafka]
mjsax commented on PR #14539: URL: https://github.com/apache/kafka/pull/14539#issuecomment-1773506747 I did not run all of them locally yet... the upgrade tests, and cooperative rebalancing ones only. I am running them on my Mac, macOS Monterey (12.7), 2.3GHz 8-Core Intel i9 -- 32GB DDR4 I often modify the test python code to run a single configuration only and run a single test case (ie python method). Otherwise I don't change anything. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-15378: fix streams upgrade system test [kafka]
mimaison commented on PR #14539: URL: https://github.com/apache/kafka/pull/14539#issuecomment-1772436472 @mjsax Can you share the TC_PATHS, ducktape options and specs of the machines you used to run the system tests? I'm really having troubles getting any of them pass regularly in my environment. Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-15378: fix streams upgrade system test [kafka]
mjsax commented on PR #14539: URL: https://github.com/apache/kafka/pull/14539#issuecomment-1771437783 Test failures. Much fewer than before. A few test (ie, "From version" seems to be "stable" while others are not -- will keep digging. - test_upgrade_to_cooperative_rebalance - 0.11.0.3 (passed before) - 1.0.2 (failed again) - 1.1.1 (failed again) - 2.3.1 (failed again) - test_app_upgrade - 2.6.3 / full (failed again) - 2.7.2 / full (failed again) - 3.3.2 / full (failed again) - test_rolling_upgrade_with_2_bounces - 2.6.3 (failed again) - 2.7.2 (failed again) - 3.3.2 (failed again) - test_compatible_brokers_eos_alpha_enabled - 2.6.3 (failed again) - 2.7.2 (failed again) - 3.3.2 (failed again) - test_compatible_brokers_eos_disabled - 2.6.3 (failed again) - 2.7.2 (failed again) - 3.3.2 (failed again) - test_compatible_brokers_eos_v2_enabled - 2.6.3 (failed again) - 2.7.2 (failed again) - 3.3.2 (failed again) Triggered Jenkin re-run to get a clean build. Plan to merge afterwards. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-15378: fix streams upgrade system test [kafka]
mimaison commented on PR #14539: URL: https://github.com/apache/kafka/pull/14539#issuecomment-1770296253 I agree, it seems it may take a while to fix all these failures so let's merge these PRs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-15378: fix streams upgrade system test [kafka]
mjsax commented on PR #14539: URL: https://github.com/apache/kafka/pull/14539#issuecomment-1769719774 The following system tests failed: - test_upgrade_to_cooperative_rebalance - 0.10.1.1 - 0.10.2.2 - 1.0.2 - 1.1.1 - 2.0.1 - 2.3.1 - test_app_upgrade - 2.6.3 / full - 2.7.2 / full - 3.3.2 / full - test_rolling_upgrade_with_2_bounces - 0.10.0.1 - 0.10.1.1 - 0.10.2.2 - 0.11.0.3 - 1.0.2 - 2.6.3 - 2.7.2 - 3.3.2 - test_broker_type_bounce - "broker_type": "controller", "failure_mode": "hard_shutdown", "metadata_quorum": "ZK", - "broker_type": "leader", "failure_mode": "hard_shutdown", "metadata_quorum": "ISOLATED_KRAFT", - "broker_type": "leader", "failure_mode": "hard_shutdown", "metadata_quorum": "ZK", - test_many_brokers_bounce - failure_mode": "clean_shutdown", "metadata_quorum": "ISOLATED_KRAFT", - "failure_mode": "clean_shutdown", "metadata_quorum": "ZK", - test_compatible_brokers_eos_alpha_enabled - 2.6.3 - 2.7.2 - 3.3.2 - test_compatible_brokers_eos_disabled - 2.6.3 - 2.7.2 - 3.3.2 - test_compatible_brokers_eos_v2_enabled - 2.6.3 - 2.7.2 - 3.3.2 Overall, we are not in good shape :( Triggered a re-run to see what is noise: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/5896/ But I actually believe, we might want to merge this PR as-is to unblock Mickeal's PR, and tackle each of these test one-by-one as follow up work? Thoughts? @mimaison @guozhangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-15378: fix streams upgrade system test [kafka]
mimaison commented on code in PR #14539: URL: https://github.com/apache/kafka/pull/14539#discussion_r1363597688 ## tests/kafkatest/tests/streams/streams_upgrade_test.py: ## @@ -40,11 +40,13 @@ metadata_1_versions = [str(LATEST_0_10_0)] metadata_2_versions = [str(LATEST_0_10_1), str(LATEST_0_10_2), str(LATEST_0_11_0), str(LATEST_1_0), str(LATEST_1_1), str(LATEST_2_4), str(LATEST_2_5), str(LATEST_2_6), str(LATEST_2_7), str(LATEST_2_8), - str(LATEST_3_0)] -# upgrading from version (2.4...3.0) is broken and only fixed later in 3.1 -# we cannot test two bounce rolling upgrade because we know it's broken -# instead we add version 2.4...3.0 to the `metadata_2_versions` upgrade list -fk_join_versions = [str(LATEST_3_1), str(LATEST_3_2), str(LATEST_3_3)] + str(LATEST_3_0), str(LATEST_3_1), str(LATEST_3_2), str(LATEST_3_3)] +# upgrading from version (2.4...3.3) is broken and only fixed later in 3.3.3 (unreleased) and 3.4.0 +# -> https://issues.apache.org/jira/browse/KAFKA-14646 +# thus, we cannot test two bounce rolling upgrade because we know it's broken +# instead we add version 2.4...3.3 to the `metadata_2_versions` upgrade list +#fk_join_versions = [str(LATEST_3_4)] Review Comment: Noted, I'll do that once this is merged. Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-15378: fix streams upgrade system test [kafka]
mjsax commented on code in PR #14539: URL: https://github.com/apache/kafka/pull/14539#discussion_r1363148136 ## tests/kafkatest/tests/streams/streams_upgrade_test.py: ## @@ -40,11 +40,13 @@ metadata_1_versions = [str(LATEST_0_10_0)] metadata_2_versions = [str(LATEST_0_10_1), str(LATEST_0_10_2), str(LATEST_0_11_0), str(LATEST_1_0), str(LATEST_1_1), str(LATEST_2_4), str(LATEST_2_5), str(LATEST_2_6), str(LATEST_2_7), str(LATEST_2_8), - str(LATEST_3_0)] -# upgrading from version (2.4...3.0) is broken and only fixed later in 3.1 -# we cannot test two bounce rolling upgrade because we know it's broken -# instead we add version 2.4...3.0 to the `metadata_2_versions` upgrade list -fk_join_versions = [str(LATEST_3_1), str(LATEST_3_2), str(LATEST_3_3)] + str(LATEST_3_0), str(LATEST_3_1), str(LATEST_3_2), str(LATEST_3_3)] +# upgrading from version (2.4...3.3) is broken and only fixed later in 3.3.3 (unreleased) and 3.4.0 +# -> https://issues.apache.org/jira/browse/KAFKA-14646 +# thus, we cannot test two bounce rolling upgrade because we know it's broken +# instead we add version 2.4...3.3 to the `metadata_2_versions` upgrade list +#fk_join_versions = [str(LATEST_3_4)] Review Comment: @mimaison You will need to uncomment this, and also add 3.5 release to this list in your PR, and reenable the corresponding `@matix` annotation, too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-15378: fix streams upgrade system test [kafka]
mjsax commented on code in PR #14539: URL: https://github.com/apache/kafka/pull/14539#discussion_r1363147275 ## tests/kafkatest/tests/streams/streams_broker_down_resilience_test.py: ## @@ -100,7 +100,7 @@ def test_streams_runs_with_broker_down_initially(self, metadata_quorum): processor_3 = StreamsBrokerDownResilienceService(self.test_context, self.kafka, configs) processor_3.start() -broker_unavailable_message = "Broker may not be available" +broker_unavailable_message = "Node may not be available" Review Comment: Log message was changed via https://github.com/apache/kafka/commit/fcac880fd54efbec3fe385000cf990a19972dafa -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-15378: fix streams upgrade system test [kafka]
mjsax commented on PR #14539: URL: https://github.com/apache/kafka/pull/14539#issuecomment-1767576443 Triggered a new system test build: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/5891/ Some tests seems to be flaky (cf https://github.com/apache/kafka/pull/13860#issuecomment-1767572941) -- let's see what the system test result is, and make a call to merge or add more fixes... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-15378: fix streams upgrade system test [kafka]
mjsax commented on PR #14539: URL: https://github.com/apache/kafka/pull/14539#issuecomment-1767575130 Rebased this PR to pick-up bug-fix https://github.com/apache/kafka/pull/14555 (bug was exposed via system test). -> Re-enable state-updater. Also added a fix for `streams_broker_down_resilience_test` that was broken by a recent commit (https://github.com/apache/kafka/commit/fcac880fd54efbec3fe385000cf990a19972dafa) which changed an expected log message. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-15378: fix streams upgrade system test [kafka]
mjsax commented on PR #14539: URL: https://github.com/apache/kafka/pull/14539#issuecomment-1761796876 Seems some system test still failed... Let me look into it and see if I can producer locally... I did run a few locally already which did pass... 🤔 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-15378: fix streams upgrade system test [kafka]
mjsax commented on PR #14539: URL: https://github.com/apache/kafka/pull/14539#issuecomment-1760544857 Triggered a system test run: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/5884/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org