Re: [PR] KAFKA-15378: fix streams upgrade system test [kafka]

2023-10-20 Thread via GitHub


mjsax commented on PR #14539:
URL: https://github.com/apache/kafka/pull/14539#issuecomment-1773534318

   Just looked into the branch builder results for 
`test_compatible_brokers_eos_v2_enabled` 2.6.3 in more details.
   
   Broker log shows:
   ```
   bash: /opt/kafka-2.6.3/bin/kafka-server-start.sh: No such file or directory
   ```
   
   And the console output show
   ```
   worker4: + get_kafka 2.6.2 2.12
   17:45:33 worker4: + version=2.6.2
   17:45:33 worker4: + scala_version=2.12
   17:45:33 worker4: + kafka_dir=/opt/kafka-2.6.2
   17:45:33 worker4: + 
url=https://s3-us-west-2.amazonaws.com/kafka-packages/kafka_2.12-2.6.2.tgz
   17:45:33 worker4: + 
url_streams_test=https://s3-us-west-2.amazonaws.com/kafka-packages/kafka-streams-2.6.2-test.jar
   17:45:33 worker4: + '[' '!' -d /opt/kafka-2.6.2 ']'
   17:45:33 worker4: /tmp /opt/jdk/8
   17:45:33 worker4: + pushd /tmp
   17:45:33 worker4: + curl --retry 5 -O 
https://s3-us-west-2.amazonaws.com/kafka-packages/kafka_2.12-2.6.2.tgz
   ```
   
   The `Dockerfile` does use `2.6.3` though -- not sure where `2.6.2` come 
from? Can it be that this PR should have been rebased to pickup some 
`Dockerfile` updates I did recently 
(https://github.com/apache/kafka/commit/cdf726fd358f9be3438ceefb01073ab40a31a8b4)
   
   Maybe we should keep observing `trunk` runs and see what it does... Given 
that it's always 2.6.2, 2.7.3, and 3.3.2 that failed above, and that's exactly 
the versions the other PR bumped, I see a clear relationship.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-15378: fix streams upgrade system test [kafka]

2023-10-20 Thread via GitHub


mjsax commented on PR #14539:
URL: https://github.com/apache/kafka/pull/14539#issuecomment-1773526503

   Merged to `trunk` and cherry-picked to `3.6`, `3.5`, and `3.4` branches.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-15378: fix streams upgrade system test [kafka]

2023-10-20 Thread via GitHub


mjsax commented on PR #14539:
URL: https://github.com/apache/kafka/pull/14539#issuecomment-1773521845

   For example, just re-run:
   ```
   $ 
TC_PATHS="tests/kafkatest/tests/streams/streams_broker_compatibility_test.py::StreamsBrokerCompatibility.test_compatible_brokers_eos_v2_enabled"
 bash tests/docker/run_tests.sh
   
   [...]
   
   

   SESSION REPORT (ALL TESTS)
   ducktape version: 0.11.4
   session_id:   2023-10-20--004
   run time: 5 minutes 50.448 seconds
   tests run:8
   passed:   7
   flaky:0
   failed:   1
   ignored:  0
   

   
   [...]
   ```
   
   Only the run for 2.6.3 failed. Looking into the test failure, the issue was 
that the broker did not startup on time, and it run into a test timeout. Broker 
log 
   shows: 
   ```
   Error: Exception thrown by the agent : java.rmi.server.ExportException: Port 
already in use: 9192; nested exception is: 
java.net.BindException: Address already in use (Bind failed)
   ```
   
   So I re-run just this single configuration for 2.6.3 and it passed 
afterwards.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-15378: fix streams upgrade system test [kafka]

2023-10-20 Thread via GitHub


mjsax merged PR #14539:
URL: https://github.com/apache/kafka/pull/14539


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-15378: fix streams upgrade system test [kafka]

2023-10-20 Thread via GitHub


mjsax commented on PR #14539:
URL: https://github.com/apache/kafka/pull/14539#issuecomment-1773506747

   I did not run all of them locally yet... the upgrade tests, and cooperative 
rebalancing ones only.
   
   I am running them on my Mac, macOS Monterey (12.7), 2.3GHz 8-Core Intel i9 
-- 32GB DDR4
   
   I often modify the test python code to run a single configuration only and 
run a single test case (ie python method). Otherwise I don't change anything.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-15378: fix streams upgrade system test [kafka]

2023-10-20 Thread via GitHub


mimaison commented on PR #14539:
URL: https://github.com/apache/kafka/pull/14539#issuecomment-1772436472

   @mjsax Can you share the TC_PATHS, ducktape options and specs of the 
machines you used to run the system tests? I'm really having troubles getting 
any of them pass regularly in my environment. Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-15378: fix streams upgrade system test [kafka]

2023-10-19 Thread via GitHub


mjsax commented on PR #14539:
URL: https://github.com/apache/kafka/pull/14539#issuecomment-1771437783

   Test failures. Much fewer than before. A few test (ie, "From version" seems 
to be "stable" while others are not -- will keep digging.
   
   - test_upgrade_to_cooperative_rebalance
 - 0.11.0.3 (passed before)
 - 1.0.2 (failed again)
 - 1.1.1 (failed again)
 - 2.3.1 (failed again)
   - test_app_upgrade
 - 2.6.3 / full (failed again)
 - 2.7.2 / full (failed again)
 - 3.3.2 / full (failed again)
   - test_rolling_upgrade_with_2_bounces
 - 2.6.3 (failed again)
 - 2.7.2 (failed again)
 - 3.3.2 (failed again)
   - test_compatible_brokers_eos_alpha_enabled
 - 2.6.3 (failed again)
 - 2.7.2 (failed again)
 - 3.3.2 (failed again)
   - test_compatible_brokers_eos_disabled
- 2.6.3 (failed again)
- 2.7.2 (failed again)
- 3.3.2 (failed again)
   - test_compatible_brokers_eos_v2_enabled
 - 2.6.3 (failed again)
 - 2.7.2 (failed again)
 - 3.3.2 (failed again)
 
 Triggered Jenkin re-run to get a clean build. Plan to merge afterwards.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-15378: fix streams upgrade system test [kafka]

2023-10-19 Thread via GitHub


mimaison commented on PR #14539:
URL: https://github.com/apache/kafka/pull/14539#issuecomment-1770296253

   I agree, it seems it may take a while to fix all these failures so let's 
merge these PRs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-15378: fix streams upgrade system test [kafka]

2023-10-18 Thread via GitHub


mjsax commented on PR #14539:
URL: https://github.com/apache/kafka/pull/14539#issuecomment-1769719774

   The following system tests failed:
- test_upgrade_to_cooperative_rebalance
  - 0.10.1.1
  - 0.10.2.2
  - 1.0.2
  - 1.1.1
  - 2.0.1
  - 2.3.1
- test_app_upgrade
  - 2.6.3 / full
  - 2.7.2 / full
  - 3.3.2 / full 
- test_rolling_upgrade_with_2_bounces
  - 0.10.0.1
  - 0.10.1.1
  - 0.10.2.2
  - 0.11.0.3
  - 1.0.2
  - 2.6.3
  - 2.7.2
  - 3.3.2
- test_broker_type_bounce
  - "broker_type": "controller", "failure_mode": "hard_shutdown", 
"metadata_quorum": "ZK",
  - "broker_type": "leader", "failure_mode": "hard_shutdown", 
"metadata_quorum": "ISOLATED_KRAFT",
  - "broker_type": "leader", "failure_mode": "hard_shutdown", 
"metadata_quorum": "ZK",
- test_many_brokers_bounce
  - failure_mode": "clean_shutdown", "metadata_quorum": "ISOLATED_KRAFT",
  - "failure_mode": "clean_shutdown", "metadata_quorum": "ZK",
- test_compatible_brokers_eos_alpha_enabled
  - 2.6.3
  - 2.7.2
  - 3.3.2
- test_compatible_brokers_eos_disabled
  - 2.6.3
  - 2.7.2
  - 3.3.2
- test_compatible_brokers_eos_v2_enabled
  - 2.6.3
  - 2.7.2
  - 3.3.2
  
   Overall, we are not in good shape :(
   
   Triggered a re-run to see what is noise: 
https://jenkins.confluent.io/job/system-test-kafka-branch-builder/5896/
   
   But I actually believe, we might want to merge this PR as-is to unblock 
Mickeal's PR, and tackle each of these test one-by-one as follow up work? 
Thoughts? @mimaison @guozhangwang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-15378: fix streams upgrade system test [kafka]

2023-10-18 Thread via GitHub


mimaison commented on code in PR #14539:
URL: https://github.com/apache/kafka/pull/14539#discussion_r1363597688


##
tests/kafkatest/tests/streams/streams_upgrade_test.py:
##
@@ -40,11 +40,13 @@
 metadata_1_versions = [str(LATEST_0_10_0)]
 metadata_2_versions = [str(LATEST_0_10_1), str(LATEST_0_10_2), 
str(LATEST_0_11_0), str(LATEST_1_0), str(LATEST_1_1),
str(LATEST_2_4), str(LATEST_2_5), str(LATEST_2_6), 
str(LATEST_2_7), str(LATEST_2_8),
-   str(LATEST_3_0)]
-# upgrading from version (2.4...3.0) is broken and only fixed later in 3.1
-# we cannot test two bounce rolling upgrade because we know it's broken
-# instead we add version 2.4...3.0 to the `metadata_2_versions` upgrade list
-fk_join_versions = [str(LATEST_3_1), str(LATEST_3_2), str(LATEST_3_3)]
+   str(LATEST_3_0), str(LATEST_3_1), str(LATEST_3_2), 
str(LATEST_3_3)]
+# upgrading from version (2.4...3.3) is broken and only fixed later in 3.3.3 
(unreleased) and 3.4.0
+# -> https://issues.apache.org/jira/browse/KAFKA-14646
+# thus, we cannot test two bounce rolling upgrade because we know it's broken
+# instead we add version 2.4...3.3 to the `metadata_2_versions` upgrade list
+#fk_join_versions = [str(LATEST_3_4)]

Review Comment:
   Noted, I'll do that once this is merged. Thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-15378: fix streams upgrade system test [kafka]

2023-10-17 Thread via GitHub


mjsax commented on code in PR #14539:
URL: https://github.com/apache/kafka/pull/14539#discussion_r1363148136


##
tests/kafkatest/tests/streams/streams_upgrade_test.py:
##
@@ -40,11 +40,13 @@
 metadata_1_versions = [str(LATEST_0_10_0)]
 metadata_2_versions = [str(LATEST_0_10_1), str(LATEST_0_10_2), 
str(LATEST_0_11_0), str(LATEST_1_0), str(LATEST_1_1),
str(LATEST_2_4), str(LATEST_2_5), str(LATEST_2_6), 
str(LATEST_2_7), str(LATEST_2_8),
-   str(LATEST_3_0)]
-# upgrading from version (2.4...3.0) is broken and only fixed later in 3.1
-# we cannot test two bounce rolling upgrade because we know it's broken
-# instead we add version 2.4...3.0 to the `metadata_2_versions` upgrade list
-fk_join_versions = [str(LATEST_3_1), str(LATEST_3_2), str(LATEST_3_3)]
+   str(LATEST_3_0), str(LATEST_3_1), str(LATEST_3_2), 
str(LATEST_3_3)]
+# upgrading from version (2.4...3.3) is broken and only fixed later in 3.3.3 
(unreleased) and 3.4.0
+# -> https://issues.apache.org/jira/browse/KAFKA-14646
+# thus, we cannot test two bounce rolling upgrade because we know it's broken
+# instead we add version 2.4...3.3 to the `metadata_2_versions` upgrade list
+#fk_join_versions = [str(LATEST_3_4)]

Review Comment:
   @mimaison You will need to uncomment this, and also add 3.5 release to this 
list in your PR, and reenable the corresponding `@matix` annotation, too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-15378: fix streams upgrade system test [kafka]

2023-10-17 Thread via GitHub


mjsax commented on code in PR #14539:
URL: https://github.com/apache/kafka/pull/14539#discussion_r1363147275


##
tests/kafkatest/tests/streams/streams_broker_down_resilience_test.py:
##
@@ -100,7 +100,7 @@ def test_streams_runs_with_broker_down_initially(self, 
metadata_quorum):
 processor_3 = StreamsBrokerDownResilienceService(self.test_context, 
self.kafka, configs)
 processor_3.start()
 
-broker_unavailable_message = "Broker may not be available"
+broker_unavailable_message = "Node may not be available"

Review Comment:
   Log message was changed via 
https://github.com/apache/kafka/commit/fcac880fd54efbec3fe385000cf990a19972dafa



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-15378: fix streams upgrade system test [kafka]

2023-10-17 Thread via GitHub


mjsax commented on PR #14539:
URL: https://github.com/apache/kafka/pull/14539#issuecomment-1767576443

   Triggered a new system test build: 
https://jenkins.confluent.io/job/system-test-kafka-branch-builder/5891/
   
   Some tests seems to be flaky (cf 
https://github.com/apache/kafka/pull/13860#issuecomment-1767572941) -- let's 
see what the system test result is, and make a call to merge or add more 
fixes...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-15378: fix streams upgrade system test [kafka]

2023-10-17 Thread via GitHub


mjsax commented on PR #14539:
URL: https://github.com/apache/kafka/pull/14539#issuecomment-1767575130

   Rebased this PR to pick-up bug-fix 
https://github.com/apache/kafka/pull/14555 (bug was exposed via system test). 
-> Re-enable state-updater.
   
   Also added a fix for `streams_broker_down_resilience_test` that was broken 
by a recent commit 
(https://github.com/apache/kafka/commit/fcac880fd54efbec3fe385000cf990a19972dafa)
 which changed an expected log message.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-15378: fix streams upgrade system test [kafka]

2023-10-13 Thread via GitHub


mjsax commented on PR #14539:
URL: https://github.com/apache/kafka/pull/14539#issuecomment-1761796876

   Seems some system test still failed... Let me look into it and see if I can 
producer locally... I did run a few locally already which did pass... 🤔 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] KAFKA-15378: fix streams upgrade system test [kafka]

2023-10-12 Thread via GitHub


mjsax commented on PR #14539:
URL: https://github.com/apache/kafka/pull/14539#issuecomment-1760544857

   Triggered a system test run: 
https://jenkins.confluent.io/job/system-test-kafka-branch-builder/5884/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org