[jira] [Commented] (KAFKA-16943) Synchronously verify Connect worker startup failure in InternalTopicsIntegrationTest

2024-06-25 Thread zhengke zhou (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17859846#comment-17859846
 ] 

zhengke zhou commented on KAFKA-16943:
--

[~ChrisEgerton] hi, I'm new to this community and hoping to begin this as my 
first step

> Synchronously verify Connect worker startup failure in 
> InternalTopicsIntegrationTest
> 
>
> Key: KAFKA-16943
> URL: https://issues.apache.org/jira/browse/KAFKA-16943
> Project: Kafka
>  Issue Type: Improvement
>  Components: connect
>Reporter: Chris Egerton
>Priority: Minor
>  Labels: newbie
> Attachments: code-diff.png
>
>
> Created after PR discussion 
> [here|https://github.com/apache/kafka/pull/16288#discussion_r1636615220].
> In some of our integration tests, we want to verify that a Connect worker 
> cannot start under poor conditions (such as when its internal topics do not 
> yet exist and it is configured to create them with a higher replication 
> factor than the number of available brokers, or when its internal topics 
> already exist but they do not have the compaction cleanup policy).
> This is currently not possible, and presents a possible gap in testing 
> coverage, especially for the test cases 
> {{testFailToCreateInternalTopicsWithMoreReplicasThanBrokers}} and 
> {{{}testFailToStartWhenInternalTopicsAreNotCompacted{}}}. It'd be nice if we 
> could have some way of synchronously awaiting the completion or failure of 
> worker startup in our integration tests in order to guarantee that worker 
> startup fails under sufficiently adverse conditions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16943) Synchronously verify Connect worker startup failure in InternalTopicsIntegrationTest

2024-06-20 Thread Chris Egerton (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17856531#comment-17856531
 ] 

Chris Egerton commented on KAFKA-16943:
---

[~ksolves.kafka] the idea here isn't to assert that no workers have started 
after a given timeout, it's to assert that one or more workers has attempted, 
failed, and aborted startup. We don't want to just wait for 30 seconds, see 
that no workers have started up, and then call that good enough, since startup 
may take longer than 30 seconds on our CI infrastructure (which can be pretty 
slow), and if startup does fail before the 30 seconds are up, it still forces 
us to wait that long, adding bloat to test runtime.

> Synchronously verify Connect worker startup failure in 
> InternalTopicsIntegrationTest
> 
>
> Key: KAFKA-16943
> URL: https://issues.apache.org/jira/browse/KAFKA-16943
> Project: Kafka
>  Issue Type: Improvement
>  Components: connect
>Reporter: Chris Egerton
>Priority: Minor
>  Labels: newbie
> Attachments: code-diff.png
>
>
> Created after PR discussion 
> [here|https://github.com/apache/kafka/pull/16288#discussion_r1636615220].
> In some of our integration tests, we want to verify that a Connect worker 
> cannot start under poor conditions (such as when its internal topics do not 
> yet exist and it is configured to create them with a higher replication 
> factor than the number of available brokers, or when its internal topics 
> already exist but they do not have the compaction cleanup policy).
> This is currently not possible, and presents a possible gap in testing 
> coverage, especially for the test cases 
> {{testFailToCreateInternalTopicsWithMoreReplicasThanBrokers}} and 
> {{{}testFailToStartWhenInternalTopicsAreNotCompacted{}}}. It'd be nice if we 
> could have some way of synchronously awaiting the completion or failure of 
> worker startup in our integration tests in order to guarantee that worker 
> startup fails under sufficiently adverse conditions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16943) Synchronously verify Connect worker startup failure in InternalTopicsIntegrationTest

2024-06-19 Thread Ksolves (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17856381#comment-17856381
 ] 

Ksolves commented on KAFKA-16943:
-

[~ChrisEgerton] Can you confirm my above understanding please?

> Synchronously verify Connect worker startup failure in 
> InternalTopicsIntegrationTest
> 
>
> Key: KAFKA-16943
> URL: https://issues.apache.org/jira/browse/KAFKA-16943
> Project: Kafka
>  Issue Type: Improvement
>  Components: connect
>Reporter: Chris Egerton
>Priority: Minor
>  Labels: newbie
> Attachments: code-diff.png
>
>
> Created after PR discussion 
> [here|https://github.com/apache/kafka/pull/16288#discussion_r1636615220].
> In some of our integration tests, we want to verify that a Connect worker 
> cannot start under poor conditions (such as when its internal topics do not 
> yet exist and it is configured to create them with a higher replication 
> factor than the number of available brokers, or when its internal topics 
> already exist but they do not have the compaction cleanup policy).
> This is currently not possible, and presents a possible gap in testing 
> coverage, especially for the test cases 
> {{testFailToCreateInternalTopicsWithMoreReplicasThanBrokers}} and 
> {{{}testFailToStartWhenInternalTopicsAreNotCompacted{}}}. It'd be nice if we 
> could have some way of synchronously awaiting the completion or failure of 
> worker startup in our integration tests in order to guarantee that worker 
> startup fails under sufficiently adverse conditions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16943) Synchronously verify Connect worker startup failure in InternalTopicsIntegrationTest

2024-06-18 Thread Ksolves (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855997#comment-17855997
 ] 

Ksolves commented on KAFKA-16943:
-

In our current test cases 
(`testFailToCreateInternalTopicsWithMoreReplicasThanBrokers` and 
`testFailToStartWhenInternalTopicsAreNotCompacted`), we attempt to verify that 
the Connect worker fails to start. However, our mechanism for verifying the 
startup failure lacks synchronous waiting and precise assertion.

Example Test Case: [Changes in existing test case marked in 
{color:#00875a}*green*{color}]
{code:java}
@Test
public void testFailToCreateInternalTopicsWithMoreReplicasThanBrokers() throws 
InterruptedException {
    workerProps.put(DistributedConfig.CONFIG_STORAGE_REPLICATION_FACTOR_CONFIG, 
"3");
    workerProps.put(DistributedConfig.OFFSET_STORAGE_REPLICATION_FACTOR_CONFIG, 
"2");
    workerProps.put(DistributedConfig.STATUS_STORAGE_REPLICATION_FACTOR_CONFIG, 
"1");
    int numWorkers = 0;
    int numBrokers = 1;
    connect = new EmbeddedConnectCluster.Builder().name("connect-cluster-1")
                                                  .workerProps(workerProps)
                                                  .numWorkers(numWorkers)
                                                  .numBrokers(numBrokers)
                                                  .brokerProps(brokerProps)
                                                  .build();

    // Start the brokers and Connect, but Connect should fail to create config 
and offset topic
    connect.start();
    log.info("Completed startup of {} Kafka broker. Expected Connect worker to 
fail", numBrokers);

    // Try to start a worker
    connect.addWorker();

    // Synchronously await and verify that the worker fails during startup
    boolean workerStarted = waitForWorkerStartupFailure(connect, 3); // 30 
seconds timeout
    assertFalse(workerStarted, "Worker should not have started successfully");

    log.info("Verifying the internal topics for Connect");
    connect.assertions().assertTopicsDoNotExist(configTopic(), offsetTopic());

    // Verify that no workers are running
    assertFalse(connect.anyWorkersRunning());
}

private boolean waitForWorkerStartupFailure(EmbeddedConnectCluster connect, 
long timeoutMillis) throws InterruptedException {
    long startTime = System.currentTimeMillis();
    while (System.currentTimeMillis() - startTime < timeoutMillis) {
        if (!connect.anyWorkersRunning()) {
            return false;
        }
        Thread.sleep(500); // wait for 500 milliseconds before checking again
    }
    return true;
} {code}
What changes do you suggest to improve this synchronous verification mechanism? 
I'll create the PR accordingly.

> Synchronously verify Connect worker startup failure in 
> InternalTopicsIntegrationTest
> 
>
> Key: KAFKA-16943
> URL: https://issues.apache.org/jira/browse/KAFKA-16943
> Project: Kafka
>  Issue Type: Improvement
>  Components: connect
>Reporter: Chris Egerton
>Priority: Minor
>  Labels: newbie
>
> Created after PR discussion 
> [here|https://github.com/apache/kafka/pull/16288#discussion_r1636615220].
> In some of our integration tests, we want to verify that a Connect worker 
> cannot start under poor conditions (such as when its internal topics do not 
> yet exist and it is configured to create them with a higher replication 
> factor than the number of available brokers, or when its internal topics 
> already exist but they do not have the compaction cleanup policy).
> This is currently not possible, and presents a possible gap in testing 
> coverage, especially for the test cases 
> {{testFailToCreateInternalTopicsWithMoreReplicasThanBrokers}} and 
> {{{}testFailToStartWhenInternalTopicsAreNotCompacted{}}}. It'd be nice if we 
> could have some way of synchronously awaiting the completion or failure of 
> worker startup in our integration tests in order to guarantee that worker 
> startup fails under sufficiently adverse conditions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)