No. We started using gradle analysis starting July 12th. Prior to that, the only data that we have is coming from Apache CI which AFAIK doesn't have a per-test history view - https://ci-builds.apache.org/job/Kafka/job/kafka/job/trunk/
-- Divij Vaidya On Wed, Aug 2, 2023 at 1:04 AM Kirk True <k...@kirktrue.pro> wrote: > > Hi Divij, > > Thanks for the pointer to Gradle Enterprise! That’s exactly what I was > looking for. > > Did we track builds before July 12? I see only tiny blips of failures on the > 90-day view. > > Thanks, > Kirk > > > On Jul 26, 2023, at 2:08 AM, Divij Vaidya <divijvaidy...@gmail.com> wrote: > > > > Hi Kirk > > > > I have been using this new tool to analyze the trends of test > > failures: > > https://ge.apache.org/scans/tests?search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.timeZoneId=Europe/Berlin > > and general build failures: > > https://ge.apache.org/scans/failures?search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.timeZoneId=Europe/Berlin > > > > About the classes of build failure, if we look at the last 28 days, I > > do not observe an increasing trend. The top causes of failure are: > > (link [2]) > > 1. Failures due to checkstyle (193 builds) > > 2. Timeout waiting to lock cache. It is currently in-use by another > > Gradle instance. > > 3. Compilation failures (116 builds) > > 4. "Gradle Test Executor" finished with a non-zero exit value. Process > > 'Gradle Test Executor 180' finished with non-zero exit value 1 > > > > #4 is caused by a test failure that causes a crash of the Gradle > > process. To debug this, I usually go to complete test output and try > > to figure out which was the last test that 'Gradle Test Executor 180' > > was running. As an example, consider > > https://ge.apache.org/s/luizhogirob4e. We observe that this fails for > > PR-14094. Now, we need to see the complete system out. To find that, I > > will go to Kafka PR builder at > > https://ci-builds.apache.org/job/Kafka/job/kafka-pr/view/change-requests/ > > and find the build page for PR-14094. That page is > > https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-14094/. > > Next, find last failed build at > > https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-14094/lastFailedBuild/ > > , observe that we have a failure for "Gradle Test Executor 177", click > > on view as plain text (it takes a long time to load), find what the > > GradleTest Executor was doing. In this case, it failed with the > > following error. I strongly believe that it is due to > > https://github.com/apache/kafka/pull/13572 but unfortunately, this was > > reverted and never fixed after that. Perhaps you might want to re > > > > Gradle Test Run :core:integrationTest > Gradle Test Executor 177 > > > ProducerFailureHandlingTest > testTooLargeRecordWithAckZero() STARTED > > > >> Task :clients:integrationTest FAILED > > org.gradle.internal.remote.internal.ConnectException: Could not > > connect to server [bd7b0504-7491-43f8-a716-513adb302c92 port:43321, > > addresses:[/127.0.0.1]]. Tried addresses: [/127.0.0.1]. > > at > > org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.connect(TcpOutgoingConnector.java:67) > > at > > org.gradle.internal.remote.internal.hub.MessageHubBackedClient.getConnection(MessageHubBackedClient.java:36) > > at > > org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:103) > > at > > org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:65) > > at > > worker.org.gradle.process.internal.worker.GradleWorkerMain.run(GradleWorkerMain.java:69) > > at > > worker.org.gradle.process.internal.worker.GradleWorkerMain.main(GradleWorkerMain.java:74) > > Caused by: java.net.ConnectException: Connection refused > > at java.base/sun.nio.ch.Net.pollConnect(Native Method) > > at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672) > > at > > java.base/sun.nio.ch.SocketChannelImpl.finishTimedConnect(SocketChannelImpl.java:1141) > > at > > java.base/sun.nio.ch.SocketChannelImpl.blockingConnect(SocketChannelImpl.java:1183) > > at java.base/sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:98) > > at > > org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.tryConnect(TcpOutgoingConnector.java:81) > > at > > org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.connect(TcpOutgoingConnector.java:54) > > ... 5 more > > > > > > > > > > About the classes of test failure problems, if we look at the last 28 > > days, the following tests are the biggest culprits. If we fix just > > these two, our CI would be in a much better shape. (link [1]) > > 1. https://issues.apache.org/jira/browse/KAFKA-15197 (this test passes > > only 53% of the time) > > 2. https://issues.apache.org/jira/browse/KAFKA-15052 (this test passes > > only 49% of the time) > > > > > > [1] > > https://ge.apache.org/scans/tests?search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.timeZoneId=Europe/Berlin > > [2] > > https://ge.apache.org/scans/failures?search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.timeZoneId=Europe/Berlin > > > > > > -- > > Divij Vaidya > > > > On Tue, Jul 25, 2023 at 8:09 PM Kirk True <k...@kirktrue.pro> wrote: > >> > >> Hi all! > >> > >> I’ve noticed that we’re back in the state where it’s tough to get a clean > >> PR Jenkins test run. Spot checking the top ~10 pull request runs show this > >> doesn’t appear to be an issue with just my PRs :P > >> > >> I know we have some chronic flaky tests, but I’ve seen at least two other > >> classes of problems: > >> > >> 1. Jenkins test runners hanging and eventually timing out > >> 2. Intra Jenkins-container/pod/VM/machine/turtle communication issues > >> > >> How do we go about diagnosing test runs that fail in such an opaque > >> fashion? > >> > >> Thanks! > >> Kirk >