John Roesler created KAFKA-12767:
------------------------------------
Summary: Properly set Streams system test runtime classpath
Key: KAFKA-12767
URL: https://issues.apache.org/jira/browse/KAFKA-12767
Project: Kafka
Issue Type: Task
Components: streams, system tests
Reporter: John Roesler
Some of the streams system tests started to fail recently when we stopped
exporting our transitive dependencies in the test jar.
[~lct45] was kind enough to submit [https://github.com/apache/kafka/pull/10631]
to get the system tests running again, but that PR is only a stop-gap.
The real solution is to properly package the transitive dependencies and make
them available to the system test runtime.
Here is the reason: PR#10631 gets past the issue by removing runtime usages on
Hamcrest, but Hamcrest is still present in the compiletime classpath. Tomorrow,
a new PR could add a reference to Hamcrest, and all the unit and integration
tests would pass, but we would again see the mysterious system test failures.
Only after another round of costly investigation would we realize the root
cause, and we might again decide to just patch the test to remove the reference.
It would be far better in the long run to fix the underlying condition: the
difference between the compiletime and runtime classpaths for the system tests.
To help get you started, I'll share some of the groundwork for this task, which
I put together while trying to understand the nature of the problem.
The first step is to actually copy the transitive dependencies. We had to stop
placing these dependencies in the `dependant-libs` build directory because that
results in us actually shipping those dependencies with our releases. Copying a
similar mechanism from the `:core` project, we can add a new build directory
(arbitrarily: `dependant-testlibs`), and again copy the runtime dependencies
there. Here is a commit in my fork that does just that:
[https://github.com/vvcephei/kafka/commit/8d4552dee05f2a963b8072b86aae756415ea2482]
The next step is to place those jars on the classpath of the system test code.
The mechanism for that is `kafka-run-class.sh`:
[https://github.com/apache/kafka/blob/trunk/bin/kafka-run-class.sh]
A specific example of this is the special logic for upgrade tests:
# If we are running upgrade tests, then we set the artifact directories to the
relevant version. Otherwise, we use the current build artifacts.
[https://github.com/apache/kafka/blob/fc405d792de12a50956195827eaf57bbf64444c9/bin/kafka-run-class.sh#L77-L85]
# The, here's where we specifically pull in Hamcrest from those build artifact
directories:
[https://github.com/apache/kafka/blob/fc405d792de12a50956195827eaf57bbf64444c9/bin/kafka-run-class.sh#L128-L136]
It seems to me that since Hamcrest is actually a more general dependency of the
tests, we might as well pack it up in `dependant-testlibs` and then pull it
into the classpath from there any time we're running tests. It looks like we
ought to set `streams_dependant_clients_lib_dir` to `dependant-testlibs` any
time `INCLUDE_TEST_JARS` is `true`. But if we do have
`UPGRADE_KAFKA_STREAMS_TEST_VERSION` set, then it should override the lib dir,
since those artifacts to copy over the transitive dependencies for those older
versions already.
Although the proposed fix itself is pretty small, I think the testing will take
a few days. You might want to just put some `echo` statements in
kafka-run-class.sh to see what jars are included on the classpath while you run
different tests, both locally using Docker, and remotely using Jenkins.
I marked this ticket as `newbie++` because you don't need deep knowledge of the
codebase to tackle this ticket, just a high pain tolerance for digging though
gradle/docker/bash script debugging to make sure the right bits make it into
the system tests.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)