Ignite TC Bot created IGNITE-28685:
--------------------------------------
Summary: C++ thin client test IgniteClientUserThreadPoolSize is
flaky
Key: IGNITE-28685
URL: https://issues.apache.org/jira/browse/IGNITE-28685
Project: Ignite
Issue Type: Bug
Reporter: Ignite TC Bot
h2. Problem
TeamCity intermittently fails the C++ thin client test:
* Suite: Platform C++ CMake (Linux)
* Build: https://ci2.ignite.apache.org/viewLog.html?buildId=9060050
* Test: IgniteThinClientTest: IgniteClientTestSuite:
IgniteClientUserThreadPoolSize
* Observed duration: about 14 ms
The failure was seen on PR 13121, but that PR changes only notice/license
files, so it is very unlikely to be caused by the current PR code changes.
h2. Investigation
The test is implemented in:
* modules/platforms/cpp/thin-client-test/src/ignite_client_test.cpp
* method: IgniteClientTestSuiteFixture::CheckThreadsNum
* test case: IgniteClientUserThreadPoolSize
The test starts a fake thin-client server on the fixed port 11110, then
measures the total process thread count through /proc/<pid>/task before and
immediately after IgniteClient::Start. It expects the exact delta to be
userThreadPoolSize + 1 on Linux.
This is brittle because:
* the assertion uses the whole process thread count, not only the client/user
pool threads;
* startup/shutdown of native worker threads is asynchronous from the test point
of view;
* port 11110 is reused by many other thin-client tests and configs in the same
suite;
* base-branch history shows this test is flaky with a high failure rate.
Relevant code paths:
* modules/platforms/cpp/thin-client-test/src/ignite_client_test.cpp:
CheckThreadsNum / IgniteClientUserThreadPoolSize
* modules/platforms/cpp/thin-client-test/include/test_server.h: TestServer
* modules/platforms/cpp/thin-client/src/impl/data_router.cpp:
DataRouter::Connect / DataRouter::Close
* modules/platforms/cpp/common/src/common/thread_pool.cpp: ThreadPool::Start /
ThreadPool::Stop
* modules/platforms/cpp/common/os/linux/src/common/concurrent_os.cpp:
GetThreadsCount
h2. Proposed fix
Stabilize the test without changing production behavior:
# Start TestServer on an ephemeral port by constructing it with port 0.
# Expose TestServer::GetPort() and set IgniteClientConfiguration endpoints to
the actual bound port.
# Replace single immediate thread-count samples with bounded waits for the
expected count after client start and after client destruction.
A local draft patch already applies this shape:
* add uint16_t TestServer::GetPort() const in
modules/platforms/cpp/thin-client-test/include/test_server.h;
* in IgniteClientUserThreadPoolSize, use TestServer server(0), build endpoint
from server.GetPort(), and wait up to 5 seconds for thread count to settle.
h2. Expected result
The test should continue checking that the configured user thread pool size
affects the number of client worker threads, while avoiding fixed-port
collisions and scheduler-timing races.
h2. Validation
Run:
{code}
ctest -R IgniteThinClientTest --output-on-failure
{code}
or run the C++ thin-client test executable with:
{code}
ignite-thin-client-tests
--run_test=IgniteClientTestSuite/IgniteClientUserThreadPoolSize
--catch_system_errors=no --log_level=all
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)