Re: Any ideas how to make dtests more stable and reproducible?

Dinesh Joshi Mon, 18 Mar 2019 23:23:03 -0700

Hi Stefan,

The dtests have been typically flaky but are more or less stable in the recent 
past. We are working towards stabilizing them. For the dev workflow locally, I 
typically end up running a subset of the dtests via the pytest runner. I am not 
sure how others run it.


I believe CircleCI results are the most consistent and accurate so far. You can 
refer to this[1] recent sample run. All tests passed. The CircleCI workflow has 
changed recently so it'll look different now but the point being that the tests 
are more or less stable. I would caution you if you're running on a free tier, 
it'll take a lot of time and the test results are unreliable as the free tier 
does not have enough resources. To compare your setup, we typically run the 
dtests in 100 CircleCI containers concurrently. Each container has 8 VCPUs and 
16GB RAM. The run takes 20-30 minutes depending on the resource availability.

Thanks,

Dinesh


[1] https://circleci.com/workflow-run/80804bb2-dafb-445a-acca-53401ca02806 
<https://circleci.com/workflow-run/80804bb2-dafb-445a-acca-53401ca02806>


> On Mar 18, 2019, at 5:46 PM, Stefan Miklosovic 
> <stefan.mikloso...@instaclustr.com> wrote:
> 
> Hi,
> 
> I am running large and "simple" dtests (executed via
> cassandra-builds/build-scripts/cassandra-dtest-pytest.sh) and I find myself
> quite frustrated as I do not know if there are errors because tests are
> flaky or there are legit issues which produced them.
> 
> It is "simple" to check it one by one when tests are stable and there is
> couple of them but when there are hundreds of tests, whole test run takes
> ~7 hours and it is not stable, it is like finding a needle in a haystack.
> Sometimes 15 tests fail, sometimes just 10 ... Sometimes there are
> timeouts, sometimes not.
> 
> For basic dtests I am getting stable three errors out of 900 I think which
> quite good. I supplied one patch here (1) so only two of them are failing
> now consistently (it is not merged yet).
> 
> Can you point me to your builds and what results you are getting there?
> Maybe something is wrong with my setup or these dtests are "expected" to be
> flaky from time to time?
> 
> What stability are you getting with official builds when it comes to
> dtests? How often they are run? As part of every pull request / change? Do
> you commit only on "0 dtests failed"?
> 
> Are there some recommendations as on what setup and machine these tests
> should run? I am running them on c5.9xlarge (36 cores with 64 GB or memory)
> on fairly recent Ubuntu with latest Java 8. I am trying to supply all
> needed parameters and libs in order to start Cassandra smoothly without any
> warnings / errors (there are these checks which check if your environment
> is all fine).
> 
> I am testing current trunk.
> 
> Thanks for any input how to make them more stable if there are some tips
> and tricks.
> 
> (1) https://github.com/apache/cassandra-dtest/pull/47
> 
> Stefan Miklosovic

Re: Any ideas how to make dtests more stable and reproducible?

Reply via email to