Hello Tom,
Thanks for your feedback,
I'd be rather unclear about what the actual feedback is, though. I'd
interpret it as "pg does not care much about code coverage". Most clients
are in the red on coverage.postgresql.org. I'd like pgbench at least to be
in the green, but it does not look that it will ever be the case.
The reason why the first iteration failed was that it was insufficiently
insensitive to timing.
Yes.
So the problem for a replacement patch is "how do you test fundamentally
timing-sensitive behavior in a timing-insensitive way?".
The answer is that it depends on the test objective, and there can be no
miracle because the test environment is not controlled, in particular it
cannot expect the execution to be responsive (i.e. the host not to be
overloaded for some reason).
As written in the added test comments, these tests mostly aim at coverage,
so the time-related part checks are loose, although real.
It's not really clear to me that that's possible, so I don't have a lot
of faith that this patch wouldn't fail as well.
AFAICR I think that it is pretty unlikely to fail.
Many other pg test can fail but mostly don't: some rely on random stuff,
and you can get unlucky once in a while; Other events can occur such as
file system full or whatever.
I'm also a bit disturbed that the patch seems to be changing pgbench's
behavior for no reason other than to try to make the test produce the
same results no matter what the actual machine performance is ... which
seems, at best, rather contrary to pgbench's real purpose.
No, not exactly.
The behavior change is to remove a corner-case optimization which creates
an exception to -T/-P compliance: when under very low test rate (with -R)
and -T and -P, pgbench shortens the run (i.e. end before the prescribed
time) if there is nothing to do in the future, so that progress is not
shown.
This optimization makes it impossible to test that pgbench complies to -T
and -P, because it does not always complies. Using -R is useful to avoid
too much assumption on the test host load.
So I wonder, are those behavioral changes a net win from a user's
standpoint?
Would a user complain that they asked for -T 10 -P 1 but the test ran for
10 seconds and the got 10 progress reports along the way?
The net win is that the feature they asked for has been tested a little
before they actually ran it. It is small, but better than nothing.
If not we're letting the tail wag the dog. (If they are a win, they
ought to be submitted and defended independently, and maybe there ought
to be some documentation changes alongside.)
The shorten execution time is a non documented corner case that nobody
really expects as a feature, IMHO. It is a side effect of the
implementation. I do not think it is worth documenting.
I'm not against having better test coverage numbers, but it's a means
to an end not an end in itself.
Sure, numbers are not an end in themselves.
For me what is not tested should not be expected to work, so I like to
have most/all lines run at least once by some tests, as a minimal
insurance that it is not completely broken… which means at least green.
It has to be balanced against the amount of effort to be put into
testing (as opposed to actually improving our software).
I'm all for balanced and meaningful testing, obviously.
However, the current balance results in the coverage status being abysmal.
I do not think it is the right one.
--
Fabien.