Hello Tom,

Thanks for your feedback,

I'd be rather unclear about what the actual feedback is, though. I'd
interpret it as "pg does not care much about code coverage". Most clients
are in the red on coverage.postgresql.org. I'd like pgbench at least to be
in the green, but it does not look that it will ever be the case.

The reason why the first iteration failed was that it was insufficiently
insensitive to timing.

Yes.

So the problem for a replacement patch is "how do you test fundamentally timing-sensitive behavior in a timing-insensitive way?".

The answer is that it depends on the test objective, and there can be no miracle because the test environment is not controlled, in particular it cannot expect the execution to be responsive (i.e. the host not to be overloaded for some reason).

As written in the added test comments, these tests mostly aim at coverage, so the time-related part checks are loose, although real.

It's not really clear to me that that's possible, so I don't have a lot of faith that this patch wouldn't fail as well.

AFAICR I think that it is pretty unlikely to fail.

Many other pg test can fail but mostly don't: some rely on random stuff, and you can get unlucky once in a while; Other events can occur such as file system full or whatever.

I'm also a bit disturbed that the patch seems to be changing pgbench's
behavior for no reason other than to try to make the test produce the
same results no matter what the actual machine performance is ... which
seems, at best, rather contrary to pgbench's real purpose.

No, not exactly.

The behavior change is to remove a corner-case optimization which creates an exception to -T/-P compliance: when under very low test rate (with -R) and -T and -P, pgbench shortens the run (i.e. end before the prescribed time) if there is nothing to do in the future, so that progress is not shown.

This optimization makes it impossible to test that pgbench complies to -T and -P, because it does not always complies. Using -R is useful to avoid too much assumption on the test host load.

So I wonder, are those behavioral changes a net win from a user's standpoint?

Would a user complain that they asked for -T 10 -P 1 but the test ran for 10 seconds and the got 10 progress reports along the way?

The net win is that the feature they asked for has been tested a little before they actually ran it. It is small, but better than nothing.

If not we're letting the tail wag the dog. (If they are a win, they ought to be submitted and defended independently, and maybe there ought to be some documentation changes alongside.)

The shorten execution time is a non documented corner case that nobody really expects as a feature, IMHO. It is a side effect of the implementation. I do not think it is worth documenting.

I'm not against having better test coverage numbers, but it's a means
to an end not an end in itself.

Sure, numbers are not an end in themselves.

For me what is not tested should not be expected to work, so I like to have most/all lines run at least once by some tests, as a minimal insurance that it is not completely broken… which means at least green.

It has to be balanced against the amount of effort to be put into testing (as opposed to actually improving our software).

I'm all for balanced and meaningful testing, obviously.

However, the current balance results in the coverage status being abysmal. I do not think it is the right one.

--
Fabien.

Reply via email to