On Mon, 19 Feb 2024 at 15:35, Tom Lane <t...@sss.pgh.pa.us> wrote: > > I thought I'd try to acquire some actual facts here, so I compared > the code coverage shown by "make check" as of HEAD, versus "make > check" after adding numeric_big to parallel_schedule. I saw the > following lines of numeric.c as being covered in the second run > and not the first: >
I don't think that tells the whole story. Code coverage only tells you whether a particular line of code has been hit, not whether it has been properly tested with all the values that might lead to different cases. For example: > sqrt_var(): > 10073 res_ndigits = res_weight + 1 - (-rscale - 1) / DEC_DIGITS; > To get code coverage of this line, all you need to do is ensure that sqrt_var() is called with rscale < -1 (which can only happen from the range-reduction code in ln_var()). You could do that by computing ln(1e50), which results in calling sqrt_var() with rscale = -2, causing that line of code in sqrt_var() to be hit. That would satisfy code coverage, but I would argue that you've only really tested that line of code properly if you also hit it with rscale = -3, and probably a few more values, to check that the round-down division is working as intended. Similarly, looking at commit 18a02ad2a5, the crucial code change was the following in power_var(): - val = Max(val, -NUMERIC_MAX_RESULT_SCALE); - val = Min(val, NUMERIC_MAX_RESULT_SCALE); val *= 0.434294481903252; /* approximate decimal result weight */ Any test that calls numeric_power() is going to hit those lines of code, but to see a failure, it was necessary to hit them with the absolute value of "val" greater than NUMERIC_MAX_RESULT_SCALE, which is why that commit added 2 new test cases to numeric_big, calling power_var() with "val" outside that range. Code coverage is never going to tell you whether or not that is being tested, since the code change was to delete lines. Even if that weren't the case, any line of code involving Min() or Max() has 2 branches, and code coverage won't tell you if you've hit both of them. > Pretty poor return on investment for the runtime consumed. I don't > object to adding something to numeric.sql that's targeted at adding > coverage for these (or anyplace else that's not covered), but let's > not just throw cycles at the problem. > I agree that blindly performing a bunch of large computations (just throwing cycles at the problem) is not a good approach to testing. And maybe that's a fair characterisation of parts of numeric_big. However, it also contains some fairly well-targeted tests intended to test specific edge cases that only occur with particular ranges of inputs, which don't necessarily show up as increased code coverage. So I think this requires more careful analysis. Simply deleting numeric_big and adding tests to numeric.sql until the same level of code coverage is achieved will not give the same level of testing. It's also worth noting that the cost of running numeric_big has come down very significantly over the last few years, as can be seen by running the current numeric_big script against old backends. There's a lot of random variation in the timings, but the trend is very clear: 9.5 1.641s 9.6 0.856s 10 0.845s 11 0.750s 12 0.760s 13 0.672s 14 0.430s 15 0.347s 16 0.336s Arguably, this is a useful benchmark to spot performance regressions and test proposed performance-improving patches. If I run "EXTRA_TESTS=numeric_big make check | grep 'ok ' | sort -nrk5", numeric_big is not in the top 10 most expensive tests (it's usually down at around 15'th place). Looking at the script itself, the addition, subtraction, multiplication and division tests at the top are probably pointless, since I would expect those operations to be tested adequately (and probably more thoroughly) by the transcendental test cases. In fact, I think it would probably be OK to delete everything above line 650, and just keep the bottom half of the script -- the pow(), exp(), ln() and log() tests, which cover various edge cases, as well as exercising basic arithmetic operations internally. We might want to check that I/O of large numerics is still being tested properly though. If we did that, numeric_big would be even further down the list of expensive tests, and I'd say it should be run by default. Regards, Dean