Re: search for better regtest comparison algorithm

Han-Wen Nienhuys Mon, 29 Jul 2024 06:32:57 -0700

On Mon, Jul 29, 2024 at 12:10 PM Luca Fascione <l.fasci...@gmail.com> wrote:
> I was actually thinking about this situation upside-down from how you're 
> seeing it,
> details below
>
> On Mon, Jul 29, 2024 at 10:30 AM Han-Wen Nienhuys <hanw...@gmail.com> wrote:
>>
>> On Mon, Jul 29, 2024 at 8:56 AM Luca Fascione <l.fasci...@gmail.com> wrote:
>> > [shifts are] going to be some random, non-integer quantity, right?
>>
>> Yes, but since the comparison works on pixel images, you can't see the
>> non-integer part of the shift.
>>
>> > Also, the rasterization that gets performed, is it anti aliased?
>>
>> Usually it is; we could turn it off and possibly make the images
>> larger, but IIRC that slows things down.
>
>
> Actually my thought here is that if you have antialiasing on and you 
> translate the image
> my a small amount in x and/or y, you alter _all_ the non-white pixels: this 
> is because the
> renderer will account for the coverage of each object wrt each pixel slightly 
> differently,
> and change all the shades of gray it generates because of this.
> If you render a square that is 1 pxl to the side aligned with the raster you 
> get one black pixel.
> But if you translate it half pixel right and down you get 4 pixels each 0.25 
> grey (after
> unapplying gamma, assuming you're using "box 1 1" filtering).
> If you're trying to realign one image to the other in this scenario, you can 
> see it'll be quite annoying to
> do this and recover one image from the other (in my example going from first 
> image to second would
> work but going from second to first won't, the heart of the problem is that 
> you're double-convolving with
> antialiasing filter, instead of deconvolving).


I can see that antialiasing makes things needlessly difficult

For PS/GS, we have a setting (/GraphicsAlphaBits, /TextAlphaBits)
which we can set  to 0.

For Cairo, we don't do anything special, so we get the defaults, which
includes AA. You can switch it off, eg.
https://www.cairographics.org/manual/cairo-cairo-t.html#cairo-set-antialias.

>> > It would seem that though shifts and changes in the lengths of the staves 
>> > are "common", small and relatively benign problems, rotations and scales 
>> > (magnifications) should be considered major disasters, right?
>>
>> Rotations do not generally happen. Virtually all the positioning is
>> rectilinear, and scaling is also not common. What happens that objects
>> end up in different locations, and sometimes variable objects (slurs,
>> beams) have different sizes.
>
>
> Yes I meant the case that this would happen as a result of new defects 
> introduced in the code,
> not as requests from the source.
> In other words: if lilypond emits everything rotated 0.2 degrees clockwise,
> some major coding disaster has certainly happened, and tests should fail 
> loudly.

It's nice if the tests complain loudly, but it's not crucial, as
spurious rotation this is not a failure mode I have ever seen.

> Very agreed on the suggestion elsewhere that was brought up that tests should 
> be small.
> I must admit I'm not familiar with how specifically lilypond is tested, but 
> the ideal situation is
>
> each test runs quickly
> each test demonstrates a very small amount of features (ideally one or two)
> the verification checks only specific aspects of the output (a test that 
> renders articulations, should not check that the console output reports the 
> right version of lilypond, for example)
> this is useful, because in many cases it'll let you use fairly coarse 
> thresholds for accept/reject, in that the checking part should have a wildly 
> different outcomes from "good" vs "bad"
>
> This will give you hundreds of tests.
> Running hundreds of tests will takes a long time. This is usually not 
> something most people look forward to dealing with.

We already have around 1900 tests.  They take about 6 minutes of CPU
time to compile (it parallelizes perfectly on multicore machines), and
we test every aspect (PNG image, console output etc.), because it is
too much work to administer which file needs which kind of check.

> So once you have the above, you add hierarchies to the above so you can 
> deploy a branch-and-bound strategy
>
> Make bigger tests that check several things at once (these are probably 
> approximately the tests you have now, I suspect)
> These will fail using much tighter acceptance criteria (if they pass you're 
> very sure it's all good)
> When these "supertests" fail, the inner tests they cover are run, and a 
> report is made containing the outcome of those
> When these "supertests" pass, the inner tests are skipped: this is where you 
> get major time savings

On the contrary: these are all extra invocations that need to be
coordinated. To make things go fast, you want to run lilypond once on
a ton of files, so it can use parallelism to blaze through all of
them.

-- 
Han-Wen Nienhuys - hanw...@gmail.com - http://www.xs4all.nl/~hanwen

Re: search for better regtest comparison algorithm

Reply via email to