Sorry for the delay in getting back.
To answer your questions: the convolution is a box filter, because it
can be implemented efficiently: for a NxM image, you do a single N*M
precomputation, and can then compute each convolution in N*M
operations.
Regarding performance, consider (cpu @ 2ghz)
MAE:
$ go run ./cmd/compare/ -algorithm=mae -gs_jobs=1 -cmp_jobs=1
-file_regexp 'eps' ../lilypond/{d1,d2} /tmp/tmp.UULz7I9RKh/output/
2025/06/20 14:12:38 Convert 2 EPS files using 1 cores (batch=true)
to PNG in 369.2581ms (184.62905ms/file)
2025/06/20 14:12:38 compared 1 PNG image pairs using 1 cores
(imagemagick=false) in 51.97695ms (51.97695ms / pair)
$ file ../lilypond/les-nereides.png
../lilypond/les-nereides.png: PNG image data, 835 x 1181,
8-bit/color RGB, non-interlaced
There are ~1M pixels in the image, so that's 100 cycles per pixel,
which seems like a bit too much, but actually significant time is
spent decoding the png.
The filter version convolves at 10 resolutions (window = 3 ... 2049),
and is roughly 10 times slower:
$ go run ./cmd/compare/ -algorithm=filter -gs_jobs=1 -cmp_jobs=1
-file_regexp 'eps' ../lilypond/{d1,d2} /tmp/tmp.UULz7I9RKh/output/
2025/06/20 14:14:55 Convert 2 EPS files using 1 cores (batch=true)
to PNG in 378.991521ms (189.49576ms/file)
2025/06/20 14:14:55 compared 1 PNG image pairs using 1 cores
(imagemagick=false) in 448.789022ms (448.789022ms / pair)
I am sure there are numeric libraries that can get the result computed
more quickly, but extra dependencies are a nightmare for packaging, so
we should avoid it if possible.
Our pixels are mostly blank, so this isn't terribly efficient. If
converting the image to vectors, you get:
$ time ps2ps -dNoOutputFonts les-nereides.ps vec.ps
real 0m0.181s
$ time inkscape --export-filename=out.dxf vec.ps
real 0m3.139s
$ grep VERTEX out.dxf | wc -l
9998
10k points probably represents ~80kb of data? I agree that you'll be
able to process that data more quickly than the 1mb of pixels we
process using pixmaps.
There are some considerations though:
* Inkscape seems pretty slow for the ps -> dxf conversion. This can
probably be sped up, but how? It would need some nontrivial hacking
to interpret the PS and make it generate something that a vector
algorithm can handle.
* If you convert to a vector format, changes in the representation of
the score (eg. how drawing routines are called) can generate
spurious differences. It is also not guaranteed that the Cairo and
PS generate something you can compare. This is also why "for
font-based elements the font/glyph id pair" is suspect: changes to
glyph shapes are also in scope for regtesting, and unless you expand
the glyphs in their curves, you'd miss them.
So in short, you are right that a vector-based approach is potentially much
faster. But without a prototype, it's hard to evaluate either
performance or how well it works.
Bitmaps are straightforward to manipulate, and we already have the
dependencies (zlib, libpng) to read them. Altogether, this was about
half a day of work, and seems like it could be an improvement over
what we have.
HTH