Han-Wen Nienhuys <hanw...@gmail.com> writes: > I looked a bit through the GUILE source code to see what is going on. > > I believe our current hypothesis (LilyPond's slowdown is caused by > expensive unicode transcoding into 32-bit strings) is incorrect. > > If you look into the source code, you can see that the UTF-8 -> SCM > conversion checks if there are any code points over 255 > > > https://git.savannah.nongnu.org/cgit/guile.git//tree/libguile/strings.c/?id=1b8e9ca0e37fab366435436995248abdfc780a10#n1620 > > if there aren't, it uses Latin1 encoding ("narrow == 1") to encode the > string as a normal byte array. This code walks the string twice, but that > is very cheap due to CPU cache locality, so it should be > essentially equivalent to whatever GUILE 1.8 was doing.
GUILE 1.8 did not walk the string even once. > LilyPond internally doesn't use any Unicode strings, as all our > identifiers are pure ascii, as well as internal strings (eg. font > glyph names). This means that files that do not use Unicode characters > at all should have the same overhead for strings as GUILE 1.8. We already use the latin1 calls for LilyPond internals. > Even so, if the input flie does use UTF-8, there should be little > overhead, because the number of texts that we process is always > small. LilyPond is not a text processor. > > So, what hard data do we have on GUILE 2/3 slowness, and what does > that data say? That data says "humongous slowdown". There is not much more than speculation what this is caused by as far as I know. -- David Kastrup