Re: GUILE 2/3 and string encoding cost

David Kastrup Wed, 22 Jan 2020 03:02:54 -0800

Han-Wen Nienhuys <hanw...@gmail.com> writes:

> I looked a bit through the GUILE source code to see what is going on.
>
> I believe our current hypothesis (LilyPond's slowdown is caused by
> expensive unicode transcoding into 32-bit strings) is incorrect.
>
> If you look into the source code, you can see that the UTF-8 -> SCM
> conversion checks if there are any code points over 255
>
>
> https://git.savannah.nongnu.org/cgit/guile.git//tree/libguile/strings.c/?id=1b8e9ca0e37fab366435436995248abdfc780a10#n1620
>
> if there aren't, it uses Latin1 encoding ("narrow == 1") to encode the
> string as a normal byte array. This code walks the string twice, but that
> is very cheap due to CPU cache locality, so it should be
> essentially equivalent to whatever GUILE 1.8 was doing.


GUILE 1.8 did not walk the string even once.

> LilyPond internally doesn't use any Unicode strings, as all our
> identifiers are pure ascii, as well as internal strings (eg. font
> glyph names). This means that files that do not use Unicode characters
> at all should have the same overhead for strings as GUILE 1.8.

We already use the latin1 calls for LilyPond internals.

> Even so, if the input flie does use UTF-8, there should be little
> overhead, because the number of texts that we process is always
> small. LilyPond is not a text processor.
>
> So, what hard data do we have on GUILE 2/3 slowness, and what does
> that data say?

That data says "humongous slowdown".  There is not much more than
speculation what this is caused by as far as I know.

-- 
David Kastrup

Re: GUILE 2/3 and string encoding cost

Reply via email to