On Sat, Nov 22, 2025 at 02:48:30PM +0100, Patrice Dumas wrote:
> > The difference is in the "_0028_0029" section of the header 'id' attribute.
> > This gives the ASCII values for "()". On GNU/Linux it is _0028_003f_0029
> > which corresponds to "(?)" - here, "?" is evidently used as a replacement
> > character for the right arrow character.
> >
> > Neither is good output.
>
> The sectioning commands id are not specified. They are only supposed to
> be consistent "internally", ie it should be the right id which is used
> in generated HTML (section commands id are not often used) or available
> in user-defined code using the HTML customization API.
>
> Therefore, in normal runs, right now speed is favored over consistency
> and the iconv "us-ascii//TRANSLIT" output is used.
OK got it, there is no promise of stability for section anchors.
How does running UTF-8 into "us-ascii//TRANSLIT" iconv conversion
increase speed? Could we not just skip that step?
For example, commenting out the call to unicode_to_transliterate in
normalize_transliterate_texinfo changes the output for @expansion{}
in the 'id' from _003f to _21a6:
diff --git a/tta/C/main/node_name_normalization.c
b/tta/C/main/node_name_normalization.c
index d8f922edea..762edf2734 100644
--- a/tta/C/main/node_name_normalization.c
+++ b/tta/C/main/node_name_normalization.c
@@ -372,13 +372,14 @@ normalize_transliterate_texinfo (const ELEMENT *e, int
external_translit,
{
char *converted_name = convert_to_normalized (e);
char *normalized_name = normalize_NFC (converted_name);
- char *transliterated = unicode_to_transliterate (normalized_name,
- external_translit, in_test, no_unidecode);
- char *result = unicode_to_protected (transliterated);
+ // char *transliterated = unicode_to_transliterate (normalized_name,
+ // external_translit, in_test, no_unidecode);
+ //char *result = unicode_to_protected (transliterated);
+ char *result = unicode_to_protected (normalized_name);
free (converted_name);
free (normalized_name);
- free (transliterated);
+ // free (transliterated);
return result;
}
(Obviously this function is called elsewhere as well so probably a new
function would have to be called, called something like 'normalize_texinfo'.)
> In tests, the Perl code is called.
>
> > If I run with TEXINFO_XS=omit, the output is different: _0028_21a6_0029.
> > Here _21a6 refers to the correct character. This is the same on both
> > Solaris 11 and GNU/Linux.
>
> And with TEST set.
>
I've checked and the error message isn't output with '-c TEST=1', as you
say. I see from reading the code that with TEST, a call is made into
Perl to do the transliteration.
> At that time it was simply used as a C replacement for Text::Unidecode.
> Later on, I added the possibility to call Perl to do the transliteration
> reproducibly. But as I said above, reproducibility is not offered for
> sectioning commands identifiers, so it remained as is in that case.
It seems that we don't need transliteration for sectioning commands,
regardless of whether it is in the C code or the Perl code.