On Sat, Nov 22, 2025 at 10:29:24AM +0100, Patrice Dumas wrote: > On Fri, Nov 21, 2025 at 11:04:36PM +0000, Gavin Smith wrote: > > Is there some internal conversion done on section titles that doesn't show > > up in the output? > > Indeed, there is, as shown by the trace in your other email, the > normalization of 'HTML Cross-references' is used for section arguments > to get a string that can be used as target. This would also happen with > @expansion on @node line.
Indeed, there is a difference between Solaris 11 output and GNU/Linux. On Solaris 11: <h2 class="chapter subsection-level-set-chapter" id="g_t_0040expansion_007b_007d-_0028_0029_003a-Indicating-an-Expansion" On GNU/Linux: <h2 class="chapter subsection-level-set-chapter" id="g_t_0040expansion_007b_007d-_0028_003f_0029_003a-Indicating-an-Expansion"> The difference is in the "_0028_0029" section of the header 'id' attribute. This gives the ASCII values for "()". On GNU/Linux it is _0028_003f_0029 which corresponds to "(?)" - here, "?" is evidently used as a replacement character for the right arrow character. Neither is good output. If I run with TEXINFO_XS=omit, the output is different: _0028_21a6_0029. Here _21a6 refers to the correct character. This is the same on both Solaris 11 and GNU/Linux. Hence there is a clear bug here with inconsistent output between XS and pure Perl code, with the pure Perl output being superior. > Expansion to UTF-8 does not happen in the remaining of the output > presumably because textual entities are used. > > You could set OUTPUT_CHARACTERS customization variable to have > characters output instead of textual entities. It could help determine > if the issue is only with cross references normalization, or more > general. There are tests that test OUTPUT_CHARACTERS in the test suite. > > I had a look at the opencsw test results and there is no messages/errors > like that. Maybe the iconv library used is different? It appears to be from the use of the "us-ascii//TRANSLIT" encoding in 'unicode_to_transliterate' in main/node_name_normalization.c. My guess is that this system either doesn't have such an encoding or doesn't support some characters for transliteration. I tried with '-c OUTPUT_CHARACTERS=1' and it made no difference to the error messages. I found the use of this encoding was introduced in commit 1c9a5f283: Author: Patrice Dumas <[email protected]> Date: 2023-10-11 15:11:11 +0200 * tp/Texinfo/Convert/HTML.pm (_set_root_commands_targets_node_files): remove unused $output_units argument. Remove unused $no_unidecode. Put $extension in if. * tp/Texinfo/XS/main/errors.c (reallocate_error_messages) (message_list_line_error_internal) (message_list_document_error_internal, message_list_document_error) (message_list_document_warn), tp/Texinfo/XS/main/get_perl_info.c (html_converter_initialize): add message_list_document_warn and message_list_document_error and add error messages in converter. * tp/Texinfo/XS/main/convert_utils.c, tp/Texinfo/XS/main/utils.c (output_conversions, input_conversions, decode_string, encode_string): move output_conversions, input_conversions, decode_string, encode_string to utils.c. * tp/Texinfo/XS/parsetexi/input.c (parser_input_conversions): rename input_conversions as parser_input_conversions. * tp/Texinfo/XS/convert/convert_html.c (normalized_to_id) (normalized_label_id_file, unique_target) (new_sectioning_command_target, set_root_commands_targets_node_files) (html_prepare_conversion_units_targets), tp/Texinfo/XS/convert/converter.c (id_to_filename) (normalized_sectioning_command_filename, node_information_filename), tp/Texinfo/XS/main/call_perl_function.c (call_file_id_setting_label_target_name) (call_file_id_setting_node_file_name) (call_file_id_setting_sectioning_command_target_name), tp/Texinfo/XS/main/node_name_normalization.c (unicode_to_transliterate, normalize_transliterate_texinfo) (normalize_transliterate_texinfo_contents): implement set_root_commands_targets_node_files. I don't get any understanding by looking at that commit why "us-ascii//TRANSLIT" was used. It seems likely that such an encoding wouldn't be supported or work identically on different systems.
