> Date: Sun, 23 Oct 2022 20:22:29 +0200 > From: [email protected] > Cc: [email protected], [email protected] > > On Sun, Oct 23, 2022 at 07:10:57PM +0300, Eli Zaretskii wrote: > > > > > > Which is problematic, it means that with a correctly setup input file > > > with latin1 encoded character in the name, something wrong is going on. > > > > The character is supposed to be encoded in Latin1, but I don't think > > it is, because Latin1 is not the locale's encoding here. > > I think that it is encoded in Latin1, as discussed in another mail.
The Perl program which creates the file wanted a Latin1 encoding, but Windows has its own ideas about that, as I explained in my other mail. > > And some > > programs involved in these tests could decide they don't understand > > the character and replace it with a '?'. > > There aren't other programs in these tests than texi2any. The ? > appearing in the message may come from perl Encode which does not > know how to encode from the perl internal encoding to the C locale > sets up in the test as > LC_ALL=C; export LC_ALL I think it could come from the Bash I'm using here. > The failure of manual_include_accented_file_name_latin1_explicit_encoding > is more surprising to me, as in that case INPUT_FILE_NAME_ENCODING is > set to ISO-8859-1, so I do not understand why the test fails, the > reverse encoding from UTF-8 to ISO-8859-1 should lead to a path that can > be found. The function where paths are looked for is > locate_include_file() in input.c, it could be where something unexpected > happens, maybe if stat() on Windows does some kind of conversion. 'stat' doesn't do any conversions, it uses the bytestream in the 'char *' file-name argument we feed it. What is expected to be in that file-name argument? Where did the UTF-8 encoded input file name come from in that case? Did we read it from the filesystem, from some file in the source tree, or from somewhere else? > Debugging further the > manual_include_accented_file_name_latin1_explicit_encoding > test would require showing the string bytes before and after the call to > encode_file_name() in end_line.c, and then, if the string bytes seem to > match the expected latin1 string with \xEE for î, check if something > unexpected happens in locate_include_file, maybe checking what are the > values of filename to check if there is indeed one that should lead to > stat giving a 0 return value. > > I do not know if it is practical for you to do that Eli? Would printf's to stderr in input.c be visible when running tests? If so, I can show these byte streams.
