On Wed, Oct 26, 2022 at 03:14:26PM +0300, Eli Zaretskii wrote: > > Date: Wed, 26 Oct 2022 11:03:53 +0200 > > From: [email protected] > > Cc: [email protected], [email protected] > > > > Lets call LOC your locale. The setup is a manual encoded in Latin1, and > > an include file included_latîn1.texi. On your computer, the î in the > > include file is stored as 0x05DE, which is the conversion of 0xEE in the > > LOC codepage. > > The manual has the @documentencoding which declares Latin1 encoding. > So I believe included_latîn1.texi is converted to UTF-8 correctly. > > The problem happens when accessing this file via 'stat' etc., because > the locale's encoding cannot encode î. > > For this to work, the non-ASCII character we use should be encodable > both by Latin1 and by the Windows codepages. This is a tough > requirement, but if you look at the tables of these encodings, you > will see that some codepoints between 0xA1 and 0xAF are identical > between many Windows codepages and Latin1. For example, 0xAB is > identical in many codepages. So maybe we could try such a character, > for these tests?
We could do that, but to me the tests are not so important, if they are skipped on Windows, it is not such a big deal, to me what is important is to make sure that the DOC_ENCODING_FOR_INPUT_FILE_NAME set to 0 in Windows is the best choice. -- Pat
