Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
On Mon, May 18, 2009 at 01:41:28PM +0800, Lenik wrote: The expr error is fixed, and I can build cygpath from source now. Though I don't have NTDDK in hand, I'm suprised how it could be compiled. The cygwin build is fairly self-contained. We certainly don't need anything like a DDK to build. I can get the correct result from the new cygpath now, without -C option. Thank you guys. I think the main person you should be thanking isn't a guy. cgf -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
On 2009-5-18 14:09, Christopher Faylor wrote: I think the main person you should be thanking isn't a guy. Ok. Thank you gods. Lenik -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
Lenik wrote: On 2009-5-18 14:09, Christopher Faylor wrote: I think the main person you should be thanking isn't a guy. Ok. Thank you gods. Hey Corinna? Congrats! You just got a promotion! cheers, DaveK -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
On Mon, May 18, 2009 at 9:17 AM, Dave Korn wrote: Lenik wrote: On 2009-5-18 14:09, Christopher Faylor wrote: I think the main person you should be thanking isn't a guy. Ok. Thank you gods. Hey Corinna? Congrats! You just got a promotion! All praise to the great Corinna! -- Mark J. Reed markjr...@gmail.com -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
On 2009-5-17 10:09, IWAMURO Motonori wrote: 2009/5/17 Lenikle...@bodz.net: Thanks, but where can I get this patch? You can checkout it from CVS HEAD. Thanks for your information, well, I'm not expect to build from source, that really frustrates me, and brings me even more problems. Is there any mirror site for nightly builds? so I can use rsync to get it (If this patch is too minor to increase any of the version numbers). I've just looked at snapshots, but the last update time is 2009-05-13. I can't build from source, here are some errors, I guess there will be more errors, so I hope someone will compile cygpath at the first time, 6 weeks to the next release maybe too long to wait. 1, cvs update failed: ... (ignored) cvs update: Updating src/winsup/testsuite/winsup.api/samples cvs update: Updating src/winsup/utils cvs update: Updating src/winsup/w32api cvs update: Updating src/winsup/w32api/include cvs update: Updating src/winsup/w32api/include/GL cvs update: Updating src/winsup/w32api/include/ddk cvs update: Updating src/winsup/w32api/include/directx cvs update: Updating src/winsup/w32api/lib cvs update: Updating src/winsup/w32api/lib/ddk cvs update: Updating src/winsup/w32api/lib/directx cvs update: closing down connection to cygwin.com: Transport endpoint is not connected 2, configure failed: bash-3.2$ ./configure 5 [main] expr 952 _cygtls::handle_exceptions: Error while dumping state (probably corrupted stack) ./configure: line 56: 952 Segmentation fault (core dumped) expr a : '\(a\)' /dev/null 21 4 [main] expr 2808 _cygtls::handle_exceptions: Error while dumping state (probably corrupted stack) 5 [main] expr 3516 _cygtls::handle_exceptions: Error while dumping state (probably corrupted stack) 5 [main] expr 3328 _cygtls::handle_exceptions: Error while dumping state (probably corrupted stack) 5 [main] expr 2648 _cygtls::handle_exceptions: Error while dumping state (probably corrupted stack) 5 [main] expr 900 _cygtls::handle_exceptions: Error while dumping state (probably corrupted stack) 5 [main] expr 1840 _cygtls::handle_exceptions: Error while dumping state (probably corrupted stack) 5 [main] expr 2972 _cygtls::handle_exceptions: Error while dumping state (probably corrupted stack) Thanks, Lenik -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
On May 17 11:09, IWAMURO Motonori wrote: 2009/5/17 Lenik le...@bodz.net: Thanks, but where can I get this patch? You can checkout it from CVS HEAD. It occured to me that, if you're using a charset which differs from your current ANSI or OEM codepage, you might run into trouble with native Windows tools. Therefore I also added a new -C/--codepage option to cygpath to specify the codepage used to create a WIndows path from a Cygwin path. For instance: cygpath -C ANSI -aw . creates the full path of the CWD in the current ANSI codepage. The -C/--codepage option takes the following parameters: - ANSI to specify the current ANSI codepage (for interaction with GUI tools). - OEMto specify the current OEM codepage (for interaction with CLI tools). - UTF8 just guess... UTF-8 - n A decimal codepage number according to the following table: http://msdn.microsoft.com/en-us/library/dd317756(VS.85).aspx Note that not all installations support all codepages. I hope that helps. Please note that the -C option doesn't work yet for the -p option. That's something I'll do after my vacation. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
On May 17 15:52, Lenik wrote: On 2009-5-17 10:09, IWAMURO Motonori wrote: 2009/5/17 Lenikle...@bodz.net: Thanks, but where can I get this patch? You can checkout it from CVS HEAD. [...] 6 weeks to the next release maybe too long to wait. We have about 2 weeks between the test releases. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
On 2009-5-17 19:53, Corinna Vinschen wrote: On May 17 15:52, Lenik wrote: On 2009-5-17 10:09, IWAMURO Motonori wrote: 2009/5/17 Lenikle...@bodz.net: Thanks, but where can I get this patch? You can checkout it from CVS HEAD. [...] 6 weeks to the next release maybe too long to wait. We have about 2 weeks between the test releases. Corinna Thank you, I'll be very happy if I can apply your great patch in next morning if not earlier. I'd rather hope I can get everything immediately when I read your reply, and IMHO that should be very easy, all what you have to do is make your working directory public and accessible. Stupid idea, heh? :) Currently I resolved it by a simple function: function _u2w() { local p=$(cygpath -au $1) if [ ${p:0:5} = /mnt/ -o ${p:0:10} = /cygdrive/ ]; then p=${p:1} p=${p#*/} p=${p/\//:/} else if [ ${p:0:9} = /usr/bin/ ]; then p=${p:4}; fi if [ ${p:0:9} = /usr/lib/ ]; then p=${p:4}; fi p=$(cygpath -am /)$p fi p=${p//\//\\} echo $p } path=$(_u2w $path) Lenik -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
On 2009-5-17 15:52, Lenik wrote: 2, configure failed: bash-3.2$ ./configure 5 [main] expr 952 _cygtls::handle_exceptions: Error while dumping state (probably corrupted stack) ./configure: line 56: 952 Segmentation fault (core dumped) expr a : '\(a\)' /dev/null 21 4 [main] expr 2808 _cygtls::handle_exceptions: Error while dumping state (probably corrupted stack) 5 [main] expr 3516 _cygtls::handle_exceptions: Error while dumping state (probably corrupted stack) 5 [main] expr 3328 _cygtls::handle_exceptions: Error while dumping state (probably corrupted stack) 5 [main] expr 2648 _cygtls::handle_exceptions: Error while dumping state (probably corrupted stack) 5 [main] expr 900 _cygtls::handle_exceptions: Error while dumping state (probably corrupted stack) 5 [main] expr 1840 _cygtls::handle_exceptions: Error while dumping state (probably corrupted stack) 5 [main] expr 2972 _cygtls::handle_exceptions: Error while dumping state (probably corrupted stack) The expr error is fixed, and I can build cygpath from source now. Though I don't have NTDDK in hand, I'm suprised how it could be compiled. I can get the correct result from the new cygpath now, without -C option. Thank you guys. Lenik -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
On May 16 13:17, Lenik wrote: (This mail is encoded in utf-8) After tested with 1.7.0-48, many problems are eliminated. But cygpath doesn't return good pathnames, see: Looks like cygpath gets the wcstombs system call from ntdll rather than from cygwin1.dll due to a linking order problem. Unfortunately ntdll exports a couple of convenient C functions like wcstombs, or even sprintf. I applied a patch so the next version of cygpath should do the conversion more correctly. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
On 2009-5-16 23:49, Corinna Vinschen wrote: Looks like cygpath gets the wcstombs system call from ntdll rather than from cygwin1.dll due to a linking order problem. Unfortunately ntdll exports a couple of convenient C functions like wcstombs, or even sprintf. I applied a patch so the next version of cygpath should do the conversion more correctly. Corinna Thanks, but where can I get this patch? Lenik -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
2009/5/17 Lenik le...@bodz.net: Thanks, but where can I get this patch? You can checkout it from CVS HEAD. -- IWAMURO Motnori http://vmi.jp/ -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
2009/5/15 Corinna Vinschen corinna-cyg...@cygwin.com: I have just trouble with SJIS, but that's not something I can easily test. Maybe you can look into that in the next couple of days? Maybe I can. Please explain details of the trouble. -- IWAMURO Motnori http://vmi.jp/ -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
On May 15 20:34, IWAMURO Motonori wrote: 2009/5/15 Corinna Vinschen corinna-cyg...@cygwin.com: I have just trouble with SJIS, but that's not something I can easily test. Maybe you can look into that in the next couple of days? Maybe I can. Please explain details of the trouble. Probably I only fall over my own feet. I was surprised to see the filenames using chinese characters (from Lenik's examples) using SO/UTF sequences. I didn't expect that, but maybe that was correct. The whole problem already starts with me not being able to see non-western chars in the console window. The two available console fonts simple don't provide them, so I only see squares, even if the characters are printed correctly. It would be cool if you could simply use SJIS for testing and see if everything looks basically ok. For the records: The internationalization stuff is a heck of a lot of effort. Even if my replies might sound mean sometimes, I'm glad for your input and help coding. Thanks, Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
(This mail is encoded in utf-8) After tested with 1.7.0-48, many problems are eliminated. But cygpath doesn't return good pathnames, see: 1, Get absolute path of current directory: C:\Profiles\Shecti\桌面 set LANG=zh_CN.GBK cygpath -am . C:/Profiles/Shecti/桌面 (good) C:\Profiles\Shecti\桌面 set LANG=zh_CN.GBK cygpath -au . /mnt/c/Profiles/Shecti/桌面/ (good) C:\Profiles\Shecti\桌面 set LANG=zh_CN.UTF-8 cygpath -am . C:/Profiles/Shecti/ (bad) C:\Profiles\Shecti\桌面 set LANG=zh_CN.UTF-8 cygpath -au . /mnt/c/Profiles/Shecti/桌面/ (good) C:\Profiles\Shecti\桌面 set LANG=C cygpath -am . C:/Profiles/Shecti/ (bad) C:\Profiles\Shecti\桌面 set LANG=C cygpath -au . /mnt/c/Profiles/Shecti/桌面/ (good) Conclusion: 1.1 only GBK works for `cygpath -am .' (also -aw) 1.2 all work for `cygpath -au .' 2, Get absolute path of specified path C:\Profiles\Shecti\桌面set LANG=zh_CN.GBK cygpath -am C:\Profiles \Shecti\桌面 C:/Profiles/Shecti/妗岄潰 (bad) C:\Profiles\Shecti\桌面set LANG=zh_CN.GBK cygpath -au C:\Profiles \Shecti\桌面 /mnt/c/Profiles/Shecti/妗岄潰 (bad) C:\Profiles\Shecti\桌面set LANG=zh_CN.UTF-8 cygpath -am C:\Profiles\Shecti\桌面 C:/Profiles/Shecti/ (bad) C:\Profiles\Shecti\桌面set LANG=zh_CN.UTF-8 cygpath -au C:\Profiles\Shecti\桌面 /mnt/c/Profiles/Shecti/桌面 (good) C:\Profiles\Shecti\桌面set LANG=C cygpath -am C:\Profiles\Shecti\桌面 C:/Profiles/Shecti/ (bad) C:\Profiles\Shecti\桌面set LANG=C cygpath -au C:\Profiles\Shecti\桌面 /mnt/c/Profiles/Shecti/桌面 (good) Conclusion: 2.1 none works for `cygpath -am PathContainsNonascii' 2.2 GBK doesn't work for `cygpath -au PathContainsNonascii' Now the problem is, I must use GBK for 1.1, and I cannot use GBK for 2.2. and no more choice. -_-||... Lenik -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
On May 13 23:49, Matthias Andree wrote: Am 13.05.2009, 17:17 Uhr, schrieb Corinna Vinschen corinna-cyg...@cygwin.com: I followed the suggestion to use UTF-8 for internal conversions when the locale is set to C. This will also be used as default conversion when converting the Windows environment from UTF-16 to multibyte, unless the environment contains a valid LC_ALL/LC_CTYPE/LANG setting. The current working directory was also potentially unusable, if an application switched the locale. Now the CWD is re-evaluated after a setlocale call. Is Unicode normalization an issue here? Not really, I guess. Either a character is available or it isn't. We don't have composition or decomposition capabilities for most multibyte character sets anyway. If at all, we could only do that for SJIS, EUCJP, and GBK. None of them would profit a lot of that. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
2009/5/14 Corinna Vinschen corinna-cyg...@cygwin.com: Should the following part not be modified? winsup/cygwin/fhandler_console.cc: dev_state-con_mbtowc = __mbtowc; dev_state-con_wctomb = __wctomb; I'd rather not. It only affects the console and if LANG=C I'd rather see the single bytes which make up the path instead of the corresponding UTF-8 character. Hm, maybe I misunderstood. In which manner should this be modifed? I think: dev_state-con_mbtowc = __mbtowc == __ascii_mbtowc ? __utf8_mbtowc : __mbtowc; dev_state-con_wctomb = __wctomb == __ascii_wctomb ? __utf8_wctomb : __wctomb; -- IWAMURO Motnori http://vmi.jp/ -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
On May 14 21:39, IWAMURO Motonori wrote: 2009/5/14 Corinna Vinschen corinna-cyg...@cygwin.com: Should the following part not be modified? winsup/cygwin/fhandler_console.cc: dev_state-con_mbtowc = __mbtowc; dev_state-con_wctomb = __wctomb; I'd rather not. It only affects the console and if LANG=C I'd rather see the single bytes which make up the path instead of the corresponding UTF-8 character. Hm, maybe I misunderstood. In which manner should this be modifed? I think: dev_state-con_mbtowc = __mbtowc == __ascii_mbtowc ? __utf8_mbtowc : __mbtowc; dev_state-con_wctomb = __wctomb == __ascii_wctomb ? __utf8_wctomb : __wctomb; Oh, ok. So I understood right. But that's exactly what I didn't want to do. The idea is that, even though UTF-8 is used for the filename conversion, the console should default to standard ASCII behaviour, unless you specify another charset before starting the first Cygwin process in the console. I'm also wondering if we should perhaps only allow either ASCII or UTF-8 as console charsets, but for now I don't want to touch this more than necessary. I just found that the console I/O doesn't work well for non-ASCII chars anyway. The core function which echos input to the terminal only handles singlebyte chars, which can be easily reproduced using copy/paste. Oh well. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
2009/5/14 Corinna Vinschen corinna-cyg...@cygwin.com: I see a couple of potential problems. What problems are those? And have some time to discuss whether these are something the user can or even should fix or workaround alone. I think that the application that use locale by the environment variable and the application that use no locale should be able to read and write the same byte sequence. However, I don't strongly request it because the applications work correctly in UTF-8. -- IWAMURO Motnori http://vmi.jp/ -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
On May 14 23:06, IWAMURO Motonori wrote: 2009/5/14 Corinna Vinschen corinna-cyg...@cygwin.com: I see a couple of potential problems. What problems are those? I have no example off-hand. When I thought about it I always got sick thinking about scenarios where the library is using, say, UTF-8, and the application is using SJIS, and what happens to the filenames in this case. In theory the lib should provide what the application thinks it right. And have some time to discuss whether these are something the user can or even should fix or workaround alone. I think that the application that use locale by the environment variable and the application that use no locale should be able to read and write the same byte sequence. Ok, you got as point there. Assuming we leave out any application which deliberately uses a non-C locale which differs from the setting in the environment. Then we're getting into trouble. If Cygwin uses internally always the environment setting, we have a valid, identical byte stream for all applications using setlocale(LC_ALL, ), *and* for non locale-aware applications. But if the application uses a deliberately different setting via setlocale, ..., hmm. It should get what it asks for. Maybe, you're right. I have to test this a bit. Thanks for your input, Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
On May 14 16:42, Corinna Vinschen wrote: On May 14 23:06, IWAMURO Motonori wrote: 2009/5/14 Corinna Vinschen corinna-cyg...@cygwin.com: I see a couple of potential problems. What problems are those? I have no example off-hand. When I thought about it I always got sick thinking about scenarios where the library is using, say, UTF-8, and the application is using SJIS, and what happens to the filenames in this case. In theory the lib should provide what the application thinks it right. Here's one problem. What if an application uses setenv(LANG, ...)? Do you want Cygwin to intercept all calls to setenv() to check for setting $LC_ALL/LC_CTYPE/LANG? Right now, the only time these variables are read by Cygwin is at the start of the first Cygwin process in a Cygwin process tree. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
2009/5/15 Corinna Vinschen corinna-cyg...@cygwin.com: Here's one problem. What if an application uses setenv(LANG, ...)? Oh. Hmmm, I think that anything should not occur. Do you want Cygwin to intercept all calls to setenv() to check for setting $LC_ALL/LC_CTYPE/LANG? No. I think that only setlocale() has to do the check. The reason: - setlocale(LC_CTYPE, C) is called from Cygwin startup. - The following code become valid. setenv(LANG, ...); setlocale(LC_ALL, ); -- IWAMURO Motnori http://vmi.jp/ -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
On May 15 01:34, IWAMURO Motonori wrote: 2009/5/15 Corinna Vinschen corinna-cyg...@cygwin.com: Here's one problem. What if an application uses setenv(LANG, ...)? Oh. Hmmm, I think that anything should not occur. Do you want Cygwin to intercept all calls to setenv() to check for setting $LC_ALL/LC_CTYPE/LANG? No. I think that only setlocale() has to do the check. The reason: - setlocale(LC_CTYPE, C) is called from Cygwin startup. - The following code become valid. setenv(LANG, ...); setlocale(LC_ALL, ); Ok, that makes sense. I'm just testing a patch which use the environment settings internally as you proposed (still keeping UTF-8 the default for the C locale). So far it works fine, I have just trouble with SJIS, but that's not something I can easily test. I'm simply lacking a real understanding of the SJIS encoding. Maybe you can look into that in the next couple of days? I'll be mostly offline next week and the week after so there's a lot time for testing ;-) Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
On May 14 19:23, Corinna Vinschen wrote: On May 15 01:34, IWAMURO Motonori wrote: 2009/5/15 Corinna Vinschen corinna-cyg...@cygwin.com: Here's one problem. What if an application uses setenv(LANG, ...)? Oh. Hmmm, I think that anything should not occur. Do you want Cygwin to intercept all calls to setenv() to check for setting $LC_ALL/LC_CTYPE/LANG? No. I think that only setlocale() has to do the check. The reason: - setlocale(LC_CTYPE, C) is called from Cygwin startup. - The following code become valid. setenv(LANG, ...); setlocale(LC_ALL, ); Ok, that makes sense. I'm just testing a patch which use the environment settings internally as you proposed (still keeping UTF-8 the default for the C locale). So far it works fine, I have just trouble with SJIS, but that's not something I can easily test. I'm simply lacking a real understanding of the SJIS encoding. Maybe you can look into that in the next couple of days? I'll be mostly offline next week and the week after so there's a lot time for testing ;-) I applied the patch. The charset used by Cygwin now only depends on the last call to setlocale in an application and eventually on the setting of the internationalization environment variables LC_ALL/LC_CTYPE/LANG. A side effect of this change is that the console charset depends solely on the environment setting again. So you can change the console charset used in an application on the fly again by changing the LC_ALL/LC_CTYPE/LANG setting in the environment(*), instead of setting it only once at startup of the first Cygwin process in the console. The (in)famous ssh to a remote machine from a UTF-8 console doesn't work problem(**) should be unaffected by this change and still work now, if, for instance, LANG is set to en_US.UTF-8 in the calling shell. Just the documentation needs to be updated again. I really hope this change makes it finally easier to use Cygwin with native characters. I'll create a new Cygwin package tomorrow. Corinna (*) Still depends on a call to setlocale, but the Cygwin shells do that anyway. (**) http://cygwin.com/ml/cygwin/2009-04/msg00055.html -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
http://cygwin.com/ml/cygwin/2009-05/msg00245.html? I found this web page doesn't display utf-8 characters correctly. BTW, I'm using thunderbird as news reader. Lenik -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
On May 12 19:37, Corinna Vinschen wrote: On May 13 02:29, IWAMURO Motonori wrote: I propose that the filename encoding in C locale uses UTF-8 instead of SO/UTF-8. There are three reasons: That's an interesting thought. Do you have a patch and, if so, did you try it? Does it, for instance, help for the issue reported in the thread starting at http://cygwin.com/ml/cygwin/2009-05/msg00245.html? After examining the issue Lenik reported in the above thread, I'm at a loss how to solve this problem in a generic way. The problem is that the filename changes dependent on the character set used in $LANG. The reason is that every time a multibyte filename has to be generated, it has to be converted from UTF-16 to multibyte. For instance, taking one of the filename from Lenik's example. It's stored on the filesystem as the UTF-16 sequence \u684c \u9762. If I set LANG to en_US.UTF-8, a readdir(2) call returns the multibyte sequence 0xe6 0xa1 0x8c 0xe9 0x9d 0xa2 If I set LANG to en_US.GBK, `ls' returns the filename 0xd7 0xc0 0xc3 0xe6 And in case LANG=C, `ls' returns 0x0e 0xe6 0xa1 0x8c 0x0e 0xe9 0x9d 0xa2 So, dependent on the character set setting in the application, the idea of the filename differs. That's not exactly helpful for interoperability between different applications. I can think of two potential solutions to fix this problem: (1) Always return filenames in UTF-8 encoding and pretend that UTF-8 is the way files are stored on disk. That results in unchangable filenames which are always valid. But what if an application sets LANG=.SJIS and tries to create a file using SJIS character encoding? Should the file be created using the SJIS-UTF-16 conversion or should open fail with EILSEQ? That's not good. (2) If none of $LC_ALL/$LC_CTYPE/$LANG is set in the environment, then Cygwin uses the LC_CTYPE setting which corresponds to the current codepage. If one of $LC_ALL/$LC_CTYPE/$LANG is set in the environment, Cygwin uses that to convert pathnames. If the application uses setlocale, Cygwin uses that setting to convert pathnames. One problem can't be solved this way: If an application fetches and stores a filename, then switches the locale, and then tries to use the filename in another system call, the filename is potentially broken. Any better ideas? Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
Am 13.05.2009, 16:29 Uhr, schrieb Corinna Vinschen corinna-cyg...@cygwin.com: On May 12 19:37, Corinna Vinschen wrote: On May 13 02:29, IWAMURO Motonori wrote: I propose that the filename encoding in C locale uses UTF-8 instead of SO/UTF-8. There are three reasons: That's an interesting thought. Do you have a patch and, if so, did you try it? Does it, for instance, help for the issue reported in the thread starting at http://cygwin.com/ml/cygwin/2009-05/msg00245.html? After examining the issue Lenik reported in the above thread, I'm at a loss how to solve this problem in a generic way. The problem is that the filename changes dependent on the character set used in $LANG. The reason is that every time a multibyte filename has to be generated, it has to be converted from UTF-16 to multibyte. For instance, taking one of the filename from Lenik's example. It's stored on the filesystem as the UTF-16 sequence \u684c \u9762. If I set LANG to en_US.UTF-8, a readdir(2) call returns the multibyte sequence 0xe6 0xa1 0x8c 0xe9 0x9d 0xa2 If I set LANG to en_US.GBK, `ls' returns the filename 0xd7 0xc0 0xc3 0xe6 And in case LANG=C, `ls' returns 0x0e 0xe6 0xa1 0x8c 0x0e 0xe9 0x9d 0xa2 So, dependent on the character set setting in the application, the idea of the filename differs. That's not exactly helpful for interoperability between different applications. I can think of two potential solutions to fix this problem: (1) Always return filenames in UTF-8 encoding and pretend that UTF-8 is the way files are stored on disk. That results in unchangable filenames which are always valid. But what if an application sets LANG=.SJIS and tries to create a file using SJIS character encoding? Should the file be created using the SJIS-UTF-16 conversion or should open fail with EILSEQ? That's not good. Why would it have to interpreted as all? Aren't filenames just opaque strings - with exceptions, say, for / and NUL to UNIX kernels? (2) If none of $LC_ALL/$LC_CTYPE/$LANG is set in the environment, then Cygwin uses the LC_CTYPE setting which corresponds to the current codepage. If one of $LC_ALL/$LC_CTYPE/$LANG is set in the environment, Cygwin uses that to convert pathnames. If the application uses setlocale, Cygwin uses that setting to convert pathnames. One problem can't be solved this way: If an application fetches and stores a filename, then switches the locale, and then tries to use the filename in another system call, the filename is potentially broken. Any better ideas? Just questions to kindle some brainstorming: - why do you need to touch the filename at all? I haven't read all of it. Is the UTF-16 on disk and we need to work around UTF-16 being intractable as C string? - some applications in the GNOME ballpark, for instance Gnumerica, do something like treat as Unicode and fall back to SOME_ENVIRONMENT_VARIABLE specified encoding (perhaps as a colon-separated list - not sure) - adding to my interspersed comment above: isn't the issue more about *presentation* of filenames to the user than internal workings? To me the main issue appears to be that filenames should look alike in a Cygwin application and in a native Windows application. I'd assume that applications can get really confused if you change file names behind their back. - if you speak of UTF-8, do you want to normalize file names? (I'd think you do.) Which normalization form will you choose? NFC (canonical) or NFD (compatibility)? -- Matthias Andree -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
- why do you need to touch the filename at all? I haven't read all of it. Is the UTF-16 on disk and we need to work around UTF-16 being intractable as C string? Yes. If you simply treated each UTF-16 symbol as two chars, you'd get unintended NULs and slashes. For starters, the upper halves of all ISO-8859-1 characters are NUL in UTF-16. And even without that, the resulting filenames would be completely unusable. Andy -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
On May 13 15:54, Andy Koppe wrote: - why do you need to touch the filename at all? I haven't read all of it. Is the UTF-16 on disk and we need to work around UTF-16 being intractable as C string? Yes. If you simply treated each UTF-16 symbol as two chars, you'd get unintended NULs and slashes. For starters, the upper halves of all ISO-8859-1 characters are NUL in UTF-16. And even without that, the resulting filenames would be completely unusable. Right. That's the crux when using UTF-16 filenames but many different multibyte codepages. In contrast to a system in which the filename is just a byte stream, we have to perform widechar to multibyte conversion and outside of the UTF-8 domain, every other conversion is lossy. For the time being, I applied a patch to Cygwin which should ease the pain. I followed the suggestion to use UTF-8 for internal conversions when the locale is set to C. This will also be used as default conversion when converting the Windows environment from UTF-16 to multibyte, unless the environment contains a valid LC_ALL/LC_CTYPE/LANG setting. The current working directory was also potentially unusable, if an application switched the locale. Now the CWD is re-evaluated after a setlocale call. I'm sure this change doesn't fix all problems, but this worked much better in my environment when using japanese and chinese characters in filenames. There are a few other changes to the Cygwin DLL in the loop, but I will update Cygwin 1.7 end of the week. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
RE: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
Corinna Vinschen wrote on Wednesday, May 13, 2009 10:30: On May 12 19:37, Corinna Vinschen wrote: On May 13 02:29, IWAMURO Motonori wrote: I propose that the filename encoding in C locale uses UTF-8 instead of SO/UTF-8. There are three reasons: That's an interesting thought. Do you have a patch and, if so, did you try it? Does it, for instance, help for the issue reported in the thread starting at http://cygwin.com/ml/cygwin/2009-05/msg00245.html? After examining the issue Lenik reported in the above thread, I'm at a loss how to solve this problem in a generic way. I may be dense, as all of my internationlization experience was from the late 90's. But in my experience the only solution for this is a cognizant effort on behalf of the user (or admin). The problem is that the filename changes dependent on the character set used in $LANG. The reason is that every time a multibyte filename has to be generated, it has to be converted from UTF-16 to multibyte. For instance, taking one of the filename from Lenik's example. It's stored on the filesystem as the UTF-16 sequence \u684c \u9762. If I set LANG to en_US.UTF-8, a readdir(2) call returns the multibyte sequence 0xe6 0xa1 0x8c 0xe9 0x9d 0xa2 If I set LANG to en_US.GBK, `ls' returns the filename 0xd7 0xc0 0xc3 0xe6 And in case LANG=C, `ls' returns 0x0e 0xe6 0xa1 0x8c 0x0e 0xe9 0x9d 0xa2 So, dependent on the character set setting in the application, the idea of the filename differs. That's not exactly helpful for interoperability between different applications. I can think of two potential solutions to fix this problem: (1) Always return filenames in UTF-8 encoding and pretend that UTF-8 is the way files are stored on disk. That results in unchangable filenames which are always valid. But what if an application sets LANG=.SJIS and tries to create a file using SJIS character encoding? Should the file be created using the SJIS-UTF-16 conversion or should open fail with EILSEQ? That's not good. (2) If none of $LC_ALL/$LC_CTYPE/$LANG is set in the environment, then Cygwin uses the LC_CTYPE setting which corresponds to the current codepage. If one of $LC_ALL/$LC_CTYPE/$LANG is set in the environment, If nothing is set use UTF-8 as it will work in existing code. Cygwin uses that to convert pathnames. If the application uses setlocale, Cygwin uses that setting to convert pathnames. One problem can't be solved this way: If an application fetches and stores a filename, then switches the locale, and then tries to use the filename in another system call, the filename is potentially broken. This is the user's problem to resolve. Any better ideas? Not necessarily better, but here is a chart: Sys:App:function expects/returns NULL: NULL: UTF-8 C/UA: NULL: UTF-8 NULL: C/UA: UTF-8 C/UA: C/UA: UTF-8 SPEC: NULL: System Locale SPEC: C/UA: UTF-8 NULLSPEC: Application Locale C/UA: SPEC: Application Locale SPEC: SPEC: Application Locale Key: Sys= System's current locale App= Application's current locale NULL= No setting C/UA= C or any Unicode aware locale SPEC= Some other locale (i.e. SJIS) -jason -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- - - - Jason Pyeron PD Inc. http://www.pdinc.us - - Principal Consultant 10 West 24th Street #100- - +1 (443) 269-1555 x333Baltimore, Maryland 21218 - - - -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- This message is copyright PD Inc, subject to license 20080407P00. -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
On May 13 11:41, Jason Pyeron wrote: Corinna Vinschen wrote on Wednesday, May 13, 2009 10:30: On May 12 19:37, Corinna Vinschen wrote: On May 13 02:29, IWAMURO Motonori wrote: I propose that the filename encoding in C locale uses UTF-8 instead of SO/UTF-8. There are three reasons: That's an interesting thought. Do you have a patch and, if so, did you try it? Does it, for instance, help for the issue reported in the thread starting at http://cygwin.com/ml/cygwin/2009-05/msg00245.html? After examining the issue Lenik reported in the above thread, I'm at a loss how to solve this problem in a generic way. I may be dense, as all of my internationlization experience was from the late 90's. But in my experience the only solution for this is a cognizant effort on behalf of the user (or admin). [...] Any better ideas? Not necessarily better, but here is a chart: Sys: App:function expects/returns NULL: NULL: UTF-8 C/UA: NULL: UTF-8 NULL: C/UA: UTF-8 C/UA: C/UA: UTF-8 SPEC: NULL: System Locale SPEC: C/UA: UTF-8 NULL SPEC: Application Locale C/UA: SPEC: Application Locale SPEC: SPEC: Application Locale What I just implemented basically matches the above, except for SPEC: NULL: System Locale This will also use UTF-8. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
Hi. My idea is as follows: 1) separate mbtowc/wctomb function entries to library usage and system usage. (__mbtowc/__wctomb __sys_mbtowc/__sys_wctomb) 2) If call setlocale(LC_CTYPE) by locale != C, then lib == sys. 3) If call setlocale(LC_CTYPE) by locale == C, then sys is set by LC_ALL/LC_CTYPE/LANG. If LC_ALL/LC_CTYPE/LANG are not set, use UTF-8 converter. Cygwin startup call setlocale(LC_CTYPE, C) at winsup/cygwin/dcrt0.cc. I think that the result is as follows: 1) LANG=C lib = ascii converter, sys = UTF-8 converter. 2) LANG=xx_XX.ENCODING not call setlocale. lib = ascii converter, sys = ENCODING converter. 3) LANG=xx_XX.ENCODING call setlocale(LC_ALL, ). lib = ENCODING converter, sys = ENCODING converter. I think that [cat `read_dir_entry_and_print_app`] works correctly above all. I am writing this patch and test code now. One problem can't be solved this way: If an application fetches and stores a filename, then switches the locale, and then tries to use the filename in another system call, the filename is potentially broken. If the application switches the encoding while processing, I think that the problem is a responsibility of the application. 2009/5/13 Corinna Vinschen corinna-cyg...@cygwin.com: On May 12 19:37, Corinna Vinschen wrote: On May 13 02:29, IWAMURO Motonori wrote: I propose that the filename encoding in C locale uses UTF-8 instead of SO/UTF-8. There are three reasons: That's an interesting thought. Do you have a patch and, if so, did you try it? Does it, for instance, help for the issue reported in the thread starting at http://cygwin.com/ml/cygwin/2009-05/msg00245.html? After examining the issue Lenik reported in the above thread, I'm at a loss how to solve this problem in a generic way. The problem is that the filename changes dependent on the character set used in $LANG. The reason is that every time a multibyte filename has to be generated, it has to be converted from UTF-16 to multibyte. For instance, taking one of the filename from Lenik's example. It's stored on the filesystem as the UTF-16 sequence \u684c \u9762. If I set LANG to en_US.UTF-8, a readdir(2) call returns the multibyte sequence 0xe6 0xa1 0x8c 0xe9 0x9d 0xa2 If I set LANG to en_US.GBK, `ls' returns the filename 0xd7 0xc0 0xc3 0xe6 And in case LANG=C, `ls' returns 0x0e 0xe6 0xa1 0x8c 0x0e 0xe9 0x9d 0xa2 So, dependent on the character set setting in the application, the idea of the filename differs. That's not exactly helpful for interoperability between different applications. I can think of two potential solutions to fix this problem: (1) Always return filenames in UTF-8 encoding and pretend that UTF-8 is the way files are stored on disk. That results in unchangable filenames which are always valid. But what if an application sets LANG=.SJIS and tries to create a file using SJIS character encoding? Should the file be created using the SJIS-UTF-16 conversion or should open fail with EILSEQ? That's not good. (2) If none of $LC_ALL/$LC_CTYPE/$LANG is set in the environment, then Cygwin uses the LC_CTYPE setting which corresponds to the current codepage. If one of $LC_ALL/$LC_CTYPE/$LANG is set in the environment, Cygwin uses that to convert pathnames. If the application uses setlocale, Cygwin uses that setting to convert pathnames. One problem can't be solved this way: If an application fetches and stores a filename, then switches the locale, and then tries to use the filename in another system call, the filename is potentially broken. Any better ideas? Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/ -- IWAMURO Motnori http://vmi.jp/ -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
Not necessarily better, but here is a chart: Sys: App: function expects/returns NULL: NULL: UTF-8 C/UA: NULL: UTF-8 NULL: C/UA: UTF-8 C/UA: C/UA: UTF-8 SPEC: NULL: System Locale SPEC: C/UA: UTF-8 NULL SPEC: Application Locale C/UA: SPEC: Application Locale SPEC: SPEC: Application Locale What is the System Locale in this context? Asking because Windows doesn't have locales as such, although it does have a default ANSI codepage. Andy -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
On May 14 01:03, IWAMURO Motonori wrote: Hi. My idea is as follows: 1) separate mbtowc/wctomb function entries to library usage and system usage. (__mbtowc/__wctomb __sys_mbtowc/__sys_wctomb) 2) If call setlocale(LC_CTYPE) by locale != C, then lib == sys. 3) If call setlocale(LC_CTYPE) by locale == C, then sys is set by LC_ALL/LC_CTYPE/LANG. If LC_ALL/LC_CTYPE/LANG are not set, use UTF-8 converter. That's basically how my patch works. Cygwin startup call setlocale(LC_CTYPE, C) at winsup/cygwin/dcrt0.cc. Yes, it does already. I am writing this patch and test code now. Btw., if you plan to write more and bigger patches for Cygwin, it would be necessary to sign a copyright assignment form. That's explained on http://cygwin.com/contrib.html. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
2009/5/14 Corinna Vinschen corinna-cyg...@cygwin.com: That's basically how my patch works. Sorry, I can't parse this sentence because of my poor English parser... Do you be writing the patch for this problem? Btw., if you plan to write more and bigger patches for Cygwin, it would be necessary to sign a copyright assignment form. That's explained on http://cygwin.com/contrib.html. Ummm, it seems to take time very much... -- IWAMURO Motnori http://vmi.jp/ -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
On May 14 02:25, IWAMURO Motonori wrote: 2009/5/14 Corinna Vinschen corinna-cyg...@cygwin.com: That's basically how my patch works. Sorry, I can't parse this sentence because of my poor English parser... No worries. Do you be writing the patch for this problem? I already wrote that patch, see http://cygwin.com/ml/cygwin-cvs/2009-q2/msg00066.html It seems to do what you are proposing. Btw., if you plan to write more and bigger patches for Cygwin, it would be necessary to sign a copyright assignment form. That's explained on http://cygwin.com/contrib.html. Ummm, it seems to take time very much... Yes, unfortunately. But it's really required for non-trivial patches. Sorry. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
2009/5/14 Corinna Vinschen corinna-cyg...@cygwin.com: I already wrote that patch, see http://cygwin.com/ml/cygwin-cvs/2009-q2/msg00066.html It seems to do what you are proposing. I read it and built cygwin1.dll. It seems to work correctly. Should the following part not be modified? winsup/cygwin/fhandler_console.cc: dev_state-con_mbtowc = __mbtowc; dev_state-con_wctomb = __wctomb; But I think the patch solves only the case of UTF-8 in the thread starting at http://cygwin.com/ml/cygwin/2009-05/msg00245.html. It is necessary to separate the following variables for the library and for the system to support encoding that is not UTF-8. - __mb_cur_max - lc_ctype_charset - __mbtowc - __wctomb And these variables are set by LC_ALL/LC_CTYPE/LANG if call setlocale(LC_CTYPE, C). -- IWAMURO Motnori http://vmi.jp/ -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
On May 14 04:13, IWAMURO Motonori wrote: 2009/5/14 Corinna Vinschen corinna-cyg...@cygwin.com: I already wrote that patch, see http://cygwin.com/ml/cygwin-cvs/2009-q2/msg00066.html It seems to do what you are proposing. I read it and built cygwin1.dll. It seems to work correctly. Should the following part not be modified? winsup/cygwin/fhandler_console.cc: dev_state-con_mbtowc = __mbtowc; dev_state-con_wctomb = __wctomb; I'd rather not. It only affects the console and if LANG=C I'd rather see the single bytes which make up the path instead of the corresponding UTF-8 character. But I think the patch solves only the case of UTF-8 in the thread starting at http://cygwin.com/ml/cygwin/2009-05/msg00245.html. It is necessary to separate the following variables for the library and for the system to support encoding that is not UTF-8. - __mb_cur_max - lc_ctype_charset - __mbtowc - __wctomb I understand what you're up to, but right now I'm not really sure that this is the way to go. I had this idea as well at one point, but, thinking about it, I see a couple of potential problems. I don't want to decouple the libraries' idea of a string from the application's idea. I tried various scenarios with the current solution and they all worked ok, one way or the other. I'm sure there are still some which don't work, but before doing what you propose, I'd rather see explicit failures. And have some time to discuss whether these are something the user can or even should fix or workaround alone. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
On May 13 21:46, Corinna Vinschen wrote: On May 14 04:13, IWAMURO Motonori wrote: 2009/5/14 Corinna Vinschen corinna-cyg...@cygwin.com: I already wrote that patch, see http://cygwin.com/ml/cygwin-cvs/2009-q2/msg00066.html It seems to do what you are proposing. I read it and built cygwin1.dll. It seems to work correctly. Should the following part not be modified? winsup/cygwin/fhandler_console.cc: dev_state-con_mbtowc = __mbtowc; dev_state-con_wctomb = __wctomb; I'd rather not. It only affects the console and if LANG=C I'd rather see the single bytes which make up the path instead of the corresponding UTF-8 character. Hm, maybe I misunderstood. In which manner should this be modifed? Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
Am 13.05.2009, 17:17 Uhr, schrieb Corinna Vinschen corinna-cyg...@cygwin.com: I followed the suggestion to use UTF-8 for internal conversions when the locale is set to C. This will also be used as default conversion when converting the Windows environment from UTF-16 to multibyte, unless the environment contains a valid LC_ALL/LC_CTYPE/LANG setting. The current working directory was also potentially unusable, if an application switched the locale. Now the CWD is re-evaluated after a setlocale call. Is Unicode normalization an issue here? -- Matthias Andree -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
[1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
Hi. I propose that the filename encoding in C locale uses UTF-8 instead of SO/UTF-8. There are three reasons: 1. for the interoperability between Cygwin and various UNIX-like systems (Linux, *BSD, Solaris, and so on). UNIX-like systems treat the filename as 8bit byte array, and many applications on the systems send or receive filename information without locale. (mercurial, git, rsync, and so on). 2. UTF-8 is the only encoding that can treat multi languages. 3. Today, the default encoding of modern UNIX-like systems is UTF-8. Please examine it. Thanks. -- IWAMURO Motnori http://vmi.jp/ -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
On May 13 02:29, IWAMURO Motonori wrote: Hi. I propose that the filename encoding in C locale uses UTF-8 instead of SO/UTF-8. There are three reasons: 1. for the interoperability between Cygwin and various UNIX-like systems (Linux, *BSD, Solaris, and so on). UNIX-like systems treat the filename as 8bit byte array, and many applications on the systems send or receive filename information without locale. (mercurial, git, rsync, and so on). 2. UTF-8 is the only encoding that can treat multi languages. 3. Today, the default encoding of modern UNIX-like systems is UTF-8. That's an interesting thought. Do you have a patch and, if so, did you try it? Does it, for instance, help for the issue reported in the thread starting at http://cygwin.com/ml/cygwin/2009-05/msg00245.html? Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
On May 13 02:29, IWAMURO Motonori wrote: Hi. I propose that the filename encoding in C locale uses UTF-8 instead of SO/UTF-8 What the heck is SO/UTF-8? -- Mark J. Reed markjr...@gmail.com -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
On May 12 15:13, Mark J. Reed wrote: On May 13 02:29, IWAMURO Motonori wrote: Hi. I propose that the filename encoding in C locale uses UTF-8 instead of SO/UTF-8 What the heck is SO/UTF-8? http://cygwin.com/1.7/cygwin-ug-net/using-specialnames.html#pathnames-unusual Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
On Tue, May 12, 2009 at 3:22 PM, Corinna Vinschen http://cygwin.com/1.7/cygwin-ug-net/using-specialnames.html#pathnames-unusual OK, got it. So Mr. Iwamuro's proposal is that Cygwin ignore the locale setting, and just automatically convert the Windows UTF-16 filenames to UTF-8 (and back) no matter what. That seems rife with possible confusion, though. If I have my codepage set to ISO-2022 and paste in a filename, I expect it to be interpreted as ISO-2022, not as UTF-8 (which will probably fail with an invalid encoding sequence). OTOH, the SO/UTF-8 hack would seem to bode ill for the portability of, say, tar archives created under Cygwin. -- Mark J. Reed markjr...@gmail.com -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: [1.7] Proposal: the filename encoding in C locale uses UTF-8 instead of SO/UTF-8
On May 12 15:53, Mark J. Reed wrote: On Tue, May 12, 2009 at 3:22 PM, Corinna Vinschen http://cygwin.com/1.7/cygwin-ug-net/using-specialnames.html#pathnames-unusual OK, got it. So Mr. Iwamuro's proposal is that Cygwin ignore the locale setting, and just automatically convert the Windows UTF-16 filenames to UTF-8 (and back) no matter what. No. Only if LANG=C. That seems rife with possible confusion, though. If I have my codepage set to ISO-2022 and paste in a filename, I expect it to be interpreted Cygwin 1.7 doesn't use the codepage. It uses what $LANG says. See http://cygwin.com/1.7/cygwin-ug-net/setup-locale.html as ISO-2022, not as UTF-8 (which will probably fail with an invalid encoding sequence). OTOH, the SO/UTF-8 hack would seem to bode ill for the portability of, say, tar archives created under Cygwin. The filenames potentially look weird, but they are valid filenames. If anybody has a better idea how to workaround the problem of UTF-16 chars which don't translate into the current singlebyte or multibyte charset, feel free to suggest. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/