Re: Bug#467249: FW by [EMAIL PROTECTED] : Bug#467249: man-db: over sensitive on the spell of locale
On Thu, Feb 28, 2008 at 10:10:32PM +, brian m. carlson wrote: > On Thu, Feb 28, 2008 at 09:30:55PM +, Colin Watson wrote: > >On Thu, Feb 28, 2008 at 09:21:41PM +0100, Adam Borowski wrote: > >man-db really does have some special-casing here. Trust me. It was > >necessary at the time. There are a finite number of known aliases for > >the very small number of locales in question, and until it becomes > >unnecessary I will simply support those. > > > >(And I agree that it should go away, but can't easily just yet.) > > Is there some way to query what character set a locale uses? Yes, nl_langinfo (CODESET). > If not, I think that man-db should default to UTF-8 (since that *is* > the standard on Debian) and handle exceptions to that. Processing an > ASCII manpage as UTF-8 is a no-op. And it's pretty easy to tell if > something isn't valid UTF-8, and man-db can handle that as it normally > would. Please review the changes that I made in man-db 2.5.0 and 2.5.1, which I think make this speculation unnecessary. > AIUI, PostScript doesn't have UTF-8 support either, yet it seems to work > just fine. Anyway, newer versions of groff have a conversion tool that > maps UTF-8 (or any arbitrary character set) input into glyph names. But > Debian's groff has been very heavily patched with support for kinsoku > shori (prohibition character handling) and so we cannot simply update to > a newer version. Believe me, if it were that easy, I'm sure Colin would > have done it. Indeed so (I have tried before). I've had it with special-cased hacks to groff - I want either something that goes upstream, or else to stick with what we have until something *can* go upstream. I'm finished with nasty typographically-unsound workarounds. > >Are you working with Brian M. Carlson on this? He has been working on a > >solution acceptable to groff upstream, which is, frankly, the only way I > >want to go now. He has already made substantial progress with character > >class support. > > Please be aware that I have little time with school right now, so this > may not be implemented soon. In fact, it may not be ready in time for > lenny's release. I will sit down and work on it some more soon, but my > time is limited. If people want more information on my plan of attack, > please do let me know, and I'll be happy to share. Drat. Understood, though. I do follow the groff list (when my spam filters haven't decided that it's statistically all spam ...) and do hope to find time to build something useful on top of the work you've posted there already. Cheers, -- Colin Watson [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: FW by [EMAIL PROTECTED] : Bug#467249: man-db: over sensitive on the spell of locale
On Fri, Feb 29, 2008 at 08:52:12AM +0100, Petter Reinholdtsen wrote: > [Michelle Konzack] > > It seems there is a common problem while setting up the correct UNICODE > > locale in systems. As the posster in the attached message has written, > > he has setup his locale to "zh_CN.utf8" which is wrong, but as he has > > written too, the output of "locale -a" show it. > > 'locale -a' do not show that the locale is working, it just show what > is set in the environment. locale will indicate whether a locale is working by means of the messages it prints on standard error if it isn't: $ LANG=zz_ZZ.utf8 LC_ALL=zz_ZZ.utf8 locale locale: Cannot set LC_CTYPE to default locale: No such file or directory locale: Cannot set LC_MESSAGES to default locale: No such file or directory locale: Cannot set LC_ALL to default locale: No such file or directory LANG=zz_ZZ.utf8 [...] $ LANG=zh_CN.utf8 LC_ALL=zh_CN.utf8 locale LANG=zh_CN.utf8 [...] (The same goes for 'locale -a', which in fact does not show what is set in the environment at all, but does what it is documented to do: "Write names of available locales.") > Use 'locale charmap' to check that the locale is working and that the > correct character set is selected. If it return 'ANSI_X3.4-1968' > (which is ASCII), the locale isn't working (unless it is a locale that > uses ASCII, not very likely). If it show 'UTF-8', the locale settings > are working. $ LANG=zh_CN.utf8 locale charmap UTF-8 > In the case you describe, I believe the only fix is to get the user to > stop using an invalid and non-existing locale, It is neither an invalid nor a non-existing locale. It is not in the form documented in /usr/share/i18n/SUPPORTED, but it is in the canonical form into which glibc normalises it internally. See the _nl_normalize_codeset function in glibc/intl/l10nflist.c. > and instead use the correct locale string, which I would suspect is > 'zh_CN.UTF-8'. The only workaround to this would be to rewrite glibc > and locales, and it does not seem useful to me. This is not a glibc bug. This is not a locales bug. This is a man-db bug. I am both the Debian maintainer and the upstream maintainer and I have accepted the bug (see the bug log). I wish people would stop trying to argue that it isn't a bug or that it is a bug somewhere else. Cheers, -- Colin Watson [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: FW by [EMAIL PROTECTED] : Bug#467249: man-db: over sensitive on the spell of locale
[Michelle Konzack] > Hello Maintainers, > > It seems there is a common problem while setting up the correct UNICODE > locale in systems. As the posster in the attached message has written, > he has setup his locale to "zh_CN.utf8" which is wrong, but as he has > written too, the output of "locale -a" show it. 'locale -a' do not show that the locale is working, it just show what is set in the environment. Use 'locale charmap' to check that the locale is working and that the correct character set is selected. If it return 'ANSI_X3.4-1968' (which is ASCII), the locale isn't working (unless it is a locale that uses ASCII, not very likely). If it show 'UTF-8', the locale settings are working. In the case you describe, I believe the only fix is to get the user to stop using an invalid and non-existing locale, and instead use the correct locale string, which I would suspect is 'zh_CN.UTF-8'. The only workaround to this would be to rewrite glibc and locales, and it does not seem useful to me. Happy hacking, -- Petter Reinholdtsen -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Bug#467249: FW by [EMAIL PROTECTED] : Bug#467249: man-db: over sensitive on the spell of locale
On Thu, Feb 28, 2008 at 09:30:55PM +, Colin Watson wrote: On Thu, Feb 28, 2008 at 09:21:41PM +0100, Adam Borowski wrote: man-db really does have some special-casing here. Trust me. It was necessary at the time. There are a finite number of known aliases for the very small number of locales in question, and until it becomes unnecessary I will simply support those. (And I agree that it should go away, but can't easily just yet.) Is there some way to query what character set a locale uses? If not, I think that man-db should default to UTF-8 (since that *is* the standard on Debian) and handle exceptions to that. Processing an ASCII manpage as UTF-8 is a no-op. And it's pretty easy to tell if something isn't valid UTF-8, and man-db can handle that as it normally would. Of course, I'm not contributing code, so my opinion is worth what you paid for it. Too bad, groff doesn't have real Unicode support, and supports only several special-cased locales (which may then be transcoded as UTF-8, but they still get wrapped into their old-style charsets). AIUI, PostScript doesn't have UTF-8 support either, yet it seems to work just fine. Anyway, newer versions of groff have a conversion tool that maps UTF-8 (or any arbitrary character set) input into glyph names. But Debian's groff has been very heavily patched with support for kinsoku shori (prohibition character handling) and so we cannot simply update to a newer version. Believe me, if it were that easy, I'm sure Colin would have done it. Are you working with Brian M. Carlson on this? He has been working on a solution acceptable to groff upstream, which is, frankly, the only way I want to go now. He has already made substantial progress with character class support. Please be aware that I have little time with school right now, so this may not be implemented soon. In fact, it may not be ready in time for lenny's release. I will sit down and work on it some more soon, but my time is limited. If people want more information on my plan of attack, please do let me know, and I'll be happy to share. In fact, I'm off to hack some more on groff right now. -- brian m. carlson / brian with sandals: Houston, Texas, US +1 713 440 7475 | http://crustytoothpaste.ath.cx/~bmc | My opinion only troff on top of XML: http://crustytoothpaste.ath.cx/~bmc/code/thwack OpenPGP: RSA v4 4096b 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187 signature.asc Description: Digital signature
Re: Bug#467249: FW by [EMAIL PROTECTED] : Bug#467249: man-db: over sensitive on the spell of locale
On Thu, Feb 28, 2008 at 09:21:41PM +0100, Adam Borowski wrote: > On Thu, Feb 28, 2008 at 10:42:30AM +0100, Michelle Konzack wrote: > > It seems there is a common problem while setting up the correct UNICODE > > locale in systems. As the posster in the attached message has written, > > he has setup his locale to "zh_CN.utf8" which is wrong, but as he has > > written too, the output of "locale -a" show it. > > No way which way the _locale_ is spelt (including "vi_VI" without even the > word "utf" inside), Irrelevant to this bug, as you'll see if you look at the code. > the _charset_ is UTF-8. No program ever should look at the locale's > name, as it has more quirks like this. Checking the charset will get > you what you want. > > > I think, there should be a global solution for this, since patching > > man-db is worthless. > > Actually, it's groff what's at fault here. Mostly. man-db really does have some special-casing here. Trust me. It was necessary at the time. There are a finite number of known aliases for the very small number of locales in question, and until it becomes unnecessary I will simply support those. (And I agree that it should go away, but can't easily just yet.) Please don't drag groff into this bug. I really hate it when bugs drift wildly off their original (accurately-constrained) topic despite attempts to haul them back. It makes them impossible to keep organised. > > $ LANG=zh_CN.UTF-8 man --warnings -l ls.zh_CN.1 > /dev/null > > $ LANG=zh_CN.utf8 man --warnings -l ls.zh_CN.1 > /dev/null > > :9: warning: can't find special character `u013F' > > :9: warning: can't find special character `u011A' > > :9: warning: can't find special character `u021D' > > :11: warning: can't find special character `u0321' > > :11: warning: can't find special character `u04AA' > > :12: warning: can't find special character `u0461' > > // snip > > Too bad, groff doesn't have real Unicode support, and supports only several > special-cased locales (which may then be transcoded as UTF-8, but they still > get wrapped into their old-style charsets). > > Instead of changing the special-case recognition, I would instead completely > skip special-casing and just treat all characters equally. Including, but > not limited to, u013F and u0461. Are you working with Brian M. Carlson on this? He has been working on a solution acceptable to groff upstream, which is, frankly, the only way I want to go now. He has already made substantial progress with character class support. Treating all characters equally will absolutely not be acceptable to groff upstream. groff is a typesetter and needs to know about properties of characters. Cheers, -- Colin Watson [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: FW by [EMAIL PROTECTED] : Bug#467249: man-db: over sensitive on the spell of locale
On Thu, Feb 28, 2008 at 10:42:30AM +0100, Michelle Konzack wrote: > Hello Maintainers, > > It seems there is a common problem while setting up the correct UNICODE > locale in systems. As the posster in the attached message has written, > he has setup his locale to "zh_CN.utf8" which is wrong, but as he has > written too, the output of "locale -a" show it. No way which way the _locale_ is spelt (including "vi_VI" without even the word "utf" inside), the _charset_ is UTF-8. No program ever should look at the locale's name, as it has more quirks like this. Checking the charset will get you what you want. > I think, there should be a global solution for this, since patching > man-db is worthless. Actually, it's groff what's at fault here. Mostly. > $ LANG=zh_CN.UTF-8 man --warnings -l ls.zh_CN.1 > /dev/null > $ LANG=zh_CN.utf8 man --warnings -l ls.zh_CN.1 > /dev/null > :9: warning: can't find special character `u013F' > :9: warning: can't find special character `u011A' > :9: warning: can't find special character `u021D' > :11: warning: can't find special character `u0321' > :11: warning: can't find special character `u04AA' > :12: warning: can't find special character `u0461' > // snip Too bad, groff doesn't have real Unicode support, and supports only several special-cased locales (which may then be transcoded as UTF-8, but they still get wrapped into their old-style charsets). Instead of changing the special-case recognition, I would instead completely skip special-casing and just treat all characters equally. Including, but not limited to, u013F and u0461. I've did some initial work at this, but unfortunately I'm dead busy right now. For "show me the code", working but not good enough to even to submit to Colin pan-Unicode groff and man-db are at deb-src http://angband.pl/debian sid main (just don't look inside, they're too ugly to live). On the upside, on tty everything but RTL (Hebrew/Arabic) works just fine, including CJK, Vietnamese, Devanagari and cuneiform, even all together in one manpage (try "man utf8test"). What's lacking is support for html (should be trivial), ps (aargh...) and other devices. I'm afraid I can do nothing at least until late friday... but it looks like we may be able to help Colin squash at least this bastion of locale dependency. -- 1KB // Microsoft corollary to Hanlon's razor: // Never attribute to stupidity what can be // adequately explained by malice. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
FW by [EMAIL PROTECTED] : Bug#467249: man-db: over sensitive on the spell of locale
Hello Maintainers, It seems there is a common problem while setting up the correct UNICODE locale in systems. As the posster in the attached message has written, he has setup his locale to "zh_CN.utf8" which is wrong, but as he has written too, the output of "locale -a" show it. I have many customers with the same problem... I think, there should be a global solution for this, since patching man-db is worthless. Please discuse this problem and let me stay in the MFT. Thanks, Greetings and nice Day Michelle Konzack Systemadministrator Tamay Dogan Network Debian GNU/Linux Consultant - Forwarded message from LI Daobing <[EMAIL PROTECTED]> - Date: Sun, 24 Feb 2008 11:51:44 +0800 From: LI Daobing <[EMAIL PROTECTED]> To: Debian Bug Tracking System <[EMAIL PROTECTED]> Subject: Bug#467249: man-db: over sensitive on the spell of locale X-PTS-Package: man-db X-Debian-PR-Package: man-db Package: man-db Version: 2.5.1-2 Severity: important when set locale to zh_CN.UTF-8, I can view a chinese manpage with "man -l ls.zh_CN.1", but when set locale to zh_CN.utf8, I got many rubbish charaters on the screen. and the information generated by "locale -a" is zh_CN.utf8, so many users set locale to utf8 instead of UTF-8. the ls.zh_CN.1 in attchment. please fix it, thanks. or you can reproduce this bug with following commands. Thanks. $ LANG=zh_CN.UTF-8 man --warnings -l ls.zh_CN.1 > /dev/null $ LANG=zh_CN.utf8 man --warnings -l ls.zh_CN.1 > /dev/null :9: warning: can't find special character `u013F' :9: warning: can't find special character `u011A' :9: warning: can't find special character `u021D' :11: warning: can't find special character `u0321' :11: warning: can't find special character `u04AA' :12: warning: can't find special character `u0461' // snip - End forwarded message --- -- Linux-User #280138 with the Linux Counter, http://counter.li.org/ # Debian GNU/Linux Consultant # Michelle Konzack Apt. 917 ICQ #328449886 +49/177/935194750, rue de Soultz MSN LinuxMichi +33/6/61925193 67100 Strasbourg/France IRC #Debian (irc.icq.com) signature.pgp Description: Digital signature