Re: Emacs 21.2 non-ASCII keysym problems
Kenichi Handa <[EMAIL PROTECTED]> writes: > At first, please post this kind of bug report to > [EMAIL PROTECTED] (or to [EMAIL PROTECTED] if you > are using a pretest version). Indeed (but _after reading the documentation_ about International features). Also, the description seems to concern a modified version which presumably RedHat should support, perhaps including whatever messed up and put emacs-mule-encoded stuff in the message. > I think this problem is fixed in the HEAD branch. > > And, as emacs-unicode branch was made earlier, this problem > is not yet fixed in emacs-unicode. It is, but the treatment of keysyms has been revamped anyway, so the issue wouldn't arise in this case. If I remember correctly, a workaround for Emacs 21.2 is to set the keyboard coding system. -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: Emacs and UTF-8 locale
> Kenichi Handa writes: > It seems that it doesn't have a major problem, I hope not, because I've basically used your customization hooks or similar ones and done the sort of things you'd talked about at some time! > but I found one problem related to handling unibyte case. I didn't expect it to do anything sensible with unibyte, but if there's an easy to improve it, that would be fine. > If unify-8859-on-decoding-mode is on, for instance, in > latin-2 lang. env., 8859-2 characters files are decoded into > latin-iso8859-1 and mule-unicode-0100-24ff. But, C-q XXX > still inserts latin-iso8859-2 characters. Yes. I'm not sure that should change, but the relevant primitives could now use `translation-table-for-input'. It wasn't the sort of thing I could control in user-level customization anyway, without kludging it with a post-command hook. > And, when we paste mule-unicode-0100-24ff characters into > unibyte buffer, or paste unibyte string into a multibyte > buffer, they are not correctly converted. What would be correct? Is general Unicode text any different to, say, JISX-based Japanese in that respect? -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: Emacs and UTF-8 locale
> Tomohiro KUBOTA writes: > Thus, portable softwares should check environment variables when > nl_langinfo() is not available, though this method can result in > wrong encoding. Emacs will be able to do that anyhow, even if nl_langinfo is available. >> From users' viewpoint, to declare encoding more clearly, it is a >good idea to define LANG variable including codeset part. Probably, in principle, but that doesn't work generally (outside Emacs) e.g. try LC_CTYPE=en_GB.iso8859-15 on Debian testing. -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: Emacs and UTF-8 locale
> Markus Kuhn writes: > Another example for why using "nl_langinfo(CODESET)" or "locale > charmap" is far better than looking at the environment variables > with obscure rules: On the contrary, as I tried to explain. In particular, I added Latin-9 support to Emacs long ago, and Latin-9 v. Latin-1 is just the sort of thing I was concerned about. Anyhow, I'm surprised any system has to change today and doubt it would be a good idea to change things under users' feet. Presumably they've already been using @euro or whatever. Emacs needs those `obscure rules', whether or not it can use nl_langinfo. If that returns something meaning ASCII, we probably want to look for a sensible language environment using the current code. > The mapping between locale names and encodings should really > be left to where it belongs, namely the C library. If it changes there, Emacs users could be messed up. Apart from the effect on using existing data, they may be surprised when creating new files and they'd probably find themselves using the wrong input method. -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: Emacs and UTF-8 locale
> Eli Zaretskii writes: > I think the decision to leave it disabled in v21.1 was correct, > since the application code, written by Dave, to make that support > reasonably complete was only recently added to the CVS tree. I don't know what that means. The trivial utf-8 language environment I offered could easily have been installed to fix the bug of not honouring the locale. The only way I've significantly improved the support of utf-8 encoding recently is by additions to characters.el and providing experimental level 2 support for some scripts. I don't think that's too important. > The changes for which this addition is useful are installed only on > the development trunk, What changes? I don't know why it would have been any different in 21.1, which is what I'm basically using anyhow, and on which I've tested things. [To use nl_langinfo, I added a single function and changed `locale-name' to `(or locale-name (locale-codeset))' once.] > Without those Dave's additions, turning on the UTF support by > default would screw users. I don't know why, and I'm the only one who tested it as far as I know. > I believe some of those who tried to do that with stock Emacs 21.1 > complained about problems on gnu.emacs.bug, I don't know what that refers to, so any such problems probably haven't been addressed by anything I've done. > the same kind of problems whose anticipation was the reason for > leaving UTF disabled in the last release. I can address actual test cases. I'm running a 21.1-based Emacs so I can't necessarily reproduce problems, but I might well be able to spot causes. All I've heard is vague suggestions of problems and statements about what I've implemented that are wrong by demonstration. -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: Emacs and UTF-8 locale
> Paul Eggert writes: > I think it's reasonable to use this kind of approach, but I suggest > deferring it for a major release, Of course. I'm not convinced it's a big issue anyhow, especially compared with actually providing support for the relevant locales. > I believe I used Solaris 7; could have been an earlier version. I had a quick look at Solaris 8, and I think it wasn't fully consistent with glibc. I assumed Emacs should take its cue from glibc, but if that actually causes a problem, maybe the list could be adjusted depending on the system for which Emacs is built. -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: Emacs and UTF-8 locale
> Roozbeh Pournader writes: > Will you accept 'fa_IR' as another example? Do you mean that there is Farsi support for Emacs? -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: Emacs and UTF-8 locale
> Eli Zaretskii writes: > this was disabled previously because stock Emacs 21.1 lacked some > user-level features which are required for a decent support of > UTF-8 locales. It lacked _any_ built-in support for utf-8 at the time the locale work was done. That's why eggert special-cased it, according to the commentary and what I recall in mail. I didn't hear a good reason for maintaining the exclusion in 21.1. -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: Emacs and UTF-8 locale
> Markus Kuhn writes: > There are UTF-8 locales in use (e.g., vi_VI), which do NOT have > UTF-8 in their name, That looks like a bad example since, at least in glibc 2.2.4, the locale is listed as `vi_VN.UTF-8'. That's fortunate, since Emacs uses VISCII for the unqualified Vietnamese language environment. (Similarly for Devanagari.) I documented other cases from glibc in the code. Apparently they aren't all consistent with the source that eggert originally used. > therefore the direct test of the locale environment > variables is just a less reliable fallback option. > It is my understanding that elisp currently has no direct access to > the output of the API function nl_langinfo(CODESET), and I hope > this can be fixed. I've implemented it, but it's not installed, partly _because_ people may not end up with the coding they expect. I haven't yet tried to check compatibility properly. > Fortunately, there exists only one single standard string that > nl_langinfo(CODESET) returns in a UTF-8 locale, and that is > "UTF-8". (For ISO 8859-1, both "ISO-8859-1" and "ISO8859-1" are > used by different manufacturers.) Emacs already deals with that sort of issue and DTRT with the environment variables, including matching something more general than `UTF-8'. Please see the code in mule-cmds.el. > if (strstr(s, "UTF-8")) > utf8_mode = 1; Testing solely for utf-8 isn't useful. -- $ locale -c charmap LC_CTYPE ISO-8859-1 -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: unicode in emacs 21
> "EZ" == Eli Zaretskii <[EMAIL PROTECTED]> writes: EZ> Unless you refer to the CNS plane and Japanese Han characters, EZ> which were deliberately left ununified (in addition to the EZ> Unicode codepoints for those characters), I think you are EZ> mistaken. I.e., he's right. Someone needs to give a cogent argument why it's a problem in practice to have multiple representations if you can canonicalize as required, especially why this should be any different for Western scripts than for CJK. Note that I have some practical experience of this in Emacs. -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: unicode in emacs 21
> "EZ" == Eli Zaretskii <[EMAIL PROTECTED]> writes: EZ> The current plan for Unicode was discussed at length 3 years ago, and EZ> the result was what I described. I don't think it's wise for us to EZ> reopen that discussion again Well I, at least, don't understand why it's necessary, at least for technical reasons. I have a fair amount of experience as a user and implementor. -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: unicode in emacs 21
> "MK" == Markus Kuhn <[EMAIL PROTECTED]> writes: MK> If you can edit the UTF-8 test file MK> http://www.cl.cam.ac.uk/~mgk25/ucs/examples/ UTF-8-test.txt That's what I mean by test cases. I can't remember which ones fail, but I suspect it's non-BMP ones. There are a couple of ways to fix it, but I don't think it's important. -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: unicode in emacs 21
> "JK" == Jimmy Kaplowitz <[EMAIL PROTECTED]> writes: JK> It's the only editor I've used (including Yudit) that could JK> display the sequence U+0283 U+034D correctly. [With what font?] Note that character composition (combination) is a user-level feature in Emacs, so if rules are implemented which you don't like, you can change them. JK> Well, Emacs does have more features (including some that are less JK> essential, such as doctor mode :), but vim has quite enough for JK> most purposes. I assumed the point was specifically about the display, tty v. X. -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: unicode in emacs 21
> "MK" == Markus Kuhn <[EMAIL PROTECTED]> writes: MK> Using UTF-8 as the internal Emacs encoding is one way of achieving MK> continued guaranteed binary transparency, I.e., maintain a malformed internal representation?? MK> coming up with a tricky encoding for malformed UTF-8 sequences is MK> another one. We can maintain arbitrary byte sequences now. It's not terribly tricky, just not too robust through the use of the eight-bit-x charsets. I don't think it's very important that reading and writing malformed sequences by utf-8.el isn't always idempotent. Presumably the three or four relevant test cases could be addressed in the CCL, but I think there are better things to spend the time on. - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: unicode in emacs 21
> "MK" == Markus Kuhn <[EMAIL PROTECTED]> writes: MK> CJK Greek/Cyrillic characters are traditionally displayed as MK> double-width, whereas ISO 8859/ISO 10646 Greek & Cyrillic MK> characters are traditionally displayed single-width. Yes, but... MK> But surely all the European encodings such as ISO 8859, KOI, MK> etc. should be urgently unified with Unicode. The implementation you may recall hearing about earlier in the year is now available (posted to gnu.emacs.sources). - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: unicode in emacs 21
> "EZ" == Eli Zaretskii <[EMAIL PROTECTED]> writes: EZ> The problem is that characters are still not unified in Emacs 21. A package was contributed to do that for ISO 8859 characters. It's been posted to gnu.emacs.sources, so that shouldn't be an issue for anyone who's bothered by it. EZ> So we have two versions of Cyrillic characters, two versions of EZ> Greek characters, two versions of Hebrew characters, etc.: one EZ> version in the new Unicode set, the other version in the old Mule EZ> set. There are more than two, at least for Greek and Cyrillic. Those in the Far Eastern charsets could be unified too if anyone cared. This issue clearly doesn't apply only to the Unicode charsets, and, as a user, I don't think it's much of a problem in practice. EZ> What can I say except ``volunteers are welcome...'' etc.? I can't EZ> believe no one wants Unicode badly enough to work on its support in EZ> Emacs, but what do I do with facts which fly in my face? That view is unfair to the people who have done lots of work, himi in particular. `Working on Unicode support' in my book isn't restricted to implementing an apparently-unnecessary, disruptive, incompatible change to the internal encoding, even if it's what one wants ideally. - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
Re: unicode in emacs 21
> "OD" == Oliver Doepner <[EMAIL PROTECTED]> writes: OD> There is vim 6.x now with full utf-8 support on the xterm. [Does `full utf-8 support' mean level 3?] Emacs can do utf-8 i/o under ttys that support it, though you don't _need_ such support -- either input or output -- to edit utf-8 text. OD> It is much faster than emacs on x11 of course. I'm surprised that's much of an issue. I assume Emacs under X is much more capable. OD> I was happy to see Emacs 21 announced. but the unicode support OD> does not seem to have moved forward very much It's moved from zero to the state where it's perfectly fine for editing at least the Western technical text that interests me. E.g., Kuhn's UTF-8-demo.utf works modulo the level 2 text, for which one can add support straightforwardly at the Lisp level. It also allowed producing coding systems for all the 8-bit charsets for GNUish locales, which perhaps matters more in the wide world than utf-8 per se. With some customization, I can also at least _display_ utf-8-encoded CJK text. I can send and receive utf-8-encoded mail and browse utf-8-encoded web sites (with the development W3 package). The Mule-UCS package provides more if necessary, specifically better coverage of the BMP. OD> Is the internal representation still the special MULE format ??~ Yes. So what? [There has been much mis-representation of Mule, some of it malicious.] There is a yet-unimplemented scheme for coverage up to U+10 within that encoding. Even now, with Lisp-level changes one could build an (incompatible) Emacs to cover the BMP, sacrificing some of the standard charsets. -- Bragging about Unicode support: ‘2d sinθ = nλ’ is plain text. ☺ http://www.unicode.org/> - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/