Re: Emacs 21.2 non-ASCII keysym problems

2002-11-27 Thread Dave Love
Kenichi Handa <[EMAIL PROTECTED]> writes:

> At first, please post this kind of bug report to
> [EMAIL PROTECTED] (or to [EMAIL PROTECTED] if you
> are using a pretest version).

Indeed (but _after reading the documentation_ about International
features).  Also, the description seems to concern a modified version
which presumably RedHat should support, perhaps including whatever
messed up and put emacs-mule-encoded stuff in the message.

> I think this problem is fixed in the HEAD branch.
> 
> And, as emacs-unicode branch was made earlier, this problem
> is not yet fixed in emacs-unicode.

It is, but the treatment of keysyms has been revamped anyway, so the
issue wouldn't arise in this case.  If I remember correctly, a
workaround for Emacs 21.2 is to set the keyboard coding system.
--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: Emacs and UTF-8 locale

2002-01-01 Thread Dave Love

> Kenichi Handa writes:

 > It seems that it doesn't have a major problem,

I hope not, because I've basically used your customization hooks or
similar ones and done the sort of things you'd talked about at some
time!

 > but I found one problem related to handling unibyte case.

I didn't expect it to do anything sensible with unibyte, but if
there's an easy to improve it, that would be fine.

 > If unify-8859-on-decoding-mode is on, for instance, in
 > latin-2 lang. env., 8859-2 characters files are decoded into
 > latin-iso8859-1 and mule-unicode-0100-24ff.  But, C-q XXX
 > still inserts latin-iso8859-2 characters.

Yes.  I'm not sure that should change, but the relevant primitives
could now use `translation-table-for-input'.  It wasn't the sort of
thing I could control in user-level customization anyway, without
kludging it with a post-command hook.

 > And, when we paste mule-unicode-0100-24ff characters into
 > unibyte buffer, or paste unibyte string into a multibyte
 > buffer, they are not correctly converted.

What would be correct?  Is general Unicode text any different to, say,
JISX-based Japanese in that respect?
--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: Emacs and UTF-8 locale

2002-01-01 Thread Dave Love

> Tomohiro KUBOTA writes:

 > Thus, portable softwares should check environment variables when
 > nl_langinfo() is not available, though this method can result in
 > wrong encoding.

Emacs will be able to do that anyhow, even if nl_langinfo is
available.

 >> From users' viewpoint, to declare encoding more clearly, it is a
 >good idea to define LANG variable including codeset part.

Probably, in principle, but that doesn't work generally (outside
Emacs) e.g. try LC_CTYPE=en_GB.iso8859-15 on Debian testing.
--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: Emacs and UTF-8 locale

2002-01-01 Thread Dave Love

> Markus Kuhn writes:

 > Another example for why using "nl_langinfo(CODESET)" or "locale
 > charmap" is far better than looking at the environment variables
 > with obscure rules:

On the contrary, as I tried to explain.  In particular, I added
Latin-9 support to Emacs long ago, and Latin-9 v. Latin-1 is just the
sort of thing I was concerned about.  Anyhow, I'm surprised any system
has to change today and doubt it would be a good idea to change things
under users' feet.  Presumably they've already been using @euro or
whatever.

Emacs needs those `obscure rules', whether or not it can use
nl_langinfo.  If that returns something meaning ASCII, we probably
want to look for a sensible language environment using the current
code.

 > The mapping between locale names and encodings should really
 > be left to where it belongs, namely the C library.

If it changes there, Emacs users could be messed up.  Apart from the
effect on using existing data, they may be surprised when creating new
files and they'd probably find themselves using the wrong input
method.
--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: Emacs and UTF-8 locale

2001-12-21 Thread Dave Love

> Eli Zaretskii writes:

 > I think the decision to leave it disabled in v21.1 was correct,
 > since the application code, written by Dave, to make that support
 > reasonably complete was only recently added to the CVS tree.

I don't know what that means.  The trivial utf-8 language environment
I offered could easily have been installed to fix the bug of not
honouring the locale.  The only way I've significantly improved the
support of utf-8 encoding recently is by additions to characters.el
and providing experimental level 2 support for some scripts.  I don't
think that's too important.

 > The changes for which this addition is useful are installed only on
 > the development trunk,

What changes?  I don't know why it would have been any different in
21.1, which is what I'm basically using anyhow, and on which I've
tested things.  [To use nl_langinfo, I added a single function and
changed `locale-name' to `(or locale-name (locale-codeset))'  once.]

 > Without those Dave's additions, turning on the UTF support by
 > default would screw users.

I don't know why, and I'm the only one who tested it as far as I know.

 > I believe some of those who tried to do that with stock Emacs 21.1
 > complained about problems on gnu.emacs.bug,

I don't know what that refers to, so any such problems probably
haven't been addressed by anything I've done.

 > the same kind of problems whose anticipation was the reason for
 > leaving UTF disabled in the last release.

I can address actual test cases.  I'm running a 21.1-based Emacs so I
can't necessarily reproduce problems, but I might well be able to spot
causes.  All I've heard is vague suggestions of problems and
statements about what I've implemented that are wrong by
demonstration.
--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: Emacs and UTF-8 locale

2001-12-21 Thread Dave Love

> Paul Eggert writes:

 > I think it's reasonable to use this kind of approach, but I suggest
 > deferring it for a major release, 

Of course.  I'm not convinced it's a big issue anyhow, especially
compared with actually providing support for the relevant locales.

 > I believe I used Solaris 7; could have been an earlier version.

I had a quick look at Solaris 8, and I think it wasn't fully
consistent with glibc.  I assumed Emacs should take its cue from
glibc, but if that actually causes a problem, maybe the list could be
adjusted depending on the system for which Emacs is built.
--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: Emacs and UTF-8 locale

2001-12-21 Thread Dave Love

> Roozbeh Pournader writes:

 > Will you accept 'fa_IR' as another example?

Do you mean that there is Farsi support for Emacs?




--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: Emacs and UTF-8 locale

2001-12-19 Thread Dave Love

> Eli Zaretskii writes:

 > this was disabled previously because stock Emacs 21.1 lacked some
 > user-level features which are required for a decent support of
 > UTF-8 locales.

It lacked _any_ built-in support for utf-8 at the time the locale work
was done.  That's why eggert special-cased it, according to the
commentary and what I recall in mail.

I didn't hear a good reason for maintaining the exclusion in 21.1.
--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: Emacs and UTF-8 locale

2001-12-19 Thread Dave Love

> Markus Kuhn writes:

 > There are UTF-8 locales in use (e.g., vi_VI), which do NOT have
 > UTF-8 in their name,

That looks like a bad example since, at least in glibc 2.2.4, the
locale is listed as `vi_VN.UTF-8'.  That's fortunate, since Emacs uses
VISCII for the unqualified Vietnamese language environment.
(Similarly for Devanagari.)  I documented other cases from glibc in
the code.  Apparently they aren't all consistent with the source that
eggert originally used.

 > therefore the direct test of the locale environment
 > variables is just a less reliable fallback option.

 > It is my understanding that elisp currently has no direct access to
 > the output of the API function nl_langinfo(CODESET), and I hope
 > this can be fixed.

I've implemented it, but it's not installed, partly _because_ people
may not end up with the coding they expect.  I haven't yet tried to
check compatibility properly.

 > Fortunately, there exists only one single standard string that
 > nl_langinfo(CODESET) returns in a UTF-8 locale, and that is
 > "UTF-8".  (For ISO 8859-1, both "ISO-8859-1" and "ISO8859-1" are
 > used by different manufacturers.)

Emacs already deals with that sort of issue and DTRT with the
environment variables, including matching something more general than
`UTF-8'.  Please see the code in mule-cmds.el.

 > if (strstr(s, "UTF-8"))
 >   utf8_mode = 1;

Testing solely for utf-8 isn't useful.

-- 
$ locale -c charmap
LC_CTYPE
ISO-8859-1
--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: unicode in emacs 21

2001-11-04 Thread Dave Love

> "EZ" == Eli Zaretskii <[EMAIL PROTECTED]> writes:

 EZ> Unless you refer to the CNS plane and Japanese Han characters,
 EZ> which were deliberately left ununified (in addition to the
 EZ> Unicode codepoints for those characters), I think you are
 EZ> mistaken.

I.e., he's right.

Someone needs to give a cogent argument why it's a problem in practice
to have multiple representations if you can canonicalize as required,
especially why this should be any different for Western scripts than
for CJK.  Note that I have some practical experience of this in Emacs.
--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: unicode in emacs 21

2001-11-04 Thread Dave Love

> "EZ" == Eli Zaretskii <[EMAIL PROTECTED]> writes:

 EZ> The current plan for Unicode was discussed at length 3 years ago, and
 EZ> the result was what I described.  I don't think it's wise for us to
 EZ> reopen that discussion again

Well I, at least, don't understand why it's necessary, at least for
technical reasons.  I have a fair amount of experience as a user and
implementor.
--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: unicode in emacs 21

2001-11-04 Thread Dave Love

> "MK" == Markus Kuhn <[EMAIL PROTECTED]> writes:

 MK> If you can edit the UTF-8 test file

 MK>   http://www.cl.cam.ac.uk/~mgk25/ucs/examples/  UTF-8-test.txt

That's what I mean by test cases.  I can't remember which ones fail,
but I suspect it's non-BMP ones.  There are a couple of ways to fix
it, but I don't think it's important.
--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: unicode in emacs 21

2001-11-04 Thread Dave Love

> "JK" == Jimmy Kaplowitz <[EMAIL PROTECTED]> writes:

 JK> It's the only editor I've used (including Yudit) that could
 JK> display the sequence U+0283 U+034D correctly.

[With what font?]

Note that character composition (combination) is a user-level feature
in Emacs, so if rules are implemented which you don't like, you can
change them.

 JK> Well, Emacs does have more features (including some that are less
 JK> essential, such as doctor mode :), but vim has quite enough for
 JK> most purposes.

I assumed the point was specifically about the display, tty v. X.
--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: unicode in emacs 21

2001-10-29 Thread Dave Love

> "MK" == Markus Kuhn <[EMAIL PROTECTED]> writes:

 MK> Using UTF-8 as the internal Emacs encoding is one way of achieving
 MK> continued guaranteed binary transparency, 

I.e., maintain a malformed internal representation??

 MK> coming up with a tricky encoding for malformed UTF-8 sequences is
 MK> another one.

We can maintain arbitrary byte sequences now.  It's not terribly
tricky, just not too robust through the use of the eight-bit-x
charsets.

I don't think it's very important that reading and writing malformed
sequences by utf-8.el isn't always idempotent.  Presumably the three
or four relevant test cases could be addressed in the CCL, but I think
there are better things to spend the time on.
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: unicode in emacs 21

2001-10-29 Thread Dave Love

> "MK" == Markus Kuhn <[EMAIL PROTECTED]> writes:

 MK> CJK Greek/Cyrillic characters are traditionally displayed as
 MK> double-width, whereas ISO 8859/ISO 10646 Greek & Cyrillic
 MK> characters are traditionally displayed single-width.

Yes, but...

 MK> But surely all the European encodings such as ISO 8859, KOI,
 MK> etc. should be urgently unified with Unicode.

The implementation you may recall hearing about earlier in the year is
now available (posted to gnu.emacs.sources).
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: unicode in emacs 21

2001-10-28 Thread Dave Love

> "EZ" == Eli Zaretskii <[EMAIL PROTECTED]> writes:

 EZ> The problem is that characters are still not unified in Emacs 21.

A package was contributed to do that for ISO 8859 characters.  It's
been posted to gnu.emacs.sources, so that shouldn't be an issue for
anyone who's bothered by it.

 EZ> So we have two versions of Cyrillic characters, two versions of
 EZ> Greek characters, two versions of Hebrew characters, etc.:  one
 EZ> version in the new Unicode set, the other version in the old Mule
 EZ> set.

There are more than two, at least for Greek and Cyrillic.  Those in
the Far Eastern charsets could be unified too if anyone cared.  This
issue clearly doesn't apply only to the Unicode charsets, and, as a
user, I don't think it's much of a problem in practice.

 EZ> What can I say except ``volunteers are welcome...'' etc.?  I can't 
 EZ> believe no one wants Unicode badly enough to work on its support in 
 EZ> Emacs, but what do I do with facts which fly in my face?

That view is unfair to the people who have done lots of work, himi in
particular.  `Working on Unicode support' in my book isn't restricted
to implementing an apparently-unnecessary, disruptive, incompatible
change to the internal encoding, even if it's what one wants ideally.
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: unicode in emacs 21

2001-10-28 Thread Dave Love

> "OD" == Oliver Doepner <[EMAIL PROTECTED]> writes:

 OD> There is vim 6.x now with full utf-8 support on the xterm.

[Does `full utf-8 support' mean level 3?]

Emacs can do utf-8 i/o under ttys that support it, though you don't
_need_ such support -- either input or output -- to edit utf-8 text.

 OD> It is much faster than emacs on x11 of course.

I'm surprised that's much of an issue.  I assume Emacs under X is much
more capable.

 OD> I was happy to see Emacs 21 announced. but the unicode support
 OD> does not seem to have moved forward very much

It's moved from zero to the state where it's perfectly fine for
editing at least the Western technical text that interests me.  E.g.,
Kuhn's UTF-8-demo.utf works modulo the level 2 text, for which one can
add support straightforwardly at the Lisp level.  It also allowed
producing coding systems for all the 8-bit charsets for GNUish
locales, which perhaps matters more in the wide world than utf-8 per
se.  With some customization, I can also at least _display_
utf-8-encoded CJK text.  I can send and receive utf-8-encoded mail and
browse utf-8-encoded web sites (with the development W3 package).

The Mule-UCS package provides more if necessary, specifically better
coverage of the BMP.

 OD> Is the internal representation still the special MULE format ??~

Yes.  So what?  [There has been much mis-representation of Mule, some
of it malicious.]  There is a yet-unimplemented scheme for coverage up
to U+10 within that encoding.  Even now, with Lisp-level changes
one could build an (incompatible) Emacs to cover the BMP, sacrificing
some of the standard charsets.

-- 
Bragging about Unicode support: ‘2d sinθ = nλ’ is plain text. ☺
http://www.unicode.org/>
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/