rwar (was RE: unicode in emacs 21)

2001-10-28 Thread Edward Cherlin

Please subside. If this is a real issue we can arrange a fair side-by-side
test.


Edward Cherlin
There are lies, damned lies, and benchmarks.

>-Original Message-
>From: [EMAIL PROTECTED]
>[mailto:[EMAIL PROTECTED]]On Behalf Of Jimmy Kaplowitz
>Sent: Sun, October 28, 2001 9:19 AM
>To: [EMAIL PROTECTED]
>Cc: Oliver Doepner; [EMAIL PROTECTED]
>Subject: Re: unicode in emacs 21
>
>
>On Sun, Oct 28, 2001 at 05:04:22PM +, Dave Love wrote:
>> > "OD" == Oliver Doepner <[EMAIL PROTECTED]> writes:
>>
>>  OD> There is vim 6.x now with full utf-8 support on the xterm.
>>
>> [Does `full utf-8 support' mean level 3?]
>
>Well, it handles double-width characters as well as up to two combining
>characters. It's the only editor I've used (including Yudit) that could
>display the sequence U+0283 U+034D correctly.
>
>> Emacs can do utf-8 i/o under ttys that support it, though you don't
>> _need_ such support -- either input or output -- to edit utf-8 text.
>>
>>  OD> It is much faster than emacs on x11 of course.
>>
>> I'm surprised that's much of an issue.  I assume Emacs under X is much
>> more capable.
>
>Well, Emacs does have more features (including some that are less
>essential, such as doctor mode :), but vim has quite enough for most
>purposes.
>
>>  OD> I was happy to see Emacs 21 announced. but the unicode support
>>  OD> does not seem to have moved forward very much
>>
>> It's moved from zero to the state where it's perfectly fine for
>> editing at least the Western technical text that interests me.  E.g.,
>> Kuhn's UTF-8-demo.utf works modulo the level 2 text, for which one can
>> add support straightforwardly at the Lisp level.  It also allowed
>> producing coding systems for all the 8-bit charsets for GNUish
>> locales, which perhaps matters more in the wide world than utf-8 per
>> se.  With some customization, I can also at least _display_
>> utf-8-encoded CJK text.  I can send and receive utf-8-encoded mail and
>> browse utf-8-encoded web sites (with the development W3 package).
>
>Vim can display the UTF-8-demo file perfectly, with no exceptions. Also,
>although I haven't tested this, I am told it can write as well as
>display utf-8 CJK text.
>
>- Jimmy Kaplowitz
>[EMAIL PROTECTED] / [EMAIL PROTECTED]
>

-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: Unicode in Emacs again

2001-10-28 Thread Kenichi Handa

Eli Zaretskii <[EMAIL PROTECTED]> writes:
> On Sat, 27 Oct 2001, David Starner wrote:
>>  On Fri, Oct 26, 2001 at 01:35:26PM +0200, Oliver Doepner wrote:
>>  >   Emacs-Unicode-990824
>>  > --
>>  > Internal Character code:
>>  > 
>>  >   00      Unicode U+ - U+
>>  >   00      Unicode 20bit (via surrogate pair)
>>  >   01      Unicode 20bit (via surrogate pair)
>>  
>>  Why are astral characters going to be supported by surrogate pairs?
>>  That's just ugly, especially if elisp coders have to deal with
>>  surrogates. Considering the binary transparency demands, you also need
>>  to round-trip surrogate pairs in UTF-8 back to surrogate pairs, not
>>  astral characters.
>>  
>>  On second glance, it doesn't look like you're using surrogate pairs at
>>  all. Then why do you mention them? They're just an encoding trick; if
>>  you aren't using UTF-16, you can forget about them. The characters above
>>  U+ are U+1, U+10001, etc., not U+D800 U+DC00, U+D800 U+DC01,
>>  etc.

> Handa-san, could you please comment on that?  David is one of a few
> people who replied to my (quite desperate ;-) message posted to 
> gnu.emacs.bug a few days ago.

Please ignore the text "(via surrogate pair)".  It means
nothing, and I don't remember why I wrote that part.  :-(

Florian Weimer <[EMAIL PROTECTED]> writes:
> What does 'via surrogate pair' mean?  I guess the second line should
> read:

>>00      Unicode 20bit (U+1 - U+F)

Yes.   That's correct, and the third line shoud read as below:

   01      Unicode 20bit (U+10 - U+10)

>>01 0ppp     7 64kByte planes reserved for Emacs
>>01 1ppp     8 64kByte planes for private use
>>1x      for private use, CNS 3-16, and CCCII
>>  
>>  Private area is 18h - 3087FFh

> These are the characters from
>  1 1000  
> to
> 11  1111  .
> Is this range intentional?  It looks rather strange.

I don't remember well.  :-(  Perhaps to fill code-space for
CNS 3-16 and CCCII from the tail.   They require #xF7800
code points (== (96*96*14) + (96*96*96)), and #x3087FF ==
#x3F - #xF7800.

> Anyway, what does 'private use' mean? Reserved for GNU Emacs, for Lisp
> packages, for the end user?

It seems that we have not yet discussed it in detail.

---
Ken'ichi HANDA
[EMAIL PROTECTED]
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: unicode in emacs 21

2001-10-28 Thread Markus Kuhn

On 28 Oct 2001, Dave Love wrote:
>  EZ> So we have two versions of Cyrillic characters, two versions of
>  EZ> Greek characters, two versions of Hebrew characters, etc.:  one
>  EZ> version in the new Unicode set, the other version in the old Mule
>  EZ> set.
>
> There are more than two, at least for Greek and Cyrillic.  Those in
> the Far Eastern charsets could be unified too if anyone cared.

Full unification here would have the disadvantage that CJK Greek/Cyrillic
characters are traditionally displayed as double-width, whereas ISO
8859/ISO 10646 Greek & Cyrillic characters are traditionally displayed
single-width. Some CJK users might be quite happy about a lack of
unification here to preserve the display width of these characters. Same
for the block graphics characters, which xterm with ISO10646 fonts
displays single-width whereas kterm with JIS/etc. fonts displays in
double-width.

But surely all the European encodings such as ISO 8859, KOI, etc. should
be urgently unified with Unicode. The relevant standards have already been
(re)written to represent these encodings just as single-byte encodings of
ISO 10646 subsets.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: 

-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: unicode in emacs 21

2001-10-28 Thread Jimmy Kaplowitz

On Sun, Oct 28, 2001 at 05:04:22PM +, Dave Love wrote:
> > "OD" == Oliver Doepner <[EMAIL PROTECTED]> writes:
> 
>  OD> There is vim 6.x now with full utf-8 support on the xterm.
> 
> [Does `full utf-8 support' mean level 3?]

Well, it handles double-width characters as well as up to two combining
characters. It's the only editor I've used (including Yudit) that could
display the sequence U+0283 U+034D correctly.

> Emacs can do utf-8 i/o under ttys that support it, though you don't
> _need_ such support -- either input or output -- to edit utf-8 text.
> 
>  OD> It is much faster than emacs on x11 of course.
> 
> I'm surprised that's much of an issue.  I assume Emacs under X is much
> more capable.

Well, Emacs does have more features (including some that are less
essential, such as doctor mode :), but vim has quite enough for most
purposes.

>  OD> I was happy to see Emacs 21 announced. but the unicode support
>  OD> does not seem to have moved forward very much
> 
> It's moved from zero to the state where it's perfectly fine for
> editing at least the Western technical text that interests me.  E.g.,
> Kuhn's UTF-8-demo.utf works modulo the level 2 text, for which one can
> add support straightforwardly at the Lisp level.  It also allowed
> producing coding systems for all the 8-bit charsets for GNUish
> locales, which perhaps matters more in the wide world than utf-8 per
> se.  With some customization, I can also at least _display_
> utf-8-encoded CJK text.  I can send and receive utf-8-encoded mail and
> browse utf-8-encoded web sites (with the development W3 package).

Vim can display the UTF-8-demo file perfectly, with no exceptions. Also,
although I haven't tested this, I am told it can write as well as
display utf-8 CJK text.

- Jimmy Kaplowitz
[EMAIL PROTECTED] / [EMAIL PROTECTED]

 PGP signature


Re: unicode in emacs 21

2001-10-28 Thread Dave Love

> "EZ" == Eli Zaretskii <[EMAIL PROTECTED]> writes:

 EZ> The problem is that characters are still not unified in Emacs 21.

A package was contributed to do that for ISO 8859 characters.  It's
been posted to gnu.emacs.sources, so that shouldn't be an issue for
anyone who's bothered by it.

 EZ> So we have two versions of Cyrillic characters, two versions of
 EZ> Greek characters, two versions of Hebrew characters, etc.:  one
 EZ> version in the new Unicode set, the other version in the old Mule
 EZ> set.

There are more than two, at least for Greek and Cyrillic.  Those in
the Far Eastern charsets could be unified too if anyone cared.  This
issue clearly doesn't apply only to the Unicode charsets, and, as a
user, I don't think it's much of a problem in practice.

 EZ> What can I say except ``volunteers are welcome...'' etc.?  I can't 
 EZ> believe no one wants Unicode badly enough to work on its support in 
 EZ> Emacs, but what do I do with facts which fly in my face?

That view is unfair to the people who have done lots of work, himi in
particular.  `Working on Unicode support' in my book isn't restricted
to implementing an apparently-unnecessary, disruptive, incompatible
change to the internal encoding, even if it's what one wants ideally.
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: unicode in emacs 21

2001-10-28 Thread Dave Love

> "OD" == Oliver Doepner <[EMAIL PROTECTED]> writes:

 OD> There is vim 6.x now with full utf-8 support on the xterm.

[Does `full utf-8 support' mean level 3?]

Emacs can do utf-8 i/o under ttys that support it, though you don't
_need_ such support -- either input or output -- to edit utf-8 text.

 OD> It is much faster than emacs on x11 of course.

I'm surprised that's much of an issue.  I assume Emacs under X is much
more capable.

 OD> I was happy to see Emacs 21 announced. but the unicode support
 OD> does not seem to have moved forward very much

It's moved from zero to the state where it's perfectly fine for
editing at least the Western technical text that interests me.  E.g.,
Kuhn's UTF-8-demo.utf works modulo the level 2 text, for which one can
add support straightforwardly at the Lisp level.  It also allowed
producing coding systems for all the 8-bit charsets for GNUish
locales, which perhaps matters more in the wide world than utf-8 per
se.  With some customization, I can also at least _display_
utf-8-encoded CJK text.  I can send and receive utf-8-encoded mail and
browse utf-8-encoded web sites (with the development W3 package).

The Mule-UCS package provides more if necessary, specifically better
coverage of the BMP.

 OD> Is the internal representation still the special MULE format ??~

Yes.  So what?  [There has been much mis-representation of Mule, some
of it malicious.]  There is a yet-unimplemented scheme for coverage up
to U+10 within that encoding.  Even now, with Lisp-level changes
one could build an (incompatible) Emacs to cover the BMP, sacrificing
some of the standard charsets.

-- 
Bragging about Unicode support: ‘2d sinθ = nλ’ is plain text. ☺
http://www.unicode.org/>
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: Unicode in Emacs

2001-10-28 Thread Edmund GRIMLEY EVANS

Bram Moolenaar <[EMAIL PROTECTED]>:

> Richard Stallman wrote:
> 
> > I have no comments on vim from a technical standpoint, but its license
> > includes a restriction that makes it not free software.  Unless the
> > license is changed, please don't use vim.
> 
> Please don't project your interpretation of the term "free software"
> unto the rest of the world.  In my opinion the license that Vim uses
> makes it more free than the GNU public license.  That is because the GPL
> enforces the source code of modified versions to be published, the Vim
> license does not always enforce that.  Therefore the Vim lincese gives
> more freedom.  Otherwise the licenses are practically the same.

This slightly misrepresents the GPL, because the GPL only forces you
to make source code available to the people you distribute binaries
to, which is not quite the same thing as forcing it to be published.

I note that Debian considers Vim to be free software, as it is in main
rather than non-free. Debian is rather strict about "free".

Vim's licence would appear to be GPL-incompatible because of this bit
(taken from Debian's /usr/doc/vim/copyright):

> When the maintainer asks for it (in any way) you must make your changes,
> including source code, available to him.
> 
> The maintainer reserves the right to include any changes in the official
> version of Vim.  This is negotiable.  You are not allowed to distribute a
> modified version of Vim when you are not willing to make the source code
> available to the maintainer.

You can distribute a modified version of a GPL program without giving
the changes to the original author or the oficial maintainer.

Being GPL-incompatible is not a crime, but it can be expected to annoy
some people ... and it can cause a lot of trouble if you make a
library (or code that might end up in a library) GPL-incompatible.

Edmund
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: Unicode in Emacs

2001-10-28 Thread Bram Moolenaar


Richard Stallman wrote:

> I have no comments on vim from a technical standpoint, but its license
> includes a restriction that makes it not free software.  Unless the
> license is changed, please don't use vim.

Please don't project your interpretation of the term "free software"
unto the rest of the world.  In my opinion the license that Vim uses
makes it more free than the GNU public license.  That is because the GPL
enforces the source code of modified versions to be published, the Vim
license does not always enforce that.  Therefore the Vim lincese gives
more freedom.  Otherwise the licenses are practically the same.

-- 
ARTHUR:What?
BLACK KNIGHT:  None shall pass.
ARTHUR:I have no quarrel with you, good Sir knight, but I must cross
   this bridge.
BLACK KNIGHT:  Then you shall die.
  The Quest for the Holy Grail (Monty Python)

 ///  Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.moolenaar.net  \\\
(((   Creator of Vim -- http://vim.sf.net -- ftp://ftp.vim.org/pub/vim   )))
 \\\  Help me helping AIDS orphans in Uganda - http://iccf-holland.org  ///
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: Unicode in Emacs

2001-10-28 Thread Roozbeh Pournader

On Sun, 28 Oct 2001, Richard Stallman wrote:

> I have no comments on vim from a technical standpoint, but its license
> includes a restriction that makes it not free software.  Unless the
> license is changed, please don't use vim.

What is this restriction? Would you please provide details or pointers?

On the other side, there are vim developers here. If you (developers)  
know the reason for vim being 'non-free' please tell why you want the
'offending' clause in the license? (The phrases in quotations are in GNU 
terms of course... ;)

> We want to make Emacs use Unicode internally and have designed some of
> the data representations.  But it is a substantial amount of work.
> We're looking for people to work on it; would any of you like to help?

The current Emacs codebase is really big. Some time ago I and some
colleagues wished to help with bidi-enabling the Emacs, but found that
it's very hard to start. Starting with a smaller codebase like vim's
really helps.

roozbeh

-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: Unicode in Emacs

2001-10-28 Thread Richard Stallman

I have no comments on vim from a technical standpoint, but its license
includes a restriction that makes it not free software.  Unless the
license is changed, please don't use vim.

We want to make Emacs use Unicode internally and have designed some of
the data representations.  But it is a substantial amount of work.
We're looking for people to work on it; would any of you like to help?


-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: unicode in emacs 21

2001-10-28 Thread Florian Weimer

Eli Zaretskii <[EMAIL PROTECTED]> writes:

>> Why can't you continue to use the MULE code and just change the
>> character sets to reflect certain aspects of Unicode?
> 
> The current plan for Unicode was discussed at length 3 years ago, and
> the result was what I described.

Is the discussion archived somewhere, or are there some design
documents which resulted from the discussion?

> I don't think it's wise for us to reopen that discussion again,
> unless you think the UTF-8-based representation is a terribly wrong
> design.

Of course, it's hard to come up with constructive criticism when you
don't know what's already there. ;-)

> So I don't see any reason for the unnamed Unicode people to get
> annoyed by a term they themselves coined.

Me neither, but I got flamed in the past. :-/

> Conceivably, changing the internal representation doesn't mean we need
> to rewrite all of the existing code, just the low-level parts of it
> that deal with code conversions (i.e. subroutines of encoding and
> decoding functions).

I still don't understand the need for such a change.  In theory, the
internal representation of characters should be invisible to the
higher levels.
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: unicode in emacs 21

2001-10-28 Thread Eli Zaretskii


On 28 Oct 2001, Janusz S. =?iso-8859-2?q?Bie=F1?= wrote:

> On Eli Zaretskii <[EMAIL PROTECTED]>  wrote:
> 
> [...]
> 
> > Lately, the emacs-unicode mailing list was revived, in the hope that it 
> > will boost the activity.  Sadly, the traffic on that list is nil.
> 
> Was the list properly announced? I've seen a mention of it, but no
> instruction how to suscribe. Where is the list hosted? It is not
> accessible from the emacs pages at http://savannah.gnu.org.

It's not a public list (and, given the traffic, I'm not convinced it's 
worth the hassle to make it a public one).  However, the people who 
subscribe to that list know they are subscribed (they've asked for that 
explicitly), so no announcement seems to be necessary.

I can subscribe you if you want.
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: unicode in emacs 21

2001-10-28 Thread Janusz S. Bień

On Eli Zaretskii <[EMAIL PROTECTED]>  wrote:

[...]

> Lately, the emacs-unicode mailing list was revived, in the hope that it 
> will boost the activity.  Sadly, the traffic on that list is nil.

Was the list properly announced? I've seen a mention of it, but no
instruction how to suscribe. Where is the list hosted? It is not
accessible from the emacs pages at http://savannah.gnu.org.

Best regards

Janusz

-- 
 ,   
dr hab. Janusz S. Bien, prof. UW
Prof. Janusz S. Bien, Warsaw Uniwersity
http://www.orient.uw.edu.pl/~jsbien/
-
Na tym koncie czytam i wysylam poczte i wiadomosci offline.
On this account I read/post mail/news offline.
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: unicode in emacs 21

2001-10-28 Thread Florian Weimer

"H. Peter Anvin" <[EMAIL PROTECTED]> writes:

> Does that mean you're painting yourself into a corner, though,
> requiring manual work to integrate the increasingly Unicode-based
> infrastructure support that is becoming available?  Odds are pretty
> good that they are.

I don't think it is a good idea to use operating system Unicode
support.  This would mean that GNU Emacs behaves differently on
different operating systems, depending on the installed locale
descriptions, for example.

OTOH, the character encodings posted earlier to this list are as
incompatible with existing Unicode support as the current emacs-mule
internal encoding.  In effect, just one Emacs-specific internal
encoding is replaced by another.
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: unicode in emacs 21

2001-10-28 Thread Colin Paul Adams

> "Eli" == Eli Zaretskii <[EMAIL PROTECTED]> writes:

Eli> Would you like to be subscribed to emacs-unicode?

I would, please.
-- 
Colin Paul Adams
Preston Lancashire
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: Unicode in Emacs again

2001-10-28 Thread Eli Zaretskii


On Sat, 27 Oct 2001, David Starner wrote:

> On Fri, Oct 26, 2001 at 01:35:26PM +0200, Oliver Doepner wrote:
> > Emacs-Unicode-990824
> > --
> > Internal Character code:
> > 
> >   00      Unicode U+ - U+
> >   00      Unicode 20bit (via surrogate pair)
> >   01      Unicode 20bit (via surrogate pair)
> 
> Why are astral characters going to be supported by surrogate pairs?
> That's just ugly, especially if elisp coders have to deal with
> surrogates. Considering the binary transparency demands, you also need
> to round-trip surrogate pairs in UTF-8 back to surrogate pairs, not
> astral characters.
> 
> On second glance, it doesn't look like you're using surrogate pairs at
> all. Then why do you mention them? They're just an encoding trick; if
> you aren't using UTF-16, you can forget about them. The characters above
> U+ are U+1, U+10001, etc., not U+D800 U+DC00, U+D800 U+DC01,
> etc.

Handa-san, could you please comment on that?  David is one of a few
people who replied to my (quite desperate ;-) message posted to 
gnu.emacs.bug a few days ago.

In any case, let's continue discussing this on [EMAIL PROTECTED] 
(CC'ed).
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: unicode in emacs 21

2001-10-28 Thread Eli Zaretskii

[I suggest to have this discussion on emacs-unicode mailing list, so I
added it to the list of addressees.]

> From: Florian Weimer <[EMAIL PROTECTED]>
> Date: Sun, 28 Oct 2001 00:58:29 +0200
> 
> "Eli Zaretskii" <[EMAIL PROTECTED]> writes:
> 
> > Emacs cannot use a pure UTF-8 encoding, since some cultures don't want
> > unification, and it was decided that Emacs should not force
> > unification on those cultures.
> 
> Why can't you continue to use the MULE code and just change the
> character sets to reflect certain aspects of Unicode?

The current plan for Unicode was discussed at length 3 years ago, and
the result was what I described.  I don't think it's wise for us to
reopen that discussion again, unless you think the UTF-8-based
representation is a terribly wrong design.

> One such aspect
> is Latin "unification", for example.  (The Unicode people get very
> annoyed if you talk about "unification", "source separation rule" etc.
> in the context of non-Han scripts...)

IIRC, the term "unification" appears early in the Unicode standard,
not necessarily in conjunction with ``Han unification''.  It is cited
as one of the principles on the Unicode approach.  So I don't see any
reason for the unnamed Unicode people to get annoyed by a term they
themselves coined.

> In a second step, support for normalization, combining characters
> etc. would have to be added, but this could be based on the reliable
> foundation of the old MULE code.

Conceivably, changing the internal representation doesn't mean we need
to rewrite all of the existing code, just the low-level parts of it
that deal with code conversions (i.e. subroutines of encoding and
decoding functions).
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: unicode in emacs 21

2001-10-28 Thread Eli Zaretskii

> From: David Starner <[EMAIL PROTECTED]>
> Date: Sat, 27 Oct 2001 14:34:04 -0500
> 
> On Thu, Oct 25, 2001 at 07:11:30PM +0200, Eli Zaretskii wrote:
> > What can I say except ``volunteers are welcome...'' etc.?  I can't 
> > believe no one wants Unicode badly enough to work on its support in 
> > Emacs, but what do I do with facts which fly in my face?
> 
> I've spend several years trawling the net for Unicode information. I've
> heard about the emacs-unicode list, but never seen archives or
> subscription information.

Would you like to be subscribed to emacs-unicode?

> Nor have I ever seen the plans to support
> Unicode; looking in the Emacs 21 etc directory of junk provides neither
> these plans or any idea that they exist.

See etc/TODO.
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/