Re: [Bug 2077] - Abiword mangles a UTF-8 file open importing

2001-10-30 Thread D. Dale Gulledge

The default encoding should certainly be based on the locale.  However,
there are any number of reasons why someone would edit text with
multiple encodings.  The most obvious is when you are editing something
that has been sent to you by someone else who is using a different
encoding.  For a single language this would probably involve UTF-8 and a
single 8-bit encoding.  For multilingual people, which includes a
sizeable number of open source developers, it could involve multiple
8-bit encodings as well, which make automatic detection impossible.

My own preference would be very simple.  Assume a default based on the
locale, but allow selection of a different encoding on the fly.  This is
just a gentle suggestion because I am not currently using Abiword.

David Starner wrote:
> 
> On Tue, Oct 30, 2001 at 10:54:01AM -0600, [EMAIL PROTECTED] wrote:
> > http://bugzilla.abisource.com/show_bug.cgi?id=2077
> >
> > --- Additional Comments From [EMAIL PROTECTED]  2001-10-30 10:54 ---
> > No, abiword shouldn't assume that your text is UTF-8 just because you're
> > running in a UTF-8 locale.
> 
> Huh? That's part of the definition of a locale. Under a locale, the text
> encoding is the same as the terminal encoding, which is the same as the
> locale encoding. If the text encoding isn't the same as the terminal
> encoding, you can't use cat or more or grep or any other console program
> without recoding the output to screen. You couldn't redirect output to
> disk without recoding it. If the locale encoding differs from both of
> them, then what does it mean and why is it useful? Gettext, for one,
> uses the locale encoding for the terminal/text encoding.
> 
> If I'm wrong, then someone please clarify, but I don't understand where
> you're coming from at all.
> 
> --
> David Starner - [EMAIL PROTECTED]
> Pointless website: http://dvdeug.dhis.org
> "I saw a daemon stare into my face, and an angel touch my breast; each
> one softly calls my name . . . the daemon scares me less."
> - "Disciple", Stuart Davis
> -
> Linux-UTF8:   i18n of Linux on all levels
> Archive:  http://mail.nl.linux.org/linux-utf8/

-- 
D. Dale Gulledge, Sr. Programmer,
[EMAIL PROTECTED]
C, C++, Perl, Unix (AIX, Linux), Oracle, Java,
Internationalization (i18n), Awk.
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: [Bug 2077] - Abiword mangles a UTF-8 file open importing

2001-10-30 Thread Dom Lachowicz

>On Tue, Oct 30, 2001 at 12:30:34PM -0500, Dom Lachowicz wrote:
> > One cannot make the assumption that the files on disk are written in the
> > same locale that you're running under.
>
>There's no other assumption you can make. There are hundreds of
>encodings out there, and the only way Abiword can know what my prefered
>encoding is by asking me - i.e. the locale.

Your locale doesn't mean anything other than perhaps you'd like to save new 
text documents in this locale. Running `LOCALE=utf-8 ./abiword` will *not* 
give *any* insight as to the locale of /etc/passwd or any other pre-existing 
file. At best it might be a hint that should be taken very lightly.

> > Hell, you might've gotten them from
> > someone who wrote it in iso-8859-1 (god forbid anyone use that locale 
>any
> > more...).
>
>God forbid anyone use ISO-8859-3, or KOI8-R, or EUC-JP, or ... Dealing
>with multiple encodings on the same disk has always been the problem of
>the user.

Precisely. Hence the encoded-text dialog box. Since you seem to have made my 
point for me, I'll stop here...

> > AbiWord will assume ASCII text by default, just like it always has,
>
>No one's arguing the interpretation of octets 00-7F. But what do you do
>when an octet > 7F appears?

That's where the "auto-detection" attempt that I talked about earlier kicks 
in, but it's certainly not foolproof. It is, however, a helluva lot smarter 
than iconv_open(getenv("LOCALE")) ...

> > If you want Abi to read encoded text
> > via some mechanism and Abi *can't* auto-detect the encoding, then use 
>the
> > encoded-text dialog and specify the encoding manually.
>
>As I said in my reply, I tried that. It work once, and then it doesn't
>work anymore. If I start up Abiword, go to file, open, change the
>selector to Encoded Text, select the file, select UTF-8, the abiword opens
>the file up with garbage.

Then please file a different bug saying that "even if I specify utf-8 as my 
encoding things are screwed up" - don't complain about abi not using your 
LOCALE as the end-all-be-all determiner. It's my job to fix bugs (if they 
are indeed bugs, which i determined your original post was not), not 
back-derive meaning from incomplete bugreports and to split things up into 
"what you meant"... Feel free to submit patches to correct the erroneous 
behavior.

Oh, and this is what the "Comments" box in bugzilla is for. Emailing me 
personally basically flaming my decision is both rude and inappropriate.

Dom

_
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp

-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: [Bug 2077] - Abiword mangles a UTF-8 file open importing

2001-10-30 Thread David Starner

On Tue, Oct 30, 2001 at 12:30:34PM -0500, Dom Lachowicz wrote:
> One cannot make the assumption that the files on disk are written in the 
> same locale that you're running under. 

There's no other assumption you can make. There are hundreds of
encodings out there, and the only way Abiword can know what my prefered
encoding is by asking me - i.e. the locale.

> Hell, you might've gotten them from 
> someone who wrote it in iso-8859-1 (god forbid anyone use that locale any 
> more...). 

God forbid anyone use ISO-8859-3, or KOI8-R, or EUC-JP, or ... Dealing
with multiple encodings on the same disk has always been the problem of
the user.

> AbiWord will assume ASCII text by default, just like it always has, 

No one's arguing the interpretation of octets 00-7F. But what do you do
when an octet > 7F appears?

> If you want Abi to read encoded text 
> via some mechanism and Abi *can't* auto-detect the encoding, then use the 
> encoded-text dialog and specify the encoding manually.

As I said in my reply, I tried that. It work once, and then it doesn't
work anymore. If I start up Abiword, go to file, open, change the
selector to Encoded Text, select the file, select UTF-8, the abiword opens 
the file up with garbage.

Again, there are two bugs here. If you won't fix the first one, please
realize that I can't open up a UTF-8 file by using the encoded-text
dialog, either.

-- 
David Starner - [EMAIL PROTECTED]
Pointless website: http://dvdeug.dhis.org
"I saw a daemon stare into my face, and an angel touch my breast; each 
one softly calls my name . . . the daemon scares me less."
- "Disciple", Stuart Davis
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: [Bug 2077] - Abiword mangles a UTF-8 file open importing

2001-10-30 Thread Dom Lachowicz

>On Tue, Oct 30, 2001 at 10:54:01AM -0600, [EMAIL PROTECTED] 
>wrote:
> > http://bugzilla.abisource.com/show_bug.cgi?id=2077
> >
> > --- Additional Comments From [EMAIL PROTECTED]  2001-10-30 10:54 
>---
> > No, abiword shouldn't assume that your text is UTF-8 just because you're
> > running in a UTF-8 locale.
>
>Huh? That's part of the definition of a locale. Under a locale, the text
>encoding is the same as the terminal encoding, which is the same as the
>locale encoding. If the text encoding isn't the same as the terminal
>encoding, you can't use cat or more or grep or any other console program
>without recoding the output to screen. You couldn't redirect output to
>disk without recoding it. If the locale encoding differs from both of
>them, then what does it mean and why is it useful? Gettext, for one,
>uses the locale encoding for the terminal/text encoding.
>
>If I'm wrong, then someone please clarify, but I don't understand where
>you're coming from at all.

One cannot make the assumption that the files on disk are written in the 
same locale that you're running under. Hell, you might've gotten them from 
someone who wrote it in iso-8859-1 (god forbid anyone use that locale any 
more...). Changing your locale to utf-8 will *not* for instance change the 
actual encoding of /etc/password to utf-8 : it's still in iso-latin-1 or 
whatever.

The locale you're running under doesn't mean a shit as to what encoding your 
documents have.

AbiWord will assume ASCII text by default, just like it always has, because 
that's the fscking definition of text. If you want Abi to read encoded text 
via some mechanism and Abi *can't* auto-detect the encoding, then use the 
encoded-text dialog and specify the encoding manually.

My associates and I are closing this bug as QA:WONTFIX. Flame us all you'd 
like, but they all go to /dev/null

Dom

_
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp

-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: [Bug 2077] - Abiword mangles a UTF-8 file open importing

2001-10-30 Thread David Starner

On Tue, Oct 30, 2001 at 10:54:01AM -0600, [EMAIL PROTECTED] wrote:
> http://bugzilla.abisource.com/show_bug.cgi?id=2077
> 
> --- Additional Comments From [EMAIL PROTECTED]  2001-10-30 10:54 ---
> No, abiword shouldn't assume that your text is UTF-8 just because you're 
> running in a UTF-8 locale.

Huh? That's part of the definition of a locale. Under a locale, the text
encoding is the same as the terminal encoding, which is the same as the
locale encoding. If the text encoding isn't the same as the terminal
encoding, you can't use cat or more or grep or any other console program
without recoding the output to screen. You couldn't redirect output to
disk without recoding it. If the locale encoding differs from both of
them, then what does it mean and why is it useful? Gettext, for one,
uses the locale encoding for the terminal/text encoding.

If I'm wrong, then someone please clarify, but I don't understand where
you're coming from at all.

-- 
David Starner - [EMAIL PROTECTED]
Pointless website: http://dvdeug.dhis.org
"I saw a daemon stare into my face, and an angel touch my breast; each 
one softly calls my name . . . the daemon scares me less."
- "Disciple", Stuart Davis
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: unicode in emacs 21

2001-10-30 Thread Eli Zaretskii

> From: Florian Weimer <[EMAIL PROTECTED]>
> Date: Tue, 30 Oct 2001 08:09:20 +0100
> 
> Richard Stallman <[EMAIL PROTECTED]> writes:
> 
> > Supporting Unicode superficially while retaining the current internal
> > representation raises a number of problems, one of them being that the
> > internal representation has several alternatives for the same character
> > which correspond to the same code point in Unicode.
> 
> The GNU Emacs/Unicode proposal I've seen seems to have this property,
> too.  (At least the proposal is ambiguous, and one interpretation is
> that you can encode a single character in multiple ways.)

Unless you refer to the CNS plane and Japanese Han characters, which
were deliberately left ununified (in addition to the Unicode
codepoints for those characters), I think you are mistaken.  Could you
please point out where in the proposal do you see that a character can
be encoded in multiple ways?
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: freedom

2001-10-30 Thread Edmund GRIMLEY EVANS

I think this is an interesting and important discussion, but it's off
topic for linux-utf8, so I've set some header lines in an attempt to
divert the discussion to [EMAIL PROTECTED]

Bram Moolenaar <[EMAIL PROTECTED]>:

> I'm sure some people will pick it up and use it that way.  Still, a term
> in daily use only means what the general public thinks of it.  And
> that's still for-free software.  That's hard to change.

Bram, there's no need to retreat to a position like that. It seems
that many people who understand and care about free software think
that Vim's licence is a free software licence.

Richard Stallman <[EMAIL PROTECTED]>:

> Our definition of free software is the foundation on which the free
> software community has grown for two decades.  You can formulate your
> own criterion if you wish, but we will continue to present ours as
> "the" definition of free software.

> The license says that if you distribute a modified version to anyone
> you must distribute a copy to the maintainer.  It is this singling out
> of one particular person for special privileges that makes the license
> fail to qualify as free software.  Free software includes the freedom
> to decide when you want to redistribute.

RMS, you are probably aware of the following document even if you
don't agree with it:

The Debian Free Software Guidelines (DFSG)
http://www.debian.org/social_contract#guidelines

Do you agree that Vim's licence qualifies as "free" according to the
criteria listed in that document?

Do you have a reference to a document that lists your criteria for
software to qualify as free?

Do you want to propose a change to the DFSG?

I'm not an expert on this stuff by any means, but my impression is
that licences that give special rights to one particular person are
often considered obnoxious, but can still qualify as free as far as
Debian is concerned.

Edmund
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: unicode in emacs 21

2001-10-30 Thread Florian Weimer

Richard Stallman <[EMAIL PROTECTED]> writes:

> Supporting Unicode superficially while retaining the current internal
> representation raises a number of problems, one of them being that the
> internal representation has several alternatives for the same character
> which correspond to the same code point in Unicode.

The GNU Emacs/Unicode proposal I've seen seems to have this property,
too.  (At least the proposal is ambiguous, and one interpretation is
that you can encode a single character in multiple ways.)
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: Unicode in Emacs

2001-10-30 Thread Bram Moolenaar


Richard Stallman wrote:

> The license says that if you distribute a modified version to anyone
> you must distribute a copy to the maintainer.

Small correction: You only need to send the maintainer a copy if he asks
for it.  This is the text:

-
If you distribute a modified version of Vim, you are encouraged to send the
maintainer a copy, including the source code.  Or make it available to the
maintainer through ftp; let him know where it can be found.  If the number of
changes is small (e.g., a modified Makefile) e-mailing the diffs will do.
When the maintainer asks for it (in any way) you must make your changes,
including source code, available to him.  The e-mail address to be used is
<[EMAIL PROTECTED]>
-

Obviously I did this to make sure that I get all the useful patches and
can include them in the official version if I want to.

> It is this singling out of one particular person for special
> privileges that makes the license fail to qualify as free software.
> Free software includes the freedom to decide when you want to
> redistribute.

Well, that's your definition of "free software".  Most people have
another one, depending on their background.  When an advertisement says
"you get free software with this computer", you know it means for-free
software.  Thus at least the term "free software" is context dependend.

Note that the GPL also doesn't give you the freedom to decide when you
want to redistribute, you must supply the source code.  My opinion is
that it's fair to send your changes back to the person you got the
software from.  If I understand the GPL correctly, it requires you to
send your changes to the person who gets the binary.  Sounds like a
minor detail to me, not something you would want to waste time on
discussing (unless that's your job, perhaps).  I have a few bugs to
fix...

-- 
TIM:   That is not an ordinary rabbit ... 'tis the most foul cruel and
   bad-tempered thing you ever set eyes on.
ROBIN: You tit.  I soiled my armour I was so scared!
 "Monty Python and the Holy Grail" PYTHON (MONTY) PICTURES LTD

 ///  Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.moolenaar.net  \\\
(((   Creator of Vim -- http://vim.sf.net -- ftp://ftp.vim.org/pub/vim   )))
 \\\  Help me helping AIDS orphans in Uganda - http://iccf-holland.org  ///
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: Unicode in Emacs

2001-10-30 Thread Bram Moolenaar


Richard Stallman wrote:

> Our definition of free software is the foundation on which the free
> software community has grown for two decades.  You can formulate your
> own criterion if you wish, but we will continue to present ours as
> "the" definition of free software.

I'm sure some people will pick it up and use it that way.  Still, a term
in daily use only means what the general public thinks of it.  And
that's still for-free software.  That's hard to change.

-- 
ARTHUR: Go on, Bors, chop its head off.
BORS:   Right.  Silly little bleeder.  One rabbit stew coming up.
 "Monty Python and the Holy Grail" PYTHON (MONTY) PICTURES LTD

 ///  Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.moolenaar.net  \\\
(((   Creator of Vim -- http://vim.sf.net -- ftp://ftp.vim.org/pub/vim   )))
 \\\  Help me helping AIDS orphans in Uganda - http://iccf-holland.org  ///
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/