Re: charsets in debian/control

2004-12-11 Thread Shot (Piotr Szotkowski)
Hello. Paul Hampson: The email address isn't important, since that has to be a subset of ASCII anyway. Are the Unicode-encoded domain names supported in (modern) browsers only? I can surf to http://.pl/ (with, e.g., Firefox) - can I send mail to [EMAIL PROTECTED], or should I always use the

Re: charsets in debian/control

2004-12-11 Thread Marco d'Itri
On Dec 11, Shot (Piotr Szotkowski) [EMAIL PROTECTED] wrote: I can surf to http://?.pl/ (with, e.g., Firefox) - can I send mail to [EMAIL PROTECTED], or should I always use the [EMAIL PROTECTED] equivalent, as the Unicode in domain names is restricted to WWW only? It depends on your MUA. With

Re: charsets in debian/control

2004-12-11 Thread Michal Politowski
On Sat, 11 Dec 2004 16:08:12 +0100, Shot (Piotr Szotkowski) wrote: Hello. Paul Hampson: The email address isn't important, since that has to be a subset of ASCII anyway. Are the Unicode-encoded domain names supported in (modern) browsers only? I can surf to http://.pl/ (with,

Re: charsets in debian/control

2004-12-11 Thread Paul Hampson
On Sat, Dec 11, 2004 at 04:08:12PM +0100, Shot (Piotr Szotkowski) wrote: Hello. Paul Hampson: The email address isn't important, since that has to be a subset of ASCII anyway. Are the Unicode-encoded domain names supported in (modern) browsers only? I can surf to http://.pl/ (with,

Re: charsets in debian/control

2004-12-08 Thread Steve Langasek
On Tue, Dec 07, 2004 at 05:56:54PM +, Thaddeus H. Black wrote: But yes, non-ASCII Latin-1 chars should not be given special status over the national chars found in other languages spoken by project members. Debian should be using either ASCII, or Unicode; standardizing on Latin-1

Re: charsets in debian/control

2004-12-08 Thread Thaddeus H. Black
It is one thing spiritedly to argue a point against friends and allies. It is another to be obstinate. I do not wish the latter, and I admit that I am both outnumbered and outreasoned today. Please permit me without malice to conform my position, which now might be stated as follows. Unicode

Re: charsets in debian/control

2004-12-07 Thread Peter Samuelson
[Roger Leigh] I've been using Debian with UTF-8 only locales for over 12 months now. I now consider it fine for general use, with respect to terminal and application support. Unlike a couple of years ago, most things work perfectly. Some apps like 'screen' do not just configure themselves

Re: charsets in debian/control

2004-12-07 Thread Adrian 'Dagurashibanipal' von Bidder
On Tuesday 07 December 2004 00.19, Roger Leigh wrote: I think going to UTF-8 as the default locale charmap for all locales is a feasable goal for etch, as is recoding everything to UTF-8 (where it makes sense). Yep. My biggest problem right now is 'lpr sometextfile' to a postscript printer

Re: charsets in debian/control

2004-12-07 Thread Andreas Barth
* Roger Leigh ([EMAIL PROTECTED]) [041207 00:40]: I think going to UTF-8 as the default locale charmap for all locales is a feasable goal for etch, as is recoding everything to UTF-8 (where it makes sense). feasable goal and etch are the magic words I think: I agree on that, but I don't want

Re: charsets in debian/control

2004-12-07 Thread Maciej Dems
Patrze w ekran, a to Roger Leigh pisze do mnie: - No UTF-8 console keymaps - Some broken libraries e.g. GTK+ 1.2 [obsolete] - I can't paste UTF-8 into emacs (perhaps a problem in my .emacs) - mc making mess with its frames Maciek -- M.Sc. Maciej Dems [EMAIL PROTECTED]

Re: charsets in debian/control

2004-12-07 Thread Eugeniy Meshcheryakov
07.12.2004 13:33 +0100 Maciej Dems (-): Patrze w ekran, a to Roger Leigh pisze do mnie: - No UTF-8 console keymaps - Some broken libraries e.g. GTK+ 1.2 [obsolete] - I can't paste UTF-8 into emacs (perhaps a problem in my .emacs) - mc making mess with its frames Add dselect and

Re: charsets in debian/control

2004-12-07 Thread Daniel Burrows
On Tuesday 07 December 2004 12:44 am, Peter Samuelson wrote: Defining the character set as utf-8 means that any non-unicode capable application is going to have issues, yes. Postulate an app that is ignorant of character sets - we'll call it aptitude. Fixing it to make it accept utf-8 and

Re: charsets in debian/control

2004-12-07 Thread Daniel Burrows
On Tuesday 07 December 2004 10:17 am, Daniel Burrows wrote: complex replacement string class Admittedly, complex might (hypothetically) be a bit of an exaggeration. :P Daniel -- /--- Daniel Burrows [EMAIL PROTECTED] --\ | You are in a maze of

Re: charsets in debian/control

2004-12-07 Thread Richard Atterer
On Tue, Dec 07, 2004 at 10:17:17AM -0500, Daniel Burrows wrote: On Tuesday 07 December 2004 12:44 am, Peter Samuelson wrote: And if the app already deals with charset conversions but assumes iso-8859-1 input, then it's trivial to fix it to assume utf-8 input. This is not true.

Re: charsets in debian/control

2004-12-07 Thread Matthew Garrett
Daniel Burrows [EMAIL PROTECTED] wrote: iso-8859-1 is an 8-bit charset, while Unicode is a 32-bit [0] charset. =20 Storing and manipulating iso-8859-1 strings requires no changes to internal= =20 datatypes (only conversions for input and output); storing and manipulating= =20 Unicode means

Re: charsets in debian/control

2004-12-07 Thread Daniel Burrows
On Tuesday 07 December 2004 10:40 am, Richard Atterer wrote: No, you do not have to do this. You can keep working with char, the changes when switching to UTF-8 will mostly have to deal with the fact that one Unicode character is represented by more than one char. This means that you need to

Re: charsets in debian/control

2004-12-07 Thread Thaddeus H. Black
Steve Langasek writes, ... most of the letters you listed here are specific to the IPA, which would have no use at all in a control file as they're not part of the writing system of any natural language. Ok. Encodings and charsets are distinct concepts. Just because the file is specified

Re: charsets in debian/control

2004-12-07 Thread Marco d'Itri
On Dec 07, Thaddeus H. Black [EMAIL PROTECTED] wrote: UTF-8 is neat, but I do not really like Unicode (you may Actually you do not even understand it, because this sentence is meaningless. -- ciao, | Marco | [9639 coubl1Ib61SmA] signature.asc Description: Digital signature

Re: charsets in debian/control

2004-12-07 Thread Petter Reinholdtsen
[Thaddeus H. Black] UTF-8 is neat, but I do not really like Unicode (you may [Marco d'Itri] Actually you do not even understand it, because this sentence is meaningless. Perhaps he is aware of the difference between Unicode and ISO-10646? UTF-8 is an encoding of ISO-10646.

RE: charsets in debian/control

2004-12-07 Thread Julian Mehnle
Thaddeus H. Black wrote: However, the typical roster of skills one masters in contributing broadly to Debian development is already awesome: C, C++, CPP, Make, Perl, Python, Autoconf, CVS, Shell, Glibc, System calls, /proc, IPC, sockets, Sed, Awk, Vi, Emacs, locales, Libdb, GnuPG, Readline,

Re: charsets in debian/control

2004-12-06 Thread Adrian 'Dagurashibanipal' von Bidder
On Sunday 05 December 2004 20.11, Goswin von Brederlow wrote: Any parser that acceps 8bit non-ascii chars will accept UTF-8 then. What remains is just making the UTF-8 chars visually correct then. And make sure that, where character strings are modified, the multibyte sequences are counted

Re: charsets in debian/control

2004-12-06 Thread Goswin von Brederlow
Daniel Burrows [EMAIL PROTECTED] writes: On Sunday 05 December 2004 03:32 pm, Jose Carlos Garcia Sogo wrote: Would Peter permit me a mild dissent?  I prefer Latin-1.  Reason: I can recognize and distinguish Latin-1 characters, even when I do not always understand the words they spell.  

Re: charsets in debian/control

2004-12-06 Thread Thaddeus H. Black
I would not disagree with Peter or Daniel. They are right in my view. However, consider the following Unicode characters: 025A LATIN SMALL LETTER SCHWA WITH HOOK 025E LATIN SMALL LETTER CLOSED REVERSED OPEN E 0261 LATIN SMALL LETTER SCRIPT G 0264 LATIN SMALL LETTER RAMS HORN 0267

Re: charsets in debian/control

2004-12-06 Thread Bruce Perens
Thaddeus H. Black wrote: 025A LATIN SMALL LETTER SCHWA WITH HOOK 025E LATIN SMALL LETTER CLOSED REVERSED OPEN E 0261 LATIN SMALL LETTER SCRIPT G 0264 LATIN SMALL LETTER RAMS HORN 0267 LATIN SMALL LETTER HENG WITH HOOK 027A LATIN SMALL LETTER TURNED R WITH LONG LEG 027F LATIN SMALL LETTER

Re: charsets in debian/control

2004-12-06 Thread Matthew Garrett
Thaddeus H. Black [EMAIL PROTECTED] wrote: We are not speaking of a stricken Polish L, a double-accented Magyar O, or a euro sign. We are speaking of... well, to tell the truth I have no idea what these letters are. Have you? More to the point, should you and I learn to recognize such

Re: charsets in debian/control

2004-12-06 Thread Roger Leigh
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Andreas Barth [EMAIL PROTECTED] writes: Though I agree on your last statement (and please, remember, I'm from germany where non-ASCII-characters are also in common use), I still consider that UTF-8-not-ASCII has not finally reached ok, but it's on

Re: charsets in debian/control

2004-12-06 Thread Steve Langasek
On Mon, Dec 06, 2004 at 06:58:10PM +, Thaddeus H. Black wrote: I would not disagree with Peter or Daniel. They are right in my view. However, consider the following Unicode characters: 025A LATIN SMALL LETTER SCHWA WITH HOOK 025E LATIN SMALL LETTER CLOSED REVERSED OPEN E 0261

Re: charsets in debian/control

2004-12-06 Thread Mike Hommey
On Mon, Dec 06, 2004 at 06:53:42PM -0800, Steve Langasek [EMAIL PROTECTED] wrote: But yes, non-ASCII Latin-1 chars should not be given special status over the national chars found in other languages spoken by project members. Debian should be using either ASCII, or Unicode; standardizing on

Re: charsets in debian/control

2004-12-06 Thread Steve Langasek
On Tue, Dec 07, 2004 at 12:04:56PM +0900, Mike Hommey wrote: On Mon, Dec 06, 2004 at 06:53:42PM -0800, Steve Langasek [EMAIL PROTECTED] wrote: But yes, non-ASCII Latin-1 chars should not be given special status over the national chars found in other languages spoken by project members.

Re: charsets in debian/control

2004-12-06 Thread Mike Hommey
On Mon, Dec 06, 2004 at 07:10:21PM -0800, Steve Langasek [EMAIL PROTECTED] wrote: On Tue, Dec 07, 2004 at 12:04:56PM +0900, Mike Hommey wrote: On Mon, Dec 06, 2004 at 06:53:42PM -0800, Steve Langasek [EMAIL PROTECTED] wrote: But yes, non-ASCII Latin-1 chars should not be given special

Re: charsets in debian/control

2004-12-06 Thread Peter Samuelson
[Matthew Garrett] Defining the character set as utf-8 means that any non-unicode capable application is going to have issues, yes. Postulate an app that is ignorant of character sets - we'll call it aptitude. Fixing it to make it accept utf-8 and spit out the correct encoding for its LC_CTYPE

charsets in debian/control

2004-12-05 Thread Peter Samuelson
We seem to be moving to a de facto standard of UTF-8 for non-ASCII characters in debian/control files. This is not specified in Policy [1], but for hopefully obvious reasons, consistency is a Good Thing, and UTF-8 seems to be the best solution for this sort of thing. In my sid control files, I

Re: charsets in debian/control

2004-12-05 Thread Peter Samuelson
[Peter Samuelson] I suggest that the affected source packages[3] be run through the command 'iconv -f ORIGINAL_CHARSET -t utf-8' as soon as convenient. Ehhh, I see I have already ruined my credibility by pasting the wrong source package list. The real list is much shorter. Apologies, Peter

Re: charsets in debian/control

2004-12-05 Thread Petter Reinholdtsen
[Peter Samuelson] We seem to be moving to a de facto standard of UTF-8 for non-ASCII characters in debian/control files. This is not specified in Policy [1], but for hopefully obvious reasons, consistency is a Good Thing, and UTF-8 seems to be the best solution for this sort of thing. Some

Re: charsets in debian/control

2004-12-05 Thread Andreas Barth
* Petter Reinholdtsen ([EMAIL PROTECTED]) [041205 11:30]: [Peter Samuelson] We seem to be moving to a de facto standard of UTF-8 for non-ASCII characters in debian/control files. This is not specified in Policy [1], but for hopefully obvious reasons, consistency is a Good Thing, and

Re: charsets in debian/control

2004-12-05 Thread Josselin Mouette
Le dimanche 05 décembre 2004 à 11:43 +0100, Andreas Barth a écrit : I think most of us agree that non-UTF-8-characters are not a good idea (please note the UTF-8-characters is a superset of ASCII). For some places (like package names), I think most of us even agree that only ASCII-characters

Re: charsets in debian/control

2004-12-05 Thread Andreas Barth
* Josselin Mouette ([EMAIL PROTECTED]) [041205 13:05]: Le dimanche 05 décembre 2004 à 11:43 +0100, Andreas Barth a écrit : I think most of us agree that non-UTF-8-characters are not a good idea (please note the UTF-8-characters is a superset of ASCII). For some places (like package names),

Re: charsets in debian/control

2004-12-05 Thread Steinar H. Gunderson
On Sun, Dec 05, 2004 at 01:01:16PM +0100, Josselin Mouette wrote: Many of us have names that can't be written using ASCII. Well, they usually can be transliterated, can't they? Transliterating is somewhat of a kludge (and I think in most cases UTF-8 is a much better solution); OTOH I'd rapidly

Re: charsets in debian/control

2004-12-05 Thread Marco d'Itri
On Dec 05, Peter Samuelson [EMAIL PROTECTED] wrote: Would people support a mass bug at minor severity? Make it normal. -- ciao, | Marco | [9589 inOGrPyJFNKhM] signature.asc Description: Digital signature

Re: charsets in debian/control

2004-12-05 Thread Marco d'Itri
On Dec 05, Steinar H. Gunderson [EMAIL PROTECTED] wrote: Transliterating is somewhat of a kludge (and I think in most cases UTF-8 is a much better solution); OTOH I'd rapidly get confused in the list of Japanese maintainers if their names weren't transliterated. This is a different issue: in

Re: charsets in debian/control

2004-12-05 Thread Peter Samuelson
[Steinar H. Gunderson] Transliterating is somewhat of a kludge (and I think in most cases UTF-8 is a much better solution); OTOH I'd rapidly get confused in the list of Japanese maintainers if their names weren't transliterated. I think it's a valid choice for a maintainer who natively

Re: charsets in debian/control

2004-12-05 Thread Peter Samuelson
[Marco d'Itri] Would people support a mass bug at minor severity? Make it normal. Given that Policy recommends debian/changelog to be utf-8, coupled with the observation (which I had not thought of) that various tools may require a maintainer's name in debian/control and debian/changelog to

Re: charsets in debian/control

2004-12-05 Thread Denis Barbier
[Peter Samuelson] I suggest that the affected source packages[3] be run through the command 'iconv -f ORIGINAL_CHARSET -t utf-8' as soon as convenient. No, as you noticed this list is short and can be processed in a more elegant manner, e.g. sympa description uses a no-break space where a

Re: charsets in debian/control

2004-12-05 Thread Goswin von Brederlow
Josselin Mouette [EMAIL PROTECTED] writes: Le dimanche 05 décembre 2004 à 11:43 +0100, Andreas Barth a écrit : I think most of us agree that non-UTF-8-characters are not a good idea (please note the UTF-8-characters is a superset of ASCII). For some places (like package names), I think most

Re: charsets in debian/control

2004-12-05 Thread Bart Schuller
On Sun, Dec 05, 2004 at 06:40:52PM +0100, Goswin von Brederlow wrote: On that note, how likely is it to hit a UTF-8 character encoding that contains a '\n'? Any non UTF-8 aware parser would assume a new line has started and get parse errors. 0% likely, guaranteed. UTF-8 is *designed* to be

Re: charsets in debian/control

2004-12-05 Thread Goswin von Brederlow
Bart Schuller [EMAIL PROTECTED] writes: On Sun, Dec 05, 2004 at 06:40:52PM +0100, Goswin von Brederlow wrote: On that note, how likely is it to hit a UTF-8 character encoding that contains a '\n'? Any non UTF-8 aware parser would assume a new line has started and get parse errors. 0%

Re: charsets in debian/control

2004-12-05 Thread Bernd Eckenfels
On Sun, Dec 05, 2004 at 06:40:52PM +0100, Goswin von Brederlow wrote: On that note, how likely is it to hit a UTF-8 character encoding that contains a '\n'? Any non UTF-8 aware parser would assume a new line has started and get parse errors. Thats no problem. The only problem you have with

Re: charsets in debian/control

2004-12-05 Thread Thaddeus H. Black
Peter Samuelson writes, We seem to be moving to a de facto standard of UTF-8 for non-ASCII characters in debian/control files. This is not specified in Policy [1], but for hopefully obvious reasons, consistency is a Good Thing, and UTF-8 seems to be the best solution for this sort of thing.

Re: charsets in debian/control

2004-12-05 Thread Jose Carlos Garcia Sogo
El dom, 05-12-2004 a las 20:16 +, Thaddeus H. Black escribi: Peter Samuelson writes, We seem to be moving to a de facto standard of UTF-8 for non-ASCII characters in debian/control files. This is not specified in Policy [1], but for hopefully obvious reasons, consistency is a Good

Re: charsets in debian/control

2004-12-05 Thread Daniel Burrows
On Sunday 05 December 2004 03:32 pm, Jose Carlos Garcia Sogo wrote: Would Peter permit me a mild dissent?  I prefer Latin-1.  Reason: I can recognize and distinguish Latin-1 characters, even when I do not always understand the words they spell.  Recognizing and distinguishing the

Re: charsets in debian/control

2004-12-05 Thread Paul Hampson
On Sun, Dec 05, 2004 at 04:42:24PM -0500, Daniel Burrows wrote: On Sunday 05 December 2004 03:32 pm, Jose Carlos Garcia Sogo wrote: Would Peter permit me a mild dissent?  I prefer Latin-1.  Reason: I can recognize and distinguish Latin-1 characters, even when I do not always understand

Re: charsets in debian/control

2004-12-05 Thread Mike Hommey
On Mon, Dec 06, 2004 at 09:54:36AM +1100, Paul Hampson [EMAIL PROTECTED] wrote: Isn't there a proposal around for Description#en: English text Description#ja: Japanese text And you'd advocate to write the English text in latin1 and the japanese text in euc-jp ? Let's make it clear: 1 text

Re: charsets in debian/control

2004-12-05 Thread Josselin Mouette
Le lundi 06 décembre 2004 à 09:26 +0900, Mike Hommey a écrit : On Mon, Dec 06, 2004 at 09:54:36AM +1100, Paul Hampson [EMAIL PROTECTED] wrote: Isn't there a proposal around for Description#en: English text Description#ja: Japanese text And you'd advocate to write the English text in

Re: charsets in debian/control

2004-12-05 Thread Peter Samuelson
[Thaddeus H. Black] Would Peter permit me a mild dissent? I prefer Latin-1. Dissents are fine. (: The reason to go with UTF-8 is for consistency. Tools that wish to render text onto the screen ought to be able to depend on knowing the encoding that text is in. See below for why I (and many

RE: charsets in debian/control

2004-12-05 Thread Julian Mehnle
Thaddeus H. Black wrote: I do not deny that Latin-1 represents all the languages I can read, and that this fact may color my view. Nevertheless to me a source written in Chinese is effectively non-free. It might as well be a compiled binary blob. So Emacs is effectively non-free, because I

Re: charsets in debian/control

2004-12-05 Thread Andrew Suffield
On Sun, Dec 05, 2004 at 09:32:00PM +0100, Jose Carlos Garcia Sogo wrote: But the only field in UTF8 should be Maintainer, and that field should have (IMHO) also a roman transliterate for the name, if you don't use a latin charset (Greek, Arabic, Japanese, Chinese...) The transliterated field

Re: charsets in debian/control

2004-12-05 Thread Paul Hampson
On Mon, Dec 06, 2004 at 09:26:57AM +0900, Mike Hommey wrote: On Mon, Dec 06, 2004 at 09:54:36AM +1100, Paul Hampson [EMAIL PROTECTED] wrote: Isn't there a proposal around for Description#en: English text Description#ja: Japanese text And you'd advocate to write the English text in

Re: charsets in debian/control

2004-12-05 Thread Paul Hampson
On Mon, Dec 06, 2004 at 01:40:27AM +, Andrew Suffield wrote: On Sun, Dec 05, 2004 at 09:32:00PM +0100, Jose Carlos Garcia Sogo wrote: But the only field in UTF8 should be Maintainer, and that field should have (IMHO) also a roman transliterate for the name, if you don't use a latin