Enrico Forestieri wrote:
On Wed, Oct 31, 2007 at 03:41:50PM +0100, Abdelrazak Younes wrote:
Jean-Marc Lasgouttes wrote:
Enrico Forestieri <[EMAIL PROTECTED]> writes:

Not really. UCS-4 code points from 0x0000 to 0x00ff exactly correspond
to latin-1 code points. So

   unsigned char c = 0xb5;
   os.put(c);

gives me a 'µ' character, and the assertion above is wrong.
Maybe you have utf8 in mind.
No, I am really talking about ASCII. We should not special case latin1 even if it happens to have some commonalities with utf8 (which is true only for historical reason).

Seems you are a bit confused. There's no need to special case latin1,
it was already special cased when it was decided that ucs4 code points
until 0xff would coincide with latin1. Latin1 has nothing to do with
utf8. It is ASCII that happens to have its code points in common with
utf8.

OK, right, sorry I meant UCS4 of course. I know pretty well this business right now, believe me.


We should not have a method that interprets raw characters as latin1.
So we should either

1/ find some way to forbid passing 8bit chars to <<
or
2/ only allow the ascii range, like Abdel proposed.
Exactly.

Sorry, I disagree.

Anyway, this is about a delimiter right? Why do you care that the complete latin1 subset is taken care of? Is there any delimiter you'd like to support there? If yes, then fix the code so that it needs a full ucs4 character and don't limit the implementation to Latin1.

If somebody use the docstream with latin8 character we will have bugs for sure.

Abdel.

Reply via email to