On Wed, Oct 31, 2007 at 06:18:57PM +0100, Abdelrazak Younes wrote:
> Enrico Forestieri wrote:
> > On Wed, Oct 31, 2007 at 03:41:50PM +0100, Abdelrazak Younes wrote:
> >> Jean-Marc Lasgouttes wrote:
> >>> Enrico Forestieri <[EMAIL PROTECTED]> writes:
> >>>
> >>>> Not really. UCS-4 code points from 0x0000 to 0x00ff exactly correspond
> >>>> to latin-1 code points. So
> >>>>
> >>>>    unsigned char c = 0xb5;
> >>>>    os.put(c);
> >>>>
> >>>> gives me a 'ยต' character, and the assertion above is wrong.
> >>>> Maybe you have utf8 in mind.
> >> No, I am really talking about ASCII. We should not special case latin1 
> >> even if it happens to have some commonalities with utf8 (which is true 
> >> only for historical reason).
> > 
> > Seems you are a bit confused. There's no need to special case latin1,
> > it was already special cased when it was decided that ucs4 code points
> > until 0xff would coincide with latin1. Latin1 has nothing to do with
> > utf8. It is ASCII that happens to have its code points in common with
> > utf8.
> 
> OK, right, sorry I meant UCS4 of course. I know pretty well this 
> business right now, believe me.
> 
> > 
> >>> We should not have a method that interprets raw characters as latin1.
> >>> So we should either
> >>>
> >>> 1/ find some way to forbid passing 8bit chars to <<
> >>> or
> >>> 2/ only allow the ascii range, like Abdel proposed.
> >> Exactly.
> > 
> > Sorry, I disagree.
> 
> Anyway, this is about a delimiter right? Why do you care that the 
> complete latin1 subset is taken care of? Is there any delimiter you'd 
> like to support there? If yes, then fix the code so that it needs a full 
> ucs4 character and don't limit the implementation to Latin1.
> 
> If somebody use the docstream with latin8 character we will have bugs 
> for sure.

I don't care to take into account the complete latin1 subset. I was
only disputing that the assert was right. Only ucs4 encoded characters
should be output to a docstream, and you would have the same hypothetical
bug you are speaking about with either a unsigned char or a char_type if
you try to output a latin8 encoded character to a docstream.

For sure, we are now having a bug because a unsigned char was not taken
into account. And I can predict that we will have others if that will
not be taken into account. That's all.

-- 
Enrico

Reply via email to