On Wed, Oct 31, 2007 at 06:18:57PM +0100, Abdelrazak Younes wrote: > Enrico Forestieri wrote: > > On Wed, Oct 31, 2007 at 03:41:50PM +0100, Abdelrazak Younes wrote: > >> Jean-Marc Lasgouttes wrote: > >>> Enrico Forestieri <[EMAIL PROTECTED]> writes: > >>> > >>>> Not really. UCS-4 code points from 0x0000 to 0x00ff exactly correspond > >>>> to latin-1 code points. So > >>>> > >>>> unsigned char c = 0xb5; > >>>> os.put(c); > >>>> > >>>> gives me a 'ยต' character, and the assertion above is wrong. > >>>> Maybe you have utf8 in mind. > >> No, I am really talking about ASCII. We should not special case latin1 > >> even if it happens to have some commonalities with utf8 (which is true > >> only for historical reason). > > > > Seems you are a bit confused. There's no need to special case latin1, > > it was already special cased when it was decided that ucs4 code points > > until 0xff would coincide with latin1. Latin1 has nothing to do with > > utf8. It is ASCII that happens to have its code points in common with > > utf8. > > OK, right, sorry I meant UCS4 of course. I know pretty well this > business right now, believe me. > > > > >>> We should not have a method that interprets raw characters as latin1. > >>> So we should either > >>> > >>> 1/ find some way to forbid passing 8bit chars to << > >>> or > >>> 2/ only allow the ascii range, like Abdel proposed. > >> Exactly. > > > > Sorry, I disagree. > > Anyway, this is about a delimiter right? Why do you care that the > complete latin1 subset is taken care of? Is there any delimiter you'd > like to support there? If yes, then fix the code so that it needs a full > ucs4 character and don't limit the implementation to Latin1. > > If somebody use the docstream with latin8 character we will have bugs > for sure.
I don't care to take into account the complete latin1 subset. I was only disputing that the assert was right. Only ucs4 encoded characters should be output to a docstream, and you would have the same hypothetical bug you are speaking about with either a unsigned char or a char_type if you try to output a latin8 encoded character to a docstream. For sure, we are now having a bug because a unsigned char was not taken into account. And I can predict that we will have others if that will not be taken into account. That's all. -- Enrico