That is weird indeed, I would say they mixed up Latin1 and UTF8. BTW, if you do
#[40 57 49 52 41 160 52 54 57 45] inspect. in Pharo 3, you'll see that it is actually a Latin1 encoded ByteArray, not a UTF8 one. In Latin1 (http://en.wikipedia.org/wiki/Latin1) 160 is NBSP (Non breaking space). In UTF8, this would be encoded differently ZnUTF8Encoder new encodeString: 160 asCharacter asString. #[194 160] But if the file starts with the Unicode BOM, that is really confusing (http://en.wikipedia.org/wiki/Unicode_Byte-Order_Mark#Usage), since in UTF8 it would be different, and from your byte sequence it can't be UTF16 either. On 16 May 2014, at 22:06, Sean P. DeNigris <s...@clipperadams.com> wrote: > Two issues: > > 1. Exports for Outlook > ============ > Gmail seems to have inserted "160 asCharacter" in a few places in a gmail > contact csv export, so for example this partial phone number: > '(914) 469-' > is written to file as: > #[40 57 49 52 41 160 52 54 57 45] > And then "aFile readStream contents" signals "ZnInvalidUTF8: Invalid utf8 > input detected" when it encounters the 160 > > 2. Google csv Exports > ============ > The file starts with the BOM #[255 254], which generates the same exception > > ============= > > Are these problems with gmail or Pharo? > Thanks > > > > ----- > Cheers, > Sean > -- > View this message in context: > http://forum.world.st/Gmail-Contact-Export-and-readStream-tp4759306.html > Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com. >