That is weird indeed, I would say they mixed up Latin1 and UTF8. BTW, if you do

  #[40 57 49 52 41 160 52 54 57 45] inspect.

in Pharo 3, you'll see that it is actually a Latin1 encoded ByteArray, not a 
UTF8 one.

In Latin1 (http://en.wikipedia.org/wiki/Latin1) 160 is NBSP (Non breaking 
space). In UTF8, this would be encoded differently

  ZnUTF8Encoder new encodeString: 160 asCharacter asString. 

  #[194 160]

But if the file starts with the Unicode BOM, that is really confusing  
(http://en.wikipedia.org/wiki/Unicode_Byte-Order_Mark#Usage), since in UTF8 it 
would be different, and from your byte sequence it can't be UTF16 either.

On 16 May 2014, at 22:06, Sean P. DeNigris <s...@clipperadams.com> wrote:

> Two issues:
> 
> 1. Exports for Outlook
> ============
> Gmail seems to have inserted "160 asCharacter" in a few places in a gmail
> contact csv export, so for example this partial phone number:
>  '(914) 469-'
> is written to file as:
>  #[40 57 49 52 41 160 52 54 57 45]
> And then "aFile readStream contents" signals "ZnInvalidUTF8: Invalid utf8
> input detected" when it encounters the 160
> 
> 2. Google csv Exports
> ============
> The file starts with the BOM #[255 254], which generates the same exception
> 
> =============
> 
> Are these problems with gmail or Pharo?
> Thanks
> 
> 
> 
> -----
> Cheers,
> Sean
> --
> View this message in context: 
> http://forum.world.st/Gmail-Contact-Export-and-readStream-tp4759306.html
> Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com.
> 


Reply via email to