On Wednesday, October 5, 2016 at 12:01:32 AM UTC-4, josh...@fastmail.com 
wrote:
>
> OK, I understand now: they're continuation bytes for UTF-8 and can't 
> appear in that context so they get stripped from the string representation.
>

They don't get stripped — invalid data is still stored in the String. 
 However, anything that iterates over Unicode characters (length is a count 
of Unicode codepoints) skips them.

julia> s = String([0x82,0x82,0x82,0x82,0x82])

5-byte String of invalid UTF-8 data:

 0x82

 0x82

 0x82

 0x82

 0x82


julia> length(s)

0


julia> sizeof(s)

5

Reply via email to