Hi, I just saw that and such (too late) at #parrotsketch:
21:52 < NotFound> So unicode:"\xab" and utf8:unicode:"\xab" is also the same result? In my opinion and (AFAIK still in the implementation) is it that the encoding bit of PIR is how the possibly escaped bytes are specifying the codepoint in the _scource code_. That codepoint will then belong to some charset. Alas the above example is illegal. The source encoding of that mentioned file t/op/stringu.t is utf8: :set fenc? fileencoding=utf-8 pasm_output_is( <<'CODE', <<OUTPUT, "UTF8 literals" ); set S0, utf8:unicode:"«" and ... pasm_output_is( <<'CODE', <<OUTPUT, "UTF8 literals" ); set S0, utf8:unicode:"\xc2\xab" this is valid UTF8 encoding too, as there is no collision between escaped and non-escaped UTF8 chars. unicode:"\ab" is illegal as there is no such encoding in unicode that would make this a codepoint (the more that the default encoding of charset unicode is utf8). Or IOW if this were valid than the escaped char syntax would be ambiguous. 21:51 < pmichaud> so unicode:"«" and unicode:"\xab" would produce exactly the same result. 21:51 < pmichaud> even down to being the same .pbc output. 21:51 < allison> pmichaud: exactly The former is a valid char in an UTF8/iso-8859-1 encoded source file and only there, while the latter is a single invalid UTF8 char part. How would you interpret unicode:"\xab\x65" then? I think that there is still some confusion between the encoding of source code with the desired meaning in the charset and the internal encoding of parrot, which might be UCS2 or anything. my 2 ¢ leo