Yesterday on IRC was some discussion [1] about the default string representation. The consensus was that strings should by default have "ascii" charset and not "iso-8859-1" as it was for some weeks.

Autrijus has prepared a patch "parrot-broken-ascii.patch", which did parts of this change. I've completed and extended the patch now. But it's probably still a bit rough. Tests succeed though.

There are a lot of "string_make" calls in the interpreter. I've replaced most explicit charsets by NULL, which uses the default "ascii" now. But for correctness, in the absence of a charset, the strings should be inspected, if it's "ascii" or not.

Mixed operations like append, substr, or bitwise_{and,or,xor} on fixed8-encoded strings should also be ok now, but there are no tests yet - these are very welcome.

Please folks grep through the sources and have a look at each string creation, whether it's plain ascii or not.

Thanks,
leo

[1] I missed the first part of it - would be great if such discussions could be brought forth to the list.



Reply via email to