I just created a new branch string_checks that adds more thorough checks to the contents of strings in various encodings.

First of all, there have been many places where strings are created in the default ASCII encoding, but filled with binary data afterwards. This is fixed in the new branch by always checking the contents of ASCII strings in Parrot_str_new_init, and changing the encoding to binary where appropriate.

The checks for Unicode strings are also improved and moved to Parrot_str_new_init. Along the way, I rewrote the UTF-16 support to work without ICU.

This branch breaks reading of UTF-8 data with Rakudo's IO::Socket. But it's just a coincidence that this worked at all. Currently, Parrot doesn't support different encodings for sockets like it does for file handles. I'm not sure if this is a desired feature.

Nick
_______________________________________________
http://lists.parrot.org/mailman/listinfo/parrot-dev

Reply via email to