Hey all, I recent ran across this situation. The simple fact is that NUL (character 0) (also: not NULL which is a pointer) is nowhere stated to be an invalid unicode character in the unicode spec (g_unichar_validate(0) returns TRUE btw), and the UTF-8 spec doesn't prohibit 0, and following its wording literally, unicode char 0 transforms to a single byte 0.
Nonetheless, I think g_utf8_validate() should be kept as is, at least for a long time. It is misnamed, but it serves such a useful purpose that it is widely deployed. I think it should have been named g_utf8_validate_string() b/c that's a more accurate name. I think it's fair to say that strings are NUL-terminated in C (e.g. str* functions and string literals) but there's no standard saying what a string is, so who knows. The simple fact is that MOST strings in structures, param-lists etc in C are simply: char *name; not guint name_len; char *name; so, you definitely want a function like g_utf8_validate_string() to ensure that a string doesn't contain NUL in a situation that it actually cannot be used. It would be nice if a g_utf8_validate_data (const char *str, gsize size, GError **error) could be added... it should follow the UTF-8 spec permitting character 0. Perhaps g_utf8_validate_string() could be added (identical to current g_utf8_validate() or maybe removing the size param, and possibly deprecating that function as confusing). But replacing it with the new semantics should probably wait a long time. ------- This is all rather tangential, I believe to the original problem with gedit. It should do it's own UTF-8 validation, b/c a text editor likes to handle invalid UTF-8 specially. UTF-8 is a spec that will not change, and is about 10 lines of code; you can afford to include your own version. It should do something smarter first-off to handle other encodings ie detect Latin1, obey locale, etc etc. And it could default to markup like <red>HEX</red> for non-UTF8 bytes. That's a lot different that the handling you want from say, a configuration parser. - dave _______________________________________________ gtk-devel-list mailing list gtk-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/gtk-devel-list