Hi Behdad, > Well, there's a bit more to it. Just because some bytes in a file are invalid > acording to the spec doesn't mean your text editor should refuse to open the > file. While g_utf8_get_char() and friends do assume valid UTF-8 data, it's an > unwritten assumption that for invalid bytes they simply skip the byte and > return -1. And I want to keep it that way and perhaps even document it. I > think I use that in Pango IIRC.
I'd like to bring this up for discussion as a separate matter, because I think it's a dangerously wrong way of handling things. First and foremost, if your text editor uses g_utf8_get_char() on data read from an external file without any validation, then that's a glaring and serious bug. Even if you are going to assume the incomplete checks that are currently in place, it's still nowhere robust enough to deal with untrusted input. There are dedicated functions provided for reading data which may not be valid UTF-8, and only those should be used. There is no need to reject the entire file. Also, I believe that GIOChannel conveniently does the UTF-8 validation for you on the fly. Second, it is plain *impossible* for g_utf8_get_char() to handle invalid UTF-8 sequences in a correct manner, because it does not know where the buffer ends. If the first byte is bogus already and does not actually belong to a UTF-8 sequence, you will have read farther than you were supposed to by the time you discover that it isn't followed by a proper continuation byte. If it was the last byte in the buffer, you have already read past the end at that point. Third, g_utf8_get_char() does not actually skip anything. The skipping is done in the calling code, usually by means of g_utf8_next_char() to advance to the next code point after each iteration. The implementation of g_utf8_get_char() has no influence whatsoever on that iteration and how much is being skipped. That being said, it would be a trivial matter to add the same checks to the glibmm implementation. However, I'd rather not do so because all it provides you with is a false sense of security. --Daniel _______________________________________________ gtk-devel-list mailing list gtk-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/gtk-devel-list