Re: [libvirt] [PATCH] avoid one more ctype vs. sign-extension problem
On Thu, May 08, 2008 at 04:44:38PM +0200, Jim Meyering wrote: This change demonstrates that the new syntax-check rule's regexp can be improved. It missed the unsafe tolower use, since there was already a to_uchar use on that line. Since this shows up again, let me argue a bit more why I dislike using is* and to* in libvirt. pedantic topic=string encodings We use to* only for comparing A-F/a-f in ethernet addresses comparisons, so those use are safe. The use of XML as the front end for all text parsed in libvirt also safeguards against what I consider a nasty problems with the use of strings in system code i.e. their dependance to locale. I think it's not acceptable to have a risk of different behaviour if the user locale has changed, this can lead to untractable bugs, and also mean we allowed at a given point the problem of string whose encoding is undefined to occur in the library (in the libxml2 docs I point to [1] and [2] as introductions to the problem). As long as we use XML for input, we are safe IMHO, because any string input will automatically come converted as UTF-8 by the way of libxml2, that means if someone want to use japanese characters for the name of their domain or files, at the libvirt level they can and we should not misparse or break those (it's likely to break down the line in xend, but i would hope that lxc and QEmu would be better at handling this though I don't know how we could enforce the exec'ec programs args to be interpreted as UTF-8 in the forked process, as a library we just can't touch the locale value, but maybe we can do that after fork() and before exec()). But each time we extract a string from libxml2 we should know and keep in mind it is an UTF-8 string. In practice the ethernet MAC addresses strings used with to_lower and to_upper in util.c are UTF-8, not strings in the user locale. We are safe here because most locales match with ASCII on the first 128 values, UTF-8 included, and the characters for the MAC address range cannot conflict for UTF-8 encoding, but from a pure correctness POV using those convenience macros on those UTF-8 strings is a mistake. And I would rather see automated checks on the code look for those mistakes, rather than doing a lookup for a wrong cast to then use a wrong macro with a false sense of security. /pedantic okay now I feel better :-) Daniel [1] http://www.joelonsoftware.com/articles/Unicode.html [2] http://www.tbray.org/ongoing/When/200x/2003/04/06/Unicode -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ [EMAIL PROTECTED] | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] avoid one more ctype vs. sign-extension problem
Daniel Veillard [EMAIL PROTECTED] wrote: On Thu, May 08, 2008 at 04:44:38PM +0200, Jim Meyering wrote: This change demonstrates that the new syntax-check rule's regexp can be improved. It missed the unsafe tolower use, since there was already a to_uchar use on that line. Since this shows up again, let me argue a bit more why I dislike using is* and to* in libvirt. I agree completely, especially since I've just fixed three similar bugs in coreutils: http://thread.gmane.org/gmane.comp.gnu.coreutils.bugs/13512 http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=209850fd7 I'm in the process of removing all of them from libvirt, since there aren't many. Actually, I've just realized that the 'right' way to do this is to change each use of isspace to c_isspace, isdigit to c_isdigit, etc., relying on the gnulib c-ctype module: http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=lib/c-ctype.h Looking into that, I see more room for improvement, i.e., where things like isdigit and isalnum have been open-coded. With these c_is* functions, we can have portability, readability, *and* efficiency: For example, compare these: while ((*cur = '0') (*cur = '9')) { while (c_isdigit(*cur)) { and these: char_ok = (model[i] = '0' model[i] = '9') || (model[i] = 'a' model[i] = 'z') || (model[i] = 'A' model[i] = 'Z') || model[i] == '_'; char_ok = c_isalnum (model[i]) || model[i] == '_'; So I'm making changes like these, too. -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] avoid one more ctype vs. sign-extension problem
On Fri, May 09, 2008 at 12:13:50PM +0200, Jim Meyering wrote: Daniel Veillard [EMAIL PROTECTED] wrote: On Thu, May 08, 2008 at 04:44:38PM +0200, Jim Meyering wrote: This change demonstrates that the new syntax-check rule's regexp can be improved. It missed the unsafe tolower use, since there was already a to_uchar use on that line. Since this shows up again, let me argue a bit more why I dislike using is* and to* in libvirt. I agree completely, especially since I've just fixed three similar bugs in coreutils: Ah, good, it's not just the Frenchman being overly stubborn then :-) http://thread.gmane.org/gmane.comp.gnu.coreutils.bugs/13512 http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=209850fd7 I'm in the process of removing all of them from libvirt, since there aren't many. Actually, I've just realized that the 'right' way to do this is to change each use of isspace to c_isspace, isdigit to c_isdigit, etc., relying on the gnulib c-ctype module: http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=lib/c-ctype.h That looks fine, yes. Looking into that, I see more room for improvement, i.e., where things like isdigit and isalnum have been open-coded. With these c_is* functions, we can have portability, readability, *and* efficiency: For example, compare these: while ((*cur = '0') (*cur = '9')) { while (c_isdigit(*cur)) { and these: char_ok = (model[i] = '0' model[i] = '9') || (model[i] = 'a' model[i] = 'z') || (model[i] = 'A' model[i] = 'Z') || model[i] == '_'; char_ok = c_isalnum (model[i]) || model[i] == '_'; So I'm making changes like these, too. great, thanks ! Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ [EMAIL PROTECTED] | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH] avoid one more ctype vs. sign-extension problem
This change demonstrates that the new syntax-check rule's regexp can be improved. It missed the unsafe tolower use, since there was already a to_uchar use on that line. From 5fc8de9825215e28773f2230ac6c1e1b3d724602 Mon Sep 17 00:00:00 2001 From: Jim Meyering [EMAIL PROTECTED] Date: Thu, 8 May 2008 16:11:55 +0200 Subject: [PATCH] avoid one more ctype vs. sign-extension problem * src/util.c (TOLOWER): Also convert tolower argument. --- src/util.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/src/util.c b/src/util.c index 8f3cef9..4cef6d2 100644 --- a/src/util.c +++ b/src/util.c @@ -57,7 +57,8 @@ #define MAX_ERROR_LEN 1024 -#define TOLOWER(Ch) (isupper (to_uchar(Ch)) ? tolower (Ch) : (Ch)) +#define TOLOWER(Ch) (isupper (to_uchar(Ch)) \ +? tolower (to_uchar (Ch)) : (to_uchar (Ch))) #define virLog(msg...) fprintf(stderr, msg) -- 1.5.5.1.148.g4c99ee -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] avoid one more ctype vs. sign-extension problem
On Thu, May 08, 2008 at 04:44:38PM +0200, Jim Meyering wrote: This change demonstrates that the new syntax-check rule's regexp can be improved. It missed the unsafe tolower use, since there was already a to_uchar use on that line. ACK Dan -- |: Red Hat, Engineering, Boston -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list