Re: [libvirt] [PATCH] avoid one more ctype vs. sign-extension problem

2008-05-09 Thread Daniel Veillard
On Thu, May 08, 2008 at 04:44:38PM +0200, Jim Meyering wrote:
 This change demonstrates that the new syntax-check rule's
 regexp can be improved.  It missed the unsafe tolower use,
 since there was already a to_uchar use on that line.

  Since this shows up again, let me argue a bit more why I dislike
using is* and to* in libvirt.

pedantic topic=string encodings

  We use to* only for comparing A-F/a-f in ethernet addresses
comparisons, so those use are safe. The use of XML as the front
end for all text parsed in libvirt also safeguards against what
I consider a nasty problems with the use of strings in system code
i.e. their dependance to locale. I think it's not acceptable to 
have a risk of different behaviour if the user locale has changed,
this can lead to untractable bugs, and also mean we allowed at
a given point the problem of string whose encoding is undefined
to occur in the library (in the libxml2 docs I point to [1] and [2]
as introductions to the problem).

  As long as we use XML for input, we are safe IMHO, because 
any string input will automatically come converted as UTF-8 
by the way of libxml2, that means if someone want to use japanese
characters for the name of their domain or files, at the libvirt level 
they can and we should not misparse or break those (it's likely to
break down the line in xend, but i would hope that lxc and QEmu
would be better at handling this though I don't know how we could enforce
the exec'ec programs args to be interpreted as UTF-8 in the forked
process, as a library we just can't touch the locale value, but maybe
we can do that after fork() and before exec()).

  But each time we extract a string from libxml2 we should know and keep
in mind it is an UTF-8 string. In practice the ethernet MAC addresses
strings used with to_lower and to_upper in util.c are UTF-8, not strings
in the user locale. We are safe here because most locales match
with ASCII on the first 128 values, UTF-8 included, and the 
characters for the MAC address range cannot conflict for UTF-8
encoding, but from a pure correctness POV using those convenience
macros on those UTF-8 strings is a mistake.

  And I would rather see automated checks on the code look for those
mistakes, rather than doing a lookup for a wrong cast to then use 
a wrong macro with a false sense of security.

/pedantic

  okay now I feel better :-)

Daniel

[1] http://www.joelonsoftware.com/articles/Unicode.html
[2] http://www.tbray.org/ongoing/When/200x/2003/04/06/Unicode

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard  | virtualization library  http://libvirt.org/
[EMAIL PROTECTED]  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] avoid one more ctype vs. sign-extension problem

2008-05-09 Thread Jim Meyering
Daniel Veillard [EMAIL PROTECTED] wrote:
 On Thu, May 08, 2008 at 04:44:38PM +0200, Jim Meyering wrote:
 This change demonstrates that the new syntax-check rule's
 regexp can be improved.  It missed the unsafe tolower use,
 since there was already a to_uchar use on that line.

   Since this shows up again, let me argue a bit more why I dislike
 using is* and to* in libvirt.

I agree completely, especially since I've just fixed three
similar bugs in coreutils:

  http://thread.gmane.org/gmane.comp.gnu.coreutils.bugs/13512
  http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=209850fd7

I'm in the process of removing all of them from libvirt, since there
aren't many.  Actually, I've just realized that the 'right' way to do
this is to change each use of isspace to c_isspace, isdigit to c_isdigit,
etc., relying on the gnulib c-ctype module:

  http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=lib/c-ctype.h

Looking into that, I see more room for improvement, i.e., where things
like isdigit and isalnum have been open-coded.  With these c_is* functions,
we can have portability, readability, *and* efficiency:

For example, compare these:

while ((*cur = '0')  (*cur = '9')) {

while (c_isdigit(*cur)) {


and these:

char_ok =
(model[i] = '0'  model[i] = '9') ||
(model[i] = 'a'  model[i] = 'z') ||
(model[i] = 'A'  model[i] = 'Z') || model[i] == '_';

char_ok = c_isalnum (model[i]) || model[i] == '_';

So I'm making changes like these, too.

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] avoid one more ctype vs. sign-extension problem

2008-05-09 Thread Daniel Veillard
On Fri, May 09, 2008 at 12:13:50PM +0200, Jim Meyering wrote:
 Daniel Veillard [EMAIL PROTECTED] wrote:
  On Thu, May 08, 2008 at 04:44:38PM +0200, Jim Meyering wrote:
  This change demonstrates that the new syntax-check rule's
  regexp can be improved.  It missed the unsafe tolower use,
  since there was already a to_uchar use on that line.
 
Since this shows up again, let me argue a bit more why I dislike
  using is* and to* in libvirt.
 
 I agree completely, especially since I've just fixed three
 similar bugs in coreutils:

  Ah, good, it's not just the Frenchman being overly stubborn then :-)

   http://thread.gmane.org/gmane.comp.gnu.coreutils.bugs/13512
   http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=209850fd7

 I'm in the process of removing all of them from libvirt, since there
 aren't many.  Actually, I've just realized that the 'right' way to do
 this is to change each use of isspace to c_isspace, isdigit to c_isdigit,
 etc., relying on the gnulib c-ctype module:
 
   http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=lib/c-ctype.h

  That looks fine, yes.
  
 Looking into that, I see more room for improvement, i.e., where things
 like isdigit and isalnum have been open-coded.  With these c_is* functions,
 we can have portability, readability, *and* efficiency:
 
 For example, compare these:
 
 while ((*cur = '0')  (*cur = '9')) {
 
 while (c_isdigit(*cur)) {
 
 
 and these:
 
 char_ok =
 (model[i] = '0'  model[i] = '9') ||
 (model[i] = 'a'  model[i] = 'z') ||
 (model[i] = 'A'  model[i] = 'Z') || model[i] == '_';
 
 char_ok = c_isalnum (model[i]) || model[i] == '_';
 
 So I'm making changes like these, too.

  great, thanks !

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard  | virtualization library  http://libvirt.org/
[EMAIL PROTECTED]  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH] avoid one more ctype vs. sign-extension problem

2008-05-08 Thread Jim Meyering
This change demonstrates that the new syntax-check rule's
regexp can be improved.  It missed the unsafe tolower use,
since there was already a to_uchar use on that line.

From 5fc8de9825215e28773f2230ac6c1e1b3d724602 Mon Sep 17 00:00:00 2001
From: Jim Meyering [EMAIL PROTECTED]
Date: Thu, 8 May 2008 16:11:55 +0200
Subject: [PATCH] avoid one more ctype vs. sign-extension problem

* src/util.c (TOLOWER): Also convert tolower argument.
---
 src/util.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/src/util.c b/src/util.c
index 8f3cef9..4cef6d2 100644
--- a/src/util.c
+++ b/src/util.c
@@ -57,7 +57,8 @@

 #define MAX_ERROR_LEN   1024

-#define TOLOWER(Ch) (isupper (to_uchar(Ch)) ? tolower (Ch) : (Ch))
+#define TOLOWER(Ch) (isupper (to_uchar(Ch)) \
+? tolower (to_uchar (Ch)) : (to_uchar (Ch)))

 #define virLog(msg...) fprintf(stderr, msg)

--
1.5.5.1.148.g4c99ee

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] avoid one more ctype vs. sign-extension problem

2008-05-08 Thread Daniel P. Berrange
On Thu, May 08, 2008 at 04:44:38PM +0200, Jim Meyering wrote:
 This change demonstrates that the new syntax-check rule's
 regexp can be improved.  It missed the unsafe tolower use,
 since there was already a to_uchar use on that line.

ACK

Dan
-- 
|: Red Hat, Engineering, Boston   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list