Hi Eric, On 3 Apr 2007, at 03:53, Eric Blake wrote:
According to Gary V. Vaughan on 4/2/2007 4:37 PM:Cast the subscript to unsigned char before using it as index.Otherwise, on a system where char is signed, and its high bit is set, and you haven't adjusted the array range to allow for negative values,fun will ensue.If the table value for META-^A is held at element 128 of the array (since the table was built assuming char* is unsigned by default), and we compile on a host with signed chars, does the signed char value of META-^A still become 128 when cast to unsigned char? Or does 2's complement come intoplay and scramble the order of the negative signed char values when casting them before doing a table lookup?As long as the table is handled consistently (in other words, as long asALL uses of characters as indices occur as unsigned char or within to_uchar), then META-^A (usually encoded as -128 in signed char) willalways appear at the same index, regardless of whether that index is 128 (as it will be on 2's complement machine; the bulk of what exists today),or 255 (which is what (unsigned char) -128 might become on a 1'scompliment machine, mostly theoretical). You only run into the bug thatyou were describing if you also reference the array based on a given integer encoding of characters.
My point exactly. Here's a violation of that consistency in syntax.c:
109 m4_syntax_table *
110 m4_syntax_create (void)
111 {
112 m4_syntax_table *syntax = xzalloc (sizeof *syntax);
113 int ch;
114
115 /* Set up default table. This table never changes during
operation. */
116 for (ch = 256; --ch >= 0;)
117 switch (ch)
118 {
119 case '(':
120 syntax->orig[ch] = M4_SYNTAX_OPEN;
121 break;
In this case, we let a possibly signed literal char self promote
to an int, but assume that those values with the high bit set will map
correctly when manually fed through to_uchar when we do lookups in that
table.
In practice, we don't have any case statements for high-bit-set chars
inside the switch, so it hasn't caught us out. Even so, with portable
defensive coding style, it seems better to use the same method of
dereferencing indices when building the table as when looking up entries
in it... I've probably made this same bad assumption in a few other
places where I wrote code to do table lookups for char values :-(
Cheers,
Gary
--
())_. Email me: [EMAIL PROTECTED]
( '/ Read my blog: http://blog.azazil.net
/ )= ...and my book: http://sources.redhat.com/autobook
`(_~)_ Join my AGLOCO Network: http://www.agloco.com/r/BBBS7912
PGP.sig
Description: This is a digitally signed message part
_______________________________________________ Bug-m4 mailing list [email protected] http://lists.gnu.org/mailman/listinfo/bug-m4
