On Jan 12, 2012, at 3:57 PM, Dave Thompson wrote:
>> From: [email protected] On Behalf Of Marshall Clow
>> Sent: Thursday, 12 January, 2012 09:48
>
>> On Jan 12, 2012, at 6:14 AM, Andy Polyakov via RT wrote:
>
>>> This actually goes beyond just warning. toupper accepts 'int' as
>>> argument and if you pass 'char' from upper half of ASCII table, it will
>>> be passed sign-expanded, and if you pass 'unsigned char', then it will
>>> be passed zero-expanded. What it right? Solaris manual page says: <snip>
>
>> It's not just Linux.
>>
>> The C99 standard says (Section 7.4/1): <snip>
>> See also http://msdn.microsoft.com/en-us/library/ms245348.aspx
>>
> Yes. (And C89/90/95, and very-recently-adopted C11.)
We're wandering even farther away from openssl here, but ….
C++ has two different tuppers.
There's
int std::toupper (int);
which is C99's toupper hoisted into namespace std. That has all the problems
above.
There's also the facet based uppercasing, which is templated on the character
type, and
chartype std::facet <chartype> toupper ( chartype ch );
which does not take an int as input - and has to take all values of chartype as
input.
>
> Note that 'char' *may* have this problem depending on the C implementation;
> the C standard allows '(plain) char' to be the same as either signed char
> or unsigned char. (Some implementations give the programmer the choice,
> e.g. gcc -f[no-]unsigned-char.) 'signed char' always has the problem.
> Note this differs from other integer types, where e.g. '(plain) short'
> is 'signed short' on all implementations. Although for *bitfields*,
> default signedness is again implementation dependent.
>
> Nit: it's not the upper half of *ASCII*. ASCII is 7bits, and the only
> body with the authority to change that is X3-now-INCITS. There are *many*
> different 8bit *extensions* of ASCII, and the upper half of any of them,
> on an 8bit-byte machine (which Internet and SSL/TLS pretty much requires,
> although Standard C does not), has the problem. And OpenSSL at least tries
> to support EBCDIC, an 8bit code (on 8bit machines) where the high half
> includes the most used characters (letters and digits and a few others)
> but to my understanding those C implementations make plain char unsigned
> to avoid this problem. (And it's more 'natural' for the ISA anyway.)
Agreed.
-- Marshall
Marshall Clow Idio Software <mailto:[email protected]>
A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly
moderated down to (-1, Flamebait).
-- Yu Suzuki
______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List [email protected]
Automated List Manager [email protected]