Re: somebody dropped a (warning) bomb

2007-02-19 Thread Sergei Organov
Linus Torvalds <[EMAIL PROTECTED]> writes:
> On Thu, 15 Feb 2007, Sergei Organov wrote:
>> 
>> I agree that if the warning has no true positives, it sucks. The problem
>> is that somehow I doubt it has none. And the reasons for the doubt are:
>
> Why do you harp on "no true positives"?

Because if somebody is capable to proof a warning has no true positives,
I immediately agree it's useless. I just wanted to check if it's indeed
the case. It seems it is not.

> That's a pointless thing. You can make *any* warning have "true 
> positives". My personal favorite is the unconditional warning:
>
>   warning: user is an idiot
>
> and I _guarantee_ you that it has a lot of true positives.

Yes, but there is obviously 0 correlation between the warning and the
user being an idiot (except that non-idiot will probably find a way to
either turn this warning off, or filter it out).

I already agreed that a warning may have true positives and still be
bad; and a warning having "no false negatives" is obviously a pretty
one, though unfortunately not that common in practice; and a warning
that has no true positives is useless. What we are arguing about are
intermediate cases that are most common in practice.

As programmers, we in fact are interested in correlation between a
warning and actual problem in the software, but for the kind of warnings
we are discussing (sign-correctness and type-safety in general), the
correlation depends on the style the program is written is, making it
difficult to use as a criteria in the discussion.

> It's the "no false negatives" angle you should look at.

I'd like to, but then I'll need to classify most of (all?) the GCC
warnings as bad, I'm afraid.

> THAT is what matters. The reason we don't see a lot of warnings about 
> idiot users is not that people don't do stupid things, but that 
> *sometimes* they actually don't do something stupid.
>
> Yeah, I know, it's far-fetched, but still.
>
> In other words, you're barking up *exactly* the wrong tree. You're looking 
> at it the wrong way around.
>
> Think of it this way: in science, a theory is proven to be bad by a single 
> undeniable fact just showing that it's wrong.
>
> The same is largely true of a warning. If the warning sometimes happens 
> for code that is perfectly fine, the warning is bad.

Consider:

int find_first_zero(const int *a)
{
int index = 0;
while(*a++ != 0)
index++;
return index;
}

unsigned int array[] = { 1, 2, 3, 0, 25, 14 };
int index = find_first_zero(array);  *WARNING*

So, by your logic, -Wpointer-sign warning is bad in general? If not,
then how will you "fix" the code above? And the upside of the fix is
what?

IMHO, the "problem" is that this warning is about violation of type
safety. Type safety by itself doesn't guarantee the code is perfectly
fine. Violation of type safety doesn't necessarily mean the code is
broken. A warning that warns about violation of type safety has the same
properties.  Now, it's up to you to decide if you want warnings on type
safety or not.

What you are saying, on the other hand, for me reads like this: "I do
want warnings on violations of type safety in general, except for
"signed char" or "unsigned char" being used instead of "char"". I think
I do understand your reasons, -- I just don't agree with them.

-- Sergei.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-19 Thread Sergei Organov
Bodo Eggert <[EMAIL PROTECTED]> writes:
> On Fri, 16 Feb 2007, Sergei Organov wrote:

[...]

> I'll say it again: Either the code using unspecified chars is correct, or 
> it isn't. If it's correct, neither using with signed nor with unsigned 
> chars is a bug and you should not warn at all, and if it's not correct, 
> you should always warn. Instead, gcc warns on "code compiles for
> $arch".

Here is where we disagree. In my opinion, no matter what the sign of
char for given architecture is, it's incorrect to pass either "signed
char*" or "unsigned char*" argument to a function expecting "char*".

>> > Therefore it's either always wrong to call your char* function with char*,
>> > unsigned char* _and_ signed char unless you can guarantee not to overflow 
>> > any
>> > of them, or it's always correct to call char* functions with any kind
>> > of these.
>> 
>> How are you sure those who wrote foo(char*) agrees with your opinion or
>> even understands all the involved issues?
>
> Let's asume we have this piece of buggy code. We compile it on an unsigned
> char architecture. No warning. *BOOM*

There should be warning, -- that's my point. "char" is different. "char"
is distinct from either "signed char" or "unsigned char". Always. At
least it's how C is defined.

> Let's asume there is correct code, and we use it as designed:
> Warning: Wrong arch
> Warning: Wrong arch
> Warning: Wrong arch
> Warning: real issue 
> Warning: Wrong arch
> Warning: Wrong arch
> Warning: Wrong arch
> Warning: Wrong arch
> 
> Warning: Wrong arch
> Warning: Wrong arch
> Warning: Wrong arch
> Warning: Wrong arch
> Warning: Wrong arch
>
> You don't see "real issue". *BOOM*
>
>
> What can you do about this warning? Let's asume we cast everywhere:

I already gave an answer in response to Linus:

static inline size_t ustrlen(const unsigned char *s)
{
return strlen((const char *)s);
}

>
> struct foo * p;
> printf(strlen(char*)p); *BOOM*

printf(ustrlen(p)); *NO BOOM*
unsigned char* u;
printf(ustrlen(u)); *NO WARNING*

-- Sergei.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-19 Thread Sergei Organov
Rene Herman <[EMAIL PROTECTED]> writes:
[...]
> Given char's special nature, shouldn't the conclusion of this thread
> have long been simply that gcc needs -Wno-char-pointer-sign? (with
> whatever default, as far as I'm concerned).

I entirely agree that all the char business in C is messy enough to
justify separate warning switch(es) in GCC.

However, I still insist that the problem with the code:

   void foo(char *c);
   unsigned char *u;
   signed char *s;
   ...
   foo(u);
   foo(s);

is not (only) in signedness, as neither 'u' nor 's' has compatible type
with the "char*", no matter what is the sign of "char", so if one cares
about type safety he needs warnings on both invocations of foo().

-- Sergei.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-19 Thread Sergei Organov
Bodo Eggert <[EMAIL PROTECTED]> writes:
> On Fri, 16 Feb 2007, Sergei Organov wrote:
>> Bodo Eggert <[EMAIL PROTECTED]> writes:
>> > Sergei Organov <[EMAIL PROTECTED]> wrote:
>> >> Linus Torvalds <[EMAIL PROTECTED]> writes:
>
[...]
>> If we start talking about the C language, my opinion is that it's C
>> problem that it allows numeric comparison of "char" variables at
>> all. If one actually needs to compare alphabetic characters numerically,
>> he should first cast to required integer type.
>
> Char values are indexes in a character set. Taking the difference of
> indexes is legal. Other languages have ord('x'), but it's only an
> expensive typecast (byte('x') does exactly the same thing).

Why this typecast should be expensive? Is

   char c1, c2;
   ...
   if((unsigned char)c1 < (unsigned char)c2)

any more expensive than if(c1 < c2)?

[...]

>> Comparison of characters being numeric is not a very good property of
>> the C language.
>
> NACK, as above

The point was not to entirely disable it or to make it any more
expensive, but to somehow force programmer to be explicit that he indeed
needs numeric comparison of characters' indexes. Well, I do realize that
the above is not in the C "spirit" that is to allow implicit conversions
almost everywhere.

>> > I repeat: Thanks to using signed chars, the programs only
>> > work /by/ /accident/! Promoting the use of signed char strings is promoting
>> > bugs and endangering the stability of all our systems. You should stop this
>> > bullshit now, instead of increasing the pile.
>> 
>> Where did you see I promoted using of "singed char strings"?!
>
> char strings are often signed char strings, only using unsigned char 
> prevents this in a portable way. If you care about ordinal values of
> your strings, use unsigned.

It means that (as I don't want to change the type of my
strings/characters in case I suddenly start to care about ordinal values
of the characters) I must always use "unsigned char" for storing
characters in practice. Right?

There are at least three problems then:

1. The type of "abc" remains "char*", so effectively I'll have two
   different representations of strings.

2. I can't mix two different representations of strings in C without
   compromising type safety. In C, the type for characters is "char",
   that by definition is different type than "unsigned char".

3. As I now represent characters by unsigned tiny integers, I loose
   distinct type for unsigned tiny integers. Effectively this brings me
   back in time to the old days when C didn't have distinct type for
   signed tiny integers.

Therefore, your "solution" to one problem creates a bunch of
others. Worse yet, you in fact don't solve initial problem of comparing
strings. Take, for example, KOI8-R encoding. Comparing characters from
this encoding using "unsigned char" representation of character indexes
makes no more sense than comparing "signed char" representations, as
both comparisons will bring meaningless results, due to the fact that
the order of (some of) the characters in the table doesn't match their
alphabetical order.

>> If you don't like the fact that in C language characters are "char",
>> strings are "char*", and the sign of char is implementation-defined,
>> please argue with the C committee, not with me.
>> 
>> Or use -funsigned-char to get dialect of C that fits your requirements
>> better.
>
> If you restrict yourself to a specific C compiler, you are doomed, and 
> that's not a good start.

Isn't it you who insists that "char" should always be unsigned? It's
your problem then to use compiler that understands those imaginary
language that meets the requirement.

-- Sergei.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-19 Thread Sergei Organov
Bodo Eggert [EMAIL PROTECTED] writes:
 On Fri, 16 Feb 2007, Sergei Organov wrote:
 Bodo Eggert [EMAIL PROTECTED] writes:
  Sergei Organov [EMAIL PROTECTED] wrote:
  Linus Torvalds [EMAIL PROTECTED] writes:

[...]
 If we start talking about the C language, my opinion is that it's C
 problem that it allows numeric comparison of char variables at
 all. If one actually needs to compare alphabetic characters numerically,
 he should first cast to required integer type.

 Char values are indexes in a character set. Taking the difference of
 indexes is legal. Other languages have ord('x'), but it's only an
 expensive typecast (byte('x') does exactly the same thing).

Why this typecast should be expensive? Is

   char c1, c2;
   ...
   if((unsigned char)c1  (unsigned char)c2)

any more expensive than if(c1  c2)?

[...]

 Comparison of characters being numeric is not a very good property of
 the C language.

 NACK, as above

The point was not to entirely disable it or to make it any more
expensive, but to somehow force programmer to be explicit that he indeed
needs numeric comparison of characters' indexes. Well, I do realize that
the above is not in the C spirit that is to allow implicit conversions
almost everywhere.

  I repeat: Thanks to using signed chars, the programs only
  work /by/ /accident/! Promoting the use of signed char strings is promoting
  bugs and endangering the stability of all our systems. You should stop this
  bullshit now, instead of increasing the pile.
 
 Where did you see I promoted using of singed char strings?!

 char strings are often signed char strings, only using unsigned char 
 prevents this in a portable way. If you care about ordinal values of
 your strings, use unsigned.

It means that (as I don't want to change the type of my
strings/characters in case I suddenly start to care about ordinal values
of the characters) I must always use unsigned char for storing
characters in practice. Right?

There are at least three problems then:

1. The type of abc remains char*, so effectively I'll have two
   different representations of strings.

2. I can't mix two different representations of strings in C without
   compromising type safety. In C, the type for characters is char,
   that by definition is different type than unsigned char.

3. As I now represent characters by unsigned tiny integers, I loose
   distinct type for unsigned tiny integers. Effectively this brings me
   back in time to the old days when C didn't have distinct type for
   signed tiny integers.

Therefore, your solution to one problem creates a bunch of
others. Worse yet, you in fact don't solve initial problem of comparing
strings. Take, for example, KOI8-R encoding. Comparing characters from
this encoding using unsigned char representation of character indexes
makes no more sense than comparing signed char representations, as
both comparisons will bring meaningless results, due to the fact that
the order of (some of) the characters in the table doesn't match their
alphabetical order.

 If you don't like the fact that in C language characters are char,
 strings are char*, and the sign of char is implementation-defined,
 please argue with the C committee, not with me.
 
 Or use -funsigned-char to get dialect of C that fits your requirements
 better.

 If you restrict yourself to a specific C compiler, you are doomed, and 
 that's not a good start.

Isn't it you who insists that char should always be unsigned? It's
your problem then to use compiler that understands those imaginary
language that meets the requirement.

-- Sergei.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-19 Thread Sergei Organov
Rene Herman [EMAIL PROTECTED] writes:
[...]
 Given char's special nature, shouldn't the conclusion of this thread
 have long been simply that gcc needs -Wno-char-pointer-sign? (with
 whatever default, as far as I'm concerned).

I entirely agree that all the char business in C is messy enough to
justify separate warning switch(es) in GCC.

However, I still insist that the problem with the code:

   void foo(char *c);
   unsigned char *u;
   signed char *s;
   ...
   foo(u);
   foo(s);

is not (only) in signedness, as neither 'u' nor 's' has compatible type
with the char*, no matter what is the sign of char, so if one cares
about type safety he needs warnings on both invocations of foo().

-- Sergei.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-19 Thread Sergei Organov
Bodo Eggert [EMAIL PROTECTED] writes:
 On Fri, 16 Feb 2007, Sergei Organov wrote:

[...]

 I'll say it again: Either the code using unspecified chars is correct, or 
 it isn't. If it's correct, neither using with signed nor with unsigned 
 chars is a bug and you should not warn at all, and if it's not correct, 
 you should always warn. Instead, gcc warns on code compiles for
 $arch.

Here is where we disagree. In my opinion, no matter what the sign of
char for given architecture is, it's incorrect to pass either signed
char* or unsigned char* argument to a function expecting char*.

  Therefore it's either always wrong to call your char* function with char*,
  unsigned char* _and_ signed char unless you can guarantee not to overflow 
  any
  of them, or it's always correct to call char* functions with any kind
  of these.
 
 How are you sure those who wrote foo(char*) agrees with your opinion or
 even understands all the involved issues?

 Let's asume we have this piece of buggy code. We compile it on an unsigned
 char architecture. No warning. *BOOM*

There should be warning, -- that's my point. char is different. char
is distinct from either signed char or unsigned char. Always. At
least it's how C is defined.

 Let's asume there is correct code, and we use it as designed:
 Warning: Wrong arch
 Warning: Wrong arch
 Warning: Wrong arch
 Warning: real issue 
 Warning: Wrong arch
 Warning: Wrong arch
 Warning: Wrong arch
 Warning: Wrong arch
 scroll off screen/
 Warning: Wrong arch
 Warning: Wrong arch
 Warning: Wrong arch
 Warning: Wrong arch
 Warning: Wrong arch

 You don't see real issue. *BOOM*


 What can you do about this warning? Let's asume we cast everywhere:

I already gave an answer in response to Linus:

static inline size_t ustrlen(const unsigned char *s)
{
return strlen((const char *)s);
}


 struct foo * p;
 printf(strlen(char*)p); *BOOM*

printf(ustrlen(p)); *NO BOOM*
unsigned char* u;
printf(ustrlen(u)); *NO WARNING*

-- Sergei.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-19 Thread Sergei Organov
Linus Torvalds [EMAIL PROTECTED] writes:
 On Thu, 15 Feb 2007, Sergei Organov wrote:
 
 I agree that if the warning has no true positives, it sucks. The problem
 is that somehow I doubt it has none. And the reasons for the doubt are:

 Why do you harp on no true positives?

Because if somebody is capable to proof a warning has no true positives,
I immediately agree it's useless. I just wanted to check if it's indeed
the case. It seems it is not.

 That's a pointless thing. You can make *any* warning have true 
 positives. My personal favorite is the unconditional warning:

   warning: user is an idiot

 and I _guarantee_ you that it has a lot of true positives.

Yes, but there is obviously 0 correlation between the warning and the
user being an idiot (except that non-idiot will probably find a way to
either turn this warning off, or filter it out).

I already agreed that a warning may have true positives and still be
bad; and a warning having no false negatives is obviously a pretty
one, though unfortunately not that common in practice; and a warning
that has no true positives is useless. What we are arguing about are
intermediate cases that are most common in practice.

As programmers, we in fact are interested in correlation between a
warning and actual problem in the software, but for the kind of warnings
we are discussing (sign-correctness and type-safety in general), the
correlation depends on the style the program is written is, making it
difficult to use as a criteria in the discussion.

 It's the no false negatives angle you should look at.

I'd like to, but then I'll need to classify most of (all?) the GCC
warnings as bad, I'm afraid.

 THAT is what matters. The reason we don't see a lot of warnings about 
 idiot users is not that people don't do stupid things, but that 
 *sometimes* they actually don't do something stupid.

 Yeah, I know, it's far-fetched, but still.

 In other words, you're barking up *exactly* the wrong tree. You're looking 
 at it the wrong way around.

 Think of it this way: in science, a theory is proven to be bad by a single 
 undeniable fact just showing that it's wrong.

 The same is largely true of a warning. If the warning sometimes happens 
 for code that is perfectly fine, the warning is bad.

Consider:

int find_first_zero(const int *a)
{
int index = 0;
while(*a++ != 0)
index++;
return index;
}

unsigned int array[] = { 1, 2, 3, 0, 25, 14 };
int index = find_first_zero(array);  *WARNING*

So, by your logic, -Wpointer-sign warning is bad in general? If not,
then how will you fix the code above? And the upside of the fix is
what?

IMHO, the problem is that this warning is about violation of type
safety. Type safety by itself doesn't guarantee the code is perfectly
fine. Violation of type safety doesn't necessarily mean the code is
broken. A warning that warns about violation of type safety has the same
properties.  Now, it's up to you to decide if you want warnings on type
safety or not.

What you are saying, on the other hand, for me reads like this: I do
want warnings on violations of type safety in general, except for
signed char or unsigned char being used instead of char. I think
I do understand your reasons, -- I just don't agree with them.

-- Sergei.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-16 Thread Bodo Eggert
On Fri, 16 Feb 2007, Sergei Organov wrote:
> Bodo Eggert <[EMAIL PROTECTED]> writes:
> > Sergei Organov <[EMAIL PROTECTED]> wrote:
> >> Linus Torvalds <[EMAIL PROTECTED]> writes:

> > If you don't code for a specific compiler with specific settings, there is
> > no implementation defining the signedness of char, and each part of the code
> > using char* will be wrong unless it handles both cases correctly.
> 
> The problem here is that due to historical reasons, there could be code
> out there that abuses "char" for "signed char" (not sure about "unsigned
> char"). Old code and old habits are rather persistent.

There could be code using trigraphs ... and gcc has an option for that.

If this code uses signed chars, using it on unsigned-char-archs is broken
and should be warned about, but the compiler will not warn about this
because this code will not use "signed char" and therefore it's bug-to-bug
syntax compatible, waiting for a semantic breakdown.

I'll say it again: Either the code using unspecified chars is correct, or 
it isn't. If it's correct, neither using with signed nor with unsigned 
chars is a bug and you should not warn at all, and if it's not correct, 
you should always warn. Instead, gcc warns on "code compiles for $arch".

> > Therefore it's either always wrong to call your char* function with char*,
> > unsigned char* _and_ signed char unless you can guarantee not to overflow 
> > any
> > of them, or it's always correct to call char* functions with any kind
> > of these.
> 
> How are you sure those who wrote foo(char*) agrees with your opinion or
> even understands all the involved issues?

Let's asume we have this piece of buggy code. We compile it on an unsigned
char architecture. No warning. *BOOM*


Let's asume there is correct code, and we use it as designed:
Warning: Wrong arch
Warning: Wrong arch
Warning: Wrong arch
Warning: real issue 
Warning: Wrong arch
Warning: Wrong arch
Warning: Wrong arch
Warning: Wrong arch

Warning: Wrong arch
Warning: Wrong arch
Warning: Wrong arch
Warning: Wrong arch
Warning: Wrong arch

You don't see "real issue". *BOOM*


What can you do about this warning? Let's asume we cast everywhere:

struct foo * p;
printf(strlen(char*)p); *BOOM*


Let's asume we disable this warning:

int f(unsigned short x)
{
if (!x)
return 0;
return (int) x + f(x-1);
}

f(-1); *BOOM*


Therefore unless you program for one arch with one set of compiler flags, 
this warning is useless, and I did not see much code explicitely designed 
to be non-portable.

Warning on wrong signedness is good, but if you can't enable it on 
portable code, it's useless.

-- 
Funny quotes:
39. Ever wonder about those people who spend $2.00 apiece on those little
bottles of Evian water? Try spelling Evian backwards: NAIVE
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-16 Thread Bodo Eggert
On Fri, 16 Feb 2007, Sergei Organov wrote:
> Bodo Eggert <[EMAIL PROTECTED]> writes:
> > Sergei Organov <[EMAIL PROTECTED]> wrote:
> >> Linus Torvalds <[EMAIL PROTECTED]> writes:

> [...]
> > Using signed chars for strings is wrong in most countries on earth. It was
> > wrong when the first IBM PC came out in 1981, and creating a compiler in
> > 1987 defaulting to signed char is a sure sign of originating from an
> > isolated country and knowing nothing about this planet.
> 
> Maybe I'm wrong, but wasn't it the case that C had no "signed char" type
> that time, so "char" had to be abused for the "tiny signed integer"?

Mentioning tha availability of unsigned char was a change in K's 
book "The C programming Language" from 1987 to 1988:

"For some time, type unsigned char has been available. The standard 
introduces the signed keyword to make signedness explicit for char and 
other integral objects."

So if there was no "signed char" before, the blame passes on to those who
didn't introduce it alongside "unsigned char".

> > Using signed chars in comparisons is especially wrong, and casting
> > each char to unsigned before comparing them is likely to be
> > forgotten.
> 
> If we start talking about the C language, my opinion is that it's C
> problem that it allows numeric comparison of "char" variables at
> all. If one actually needs to compare alphabetic characters numerically,
> he should first cast to required integer type.

Char values are indexes in a character set. Taking the difference of
indexes is legal. Other languages have ord('x'), but it's only an
expensive typecast (byte('x') does exactly the same thing).

> > Unsigned character strings are useless because there is no such thing
> > as char(-23), and if these strings weren't casted to signed inside all
> > IO functions, they wouldn't work correctly.
> 
> Didn't you swap "signed" and "unsigned" by mistake in this phrase? Or
> are you indeed think that using "unsigned char*" for strings is useless?

ACK

> > Only because many programmers don't compare chars, most programs will
> > work outside the USA.
> 
> Comparison of characters being numeric is not a very good property of
> the C language.

NACK, as above

> > I repeat: Thanks to using signed chars, the programs only
> > work /by/ /accident/! Promoting the use of signed char strings is promoting
> > bugs and endangering the stability of all our systems. You should stop this
> > bullshit now, instead of increasing the pile.
> 
> Where did you see I promoted using of "singed char strings"?!

char strings are often signed char strings, only using unsigned char 
prevents this in a portable way. If you care about ordinal values of
your strings, use unsigned.

> If you
> don't like the fact that in C language characters are "char", strings
> are "char*", and the sign of char is implementation-defined, please
> argue with the C committee, not with me.
> 
> Or use -funsigned-char to get dialect of C that fits your requirements
> better.

If you restrict yourself to a specific C compiler, you are doomed, and 
that's not a good start.
-- 
What about those red balls they have on car aerials so you can spot your car
in a park. I think all cars should have them!
-- Homer Simpson
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-16 Thread Sergei Organov
Bodo Eggert <[EMAIL PROTECTED]> writes:

> Sergei Organov <[EMAIL PROTECTED]> wrote:
>> Linus Torvalds <[EMAIL PROTECTED]> writes:
>
>>> Exactly because "char" *by*definition* is "indeterminate sign" as far as
>>> something like "strlen()" is concerned.
>> 
>> Thanks, I now understand that you either don't see the difference
>> between "indeterminate" and "implementation-defined" in this context or
>> consider it non-essential, so I think I've got your point.
>
> If you don't code for a specific compiler with specific settings, there is
> no implementation defining the signedness of char, and each part of the code
> using char* will be wrong unless it handles both cases correctly.

The problem here is that due to historical reasons, there could be code
out there that abuses "char" for "signed char" (not sure about "unsigned
char"). Old code and old habits are rather persistent.

> Therefore it's either always wrong to call your char* function with char*,
> unsigned char* _and_ signed char unless you can guarantee not to overflow any
> of them, or it's always correct to call char* functions with any kind
> of these.

How are you sure those who wrote foo(char*) agrees with your opinion or
even understands all the involved issues?

> Off cause if you happen to code for specific compiler settings, one signedness
> of char will become real and one warning will be legit. And if pigs fly, they
> should wear googles to protect their eyes ...

And if people were writing perfect portable code all the time, there
would be no need for warnings in the first place...

-- Sergei.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-16 Thread Sergei Organov
Bodo Eggert <[EMAIL PROTECTED]> writes:
> Sergei Organov <[EMAIL PROTECTED]> wrote:
>> Linus Torvalds <[EMAIL PROTECTED]> writes:
[...]
> Using signed chars for strings is wrong in most countries on earth. It was
> wrong when the first IBM PC came out in 1981, and creating a compiler in
> 1987 defaulting to signed char is a sure sign of originating from an
> isolated country and knowing nothing about this planet.

Maybe I'm wrong, but wasn't it the case that C had no "signed char" type
that time, so "char" had to be abused for the "tiny signed integer"?

> Using signed chars in comparisons is especially wrong, and casting
> each char to unsigned before comparing them is likely to be
> forgotten.

If we start talking about the C language, my opinion is that it's C
problem that it allows numeric comparison of "char" variables at
all. If one actually needs to compare alphabetic characters numerically,
he should first cast to required integer type.

> Unsigned character strings are useless because there is no such thing
> as char(-23), and if these strings weren't casted to signed inside all
> IO functions, they wouldn't work correctly.

Didn't you swap "signed" and "unsigned" by mistake in this phrase? Or
are you indeed think that using "unsigned char*" for strings is useless?

> Only because many programmers don't compare chars, most programs will
> work outside the USA.

Comparison of characters being numeric is not a very good property of
the C language.

> I repeat: Thanks to using signed chars, the programs only
> work /by/ /accident/! Promoting the use of signed char strings is promoting
> bugs and endangering the stability of all our systems. You should stop this
> bullshit now, instead of increasing the pile.

Where did you see I promoted using of "singed char strings"?! If you
don't like the fact that in C language characters are "char", strings
are "char*", and the sign of char is implementation-defined, please
argue with the C committee, not with me.

Or use -funsigned-char to get dialect of C that fits your requirements
better.

-- Sergei.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-16 Thread Sergei Organov
Bodo Eggert [EMAIL PROTECTED] writes:
 Sergei Organov [EMAIL PROTECTED] wrote:
 Linus Torvalds [EMAIL PROTECTED] writes:
[...]
 Using signed chars for strings is wrong in most countries on earth. It was
 wrong when the first IBM PC came out in 1981, and creating a compiler in
 1987 defaulting to signed char is a sure sign of originating from an
 isolated country and knowing nothing about this planet.

Maybe I'm wrong, but wasn't it the case that C had no signed char type
that time, so char had to be abused for the tiny signed integer?

 Using signed chars in comparisons is especially wrong, and casting
 each char to unsigned before comparing them is likely to be
 forgotten.

If we start talking about the C language, my opinion is that it's C
problem that it allows numeric comparison of char variables at
all. If one actually needs to compare alphabetic characters numerically,
he should first cast to required integer type.

 Unsigned character strings are useless because there is no such thing
 as char(-23), and if these strings weren't casted to signed inside all
 IO functions, they wouldn't work correctly.

Didn't you swap signed and unsigned by mistake in this phrase? Or
are you indeed think that using unsigned char* for strings is useless?

 Only because many programmers don't compare chars, most programs will
 work outside the USA.

Comparison of characters being numeric is not a very good property of
the C language.

 I repeat: Thanks to using signed chars, the programs only
 work /by/ /accident/! Promoting the use of signed char strings is promoting
 bugs and endangering the stability of all our systems. You should stop this
 bullshit now, instead of increasing the pile.

Where did you see I promoted using of singed char strings?! If you
don't like the fact that in C language characters are char, strings
are char*, and the sign of char is implementation-defined, please
argue with the C committee, not with me.

Or use -funsigned-char to get dialect of C that fits your requirements
better.

-- Sergei.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-16 Thread Sergei Organov
Bodo Eggert [EMAIL PROTECTED] writes:

 Sergei Organov [EMAIL PROTECTED] wrote:
 Linus Torvalds [EMAIL PROTECTED] writes:

 Exactly because char *by*definition* is indeterminate sign as far as
 something like strlen() is concerned.
 
 Thanks, I now understand that you either don't see the difference
 between indeterminate and implementation-defined in this context or
 consider it non-essential, so I think I've got your point.

 If you don't code for a specific compiler with specific settings, there is
 no implementation defining the signedness of char, and each part of the code
 using char* will be wrong unless it handles both cases correctly.

The problem here is that due to historical reasons, there could be code
out there that abuses char for signed char (not sure about unsigned
char). Old code and old habits are rather persistent.

 Therefore it's either always wrong to call your char* function with char*,
 unsigned char* _and_ signed char unless you can guarantee not to overflow any
 of them, or it's always correct to call char* functions with any kind
 of these.

How are you sure those who wrote foo(char*) agrees with your opinion or
even understands all the involved issues?

 Off cause if you happen to code for specific compiler settings, one signedness
 of char will become real and one warning will be legit. And if pigs fly, they
 should wear googles to protect their eyes ...

And if people were writing perfect portable code all the time, there
would be no need for warnings in the first place...

-- Sergei.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-16 Thread Bodo Eggert
On Fri, 16 Feb 2007, Sergei Organov wrote:
 Bodo Eggert [EMAIL PROTECTED] writes:
  Sergei Organov [EMAIL PROTECTED] wrote:
  Linus Torvalds [EMAIL PROTECTED] writes:

 [...]
  Using signed chars for strings is wrong in most countries on earth. It was
  wrong when the first IBM PC came out in 1981, and creating a compiler in
  1987 defaulting to signed char is a sure sign of originating from an
  isolated country and knowing nothing about this planet.
 
 Maybe I'm wrong, but wasn't it the case that C had no signed char type
 that time, so char had to be abused for the tiny signed integer?

Mentioning tha availability of unsigned char was a change in KR's 
book The C programming Language from 1987 to 1988:

For some time, type unsigned char has been available. The standard 
introduces the signed keyword to make signedness explicit for char and 
other integral objects.

So if there was no signed char before, the blame passes on to those who
didn't introduce it alongside unsigned char.

  Using signed chars in comparisons is especially wrong, and casting
  each char to unsigned before comparing them is likely to be
  forgotten.
 
 If we start talking about the C language, my opinion is that it's C
 problem that it allows numeric comparison of char variables at
 all. If one actually needs to compare alphabetic characters numerically,
 he should first cast to required integer type.

Char values are indexes in a character set. Taking the difference of
indexes is legal. Other languages have ord('x'), but it's only an
expensive typecast (byte('x') does exactly the same thing).

  Unsigned character strings are useless because there is no such thing
  as char(-23), and if these strings weren't casted to signed inside all
  IO functions, they wouldn't work correctly.
 
 Didn't you swap signed and unsigned by mistake in this phrase? Or
 are you indeed think that using unsigned char* for strings is useless?

ACK

  Only because many programmers don't compare chars, most programs will
  work outside the USA.
 
 Comparison of characters being numeric is not a very good property of
 the C language.

NACK, as above

  I repeat: Thanks to using signed chars, the programs only
  work /by/ /accident/! Promoting the use of signed char strings is promoting
  bugs and endangering the stability of all our systems. You should stop this
  bullshit now, instead of increasing the pile.
 
 Where did you see I promoted using of singed char strings?!

char strings are often signed char strings, only using unsigned char 
prevents this in a portable way. If you care about ordinal values of
your strings, use unsigned.

 If you
 don't like the fact that in C language characters are char, strings
 are char*, and the sign of char is implementation-defined, please
 argue with the C committee, not with me.
 
 Or use -funsigned-char to get dialect of C that fits your requirements
 better.

If you restrict yourself to a specific C compiler, you are doomed, and 
that's not a good start.
-- 
What about those red balls they have on car aerials so you can spot your car
in a park. I think all cars should have them!
-- Homer Simpson
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-16 Thread Bodo Eggert
On Fri, 16 Feb 2007, Sergei Organov wrote:
 Bodo Eggert [EMAIL PROTECTED] writes:
  Sergei Organov [EMAIL PROTECTED] wrote:
  Linus Torvalds [EMAIL PROTECTED] writes:

  If you don't code for a specific compiler with specific settings, there is
  no implementation defining the signedness of char, and each part of the code
  using char* will be wrong unless it handles both cases correctly.
 
 The problem here is that due to historical reasons, there could be code
 out there that abuses char for signed char (not sure about unsigned
 char). Old code and old habits are rather persistent.

There could be code using trigraphs ... and gcc has an option for that.

If this code uses signed chars, using it on unsigned-char-archs is broken
and should be warned about, but the compiler will not warn about this
because this code will not use signed char and therefore it's bug-to-bug
syntax compatible, waiting for a semantic breakdown.

I'll say it again: Either the code using unspecified chars is correct, or 
it isn't. If it's correct, neither using with signed nor with unsigned 
chars is a bug and you should not warn at all, and if it's not correct, 
you should always warn. Instead, gcc warns on code compiles for $arch.

  Therefore it's either always wrong to call your char* function with char*,
  unsigned char* _and_ signed char unless you can guarantee not to overflow 
  any
  of them, or it's always correct to call char* functions with any kind
  of these.
 
 How are you sure those who wrote foo(char*) agrees with your opinion or
 even understands all the involved issues?

Let's asume we have this piece of buggy code. We compile it on an unsigned
char architecture. No warning. *BOOM*


Let's asume there is correct code, and we use it as designed:
Warning: Wrong arch
Warning: Wrong arch
Warning: Wrong arch
Warning: real issue 
Warning: Wrong arch
Warning: Wrong arch
Warning: Wrong arch
Warning: Wrong arch
scroll off screen/
Warning: Wrong arch
Warning: Wrong arch
Warning: Wrong arch
Warning: Wrong arch
Warning: Wrong arch

You don't see real issue. *BOOM*


What can you do about this warning? Let's asume we cast everywhere:

struct foo * p;
printf(strlen(char*)p); *BOOM*


Let's asume we disable this warning:

int f(unsigned short x)
{
if (!x)
return 0;
return (int) x + f(x-1);
}

f(-1); *BOOM*


Therefore unless you program for one arch with one set of compiler flags, 
this warning is useless, and I did not see much code explicitely designed 
to be non-portable.

Warning on wrong signedness is good, but if you can't enable it on 
portable code, it's useless.

-- 
Funny quotes:
39. Ever wonder about those people who spend $2.00 apiece on those little
bottles of Evian water? Try spelling Evian backwards: NAIVE
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-15 Thread Rene Herman

On 02/15/2007 08:02 PM, Linus Torvalds wrote:


Think of it this way: in science, a theory is proven to be bad by a
single undeniable fact just showing that it's wrong.

The same is largely true of a warning. If the warning sometimes
happens for code that is perfectly fine, the warning is bad.


Slight difference; if a compulsory warning sometimes happens for code 
that is perfectly fine, the warning is bad. I do want to be _able_ to 
get as many warnings as a compiler can muster though.


Given char's special nature, shouldn't the conclusion of this thread 
have long been simply that gcc needs -Wno-char-pointer-sign? (with 
whatever default, as far as I'm concerned).


Rene.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-15 Thread Lennart Sorensen
On Thu, Feb 15, 2007 at 07:57:05AM -0800, Linus Torvalds wrote:
> I agree that it's "unnecessary code", and in many ways exactly the same 
> thing. I just happen to believe that casts tend to be a lot more dangerous 
> than extraneous initializations. Casts have this nasty tendency of hiding 
> *other* problems too (ie you later change the thing you cast completely, 
> and now the compiler doesn't warn about a *real* bug, because the cast 
> will just silently force it to a wrong type).

Well one way you could cast it and still have the compiler tell you if
the type ever changes is doing something stupid like:

char* unsigned_char_star_to_char_star(unsigned char* in) {
return (char*)in;
}

Then call your strlen with:
strlen(unsigned_char_star_to_char_star(my_unfortunate_unsigned_char_string)

If you ever changed the type the compiler would warn you again since the
convertion function doesn't accept anything other than unsigned char *
being passed in for "convertion".  Seems safer than a direct cast since
then you get no type checking anymore.

I hope it doesn't come to this though.  Hopefully gcc will change it's
behaviour back or give an option of --dontcomplainaboutcharstar :)

--
Len Sorensen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-15 Thread Bodo Eggert
Sergei Organov <[EMAIL PROTECTED]> wrote:
> Linus Torvalds <[EMAIL PROTECTED]> writes:

>> Exactly because "char" *by*definition* is "indeterminate sign" as far as
>> something like "strlen()" is concerned.
> 
> Thanks, I now understand that you either don't see the difference
> between "indeterminate" and "implementation-defined" in this context or
> consider it non-essential, so I think I've got your point.

If you don't code for a specific compiler with specific settings, there is
no implementation defining the signedness of char, and each part of the code
using char* will be wrong unless it handles both cases correctly.

Therefore it's either always wrong to call your char* function with char*,
unsigned char* _and_ signed char unless you can guarantee not to overflow any
of them, or it's always correct to call char* functions with any kind of these.

Off cause if you happen to code for specific compiler settings, one signedness
of char will become real and one warning will be legit. And if pigs fly, they
should wear googles to protect their eyes ...

>> THE FACT IS, THAT "strlen()" IS DEFINED UNIVERSALLY AS TAKING "char *".
> 
> So just don't pass it "unsigned char*". End of story.

Using signed chars for strings is wrong in most countries on earth. It was
wrong when the first IBM PC came out in 1981, and creating a compiler in
1987 defaulting to signed char is a sure sign of originating from an
isolated country and knowing nothing about this planet. Using signed chars
in comparisons is especially wrong, and casting each char to unsigned before
comparing them is likely to be forgotten. Unsigned character strings are
useless because there is no such thing as char(-23), and if these strings
weren't casted to signed inside all IO functions, they wouldn't work correctly.
Only because many programmers don't compare chars, most programs will work
outside the USA. I repeat: Thanks to using signed chars, the programs only
work /by/ /accident/! Promoting the use of signed char strings is promoting
bugs and endangering the stability of all our systems. You should stop this
bullshit now, instead of increasing the pile.

>> That BY DEFINITION means that "strlen()" cannot care about the sign,
>> because the sign IS NOT DEFINED UNIVERSALLY!
>>
>> And if you cannot accept that fact, it's your problem. Not mine.
> 
> I never had problems either with strlen() or with this warning, so was
> curious why does the warning is such a problem for the kernel.

The warning is a problem because either it clutters your build log hiding
real bugs, you typecast hiding errors whenever a char* function is called,
or you disable it globally hiding real signedness bugs. Either way you take
it, it's wrong.
-- 
Funny quotes:
12. He who laughs last thinks slowest.

Friß, Spammer: [EMAIL PROTECTED] [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-15 Thread Sergei Organov
Olivier Galibert <[EMAIL PROTECTED]> writes:

> On Tue, Feb 13, 2007 at 09:06:24PM +0300, Sergei Organov wrote:
>> I agree that making strxxx() family special is not a good idea. So what
>> do we do for a random foo(char*) called with an 'unsigned char*'
>> argument? Silence? Hmmm... It's not immediately obvious that it's indeed
>> harmless. Yet another -Wxxx option to GCC to silence this particular
>> case?
>
> Silence would be good.  "char *" has a special status in C, it can be:
> - pointer to a char/to an array of chars (standard interpretation)
> - pointer to a string
> - generic pointer to memory you can read(/write)
>
> Check the aliasing rules if you don't believe be on the third one.

Generic pointer still reads 'void*' isn't it? If you don't believe me,
check memcpu() and friends. From the POV of aliasing rules, "char*",
"unsigned char*", and "signed char*" are all the same, as aliasing rules
mention "character type", that according to the standard, is defined
like this:

  "The three types char, signed char, and unsigned char are collectively
   called the character types."

> And it's *way* more often cases 2 and 3 than 1 for the simple reason
> that the signedness of char is unpredictable.

Are the first two different other than by 0-termination? If not, then
differentiate 1 and 2 by signedness doesn't make sense. If we throw
away signedness argument, then yes, 2 is more common than 1, due to
obvious reason that it's much more convenient to use 0-terminated arrays
of characters in C than plain arrays of characters.

For 3, I'd use "unsigned char*" (or, sometimes, maybe, "signed char*") to
do actual access (due to aliasing rules), but "void*" as an argument
type, as arbitrary pointer is spelled "void*" in C.

So, it's 2 that I'd consider the most common usage for "char*". Strings
of characters are idiomatically "char*" in C.

> As a result, a signedness warning between char * and (un)signed char *
> is 99.99% of the time stupid.

As a result of what? Of the fact that strings in C are "char*"? Of the
fact that one can use "char*" to access arbitrary memory? If the latter,
then "char*", "signed char*", and "unsigned char*" are all equal, and
you may well argue that there should be no warning between "signed
char*" and "unsigned char*" as well, as either of them could be used to
access memory.

Anyway, I've already stated that I don't think it's signedness that matters
here. Standard explicitly says "char", "signed char", and "unsigned
char" are all distinct types, so the warning should be about
incompatible pointer instead. Anyway, I even don't dispute those 99.99%
number, I just want somebody to explain me how dangerous those 0.001%
are, and isn't it exactly 0%?

[...]
>> I'm afraid I don't follow. Do we have a way to say "I want an int of
>> indeterminate sign" in C?
>
> Almost completely.  The rules on aliasing say you can convert pointer
> between signed and unsigned variants and the accesses will be
> unsurprising.

Only from the point of view of aliasing. Aliasing rules don't disallow
other surprises.

>
> The only problem is that the implicit conversion of incompatible
> pointer parameters to a function looks impossible in the draft I have.

Why do you think it's impossible according to the draft?

> Probably has been corrected in the final version.

I doubt it's impossible either in the draft or in the final version.

> In any case, having for instance unsigned int * in a prototype really
> means in the language "I want a pointer to integers, and I'm probably
> going to use it them as unsigned, so beware".

> For the special case of char, since the beware version would require a
> signed or unsigned tag, it really means indeterminate.

IMHO, it would read "I want a pointer to alphabetic characters, and I'm
probably going to use them as alphabetic characters, so beware". No
signedness involved.

For example, suppose we have 3 function that compare two
zero-terminated arrays:

int compare1(char *c1, char *c2);
int compare2(unsigned char *c1, unsigned char *c2);
int compare3(signed char *c1, signed char *c2);

the first one might compare arguments alphabetically, while the second
and the third compare them numerically. All of the 3 are different, so
passing wrong type of argument to either of them might be dangerous.

> C is sometimes called a high-level assembler for a reason :-)

I'm afraid that those times when it actually was, are far in the past,
for better or worse.

>> The same way there doesn't seem to be a way to say "I want a char of
>> indeterminate sign". :( So no, strlen() doesn't actually say that, no
>> matter if we like it or not. It actually says "I want a char with
>> implementation-defined sign".
>
> In this day and age it means "I want a 0-terminated string".

IMHO, you've forgot to say "of alphabetic characters".

> Everything else is explicitely signed char * or unsigned char *, often
> through typedefs in the signed case.

Those are for signed and unsigned tiny 

Re: somebody dropped a (warning) bomb

2007-02-15 Thread Linus Torvalds


On Thu, 15 Feb 2007, Sergei Organov wrote:
> 
> I agree that if the warning has no true positives, it sucks. The problem
> is that somehow I doubt it has none. And the reasons for the doubt are:

Why do you harp on "no true positives"?

That's a pointless thing. You can make *any* warning have "true 
positives". My personal favorite is the unconditional warning:

warning: user is an idiot

and I _guarantee_ you that it has a lot of true positives.

It's the "no false negatives" angle you should look at.

THAT is what matters. The reason we don't see a lot of warnings about 
idiot users is not that people don't do stupid things, but that 
*sometimes* they actually don't do something stupid.

Yeah, I know, it's far-fetched, but still.

In other words, you're barking up *exactly* the wrong tree. You're looking 
at it the wrong way around.

Think of it this way: in science, a theory is proven to be bad by a single 
undeniable fact just showing that it's wrong.

The same is largely true of a warning. If the warning sometimes happens 
for code that is perfectly fine, the warning is bad.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-15 Thread Sergei Organov
Linus Torvalds <[EMAIL PROTECTED]> writes:
> On Thu, 15 Feb 2007, Sergei Organov wrote:

[...Skip things I agree with...]

>> > But if you have 
>> >
>> >unsigned char *mystring;
>> >
>> >len = strlen(mystring);
>> >
>> > then please tell me how to fix that warning without making the code 
>> > *worse* from a type-safety standpoint? You CANNOT. You'd have to cast it 
>> > explicitly (or assing it through a "void *" variable), both of which 
>> > actually MAKE THE TYPE-CHECKING PROBLEM WORSE!
>> 
>> Because instead of trying to fix the code, you insist on hiding the
>> warning.
>
> No. I'm saying that I have *reasons* that I need my string to be unsigned. 
> That makes your first "fix" totally bogus:

Somehow I still suppose that most of those warnings could be fixed by
using "char*" instead. Yes, I believe you, you have reasons. But still I
only heard about isxxx() that appeared to be not a reason in practice
(in the context of the kernel tree). As you didn't tell about other
reasons, my expectation is that they are not in fact very common. I'm
curious what are actual reasons besides the fact that quite a lot of
code is apparently written this way in the kernel. The latter is indeed
a very sound reason for the warning to be sucks for kernel developers,
but it might be not a reason for it to be sucks in general.

>> There are at least two actual fixes:
>> 
>> 1. Do 'char *mystring' instead, as otherwise compiler can't figure out
>>that you are going to use 'mystring' to point to zero-terminated
>>array of characters.
>
> And the second fix is a fix, but it is, again, worse than the warning:
>
>> 2. Use "len = ustrlen(mystring)" if you indeed need something like C
>>strings
>
> Sure, I can do that, but the upside is what?

1. You make it explicit, in a type-safe manner, that you are sure it's
   safe to call "strlen()" with "unsigned char*" argument. We all agree
   about it. Let's write it down in the program. And we can already add
   "strcmp()" to the list. And, say, ustrdup() and ustrchr() returning
   "unsigned char *", would be more type-safe either.

2. You make it more explicit that you indeed mean to use your own
   representation of strings that is different than what C idiomatically
   uses for strings, along with explicitly defined allowed set of
   operations on this representation.

The actual divergence of opinion seems to be that you are sure it's safe
to call any foo(char*) with "unsigned char*" argument, and I'm not, due
to the reasons described below. If you are right, then there is indeed
little or no point in doing this work.

> Getting rid of a warning that was bogus to begin with?

I agree that if the warning has no true positives, it sucks. The problem
is that somehow I doubt it has none. And the reasons for the doubt are:

1. C standard does explicitly say that "char" is different type from
   either "signed char" or "unsigned char". There should be some reason
   for that, otherwise they'd simply write that "char" is a synonym for
   one of "signed char" or "unsigned char", depending on implementation.

2. If "char" in fact matches one of the other two types for given
   architecture, it is definitely different from another one, so
   substituting "char" for the different one might be unsafe. As a
   consequence, for a portable program it might be unsafe to substitute
   "char" for any of signed or unsigned, as "char" may happen to
   differ from any of them.

While the doubts aren't resolved, I prefer to at least be careful with
things that I don't entirely understand, so I wish compiler give me a
warning about subtle semantic differences (if any).

-- Sergei.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-15 Thread Linus Torvalds


On Thu, 15 Feb 2007, Sergei Organov wrote:
> 
> Yes, there is. There are at least 2 drawbacks. Minor one is
> initialization eating cycles. Major one is that if later I change the
> code and there will appear a pass that by mistake uses those fake value,
> I won't get the warning anymore. The latter drawback is somewhat similar
> to the drawbacks of explicit type casts.

I actually agree - it would clearly be *nicer* if gcc was accurate with 
the warning. I'm just saying that usually the patch to shut gcc up is 
quite benign.

But yes, it basically disabled the warning, and just *maybe* the warning 
was valid, and it disabled it by initializing something to the wrong 
value. 

That's why warnings with false positives are often much worse than not 
having the warning at all: you start dismissing them, and not even 
bothering to fix them "properly". That's especially true if there _isn't_ 
a proper fix.

I personally much prefer a compiler that doesn't warn a lot, but *when* it 
warns, it's 100% on the money. I think that compilers that warn about 
things that "may" be broken are bogus. It needs to be iron-clad, with no 
question about whether the warning is appropriate!

Which is, of course, why we then shut up some gcc warnings, and which gets 
us back to the start of the discussion.. ;)

> [I'd, personally, try to do code reorganization instead so that it
> becomes obvious for the compiler that the variable can't be used
> uninitialized. Quite often it makes the code better, at least it was my
> experience so far.]

It's true that it sometimes works that way. Not always, though.

The gcc "uninitialized variable" thing is *usually* a good sign that the 
code wasn't straightforward for the compiler to follow, and if the 
compiler cannot follow it, then probably neither can most programmers. So 
together with the fact that it's not at least _syntactically_ ugly to shut 
up, it's not a warning I personally object to most of the time (unlike, 
say, the pointer-sign one ;)

> [Did I already say that I think it's wrong warning to be given in this
> particular case, as the problem with the code is not in signedness?]
> 
> 1. Don't try to hide the warning, at least not immediately, -- consider
>fixing the code first. Ah, sorry, you've already decided for yourself
>that the warning is idiotic, so there is no reason to try to...

This is actually something we've tried. 

The problem with that approach is that once you have tons of warnings that 
are "expected", suddenly *all* warnings just become noise. Not just the 
bogus one. It really *is* a matter of: warnings should either be 100% 
solid, or they should not be there at all.

And this is not just about compiler warnings. We've added some warnings of 
our own (well, with compiler help, of course): things like the 
"deprecated" warnings etc. I have often ended up shutting them up, and one 
reason for that is that yes, I have literally added bugs to the kernel 
that the compiler really *did* warn about, but because there were so many 
other pointless warnings, I didn't notice!

I didn't make that up. I forget what the bug was, but it literally wasn't 
more than a couple of months ago that I passed in the wrong type to a 
function, and the compiler warned about it in big letters. It was just 
totally hidden by all the other warning crap.

> 2. If you add a cast, it's not in contrast, it's rather similar to fake
>initialization above as they have somewhat similar drawback.

I agree that it's "unnecessary code", and in many ways exactly the same 
thing. I just happen to believe that casts tend to be a lot more dangerous 
than extraneous initializations. Casts have this nasty tendency of hiding 
*other* problems too (ie you later change the thing you cast completely, 
and now the compiler doesn't warn about a *real* bug, because the cast 
will just silently force it to a wrong type).

And yeah, that may well be a personal hangup of mine. A lot of people 
think casts in C are normal. Me, I actually consider C to be a type-safe 
language (!) but it's only type-safe if you are careful, and avoiding 
casts is one of the rules.

Others claim that C isn't type-safe at all, and I think it comes from 
them having been involved in projects where people didn't have the same 
"casts are good, but only when you use them really really carefully".

> > But if you have 
> >
> > unsigned char *mystring;
> >
> > len = strlen(mystring);
> >
> > then please tell me how to fix that warning without making the code 
> > *worse* from a type-safety standpoint? You CANNOT. You'd have to cast it 
> > explicitly (or assing it through a "void *" variable), both of which 
> > actually MAKE THE TYPE-CHECKING PROBLEM WORSE!
> 
> Because instead of trying to fix the code, you insist on hiding the
> warning.

No. I'm saying that I have *reasons* that I need my string to be unsigned. 
That makes your first "fix" totally bogus:

> There are at least two actual fixes:
> 
> 1. Do 

Re: somebody dropped a (warning) bomb

2007-02-15 Thread Sergei Organov
Linus Torvalds <[EMAIL PROTECTED]> writes:
> On Tue, 13 Feb 2007, Sergei Organov wrote:
[...]
> BUT (and this is a big but) within the discussion of "strlen()", that is 
> no longer true. "strlen()" exists _outside_ of a single particular 
> implementation. As such, "implementation-defined" is no longer something 
> that "strlen()" can depend on.
>
> As an example of this argument, try this:
>
>   #include 
>   #include 
>
>   int main(int argc, char **argv)
>   {
>   char a1[] = { -1, 0 }, a2[] = { 1, 0 };
>
>   printf("%d %d\n", a1[0] < a2[0], strcmp(a1, a2) < 0);
>   return 0;
>   }
>
> and *before* you compile it, try to guess what the output is.

Well, I'll try to play fair, so I didn't yet compile it. Now, strcmp()
is defined in the C standard so that its behavior doesn't depend on the
sign of char:

   "The sign of a nonzero value returned by the comparison functions
memcmp, strcmp, and strncmp is determined by the sign of the
difference between the values of the first pair of characters (both
interpreted as unsigned char) that differ in the objects being

compared."

[Therefore, at least I don't need to build GCC multilib'ed on
-fsigned/unsigned-char to get consistent results even if strcmp() in
fact lives in a library, that was my first thought before I referred to
the standard ;)]

Suppose the char is signed. Then a1[0] < a2[0] (= -1 < 1) should be
true. On 2's-complement implementation with 8bit char, -1 converted by
strcmp() to unsigned char should be 0xFF, and 1 converted should be
1. So strcmp() should be equivalent to (0xFF < 1) that is false. So
I'd expect

 1 0

result on implementation with signed char.

Now suppose the char is unsigned. Then on 2's-complement implementation
with 8bit-byte CPU, a1[0] should be 0xFF, and a2[0] should be 1. The
result from strcmp() won't change. So I'd expect

 0 0

result on implementation with unsigned char.

Now I'm going to compile it (I must admit I'm slightly afraid to get
surprising results, so I've re-read my above reasonings before
compiling):

[EMAIL PROTECTED] tmp$ cat strcmp.c
#include 
#include 
int main()
{
  char a1[] = { -1, 0 }, a2[] = { 1, 0 };

  printf("%d %d\n", a1[0] < a2[0], strcmp(a1, a2) < 0);
  return 0;
}
[EMAIL PROTECTED] tmp$ gcc -v
Using built-in specs.
Target: i486-linux-gnu
...
gcc version 4.1.2 20061028 (prerelease) (Debian 4.1.1-19)
[EMAIL PROTECTED] tmp$ gcc -O2 strcmp.c -o strcmp && ./strcmp
1 0

> And when that confuses you,

It didn't, or did I miss something? Is char unsigned by default?

> try to compile it using gcc with the 
> "-funsigned-char" flag (or "-fsigned-char" if you started out on an 
> architecture where char was unsigned by default)

[EMAIL PROTECTED] tmp$ gcc -O2 -fsigned-char strcmp.c -o strcmp && ./strcmp
1 0
[EMAIL PROTECTED] tmp$ gcc -O2 -funsigned-char strcmp.c -o strcmp && ./strcmp
0 0
[EMAIL PROTECTED] tmp$

Due to above, apparently char is indeed signed by default, so what?

> And when you really *really* think about it afterwards, I think you'll go 
> "Ahh.. Linus is right". It's more than "implementation-defined": it really 
> is totally indeterminate for code like this.

The fact is that strcmp() is explicitly defined in the C standard so
that it will bring the same result no matter what the sign of "char"
type is. Therefore, it obviously can't be used to determine the sign of
"char", so from the POV of usage of strcmp(), the sign of its argument
is indeed "indeterminate". What I fail to see is how this fact could
help in proving that for any function taking "char*" argument the sign
of char is indeterminate.

Anyway, it seems that you still miss (or ignore) my point that it's not
(only) sign of "char" that makes it suspect to call functions requesting
"char*" argument with "unsigned char*" value.

-- Sergei.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-15 Thread Sergei Organov
Linus Torvalds <[EMAIL PROTECTED]> writes:
> On Tue, 13 Feb 2007, Sergei Organov wrote:
>> 
>> Sorry, what do you do with "variable 'xxx' might be used uninitialized"
>> warning when it's false? Turn it off? Annotate the source? Assign fake
>> initialization value? Change the compiler so that it does "the effort"
>> for you? Never encountered false positive from this warning?
>
> The thing is, that warning is at least easy to shut up. 
>
> You just add a fake initialization. There isn't any real downside.

Yes, there is. There are at least 2 drawbacks. Minor one is
initialization eating cycles. Major one is that if later I change the
code and there will appear a pass that by mistake uses those fake value,
I won't get the warning anymore. The latter drawback is somewhat similar
to the drawbacks of explicit type casts.

[I'd, personally, try to do code reorganization instead so that it
becomes obvious for the compiler that the variable can't be used
uninitialized. Quite often it makes the code better, at least it was my
experience so far.]

> In contrast, to shut up the idiotic pointer-sign warning, you have to add 
> a cast.

[Did I already say that I think it's wrong warning to be given in this
particular case, as the problem with the code is not in signedness?]

1. Don't try to hide the warning, at least not immediately, -- consider
   fixing the code first. Ah, sorry, you've already decided for yourself
   that the warning is idiotic, so there is no reason to try to...

2. If you add a cast, it's not in contrast, it's rather similar to fake
   initialization above as they have somewhat similar drawback.

> Now, some people seem to think that adding casts is a way of life, and 
> think that C programs always have a lot of casts.

Hopefully I'm not one of those.

[...]
> But if you have 
>
>   unsigned char *mystring;
>
>   len = strlen(mystring);
>
> then please tell me how to fix that warning without making the code 
> *worse* from a type-safety standpoint? You CANNOT. You'd have to cast it 
> explicitly (or assing it through a "void *" variable), both of which 
> actually MAKE THE TYPE-CHECKING PROBLEM WORSE!

Because instead of trying to fix the code, you insist on hiding the
warning. There are at least two actual fixes:

1. Do 'char *mystring' instead, as otherwise compiler can't figure out
   that you are going to use 'mystring' to point to zero-terminated
   array of characters.

2. Use "len = ustrlen(mystring)" if you indeed need something like C
   strings, but built from "unsigned char" elements. Similar to (1), this
   will explain your intent to those stupid compiler. Should I also
   mention that due to the definition of strlen(), the implementation of
   ustrlen() is a one-liner (thnx Andrew):

static inline size_t ustrlen(const unsigned char *s)
{
return strlen((const char *)s);
}

Overall, describe your intent to the compiler more clearly, as reading
programmer's mind is not the kind of business where compilers are
strong.

> See? The warning made no sense to begin with, and it warns about something 
> where the alternatives are worse than what it warned about.
>
> Ergo. It's a crap warning.

No, I'm still not convinced. Well, I'll try to explain my point
differently. What would you think about the following line, should you
encounter it in real code:

  len = strlen(my_tiny_integers);

I'd think that as 'my_tiny_integers' is used as an argument to strlen(),
it might be zero-terminated array of characters. But why does it have
such a name then?!  Maybe it in fact holds an array of integers where 0
is not necessarily terminating, and the intent of this code is just to
find first occurrence of 0? It's confusing and I'd probably go check the
declaration. Declaration happens to be "unsigned char *my_tiny_integers"
that at least is rather consistent with the variable name, but doesn't
resolve my initial doubts. So I need to go check other usages of this
'my_tiny_integers' variable to figure out what's actually going on
here. Does it sound as a good programming practice? I don't think so.

Now, you probably realize that in the example you gave above you've
effectively declared "mystring" to be a pointer to
"tiny_unsigned_integer". As compiler (fortunately) doesn't guess intent
from variable names, it can't give us warning at the point of
declaration, but I still think we must be thankful that it gave us a
warning (even though gcc gives a wrong kind of warning) at the point of
dubious use.

-- Sergei.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-15 Thread Sergei Organov
Linus Torvalds [EMAIL PROTECTED] writes:
 On Tue, 13 Feb 2007, Sergei Organov wrote:
 
 Sorry, what do you do with variable 'xxx' might be used uninitialized
 warning when it's false? Turn it off? Annotate the source? Assign fake
 initialization value? Change the compiler so that it does the effort
 for you? Never encountered false positive from this warning?

 The thing is, that warning is at least easy to shut up. 

 You just add a fake initialization. There isn't any real downside.

Yes, there is. There are at least 2 drawbacks. Minor one is
initialization eating cycles. Major one is that if later I change the
code and there will appear a pass that by mistake uses those fake value,
I won't get the warning anymore. The latter drawback is somewhat similar
to the drawbacks of explicit type casts.

[I'd, personally, try to do code reorganization instead so that it
becomes obvious for the compiler that the variable can't be used
uninitialized. Quite often it makes the code better, at least it was my
experience so far.]

 In contrast, to shut up the idiotic pointer-sign warning, you have to add 
 a cast.

[Did I already say that I think it's wrong warning to be given in this
particular case, as the problem with the code is not in signedness?]

1. Don't try to hide the warning, at least not immediately, -- consider
   fixing the code first. Ah, sorry, you've already decided for yourself
   that the warning is idiotic, so there is no reason to try to...

2. If you add a cast, it's not in contrast, it's rather similar to fake
   initialization above as they have somewhat similar drawback.

 Now, some people seem to think that adding casts is a way of life, and 
 think that C programs always have a lot of casts.

Hopefully I'm not one of those.

[...]
 But if you have 

   unsigned char *mystring;

   len = strlen(mystring);

 then please tell me how to fix that warning without making the code 
 *worse* from a type-safety standpoint? You CANNOT. You'd have to cast it 
 explicitly (or assing it through a void * variable), both of which 
 actually MAKE THE TYPE-CHECKING PROBLEM WORSE!

Because instead of trying to fix the code, you insist on hiding the
warning. There are at least two actual fixes:

1. Do 'char *mystring' instead, as otherwise compiler can't figure out
   that you are going to use 'mystring' to point to zero-terminated
   array of characters.

2. Use len = ustrlen(mystring) if you indeed need something like C
   strings, but built from unsigned char elements. Similar to (1), this
   will explain your intent to those stupid compiler. Should I also
   mention that due to the definition of strlen(), the implementation of
   ustrlen() is a one-liner (thnx Andrew):

static inline size_t ustrlen(const unsigned char *s)
{
return strlen((const char *)s);
}

Overall, describe your intent to the compiler more clearly, as reading
programmer's mind is not the kind of business where compilers are
strong.

 See? The warning made no sense to begin with, and it warns about something 
 where the alternatives are worse than what it warned about.

 Ergo. It's a crap warning.

No, I'm still not convinced. Well, I'll try to explain my point
differently. What would you think about the following line, should you
encounter it in real code:

  len = strlen(my_tiny_integers);

I'd think that as 'my_tiny_integers' is used as an argument to strlen(),
it might be zero-terminated array of characters. But why does it have
such a name then?!  Maybe it in fact holds an array of integers where 0
is not necessarily terminating, and the intent of this code is just to
find first occurrence of 0? It's confusing and I'd probably go check the
declaration. Declaration happens to be unsigned char *my_tiny_integers
that at least is rather consistent with the variable name, but doesn't
resolve my initial doubts. So I need to go check other usages of this
'my_tiny_integers' variable to figure out what's actually going on
here. Does it sound as a good programming practice? I don't think so.

Now, you probably realize that in the example you gave above you've
effectively declared mystring to be a pointer to
tiny_unsigned_integer. As compiler (fortunately) doesn't guess intent
from variable names, it can't give us warning at the point of
declaration, but I still think we must be thankful that it gave us a
warning (even though gcc gives a wrong kind of warning) at the point of
dubious use.

-- Sergei.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-15 Thread Sergei Organov
Linus Torvalds [EMAIL PROTECTED] writes:
 On Tue, 13 Feb 2007, Sergei Organov wrote:
[...]
 BUT (and this is a big but) within the discussion of strlen(), that is 
 no longer true. strlen() exists _outside_ of a single particular 
 implementation. As such, implementation-defined is no longer something 
 that strlen() can depend on.

 As an example of this argument, try this:

   #include string.h
   #include stdio.h

   int main(int argc, char **argv)
   {
   char a1[] = { -1, 0 }, a2[] = { 1, 0 };

   printf(%d %d\n, a1[0]  a2[0], strcmp(a1, a2)  0);
   return 0;
   }

 and *before* you compile it, try to guess what the output is.

Well, I'll try to play fair, so I didn't yet compile it. Now, strcmp()
is defined in the C standard so that its behavior doesn't depend on the
sign of char:

   The sign of a nonzero value returned by the comparison functions
memcmp, strcmp, and strncmp is determined by the sign of the
difference between the values of the first pair of characters (both
interpreted as unsigned char) that differ in the objects being

compared.

[Therefore, at least I don't need to build GCC multilib'ed on
-fsigned/unsigned-char to get consistent results even if strcmp() in
fact lives in a library, that was my first thought before I referred to
the standard ;)]

Suppose the char is signed. Then a1[0]  a2[0] (= -1  1) should be
true. On 2's-complement implementation with 8bit char, -1 converted by
strcmp() to unsigned char should be 0xFF, and 1 converted should be
1. So strcmp() should be equivalent to (0xFF  1) that is false. So
I'd expect

 1 0

result on implementation with signed char.

Now suppose the char is unsigned. Then on 2's-complement implementation
with 8bit-byte CPU, a1[0] should be 0xFF, and a2[0] should be 1. The
result from strcmp() won't change. So I'd expect

 0 0

result on implementation with unsigned char.

Now I'm going to compile it (I must admit I'm slightly afraid to get
surprising results, so I've re-read my above reasonings before
compiling):

[EMAIL PROTECTED] tmp$ cat strcmp.c
#include stdio.h
#include string.h
int main()
{
  char a1[] = { -1, 0 }, a2[] = { 1, 0 };

  printf(%d %d\n, a1[0]  a2[0], strcmp(a1, a2)  0);
  return 0;
}
[EMAIL PROTECTED] tmp$ gcc -v
Using built-in specs.
Target: i486-linux-gnu
...
gcc version 4.1.2 20061028 (prerelease) (Debian 4.1.1-19)
[EMAIL PROTECTED] tmp$ gcc -O2 strcmp.c -o strcmp  ./strcmp
1 0

 And when that confuses you,

It didn't, or did I miss something? Is char unsigned by default?

 try to compile it using gcc with the 
 -funsigned-char flag (or -fsigned-char if you started out on an 
 architecture where char was unsigned by default)

[EMAIL PROTECTED] tmp$ gcc -O2 -fsigned-char strcmp.c -o strcmp  ./strcmp
1 0
[EMAIL PROTECTED] tmp$ gcc -O2 -funsigned-char strcmp.c -o strcmp  ./strcmp
0 0
[EMAIL PROTECTED] tmp$

Due to above, apparently char is indeed signed by default, so what?

 And when you really *really* think about it afterwards, I think you'll go 
 Ahh.. Linus is right. It's more than implementation-defined: it really 
 is totally indeterminate for code like this.

The fact is that strcmp() is explicitly defined in the C standard so
that it will bring the same result no matter what the sign of char
type is. Therefore, it obviously can't be used to determine the sign of
char, so from the POV of usage of strcmp(), the sign of its argument
is indeed indeterminate. What I fail to see is how this fact could
help in proving that for any function taking char* argument the sign
of char is indeterminate.

Anyway, it seems that you still miss (or ignore) my point that it's not
(only) sign of char that makes it suspect to call functions requesting
char* argument with unsigned char* value.

-- Sergei.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-15 Thread Linus Torvalds


On Thu, 15 Feb 2007, Sergei Organov wrote:
 
 Yes, there is. There are at least 2 drawbacks. Minor one is
 initialization eating cycles. Major one is that if later I change the
 code and there will appear a pass that by mistake uses those fake value,
 I won't get the warning anymore. The latter drawback is somewhat similar
 to the drawbacks of explicit type casts.

I actually agree - it would clearly be *nicer* if gcc was accurate with 
the warning. I'm just saying that usually the patch to shut gcc up is 
quite benign.

But yes, it basically disabled the warning, and just *maybe* the warning 
was valid, and it disabled it by initializing something to the wrong 
value. 

That's why warnings with false positives are often much worse than not 
having the warning at all: you start dismissing them, and not even 
bothering to fix them properly. That's especially true if there _isn't_ 
a proper fix.

I personally much prefer a compiler that doesn't warn a lot, but *when* it 
warns, it's 100% on the money. I think that compilers that warn about 
things that may be broken are bogus. It needs to be iron-clad, with no 
question about whether the warning is appropriate!

Which is, of course, why we then shut up some gcc warnings, and which gets 
us back to the start of the discussion.. ;)

 [I'd, personally, try to do code reorganization instead so that it
 becomes obvious for the compiler that the variable can't be used
 uninitialized. Quite often it makes the code better, at least it was my
 experience so far.]

It's true that it sometimes works that way. Not always, though.

The gcc uninitialized variable thing is *usually* a good sign that the 
code wasn't straightforward for the compiler to follow, and if the 
compiler cannot follow it, then probably neither can most programmers. So 
together with the fact that it's not at least _syntactically_ ugly to shut 
up, it's not a warning I personally object to most of the time (unlike, 
say, the pointer-sign one ;)

 [Did I already say that I think it's wrong warning to be given in this
 particular case, as the problem with the code is not in signedness?]
 
 1. Don't try to hide the warning, at least not immediately, -- consider
fixing the code first. Ah, sorry, you've already decided for yourself
that the warning is idiotic, so there is no reason to try to...

This is actually something we've tried. 

The problem with that approach is that once you have tons of warnings that 
are expected, suddenly *all* warnings just become noise. Not just the 
bogus one. It really *is* a matter of: warnings should either be 100% 
solid, or they should not be there at all.

And this is not just about compiler warnings. We've added some warnings of 
our own (well, with compiler help, of course): things like the 
deprecated warnings etc. I have often ended up shutting them up, and one 
reason for that is that yes, I have literally added bugs to the kernel 
that the compiler really *did* warn about, but because there were so many 
other pointless warnings, I didn't notice!

I didn't make that up. I forget what the bug was, but it literally wasn't 
more than a couple of months ago that I passed in the wrong type to a 
function, and the compiler warned about it in big letters. It was just 
totally hidden by all the other warning crap.

 2. If you add a cast, it's not in contrast, it's rather similar to fake
initialization above as they have somewhat similar drawback.

I agree that it's unnecessary code, and in many ways exactly the same 
thing. I just happen to believe that casts tend to be a lot more dangerous 
than extraneous initializations. Casts have this nasty tendency of hiding 
*other* problems too (ie you later change the thing you cast completely, 
and now the compiler doesn't warn about a *real* bug, because the cast 
will just silently force it to a wrong type).

And yeah, that may well be a personal hangup of mine. A lot of people 
think casts in C are normal. Me, I actually consider C to be a type-safe 
language (!) but it's only type-safe if you are careful, and avoiding 
casts is one of the rules.

Others claim that C isn't type-safe at all, and I think it comes from 
them having been involved in projects where people didn't have the same 
casts are good, but only when you use them really really carefully.

  But if you have 
 
  unsigned char *mystring;
 
  len = strlen(mystring);
 
  then please tell me how to fix that warning without making the code 
  *worse* from a type-safety standpoint? You CANNOT. You'd have to cast it 
  explicitly (or assing it through a void * variable), both of which 
  actually MAKE THE TYPE-CHECKING PROBLEM WORSE!
 
 Because instead of trying to fix the code, you insist on hiding the
 warning.

No. I'm saying that I have *reasons* that I need my string to be unsigned. 
That makes your first fix totally bogus:

 There are at least two actual fixes:
 
 1. Do 'char *mystring' instead, as otherwise compiler can't figure out

Re: somebody dropped a (warning) bomb

2007-02-15 Thread Sergei Organov
Linus Torvalds [EMAIL PROTECTED] writes:
 On Thu, 15 Feb 2007, Sergei Organov wrote:

[...Skip things I agree with...]

  But if you have 
 
 unsigned char *mystring;
 
 len = strlen(mystring);
 
  then please tell me how to fix that warning without making the code 
  *worse* from a type-safety standpoint? You CANNOT. You'd have to cast it 
  explicitly (or assing it through a void * variable), both of which 
  actually MAKE THE TYPE-CHECKING PROBLEM WORSE!
 
 Because instead of trying to fix the code, you insist on hiding the
 warning.

 No. I'm saying that I have *reasons* that I need my string to be unsigned. 
 That makes your first fix totally bogus:

Somehow I still suppose that most of those warnings could be fixed by
using char* instead. Yes, I believe you, you have reasons. But still I
only heard about isxxx() that appeared to be not a reason in practice
(in the context of the kernel tree). As you didn't tell about other
reasons, my expectation is that they are not in fact very common. I'm
curious what are actual reasons besides the fact that quite a lot of
code is apparently written this way in the kernel. The latter is indeed
a very sound reason for the warning to be sucks for kernel developers,
but it might be not a reason for it to be sucks in general.

 There are at least two actual fixes:
 
 1. Do 'char *mystring' instead, as otherwise compiler can't figure out
that you are going to use 'mystring' to point to zero-terminated
array of characters.

 And the second fix is a fix, but it is, again, worse than the warning:

 2. Use len = ustrlen(mystring) if you indeed need something like C
strings

 Sure, I can do that, but the upside is what?

1. You make it explicit, in a type-safe manner, that you are sure it's
   safe to call strlen() with unsigned char* argument. We all agree
   about it. Let's write it down in the program. And we can already add
   strcmp() to the list. And, say, ustrdup() and ustrchr() returning
   unsigned char *, would be more type-safe either.

2. You make it more explicit that you indeed mean to use your own
   representation of strings that is different than what C idiomatically
   uses for strings, along with explicitly defined allowed set of
   operations on this representation.

The actual divergence of opinion seems to be that you are sure it's safe
to call any foo(char*) with unsigned char* argument, and I'm not, due
to the reasons described below. If you are right, then there is indeed
little or no point in doing this work.

 Getting rid of a warning that was bogus to begin with?

I agree that if the warning has no true positives, it sucks. The problem
is that somehow I doubt it has none. And the reasons for the doubt are:

1. C standard does explicitly say that char is different type from
   either signed char or unsigned char. There should be some reason
   for that, otherwise they'd simply write that char is a synonym for
   one of signed char or unsigned char, depending on implementation.

2. If char in fact matches one of the other two types for given
   architecture, it is definitely different from another one, so
   substituting char for the different one might be unsafe. As a
   consequence, for a portable program it might be unsafe to substitute
   char for any of signed or unsigned, as char may happen to
   differ from any of them.

While the doubts aren't resolved, I prefer to at least be careful with
things that I don't entirely understand, so I wish compiler give me a
warning about subtle semantic differences (if any).

-- Sergei.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-15 Thread Linus Torvalds


On Thu, 15 Feb 2007, Sergei Organov wrote:
 
 I agree that if the warning has no true positives, it sucks. The problem
 is that somehow I doubt it has none. And the reasons for the doubt are:

Why do you harp on no true positives?

That's a pointless thing. You can make *any* warning have true 
positives. My personal favorite is the unconditional warning:

warning: user is an idiot

and I _guarantee_ you that it has a lot of true positives.

It's the no false negatives angle you should look at.

THAT is what matters. The reason we don't see a lot of warnings about 
idiot users is not that people don't do stupid things, but that 
*sometimes* they actually don't do something stupid.

Yeah, I know, it's far-fetched, but still.

In other words, you're barking up *exactly* the wrong tree. You're looking 
at it the wrong way around.

Think of it this way: in science, a theory is proven to be bad by a single 
undeniable fact just showing that it's wrong.

The same is largely true of a warning. If the warning sometimes happens 
for code that is perfectly fine, the warning is bad.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-15 Thread Sergei Organov
Olivier Galibert [EMAIL PROTECTED] writes:

 On Tue, Feb 13, 2007 at 09:06:24PM +0300, Sergei Organov wrote:
 I agree that making strxxx() family special is not a good idea. So what
 do we do for a random foo(char*) called with an 'unsigned char*'
 argument? Silence? Hmmm... It's not immediately obvious that it's indeed
 harmless. Yet another -Wxxx option to GCC to silence this particular
 case?

 Silence would be good.  char * has a special status in C, it can be:
 - pointer to a char/to an array of chars (standard interpretation)
 - pointer to a string
 - generic pointer to memory you can read(/write)

 Check the aliasing rules if you don't believe be on the third one.

Generic pointer still reads 'void*' isn't it? If you don't believe me,
check memcpu() and friends. From the POV of aliasing rules, char*,
unsigned char*, and signed char* are all the same, as aliasing rules
mention character type, that according to the standard, is defined
like this:

  The three types char, signed char, and unsigned char are collectively
   called the character types.

 And it's *way* more often cases 2 and 3 than 1 for the simple reason
 that the signedness of char is unpredictable.

Are the first two different other than by 0-termination? If not, then
differentiate 1 and 2 by signedness doesn't make sense. If we throw
away signedness argument, then yes, 2 is more common than 1, due to
obvious reason that it's much more convenient to use 0-terminated arrays
of characters in C than plain arrays of characters.

For 3, I'd use unsigned char* (or, sometimes, maybe, signed char*) to
do actual access (due to aliasing rules), but void* as an argument
type, as arbitrary pointer is spelled void* in C.

So, it's 2 that I'd consider the most common usage for char*. Strings
of characters are idiomatically char* in C.

 As a result, a signedness warning between char * and (un)signed char *
 is 99.99% of the time stupid.

As a result of what? Of the fact that strings in C are char*? Of the
fact that one can use char* to access arbitrary memory? If the latter,
then char*, signed char*, and unsigned char* are all equal, and
you may well argue that there should be no warning between signed
char* and unsigned char* as well, as either of them could be used to
access memory.

Anyway, I've already stated that I don't think it's signedness that matters
here. Standard explicitly says char, signed char, and unsigned
char are all distinct types, so the warning should be about
incompatible pointer instead. Anyway, I even don't dispute those 99.99%
number, I just want somebody to explain me how dangerous those 0.001%
are, and isn't it exactly 0%?

[...]
 I'm afraid I don't follow. Do we have a way to say I want an int of
 indeterminate sign in C?

 Almost completely.  The rules on aliasing say you can convert pointer
 between signed and unsigned variants and the accesses will be
 unsurprising.

Only from the point of view of aliasing. Aliasing rules don't disallow
other surprises.


 The only problem is that the implicit conversion of incompatible
 pointer parameters to a function looks impossible in the draft I have.

Why do you think it's impossible according to the draft?

 Probably has been corrected in the final version.

I doubt it's impossible either in the draft or in the final version.

 In any case, having for instance unsigned int * in a prototype really
 means in the language I want a pointer to integers, and I'm probably
 going to use it them as unsigned, so beware.

 For the special case of char, since the beware version would require a
 signed or unsigned tag, it really means indeterminate.

IMHO, it would read I want a pointer to alphabetic characters, and I'm
probably going to use them as alphabetic characters, so beware. No
signedness involved.

For example, suppose we have 3 function that compare two
zero-terminated arrays:

int compare1(char *c1, char *c2);
int compare2(unsigned char *c1, unsigned char *c2);
int compare3(signed char *c1, signed char *c2);

the first one might compare arguments alphabetically, while the second
and the third compare them numerically. All of the 3 are different, so
passing wrong type of argument to either of them might be dangerous.

 C is sometimes called a high-level assembler for a reason :-)

I'm afraid that those times when it actually was, are far in the past,
for better or worse.

 The same way there doesn't seem to be a way to say I want a char of
 indeterminate sign. :( So no, strlen() doesn't actually say that, no
 matter if we like it or not. It actually says I want a char with
 implementation-defined sign.

 In this day and age it means I want a 0-terminated string.

IMHO, you've forgot to say of alphabetic characters.

 Everything else is explicitely signed char * or unsigned char *, often
 through typedefs in the signed case.

Those are for signed and unsigned tiny integers, with confusing names,
that in turn are just historical baggage.

 In fact it's implementation-defined, and 

Re: somebody dropped a (warning) bomb

2007-02-15 Thread Bodo Eggert
Sergei Organov [EMAIL PROTECTED] wrote:
 Linus Torvalds [EMAIL PROTECTED] writes:

 Exactly because char *by*definition* is indeterminate sign as far as
 something like strlen() is concerned.
 
 Thanks, I now understand that you either don't see the difference
 between indeterminate and implementation-defined in this context or
 consider it non-essential, so I think I've got your point.

If you don't code for a specific compiler with specific settings, there is
no implementation defining the signedness of char, and each part of the code
using char* will be wrong unless it handles both cases correctly.

Therefore it's either always wrong to call your char* function with char*,
unsigned char* _and_ signed char unless you can guarantee not to overflow any
of them, or it's always correct to call char* functions with any kind of these.

Off cause if you happen to code for specific compiler settings, one signedness
of char will become real and one warning will be legit. And if pigs fly, they
should wear googles to protect their eyes ...

 THE FACT IS, THAT strlen() IS DEFINED UNIVERSALLY AS TAKING char *.
 
 So just don't pass it unsigned char*. End of story.

Using signed chars for strings is wrong in most countries on earth. It was
wrong when the first IBM PC came out in 1981, and creating a compiler in
1987 defaulting to signed char is a sure sign of originating from an
isolated country and knowing nothing about this planet. Using signed chars
in comparisons is especially wrong, and casting each char to unsigned before
comparing them is likely to be forgotten. Unsigned character strings are
useless because there is no such thing as char(-23), and if these strings
weren't casted to signed inside all IO functions, they wouldn't work correctly.
Only because many programmers don't compare chars, most programs will work
outside the USA. I repeat: Thanks to using signed chars, the programs only
work /by/ /accident/! Promoting the use of signed char strings is promoting
bugs and endangering the stability of all our systems. You should stop this
bullshit now, instead of increasing the pile.

 That BY DEFINITION means that strlen() cannot care about the sign,
 because the sign IS NOT DEFINED UNIVERSALLY!

 And if you cannot accept that fact, it's your problem. Not mine.
 
 I never had problems either with strlen() or with this warning, so was
 curious why does the warning is such a problem for the kernel.

The warning is a problem because either it clutters your build log hiding
real bugs, you typecast hiding errors whenever a char* function is called,
or you disable it globally hiding real signedness bugs. Either way you take
it, it's wrong.
-- 
Funny quotes:
12. He who laughs last thinks slowest.

Friß, Spammer: [EMAIL PROTECTED] [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-15 Thread Lennart Sorensen
On Thu, Feb 15, 2007 at 07:57:05AM -0800, Linus Torvalds wrote:
 I agree that it's unnecessary code, and in many ways exactly the same 
 thing. I just happen to believe that casts tend to be a lot more dangerous 
 than extraneous initializations. Casts have this nasty tendency of hiding 
 *other* problems too (ie you later change the thing you cast completely, 
 and now the compiler doesn't warn about a *real* bug, because the cast 
 will just silently force it to a wrong type).

Well one way you could cast it and still have the compiler tell you if
the type ever changes is doing something stupid like:

char* unsigned_char_star_to_char_star(unsigned char* in) {
return (char*)in;
}

Then call your strlen with:
strlen(unsigned_char_star_to_char_star(my_unfortunate_unsigned_char_string)

If you ever changed the type the compiler would warn you again since the
convertion function doesn't accept anything other than unsigned char *
being passed in for convertion.  Seems safer than a direct cast since
then you get no type checking anymore.

I hope it doesn't come to this though.  Hopefully gcc will change it's
behaviour back or give an option of --dontcomplainaboutcharstar :)

--
Len Sorensen
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-15 Thread Rene Herman

On 02/15/2007 08:02 PM, Linus Torvalds wrote:


Think of it this way: in science, a theory is proven to be bad by a
single undeniable fact just showing that it's wrong.

The same is largely true of a warning. If the warning sometimes
happens for code that is perfectly fine, the warning is bad.


Slight difference; if a compulsory warning sometimes happens for code 
that is perfectly fine, the warning is bad. I do want to be _able_ to 
get as many warnings as a compiler can muster though.


Given char's special nature, shouldn't the conclusion of this thread 
have long been simply that gcc needs -Wno-char-pointer-sign? (with 
whatever default, as far as I'm concerned).


Rene.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-14 Thread Sergei Organov
Olivier Galibert <[EMAIL PROTECTED]> writes:
> On Tue, Feb 13, 2007 at 09:06:24PM +0300, Sergei Organov wrote:
[...]
>> May I suggest another definition for a warning being entirely sucks?
>> "The warning is entirely sucks if and only if it never has true
>> positives." In all other cases it's only more or less sucks, IMHO.
>
> That means a warning that triggers on every line saying "there may be
> a bug there" does not entirely suck?

You got me. This warning is indeed entirely sucks. My definition is
poor.

-- Sergei.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-14 Thread Sergei Organov
Olivier Galibert [EMAIL PROTECTED] writes:
 On Tue, Feb 13, 2007 at 09:06:24PM +0300, Sergei Organov wrote:
[...]
 May I suggest another definition for a warning being entirely sucks?
 The warning is entirely sucks if and only if it never has true
 positives. In all other cases it's only more or less sucks, IMHO.

 That means a warning that triggers on every line saying there may be
 a bug there does not entirely suck?

You got me. This warning is indeed entirely sucks. My definition is
poor.

-- Sergei.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-13 Thread Linus Torvalds


On Tue, 13 Feb 2007, Sergei Organov wrote:
> 
> Sorry, what do you do with "variable 'xxx' might be used uninitialized"
> warning when it's false? Turn it off? Annotate the source? Assign fake
> initialization value? Change the compiler so that it does "the effort"
> for you? Never encountered false positive from this warning?

The thing is, that warning is at least easy to shut up. 

You just add a fake initialization. There isn't any real downside.

In contrast, to shut up the idiotic pointer-sign warning, you have to add 
a cast.

Now, some people seem to think that adding casts is a way of life, and 
think that C programs always have a lot of casts. That's NOT TRUE. It's 
actually possible to avoid casts, and good C practice will do that to 
quite a high degree, because casts in C are *dangerous*. 

A language that doesn't allow arbitrary type-casting is a horrible 
language (because it doesn't allow you to "get out" of the language type 
system), and typecasts in C are really important. But they are an 
important feature that should be used with caution, and as little as 
possible, because they (by design, of course) break the type rules.

Now, since the _only_ reason for the -Wpointer-sign warning in the first 
place is to warn about breaking type rules, if the way to shut it up is to 
break them EVEN MORE, then the warnign is obviously totally broken. IT 
CAUSES PEOPLE TO WRITE WORSE CODE! Which was against the whole point of 
having the warning in the first place.

This is why certain warnings are fine. For example, the warning about

if (a=b)
..

is obviously warning about totally valid C code, but it's _still_ a fine 
warning, because it's actually very easy to make that warning go away AND 
IMPROVE THE CODE at the same time. Even if you actually meant to write the 
assignment, you can write it as

if ((a = b) != 0)
..

or 

a = b;
if (a)
..

both of which are actually more readable.

But if you have 

unsigned char *mystring;

len = strlen(mystring);

then please tell me how to fix that warning without making the code 
*worse* from a type-safety standpoint? You CANNOT. You'd have to cast it 
explicitly (or assing it through a "void *" variable), both of which 
actually MAKE THE TYPE-CHECKING PROBLEM WORSE!

See? The warning made no sense to begin with, and it warns about something 
where the alternatives are worse than what it warned about.

Ergo. It's a crap warning.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-13 Thread Olivier Galibert
On Tue, Feb 13, 2007 at 09:06:24PM +0300, Sergei Organov wrote:
> I agree that making strxxx() family special is not a good idea. So what
> do we do for a random foo(char*) called with an 'unsigned char*'
> argument? Silence? Hmmm... It's not immediately obvious that it's indeed
> harmless. Yet another -Wxxx option to GCC to silence this particular
> case?

Silence would be good.  "char *" has a special status in C, it can be:
- pointer to a char/to an array of chars (standard interpretation)
- pointer to a string
- generic pointer to memory you can read(/write)

Check the aliasing rules if you don't believe be on the third one.
And it's *way* more often cases 2 and 3 than 1 for the simple reason
that the signedness of char is unpredictable.  As a result, a
signedness warning between char * and (un)signed char * is 99.99% of
the time stupid.


> May I suggest another definition for a warning being entirely sucks?
> "The warning is entirely sucks if and only if it never has true
> positives." In all other cases it's only more or less sucks, IMHO.

That means a warning that triggers on every line saying "there may be
a bug there" does not entirely suck?


> I'm afraid I don't follow. Do we have a way to say "I want an int of
> indeterminate sign" in C?

Almost completely.  The rules on aliasing say you can convert pointer
between signed and unsigned variants and the accesses will be
unsurprising.  The only problem is that the implicit conversion of
incompatible pointer parameters to a function looks impossible in the
draft I have.  Probably has been corrected in the final version.

In any case, having for instance unsigned int * in a prototype really
means in the language "I want a pointer to integers, and I'm probably
going to use it them as unsigned, so beware".  For the special case of
char, since the beware version would require a signed or unsigned tag,
it really means indeterminate.

C is sometimes called a high-level assembler for a reason :-)


> The same way there doesn't seem to be a way
> to say "I want a char of indeterminate sign". :( So no, strlen() doesn't
> actually say that, no matter if we like it or not. It actually says "I
> want a char with implementation-defined sign".

In this day and age it means "I want a 0-terminated string".
Everything else is explicitely signed char * or unsigned char *, often
through typedefs in the signed case.


> In fact it's implementation-defined, and this may make a difference
> here. strlen(), being part of C library, could be specifically
> implemented for given architecture, and as architecture is free to
> define the sign of "char", strlen() could in theory rely on particular
> sign of "char" as defined for given architecture. [Not that I think that
> any strlen() implementation actually depends on sign.]

That would require pointers tagged in a way or another, you can't
distinguish between pointers to differently-signed versions of the
same integer type otherwise (they're required to have same size and
alignment).  You don't have that on modern architectures.


> Can we assure that no function taking 'char*' ever cares about the sign?
> I'm not sure, and I'm not a language lawyer, but if it's indeed the
> case, I'd probably agree that it might be a good idea for GCC to extend
> the C language so that function argument declared "char*" means either
> of "char*", "signed char*", or "unsigned char*" even though there is no
> precedent in the language.

It's a warning you're talking about.  That means it is _legal_ in the
language (even if maybe implementation defined, but legal still).
Otherwise it would be an error.


> BTW, the next logical step would be for "int*" argument to stop meaning
> "signed int*" and become any of "int*", "signed int*" or "unsigned
> int*". Isn't it cool to be able to declare a function that won't produce
> warning no matter what int is passed to it? ;)

No, it wouldn't be logical, because char is *special*.


> Yes, indeed. So the real problem of the C language is inconsistency
> between strxxx() and isxxx() families of functions? If so, what is 
> wrong with actually fixing the problem, say, by using wrappers over
> isxxx()? Checking... The kernel already uses isxxx() that are macros
> that do conversion to "unsigned char" themselves, and a few invocations
> of isspace() I've checked pass "char" as argument. So that's not a real
> problem for the kernel, right?

Because a cast to silence a warning silences every possible warning
even if the then-pointer turns for instance into an integer through an
unrelated change.  Think for instance about an error_t going from
const char * (error string) to int (error code) through a patch, which
happened to be passed to an utf8_to_whatever conversion function that
takes an const unsigned char * as a parameter.  Casting would hide the
impact of changing the type.


> As the isxxx() family does not seem to be a real problem, at least in
> the context of the kernel source base, I'd like to 

Re: somebody dropped a (warning) bomb

2007-02-13 Thread Jeff Garzik
On Tue, Feb 13, 2007 at 11:29:44PM +0300, Sergei Organov wrote:
> Sorry, what do you do with "variable 'xxx' might be used uninitialized"
> warning when it's false? Turn it off? Annotate the source? Assign fake
> initialization value? Change the compiler so that it does "the effort"
> for you? Never encountered false positive from this warning?

You pull git://git.kernel.org/...jgarzik/misc-2.6.git#gccbug

Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-13 Thread Rob Landley
On Tuesday 13 February 2007 2:25 pm, Linus Torvalds wrote:
> THE FACT IS, THAT "strlen()" IS DEFINED UNIVERSALLY AS TAKING "char *".
> 
> That BY DEFINITION means that "strlen()" cannot care about the sign, 
> because the sign IS NOT DEFINED UNIVERSALLY!
> 
> And if you cannot accept that fact, it's your problem. Not mine.
> 
> The warning is CRAP. End of story.

In busybox we fed the compiler -funsigned-char to make it shut up.  (And so we 
had consistent bugs between arm and x86.)

Building tcc I had to feed it -fsigned-char because -funsigned char broke 
stuff. :)

Rob
-- 
"Perfection is reached, not when there is no longer anything to add, but
when there is no longer anything to take away." - Antoine de Saint-Exupery
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-13 Thread Sergei Organov
"Pekka Enberg" <[EMAIL PROTECTED]> writes:
> On 2/13/07, Sergei Organov <[EMAIL PROTECTED]> wrote:
>> With almost any warning out there one makes more or less efforts to
>> suppress the warning where it gives false positives, isn't it?
>
> Yes, as long it's the _compiler_ that's doing the effort.
> You shouldn't need to annotate the source just to make the compiler
> shut up.

Sorry, what do you do with "variable 'xxx' might be used uninitialized"
warning when it's false? Turn it off? Annotate the source? Assign fake
initialization value? Change the compiler so that it does "the effort"
for you? Never encountered false positive from this warning?

> Once the compiler starts issuing enough false positives, it's
> time to turn off that warning completely.

Yes, I don't argue that. I said "otherwise the warning is more or less
sucks", and then it's up to programmers to decide if it's enough sucks
to be turned off. The decision depends on the importance of its true
positives then. Only if warning never has true positives it is
unconditional, total, unhelpful crap, -- that was my point.

> Therefore, the only sane strategy for a warning is to aim for zero
> false positives.

Sure. But unfortunately this in an unreachable aim in most cases.

-- Sergei.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-13 Thread Linus Torvalds


On Tue, 13 Feb 2007, Sergei Organov wrote:

> > Exactly because "char" *by*definition* is "indeterminate sign" as far as 
> > something like "strlen()" is concerned.
> 
> Thanks, I now understand that you either don't see the difference
> between "indeterminate" and "implementation-defined" in this context or
> consider it non-essential, so I think I've got your point.

The thing is, "implementation-defined" does actually matter when you can 
_depend_ on it.

For example, there's a *huge* difference between "undefined" and 
"implementation-defined". A program can actually depend on something like

char c = 0x80;

if (c < 0)
..

always having the *same* behaviour for a particular compiler (with a 
particular set of compile flags - some compilers have flags to change the 
sign behaviour).

So yes, there "implementation defined" actually has a real and worthwhile 
meaning. It is guaranteed to have _some_ semantics within the confines of 
that program, and they are defined to be consistent (again, within the 
confines of that particular program).

So I agree that "implementation defined" doesn't always mean 
"meaningless".

BUT (and this is a big but) within the discussion of "strlen()", that is 
no longer true. "strlen()" exists _outside_ of a single particular 
implementation. As such, "implementation-defined" is no longer something 
that "strlen()" can depend on.

As an example of this argument, try this:

#include 
#include 

int main(int argc, char **argv)
{
char a1[] = { -1, 0 }, a2[] = { 1, 0 };

printf("%d %d\n", a1[0] < a2[0], strcmp(a1, a2) < 0);
return 0;
}

and *before* you compile it, try to guess what the output is.

And when that confuses you, try to compile it using gcc with the 
"-funsigned-char" flag (or "-fsigned-char" if you started out on an 
architecture where char was unsigned by default)

And when you really *really* think about it afterwards, I think you'll go 
"Ahh.. Linus is right". It's more than "implementation-defined": it really 
is totally indeterminate for code like this.

(Yeah, the above isn't "strlen()", but it's an even subtler issue with 
EXACTLY THE SAME PROBLEM! And one where you can actually see the _effects_ 
of it)

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-13 Thread Sergei Organov
Linus Torvalds <[EMAIL PROTECTED]> writes:
> On Tue, 13 Feb 2007, Sergei Organov wrote:
>> >
>> >"I want a char of indeterminate sign"!
>> 
>> I'm afraid I don't follow. Do we have a way to say "I want an int of
>> indeterminate sign" in C? The same way there doesn't seem to be a way
>> to say "I want a char of indeterminate sign".
>
> You're wrong.

Sure, I knew it from the very beginning ;)

>
> Exactly because "char" *by*definition* is "indeterminate sign" as far as 
> something like "strlen()" is concerned.

Thanks, I now understand that you either don't see the difference
between "indeterminate" and "implementation-defined" in this context or
consider it non-essential, so I think I've got your point.

> "char" is _special_. Char is _different_. Char is *not* "int".

Sure.

>
>> So no, strlen() doesn't actually say that, no matter if we like it or 
>> not. It actually says "I want a char with implementation-defined sign".
>
> You're arguing some strange semantic difference in the English
> language.

Didn't I further explain what I meant exactly (that you've skipped)?
OK, never mind.

> I'm not really interested.

OK, no problem.

>
> THE FACT IS, THAT "strlen()" IS DEFINED UNIVERSALLY AS TAKING "char *".

So just don't pass it "unsigned char*". End of story.

>
> That BY DEFINITION means that "strlen()" cannot care about the sign, 
> because the sign IS NOT DEFINED UNIVERSALLY!
>
> And if you cannot accept that fact, it's your problem. Not mine.

I never had problems either with strlen() or with this warning, so was
curious why does the warning is such a problem for the kernel. I'm sorry
I took your time and haven't got an answer that I entirely agree
with. Thank you very much for your explanations anyway.

> The warning is CRAP. End of story.

Amen!

-- Sergei.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-13 Thread Pekka Enberg

On 2/13/07, Sergei Organov <[EMAIL PROTECTED]> wrote:

With almost any warning out there one makes more or less efforts to
suppress the warning where it gives false positives, isn't it?


Yes, as long it's the _compiler_ that's doing the effort. You
shouldn't need to annotate the source just to make the compiler shut
up. Once the compiler starts issuing enough false positives, it's time
to turn off that warning completely. Therefore, the only sane strategy
for a warning is to aim for zero false positives.

Pekka
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-13 Thread Linus Torvalds


On Tue, 13 Feb 2007, Sergei Organov wrote:
> >
> > "I want a char of indeterminate sign"!
> 
> I'm afraid I don't follow. Do we have a way to say "I want an int of
> indeterminate sign" in C? The same way there doesn't seem to be a way
> to say "I want a char of indeterminate sign".

You're wrong.

Exactly because "char" *by*definition* is "indeterminate sign" as far as 
something like "strlen()" is concerned.

"char" is _special_. Char is _different_. Char is *not* "int".

> So no, strlen() doesn't actually say that, no matter if we like it or 
> not. It actually says "I want a char with implementation-defined sign".

You're arguing some strange semantic difference in the English language.

I'm not really interested.

THE FACT IS, THAT "strlen()" IS DEFINED UNIVERSALLY AS TAKING "char *".

That BY DEFINITION means that "strlen()" cannot care about the sign, 
because the sign IS NOT DEFINED UNIVERSALLY!

And if you cannot accept that fact, it's your problem. Not mine.

The warning is CRAP. End of story.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-13 Thread Sergei Organov
"Pekka Enberg" <[EMAIL PROTECTED]> writes:

> On 2/13/07, Sergei Organov <[EMAIL PROTECTED]> wrote:
>> May I suggest another definition for a warning being entirely sucks?
>> "The warning is entirely sucks if and only if it never has true
>> positives." In all other cases it's only more or less sucks, IMHO.
>
> You're totally missing the point.

No, I don't. 

> False positives are not a minor annoyance, they're actively harmful as
> they hide other _useful_ warnings.

Every warning I'm aware of do have false positives. They are indeed
harmful, so one takes steps to get rid of them. If a warning had none
false positives, it should be an error, not a warning in the first
place.

> So, you really want warnings to be about things that can and
> should be fixed.

That's the definition of a "perfect" warning, that is actually called
"error".

> So you really should aim for _zero false positives_ even if you risk
> not detecting some real positives.

With almost any warning out there one makes more or less efforts to
suppress the warning where it gives false positives, isn't it? At least
that's my experience so far.

-- Sergei.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-13 Thread Pekka Enberg

On 2/13/07, Sergei Organov <[EMAIL PROTECTED]> wrote:

May I suggest another definition for a warning being entirely sucks?
"The warning is entirely sucks if and only if it never has true
positives." In all other cases it's only more or less sucks, IMHO.


You're totally missing the point. False positives are not a minor
annoyance, they're actively harmful as they hide other _useful_
warnings. So, you really want warnings to be about things that can and
should be fixed. So you really should aim for _zero false positives_
even if you risk not detecting some real positives.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-13 Thread Sergei Organov
Linus Torvalds <[EMAIL PROTECTED]> writes:
> On Mon, 12 Feb 2007, Sergei Organov wrote:
>> 
>> Why strlen() should be allowed to be called with an incompatible pointer
>> type? My point is that gcc should issue *different warning*, -- the same
>> warning it issues here:
>
> I agree that "strlen()" per se isn't different.

I agree that making strxxx() family special is not a good idea. So what
do we do for a random foo(char*) called with an 'unsigned char*'
argument? Silence? Hmmm... It's not immediately obvious that it's indeed
harmless. Yet another -Wxxx option to GCC to silence this particular
case?

> A warnign is only as good as
>  (a) the thing it warns about
>  (b) the thing you can do about it
> 
> And THAT is the fundamental problem with that *idiotic* warning. Yes, it's 
> technically correct. Yes, it's "proper C grammar". But if you can't get 
> over the hump of realizing that there is a difference between "grammar" 
> and "intelligent speech", you shouldn't be doing compilers.
> 
> So the warning sucks because:

May I suggest another definition for a warning being entirely sucks?
"The warning is entirely sucks if and only if it never has true
positives." In all other cases it's only more or less sucks, IMHO.

>  - the thing it warns about (passing "unsigned char" to something that 
>doesn't specify a sign at all!) is not something that sounds wrong in 
>the first place. Yes, it's unsigned. But no, the thing it is passed to 
>didn't specify that it wanted a "signed" thing in the first place. The 
>"strlen()" function literally says
>
>   "I want a char of indeterminate sign"!

I'm afraid I don't follow. Do we have a way to say "I want an int of
indeterminate sign" in C? The same way there doesn't seem to be a way
to say "I want a char of indeterminate sign". :( So no, strlen() doesn't
actually say that, no matter if we like it or not. It actually says "I
want a char with implementation-defined sign".

>which implies that strlen really doesn't care about the sign. The same 
>is true of *any* function that takes a "char *". Such a function 
>doesn't care, and fundamentally CANNOT care about the sign, since it's 
>not even defined!

In fact it's implementation-defined, and this may make a difference
here. strlen(), being part of C library, could be specifically
implemented for given architecture, and as architecture is free to
define the sign of "char", strlen() could in theory rely on particular
sign of "char" as defined for given architecture. [Not that I think that
any strlen() implementation actually depends on sign.]

>So the warning fails the (a) criterion. The warning isn't valid, 
>because the thing it warns about isn't a valid problem!

Can we assure that no function taking 'char*' ever cares about the sign?
I'm not sure, and I'm not a language lawyer, but if it's indeed the
case, I'd probably agree that it might be a good idea for GCC to extend
the C language so that function argument declared "char*" means either
of "char*", "signed char*", or "unsigned char*" even though there is no
precedent in the language.

BTW, the next logical step would be for "int*" argument to stop meaning
"signed int*" and become any of "int*", "signed int*" or "unsigned
int*". Isn't it cool to be able to declare a function that won't produce
warning no matter what int is passed to it? ;)

>  - it _also_ fails the (b) criterion, because quite often there is nothing 
>you can do about it. Yes, you can add a cast, but adding a cast 
>actually causes _worse_ code (but the warning is certainly gone). But 
>that makes the _other_ argument for the warning totally point-less: if 
>the reason for the warning was "bad code", then having the warning is 
>actively BAD, because the end result is actually "worse code".
>
> See? The second point is why it's important to also realize that there is 
> a lot of real and valid code that actually _does_ pass "strlen()" an 
> unsigned string.

And here we finally get to the practice...

> There are tons of reasons for that to happen: the part of the program
> that _does_ care wants to use a "unsigned char" array, because it ends
> up doing things like "isspace(array[x])", and that is not well-defined
> if you use a "char *" array.

Yes, indeed. So the real problem of the C language is inconsistency
between strxxx() and isxxx() families of functions? If so, what is 
wrong with actually fixing the problem, say, by using wrappers over
isxxx()? Checking... The kernel already uses isxxx() that are macros
that do conversion to "unsigned char" themselves, and a few invocations
of isspace() I've checked pass "char" as argument. So that's not a real
problem for the kernel, right?

> So there are lots of reasons to use "unsigned char" arrays for strings. 
> Look it up. Look up any half-way reasonable man-page for the "isspace()" 
> kind of functions, and if they don't actually explicitly say that you 
> should use unsigned 

Re: somebody dropped a (warning) bomb

2007-02-13 Thread Dick Streefland
Jan Engelhardt <[EMAIL PROTECTED]> wrote:
| Further, giving again answer to the question whether they generate signed or
| unsigned comparisons: Have you ever seen a computer which addresses memory 
with
| negative numbers? Since the answer is most likely no, signed comparisons would
| not make sense for me.

The Transputer has (had?) a signed address space:

http://maven.smith.edu/~thiebaut/transputer/chapter2/chap2-3.html

-- 
Dick Streefland    Altium BV
[EMAIL PROTECTED]   (@ @)  http://www.altium.com
oOO--(_)--OOo---

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-13 Thread Dick Streefland
Jan Engelhardt [EMAIL PROTECTED] wrote:
| Further, giving again answer to the question whether they generate signed or
| unsigned comparisons: Have you ever seen a computer which addresses memory 
with
| negative numbers? Since the answer is most likely no, signed comparisons would
| not make sense for me.

The Transputer has (had?) a signed address space:

http://maven.smith.edu/~thiebaut/transputer/chapter2/chap2-3.html

-- 
Dick Streefland    Altium BV
[EMAIL PROTECTED]   (@ @)  http://www.altium.com
oOO--(_)--OOo---

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-13 Thread Sergei Organov
Linus Torvalds [EMAIL PROTECTED] writes:
 On Mon, 12 Feb 2007, Sergei Organov wrote:
 
 Why strlen() should be allowed to be called with an incompatible pointer
 type? My point is that gcc should issue *different warning*, -- the same
 warning it issues here:

 I agree that strlen() per se isn't different.

I agree that making strxxx() family special is not a good idea. So what
do we do for a random foo(char*) called with an 'unsigned char*'
argument? Silence? Hmmm... It's not immediately obvious that it's indeed
harmless. Yet another -Wxxx option to GCC to silence this particular
case?

 A warnign is only as good as
  (a) the thing it warns about
  (b) the thing you can do about it
 
 And THAT is the fundamental problem with that *idiotic* warning. Yes, it's 
 technically correct. Yes, it's proper C grammar. But if you can't get 
 over the hump of realizing that there is a difference between grammar 
 and intelligent speech, you shouldn't be doing compilers.
 
 So the warning sucks because:

May I suggest another definition for a warning being entirely sucks?
The warning is entirely sucks if and only if it never has true
positives. In all other cases it's only more or less sucks, IMHO.

  - the thing it warns about (passing unsigned char to something that 
doesn't specify a sign at all!) is not something that sounds wrong in 
the first place. Yes, it's unsigned. But no, the thing it is passed to 
didn't specify that it wanted a signed thing in the first place. The 
strlen() function literally says

   I want a char of indeterminate sign!

I'm afraid I don't follow. Do we have a way to say I want an int of
indeterminate sign in C? The same way there doesn't seem to be a way
to say I want a char of indeterminate sign. :( So no, strlen() doesn't
actually say that, no matter if we like it or not. It actually says I
want a char with implementation-defined sign.

which implies that strlen really doesn't care about the sign. The same 
is true of *any* function that takes a char *. Such a function 
doesn't care, and fundamentally CANNOT care about the sign, since it's 
not even defined!

In fact it's implementation-defined, and this may make a difference
here. strlen(), being part of C library, could be specifically
implemented for given architecture, and as architecture is free to
define the sign of char, strlen() could in theory rely on particular
sign of char as defined for given architecture. [Not that I think that
any strlen() implementation actually depends on sign.]

So the warning fails the (a) criterion. The warning isn't valid, 
because the thing it warns about isn't a valid problem!

Can we assure that no function taking 'char*' ever cares about the sign?
I'm not sure, and I'm not a language lawyer, but if it's indeed the
case, I'd probably agree that it might be a good idea for GCC to extend
the C language so that function argument declared char* means either
of char*, signed char*, or unsigned char* even though there is no
precedent in the language.

BTW, the next logical step would be for int* argument to stop meaning
signed int* and become any of int*, signed int* or unsigned
int*. Isn't it cool to be able to declare a function that won't produce
warning no matter what int is passed to it? ;)

  - it _also_ fails the (b) criterion, because quite often there is nothing 
you can do about it. Yes, you can add a cast, but adding a cast 
actually causes _worse_ code (but the warning is certainly gone). But 
that makes the _other_ argument for the warning totally point-less: if 
the reason for the warning was bad code, then having the warning is 
actively BAD, because the end result is actually worse code.

 See? The second point is why it's important to also realize that there is 
 a lot of real and valid code that actually _does_ pass strlen() an 
 unsigned string.

And here we finally get to the practice...

 There are tons of reasons for that to happen: the part of the program
 that _does_ care wants to use a unsigned char array, because it ends
 up doing things like isspace(array[x]), and that is not well-defined
 if you use a char * array.

Yes, indeed. So the real problem of the C language is inconsistency
between strxxx() and isxxx() families of functions? If so, what is 
wrong with actually fixing the problem, say, by using wrappers over
isxxx()? Checking... The kernel already uses isxxx() that are macros
that do conversion to unsigned char themselves, and a few invocations
of isspace() I've checked pass char as argument. So that's not a real
problem for the kernel, right?

 So there are lots of reasons to use unsigned char arrays for strings. 
 Look it up. Look up any half-way reasonable man-page for the isspace() 
 kind of functions, and if they don't actually explicitly say that you 
 should use unsigned characters for it, those man-pages are crap.

Well, even C99 itself does say it. But the kernel already handles this
issue 

Re: somebody dropped a (warning) bomb

2007-02-13 Thread Pekka Enberg

On 2/13/07, Sergei Organov [EMAIL PROTECTED] wrote:

May I suggest another definition for a warning being entirely sucks?
The warning is entirely sucks if and only if it never has true
positives. In all other cases it's only more or less sucks, IMHO.


You're totally missing the point. False positives are not a minor
annoyance, they're actively harmful as they hide other _useful_
warnings. So, you really want warnings to be about things that can and
should be fixed. So you really should aim for _zero false positives_
even if you risk not detecting some real positives.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-13 Thread Sergei Organov
Pekka Enberg [EMAIL PROTECTED] writes:

 On 2/13/07, Sergei Organov [EMAIL PROTECTED] wrote:
 May I suggest another definition for a warning being entirely sucks?
 The warning is entirely sucks if and only if it never has true
 positives. In all other cases it's only more or less sucks, IMHO.

 You're totally missing the point.

No, I don't. 

 False positives are not a minor annoyance, they're actively harmful as
 they hide other _useful_ warnings.

Every warning I'm aware of do have false positives. They are indeed
harmful, so one takes steps to get rid of them. If a warning had none
false positives, it should be an error, not a warning in the first
place.

 So, you really want warnings to be about things that can and
 should be fixed.

That's the definition of a perfect warning, that is actually called
error.

 So you really should aim for _zero false positives_ even if you risk
 not detecting some real positives.

With almost any warning out there one makes more or less efforts to
suppress the warning where it gives false positives, isn't it? At least
that's my experience so far.

-- Sergei.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-13 Thread Linus Torvalds


On Tue, 13 Feb 2007, Sergei Organov wrote:
 
  I want a char of indeterminate sign!
 
 I'm afraid I don't follow. Do we have a way to say I want an int of
 indeterminate sign in C? The same way there doesn't seem to be a way
 to say I want a char of indeterminate sign.

You're wrong.

Exactly because char *by*definition* is indeterminate sign as far as 
something like strlen() is concerned.

char is _special_. Char is _different_. Char is *not* int.

 So no, strlen() doesn't actually say that, no matter if we like it or 
 not. It actually says I want a char with implementation-defined sign.

You're arguing some strange semantic difference in the English language.

I'm not really interested.

THE FACT IS, THAT strlen() IS DEFINED UNIVERSALLY AS TAKING char *.

That BY DEFINITION means that strlen() cannot care about the sign, 
because the sign IS NOT DEFINED UNIVERSALLY!

And if you cannot accept that fact, it's your problem. Not mine.

The warning is CRAP. End of story.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-13 Thread Pekka Enberg

On 2/13/07, Sergei Organov [EMAIL PROTECTED] wrote:

With almost any warning out there one makes more or less efforts to
suppress the warning where it gives false positives, isn't it?


Yes, as long it's the _compiler_ that's doing the effort. You
shouldn't need to annotate the source just to make the compiler shut
up. Once the compiler starts issuing enough false positives, it's time
to turn off that warning completely. Therefore, the only sane strategy
for a warning is to aim for zero false positives.

Pekka
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-13 Thread Sergei Organov
Linus Torvalds [EMAIL PROTECTED] writes:
 On Tue, 13 Feb 2007, Sergei Organov wrote:
 
 I want a char of indeterminate sign!
 
 I'm afraid I don't follow. Do we have a way to say I want an int of
 indeterminate sign in C? The same way there doesn't seem to be a way
 to say I want a char of indeterminate sign.

 You're wrong.

Sure, I knew it from the very beginning ;)


 Exactly because char *by*definition* is indeterminate sign as far as 
 something like strlen() is concerned.

Thanks, I now understand that you either don't see the difference
between indeterminate and implementation-defined in this context or
consider it non-essential, so I think I've got your point.

 char is _special_. Char is _different_. Char is *not* int.

Sure.


 So no, strlen() doesn't actually say that, no matter if we like it or 
 not. It actually says I want a char with implementation-defined sign.

 You're arguing some strange semantic difference in the English
 language.

Didn't I further explain what I meant exactly (that you've skipped)?
OK, never mind.

 I'm not really interested.

OK, no problem.


 THE FACT IS, THAT strlen() IS DEFINED UNIVERSALLY AS TAKING char *.

So just don't pass it unsigned char*. End of story.


 That BY DEFINITION means that strlen() cannot care about the sign, 
 because the sign IS NOT DEFINED UNIVERSALLY!

 And if you cannot accept that fact, it's your problem. Not mine.

I never had problems either with strlen() or with this warning, so was
curious why does the warning is such a problem for the kernel. I'm sorry
I took your time and haven't got an answer that I entirely agree
with. Thank you very much for your explanations anyway.

 The warning is CRAP. End of story.

Amen!

-- Sergei.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-13 Thread Linus Torvalds


On Tue, 13 Feb 2007, Sergei Organov wrote:

  Exactly because char *by*definition* is indeterminate sign as far as 
  something like strlen() is concerned.
 
 Thanks, I now understand that you either don't see the difference
 between indeterminate and implementation-defined in this context or
 consider it non-essential, so I think I've got your point.

The thing is, implementation-defined does actually matter when you can 
_depend_ on it.

For example, there's a *huge* difference between undefined and 
implementation-defined. A program can actually depend on something like

char c = 0x80;

if (c  0)
..

always having the *same* behaviour for a particular compiler (with a 
particular set of compile flags - some compilers have flags to change the 
sign behaviour).

So yes, there implementation defined actually has a real and worthwhile 
meaning. It is guaranteed to have _some_ semantics within the confines of 
that program, and they are defined to be consistent (again, within the 
confines of that particular program).

So I agree that implementation defined doesn't always mean 
meaningless.

BUT (and this is a big but) within the discussion of strlen(), that is 
no longer true. strlen() exists _outside_ of a single particular 
implementation. As such, implementation-defined is no longer something 
that strlen() can depend on.

As an example of this argument, try this:

#include string.h
#include stdio.h

int main(int argc, char **argv)
{
char a1[] = { -1, 0 }, a2[] = { 1, 0 };

printf(%d %d\n, a1[0]  a2[0], strcmp(a1, a2)  0);
return 0;
}

and *before* you compile it, try to guess what the output is.

And when that confuses you, try to compile it using gcc with the 
-funsigned-char flag (or -fsigned-char if you started out on an 
architecture where char was unsigned by default)

And when you really *really* think about it afterwards, I think you'll go 
Ahh.. Linus is right. It's more than implementation-defined: it really 
is totally indeterminate for code like this.

(Yeah, the above isn't strlen(), but it's an even subtler issue with 
EXACTLY THE SAME PROBLEM! And one where you can actually see the _effects_ 
of it)

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-13 Thread Sergei Organov
Pekka Enberg [EMAIL PROTECTED] writes:
 On 2/13/07, Sergei Organov [EMAIL PROTECTED] wrote:
 With almost any warning out there one makes more or less efforts to
 suppress the warning where it gives false positives, isn't it?

 Yes, as long it's the _compiler_ that's doing the effort.
 You shouldn't need to annotate the source just to make the compiler
 shut up.

Sorry, what do you do with variable 'xxx' might be used uninitialized
warning when it's false? Turn it off? Annotate the source? Assign fake
initialization value? Change the compiler so that it does the effort
for you? Never encountered false positive from this warning?

 Once the compiler starts issuing enough false positives, it's
 time to turn off that warning completely.

Yes, I don't argue that. I said otherwise the warning is more or less
sucks, and then it's up to programmers to decide if it's enough sucks
to be turned off. The decision depends on the importance of its true
positives then. Only if warning never has true positives it is
unconditional, total, unhelpful crap, -- that was my point.

 Therefore, the only sane strategy for a warning is to aim for zero
 false positives.

Sure. But unfortunately this in an unreachable aim in most cases.

-- Sergei.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-13 Thread Rob Landley
On Tuesday 13 February 2007 2:25 pm, Linus Torvalds wrote:
 THE FACT IS, THAT strlen() IS DEFINED UNIVERSALLY AS TAKING char *.
 
 That BY DEFINITION means that strlen() cannot care about the sign, 
 because the sign IS NOT DEFINED UNIVERSALLY!
 
 And if you cannot accept that fact, it's your problem. Not mine.
 
 The warning is CRAP. End of story.

In busybox we fed the compiler -funsigned-char to make it shut up.  (And so we 
had consistent bugs between arm and x86.)

Building tcc I had to feed it -fsigned-char because -funsigned char broke 
stuff. :)

Rob
-- 
Perfection is reached, not when there is no longer anything to add, but
when there is no longer anything to take away. - Antoine de Saint-Exupery
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-13 Thread Jeff Garzik
On Tue, Feb 13, 2007 at 11:29:44PM +0300, Sergei Organov wrote:
 Sorry, what do you do with variable 'xxx' might be used uninitialized
 warning when it's false? Turn it off? Annotate the source? Assign fake
 initialization value? Change the compiler so that it does the effort
 for you? Never encountered false positive from this warning?

You pull git://git.kernel.org/...jgarzik/misc-2.6.git#gccbug

Jeff



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-13 Thread Olivier Galibert
On Tue, Feb 13, 2007 at 09:06:24PM +0300, Sergei Organov wrote:
 I agree that making strxxx() family special is not a good idea. So what
 do we do for a random foo(char*) called with an 'unsigned char*'
 argument? Silence? Hmmm... It's not immediately obvious that it's indeed
 harmless. Yet another -Wxxx option to GCC to silence this particular
 case?

Silence would be good.  char * has a special status in C, it can be:
- pointer to a char/to an array of chars (standard interpretation)
- pointer to a string
- generic pointer to memory you can read(/write)

Check the aliasing rules if you don't believe be on the third one.
And it's *way* more often cases 2 and 3 than 1 for the simple reason
that the signedness of char is unpredictable.  As a result, a
signedness warning between char * and (un)signed char * is 99.99% of
the time stupid.


 May I suggest another definition for a warning being entirely sucks?
 The warning is entirely sucks if and only if it never has true
 positives. In all other cases it's only more or less sucks, IMHO.

That means a warning that triggers on every line saying there may be
a bug there does not entirely suck?


 I'm afraid I don't follow. Do we have a way to say I want an int of
 indeterminate sign in C?

Almost completely.  The rules on aliasing say you can convert pointer
between signed and unsigned variants and the accesses will be
unsurprising.  The only problem is that the implicit conversion of
incompatible pointer parameters to a function looks impossible in the
draft I have.  Probably has been corrected in the final version.

In any case, having for instance unsigned int * in a prototype really
means in the language I want a pointer to integers, and I'm probably
going to use it them as unsigned, so beware.  For the special case of
char, since the beware version would require a signed or unsigned tag,
it really means indeterminate.

C is sometimes called a high-level assembler for a reason :-)


 The same way there doesn't seem to be a way
 to say I want a char of indeterminate sign. :( So no, strlen() doesn't
 actually say that, no matter if we like it or not. It actually says I
 want a char with implementation-defined sign.

In this day and age it means I want a 0-terminated string.
Everything else is explicitely signed char * or unsigned char *, often
through typedefs in the signed case.


 In fact it's implementation-defined, and this may make a difference
 here. strlen(), being part of C library, could be specifically
 implemented for given architecture, and as architecture is free to
 define the sign of char, strlen() could in theory rely on particular
 sign of char as defined for given architecture. [Not that I think that
 any strlen() implementation actually depends on sign.]

That would require pointers tagged in a way or another, you can't
distinguish between pointers to differently-signed versions of the
same integer type otherwise (they're required to have same size and
alignment).  You don't have that on modern architectures.


 Can we assure that no function taking 'char*' ever cares about the sign?
 I'm not sure, and I'm not a language lawyer, but if it's indeed the
 case, I'd probably agree that it might be a good idea for GCC to extend
 the C language so that function argument declared char* means either
 of char*, signed char*, or unsigned char* even though there is no
 precedent in the language.

It's a warning you're talking about.  That means it is _legal_ in the
language (even if maybe implementation defined, but legal still).
Otherwise it would be an error.


 BTW, the next logical step would be for int* argument to stop meaning
 signed int* and become any of int*, signed int* or unsigned
 int*. Isn't it cool to be able to declare a function that won't produce
 warning no matter what int is passed to it? ;)

No, it wouldn't be logical, because char is *special*.


 Yes, indeed. So the real problem of the C language is inconsistency
 between strxxx() and isxxx() families of functions? If so, what is 
 wrong with actually fixing the problem, say, by using wrappers over
 isxxx()? Checking... The kernel already uses isxxx() that are macros
 that do conversion to unsigned char themselves, and a few invocations
 of isspace() I've checked pass char as argument. So that's not a real
 problem for the kernel, right?

Because a cast to silence a warning silences every possible warning
even if the then-pointer turns for instance into an integer through an
unrelated change.  Think for instance about an error_t going from
const char * (error string) to int (error code) through a patch, which
happened to be passed to an utf8_to_whatever conversion function that
takes an const unsigned char * as a parameter.  Casting would hide the
impact of changing the type.


 As the isxxx() family does not seem to be a real problem, at least in
 the context of the kernel source base, I'd like to learn other reasons
 to use unsigned char for doing strings either in general or
 

Re: somebody dropped a (warning) bomb

2007-02-13 Thread Linus Torvalds


On Tue, 13 Feb 2007, Sergei Organov wrote:
 
 Sorry, what do you do with variable 'xxx' might be used uninitialized
 warning when it's false? Turn it off? Annotate the source? Assign fake
 initialization value? Change the compiler so that it does the effort
 for you? Never encountered false positive from this warning?

The thing is, that warning is at least easy to shut up. 

You just add a fake initialization. There isn't any real downside.

In contrast, to shut up the idiotic pointer-sign warning, you have to add 
a cast.

Now, some people seem to think that adding casts is a way of life, and 
think that C programs always have a lot of casts. That's NOT TRUE. It's 
actually possible to avoid casts, and good C practice will do that to 
quite a high degree, because casts in C are *dangerous*. 

A language that doesn't allow arbitrary type-casting is a horrible 
language (because it doesn't allow you to get out of the language type 
system), and typecasts in C are really important. But they are an 
important feature that should be used with caution, and as little as 
possible, because they (by design, of course) break the type rules.

Now, since the _only_ reason for the -Wpointer-sign warning in the first 
place is to warn about breaking type rules, if the way to shut it up is to 
break them EVEN MORE, then the warnign is obviously totally broken. IT 
CAUSES PEOPLE TO WRITE WORSE CODE! Which was against the whole point of 
having the warning in the first place.

This is why certain warnings are fine. For example, the warning about

if (a=b)
..

is obviously warning about totally valid C code, but it's _still_ a fine 
warning, because it's actually very easy to make that warning go away AND 
IMPROVE THE CODE at the same time. Even if you actually meant to write the 
assignment, you can write it as

if ((a = b) != 0)
..

or 

a = b;
if (a)
..

both of which are actually more readable.

But if you have 

unsigned char *mystring;

len = strlen(mystring);

then please tell me how to fix that warning without making the code 
*worse* from a type-safety standpoint? You CANNOT. You'd have to cast it 
explicitly (or assing it through a void * variable), both of which 
actually MAKE THE TYPE-CHECKING PROBLEM WORSE!

See? The warning made no sense to begin with, and it warns about something 
where the alternatives are worse than what it warned about.

Ergo. It's a crap warning.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-12 Thread linux-os \(Dick Johnson\)

On Fri, 9 Feb 2007, Jan Engelhardt wrote:

>
> On Feb 9 2007 15:29, linux-os (Dick Johnson) wrote:
>>>
>>> I was referring to "absolute memory", not the offset magic that assembler
>>> allows. After all, (reg+relativeOffset) will yield an absolute address.
>>> What I was out at: for machines that have more than 2 GB of memory, you
>>> don't call the address that is given by 0x8000U actually "byte
>>> -2147483648", but "byte 2147483648".
>>
>> Don't make any large bets on that!
>>
>> char foo()
>> {
>>volatile char *p = (char *)0x8000;
>>return *p;
>> }
>> Optimized
>>  .file   "zzz.c"
>>  .text
>>  .p2align 2,,3
>> .globl foo
>>  .type   foo, @function
>> foo:
>>  pushl   %ebp
>>  movb-2147483648, %al
>>  movl%esp, %ebp
>>  movsbl  %al,%eax
>>  leave
>>  ret
>>  .size   foo, .-foo
>>  .section.note.GNU-stack,"",@progbits
>>  .ident  "GCC: (GNU) 3.3.3 20040412 (Red Hat Linux 3.3.3-7)"
>
>  :
>   0:   55  push   %ebp
>   1:   0f b6 05 00 00 00 80movzbl 0x8000,%eax
>   8:   89 e5   mov%esp,%ebp
>   a:   5d  pop%ebp
>   b:   0f be c0movsbl %al,%eax
>   e:   c3  ret
>
> You do know that there is a bijection between the set of signed [32bit]
> integers and unsigned [32bit] integers, don't you?
> For the CPU, it's just bits. Being signed or unsigned is not important
> when just accessing memory.

I would have normally just let that comment go, but it is symptomatic
of a complete misunderstanding of how many (most) processors address
memory. This may come about because modern compilers tend to isolate
programmers from the underlying mechanisms. There isn't a 1:1
correspondence between a signed displacement and an unsigned one.
Signed displacements are mandatory with processors that provide
protection. Here is the reason: assume that you could execute code
anywhere (a flat module) and you didn't have 'C' complaining about
NULL pointers. If you were to execute code at memory location 0,
that told the processor to jump to location -1 in absolute memory, you
need to have the processor generate a bounds error trap, not to wrap
to offset 0x. The same thing is true if code is executing
at or near offset 0x. If it jumps to a location that would wrap
past 0x, one needs to generate a trap as well. Also, except for
the "FAR" jumps and FAR calls, where the segment register is part of
the operand, all programed jumps or calls are relative, i.e., based
upon the current program-counter's value. This means that the program-
counter will have a value added to, or subtracted from, its current
value to control program flow. For such addition to work either as
an addition of signed numbers or as a relative displacement, the
math must use signed (twos compliment) arithmetic.

If you look at the opcode output from the compiler, when performing
conditional jumps, you will note that the jump usually doesn't
include a 32-bit address. Instead, these are mostly short jumps, using
an 8-bit signed displacement (-128 to +127 bytes). Other conditional
jumps use 32-bit signed displacements.

The exact same addressing mechanism exists for memory read and write
accesses. They are all signed displacements.

> It will, when a comparison is involved, but
> that was not the point here. void* comparisons are unsigned. Period.
> Because a compiler doing signed comparisons will "map" the memory [from
> 2 GB to 4 GB] as part of the signed comparison before the memory [from 0
> GB to 2 GB], which collides with - let's call it - "the world view".
>
> Jan
> -- 
> ft: http://freshmeat.net/p/chaostables/
>

Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.24 on an i686 machine (5592.61 BogoMips).
New book: http://www.AbominableFirebug.com/
_



The information transmitted in this message is confidential and may be 
privileged.  Any review, retransmission, dissemination, or other use of this 
information by persons or entities other than the intended recipient is 
prohibited.  If you are not the intended recipient, please notify Analogic 
Corporation immediately - by replying to this message or by sending an email to 
[EMAIL PROTECTED] - and destroy all copies of this information, including any 
attachments, without reading or disclosing them.

Thank you.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-12 Thread Linus Torvalds


On Mon, 12 Feb 2007, Sergei Organov wrote:
> 
> Why strlen() should be allowed to be called with an incompatible pointer
> type? My point is that gcc should issue *different warning*, -- the same
> warning it issues here:

I agree that "strlen()" per se isn't different.

The issue is not that the warning isn't "technically correct". IT IS. 

Nobody should ever argue that the warning isn't "correct". I hope people 
didn't think I argued that.

I've argued that the warning is STUPID. That's a totally different thing.

I can say totally idiotic things in perfectly reasonable English grammar 
and spelling. Does that make the things I say "good"? No.

The same is true of this gcc warning. It's technically perfectly 
reasonable both in English grammar and spelling (well, as far as 
any compiler warning ever is) _and_ in "C grammar and spelling" too.

But being grammatically correct does not make it "smart". IT IS STILL 
STUPID.

Can people not see the difference between "grammatically correct" and 
"intelligent"? People on the internet seem to have often acquired the 
understanding that "bad grammar and spelling" => "stupid", and yes, there 
is definitely some kind of correlation there. But as any logician and 
matematician hopefully knows, "a => b" does NOT imply "!a => !b".

Some people think that "warnings are always good". HELL NO!

A warnign is only as good as
 (a) the thing it warns about
 (b) the thing you can do about it

And THAT is the fundamental problem with that *idiotic* warning. Yes, it's 
technically correct. Yes, it's "proper C grammar". But if you can't get 
over the hump of realizing that there is a difference between "grammar" 
and "intelligent speech", you shouldn't be doing compilers.

So the warning sucks because:

 - the thing it warns about (passing "unsigned char" to something that 
   doesn't specify a sign at all!) is not something that sounds wrong in 
   the first place. Yes, it's unsigned. But no, the thing it is passed to 
   didn't specify that it wanted a "signed" thing in the first place. The 
   "strlen()" function literally says

"I want a char of indeterminate sign"!

   which implies that strlen really doesn't care about the sign. The same 
   is true of *any* function that takes a "char *". Such a function 
   doesn't care, and fundamentally CANNOT care about the sign, since it's 
   not even defined!

   So the warning fails the (a) criterion. The warning isn't valid, 
   because the thing it warns about isn't a valid problem!

 - it _also_ fails the (b) criterion, because quite often there is nothing 
   you can do about it. Yes, you can add a cast, but adding a cast 
   actually causes _worse_ code (but the warning is certainly gone). But 
   that makes the _other_ argument for the warning totally point-less: if 
   the reason for the warning was "bad code", then having the warning is 
   actively BAD, because the end result is actually "worse code".

See? The second point is why it's important to also realize that there is 
a lot of real and valid code that actually _does_ pass "strlen()" an 
unsigned string. There are tons of reasons for that to happen: the part of 
the program that _does_ care wants to use a "unsigned char" array, because 
it ends up doing things like "isspace(array[x])", and that is not 
well-defined if you use a "char *" array.

So there are lots of reasons to use "unsigned char" arrays for strings. 
Look it up. Look up any half-way reasonable man-page for the "isspace()" 
kind of functions, and if they don't actually explicitly say that you 
should use unsigned characters for it, those man-pages are crap. Because 
those functions really *are* defined in "int", but it's the same kind of 
namespace that "getchar()" works in (ie "unsigned char" + EOF, where EOF 
_usually_ is -1, although other values are certainly technically legal 
too).

So:

 - in practice, a lot of "good programming" uses "unsigned char" pointers 
   for doing strings. There are LOTS of reasons for that, but "isspace()" 
   and friends is the most obvious one.

 - if you can't call "strlen()" on your strings without the compiler 
   warning, there's two choices: the compiler warning is CRAP, or your 
   program is bad. But as I just showed you, "unsigned char *" is actually 
   often the *right* thing to use for string work, so it clearly wasn't 
   the program that was bad.

So *please* understand:

 - yes, the warning is "correct" from a C grammatical standpoint

 - the warnign is STILL CRAP, because grammar isn't the only thing about a 
   computer language. Sane usage is MUCH MORE important than any grammar. 

Thus ends the sacred teachings of Linus "always right" Torvalds. Go and 
ponder these words, and please send me all your money (certified checks 
only, please - sending small unmarked bills is against USPS rules) to show 
your support of the holy church of good taste.

Linus
-
To unsubscribe from this list: send the line "unsubscribe 

Re: somebody dropped a (warning) bomb

2007-02-12 Thread Sergei Organov
Linus Torvalds <[EMAIL PROTECTED]> writes:
> On Fri, 9 Feb 2007, Sergei Organov wrote:
>> 
>> As far as I can read the C99 standard, the "char", "signed char", and
>> "unsigned char", are all different types:
>
> Indeed. Search for "pseudo-unsigned", and you'll see more. There are 
> actually cases where "char" can act differently from _both_ "unsigned 
> char" and "signed char".
>
>> If so, then the code above is broken no matter what representation of
>> "char" is chosen for given arch, as strcmp() takes "char*" arguments,
>> that are not compatible with either "signed char*" or "unsigned char*".
>
> ..and my argument is that a warning which doesn't allow you to call 
> "strlen()" on a "unsigned char" array without triggering is a bogus 
> warning, and must be removed.

Why strlen() should be allowed to be called with an incompatible pointer
type? My point is that gcc should issue *different warning*, -- the same
warning it issues here:

$ cat incompat.c
void foo(int *p);
void boo(long *p) { foo(p); }
$ gcc -W -Wall -c incompat.c
incompat.c:2: warning: passing argument 1 of 'foo' from incompatible pointer 
type

Calling strlen(char*) with "unsigned char*" argument does pass argument
of incompatible pointer type due to the fact that in C "char" and
"unsigned char" are two incompatible types, and it has nothing to do
with signedness.

[...]

-- Sergei.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-12 Thread Sergei Organov
Linus Torvalds [EMAIL PROTECTED] writes:
 On Fri, 9 Feb 2007, Sergei Organov wrote:
 
 As far as I can read the C99 standard, the char, signed char, and
 unsigned char, are all different types:

 Indeed. Search for pseudo-unsigned, and you'll see more. There are 
 actually cases where char can act differently from _both_ unsigned 
 char and signed char.

 If so, then the code above is broken no matter what representation of
 char is chosen for given arch, as strcmp() takes char* arguments,
 that are not compatible with either signed char* or unsigned char*.

 ..and my argument is that a warning which doesn't allow you to call 
 strlen() on a unsigned char array without triggering is a bogus 
 warning, and must be removed.

Why strlen() should be allowed to be called with an incompatible pointer
type? My point is that gcc should issue *different warning*, -- the same
warning it issues here:

$ cat incompat.c
void foo(int *p);
void boo(long *p) { foo(p); }
$ gcc -W -Wall -c incompat.c
incompat.c:2: warning: passing argument 1 of 'foo' from incompatible pointer 
type

Calling strlen(char*) with unsigned char* argument does pass argument
of incompatible pointer type due to the fact that in C char and
unsigned char are two incompatible types, and it has nothing to do
with signedness.

[...]

-- Sergei.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-12 Thread Linus Torvalds


On Mon, 12 Feb 2007, Sergei Organov wrote:
 
 Why strlen() should be allowed to be called with an incompatible pointer
 type? My point is that gcc should issue *different warning*, -- the same
 warning it issues here:

I agree that strlen() per se isn't different.

The issue is not that the warning isn't technically correct. IT IS. 

Nobody should ever argue that the warning isn't correct. I hope people 
didn't think I argued that.

I've argued that the warning is STUPID. That's a totally different thing.

I can say totally idiotic things in perfectly reasonable English grammar 
and spelling. Does that make the things I say good? No.

The same is true of this gcc warning. It's technically perfectly 
reasonable both in English grammar and spelling (well, as far as 
any compiler warning ever is) _and_ in C grammar and spelling too.

But being grammatically correct does not make it smart. IT IS STILL 
STUPID.

Can people not see the difference between grammatically correct and 
intelligent? People on the internet seem to have often acquired the 
understanding that bad grammar and spelling = stupid, and yes, there 
is definitely some kind of correlation there. But as any logician and 
matematician hopefully knows, a = b does NOT imply !a = !b.

Some people think that warnings are always good. HELL NO!

A warnign is only as good as
 (a) the thing it warns about
 (b) the thing you can do about it

And THAT is the fundamental problem with that *idiotic* warning. Yes, it's 
technically correct. Yes, it's proper C grammar. But if you can't get 
over the hump of realizing that there is a difference between grammar 
and intelligent speech, you shouldn't be doing compilers.

So the warning sucks because:

 - the thing it warns about (passing unsigned char to something that 
   doesn't specify a sign at all!) is not something that sounds wrong in 
   the first place. Yes, it's unsigned. But no, the thing it is passed to 
   didn't specify that it wanted a signed thing in the first place. The 
   strlen() function literally says

I want a char of indeterminate sign!

   which implies that strlen really doesn't care about the sign. The same 
   is true of *any* function that takes a char *. Such a function 
   doesn't care, and fundamentally CANNOT care about the sign, since it's 
   not even defined!

   So the warning fails the (a) criterion. The warning isn't valid, 
   because the thing it warns about isn't a valid problem!

 - it _also_ fails the (b) criterion, because quite often there is nothing 
   you can do about it. Yes, you can add a cast, but adding a cast 
   actually causes _worse_ code (but the warning is certainly gone). But 
   that makes the _other_ argument for the warning totally point-less: if 
   the reason for the warning was bad code, then having the warning is 
   actively BAD, because the end result is actually worse code.

See? The second point is why it's important to also realize that there is 
a lot of real and valid code that actually _does_ pass strlen() an 
unsigned string. There are tons of reasons for that to happen: the part of 
the program that _does_ care wants to use a unsigned char array, because 
it ends up doing things like isspace(array[x]), and that is not 
well-defined if you use a char * array.

So there are lots of reasons to use unsigned char arrays for strings. 
Look it up. Look up any half-way reasonable man-page for the isspace() 
kind of functions, and if they don't actually explicitly say that you 
should use unsigned characters for it, those man-pages are crap. Because 
those functions really *are* defined in int, but it's the same kind of 
namespace that getchar() works in (ie unsigned char + EOF, where EOF 
_usually_ is -1, although other values are certainly technically legal 
too).

So:

 - in practice, a lot of good programming uses unsigned char pointers 
   for doing strings. There are LOTS of reasons for that, but isspace() 
   and friends is the most obvious one.

 - if you can't call strlen() on your strings without the compiler 
   warning, there's two choices: the compiler warning is CRAP, or your 
   program is bad. But as I just showed you, unsigned char * is actually 
   often the *right* thing to use for string work, so it clearly wasn't 
   the program that was bad.

So *please* understand:

 - yes, the warning is correct from a C grammatical standpoint

 - the warnign is STILL CRAP, because grammar isn't the only thing about a 
   computer language. Sane usage is MUCH MORE important than any grammar. 

Thus ends the sacred teachings of Linus always right Torvalds. Go and 
ponder these words, and please send me all your money (certified checks 
only, please - sending small unmarked bills is against USPS rules) to show 
your support of the holy church of good taste.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  

Re: somebody dropped a (warning) bomb

2007-02-12 Thread linux-os \(Dick Johnson\)

On Fri, 9 Feb 2007, Jan Engelhardt wrote:


 On Feb 9 2007 15:29, linux-os (Dick Johnson) wrote:

 I was referring to absolute memory, not the offset magic that assembler
 allows. After all, (reg+relativeOffset) will yield an absolute address.
 What I was out at: for machines that have more than 2 GB of memory, you
 don't call the address that is given by 0x8000U actually byte
 -2147483648, but byte 2147483648.

 Don't make any large bets on that!

 char foo()
 {
volatile char *p = (char *)0x8000;
return *p;
 }
 Optimized
  .file   zzz.c
  .text
  .p2align 2,,3
 .globl foo
  .type   foo, @function
 foo:
  pushl   %ebp
  movb-2147483648, %al
  movl%esp, %ebp
  movsbl  %al,%eax
  leave
  ret
  .size   foo, .-foo
  .section.note.GNU-stack,,@progbits
  .ident  GCC: (GNU) 3.3.3 20040412 (Red Hat Linux 3.3.3-7)

  foo:
   0:   55  push   %ebp
   1:   0f b6 05 00 00 00 80movzbl 0x8000,%eax
   8:   89 e5   mov%esp,%ebp
   a:   5d  pop%ebp
   b:   0f be c0movsbl %al,%eax
   e:   c3  ret

 You do know that there is a bijection between the set of signed [32bit]
 integers and unsigned [32bit] integers, don't you?
 For the CPU, it's just bits. Being signed or unsigned is not important
 when just accessing memory.

I would have normally just let that comment go, but it is symptomatic
of a complete misunderstanding of how many (most) processors address
memory. This may come about because modern compilers tend to isolate
programmers from the underlying mechanisms. There isn't a 1:1
correspondence between a signed displacement and an unsigned one.
Signed displacements are mandatory with processors that provide
protection. Here is the reason: assume that you could execute code
anywhere (a flat module) and you didn't have 'C' complaining about
NULL pointers. If you were to execute code at memory location 0,
that told the processor to jump to location -1 in absolute memory, you
need to have the processor generate a bounds error trap, not to wrap
to offset 0x. The same thing is true if code is executing
at or near offset 0x. If it jumps to a location that would wrap
past 0x, one needs to generate a trap as well. Also, except for
the FAR jumps and FAR calls, where the segment register is part of
the operand, all programed jumps or calls are relative, i.e., based
upon the current program-counter's value. This means that the program-
counter will have a value added to, or subtracted from, its current
value to control program flow. For such addition to work either as
an addition of signed numbers or as a relative displacement, the
math must use signed (twos compliment) arithmetic.

If you look at the opcode output from the compiler, when performing
conditional jumps, you will note that the jump usually doesn't
include a 32-bit address. Instead, these are mostly short jumps, using
an 8-bit signed displacement (-128 to +127 bytes). Other conditional
jumps use 32-bit signed displacements.

The exact same addressing mechanism exists for memory read and write
accesses. They are all signed displacements.

 It will, when a comparison is involved, but
 that was not the point here. void* comparisons are unsigned. Period.
 Because a compiler doing signed comparisons will map the memory [from
 2 GB to 4 GB] as part of the signed comparison before the memory [from 0
 GB to 2 GB], which collides with - let's call it - the world view.

 Jan
 -- 
 ft: http://freshmeat.net/p/chaostables/


Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.24 on an i686 machine (5592.61 BogoMips).
New book: http://www.AbominableFirebug.com/
_



The information transmitted in this message is confidential and may be 
privileged.  Any review, retransmission, dissemination, or other use of this 
information by persons or entities other than the intended recipient is 
prohibited.  If you are not the intended recipient, please notify Analogic 
Corporation immediately - by replying to this message or by sending an email to 
[EMAIL PROTECTED] - and destroy all copies of this information, including any 
attachments, without reading or disclosing them.

Thank you.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-09 Thread Martin Mares
Hello!

> void* comparisons are unsigned. Period.

As far as the C standard is concerned, there is no relationship between
comparison on pointers and comparison of their values casted to uintptr_t.
The address space needn't be linear and on some machines it isn't. So
speaking about signedness of pointer comparisons doesn't make sense,
except for concrete implementations.

Have a nice fortnight
-- 
Martin `MJ' Mares  <[EMAIL PROTECTED]>   
http://mj.ucw.cz/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth
Top ten reasons to procrastinate: 1.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-09 Thread Jan Engelhardt

On Feb 9 2007 15:29, linux-os (Dick Johnson) wrote:
>>
>> I was referring to "absolute memory", not the offset magic that assembler
>> allows. After all, (reg+relativeOffset) will yield an absolute address.
>> What I was out at: for machines that have more than 2 GB of memory, you
>> don't call the address that is given by 0x8000U actually "byte
>> -2147483648", but "byte 2147483648".
>
>Don't make any large bets on that!
>
>char foo()
>{
>volatile char *p = (char *)0x8000;
>return *p;
>}
>Optimized
>   .file   "zzz.c"
>   .text
>   .p2align 2,,3
>.globl foo
>   .type   foo, @function
>foo:
>   pushl   %ebp
>   movb-2147483648, %al
>   movl%esp, %ebp
>   movsbl  %al,%eax
>   leave
>   ret
>   .size   foo, .-foo
>   .section.note.GNU-stack,"",@progbits
>   .ident  "GCC: (GNU) 3.3.3 20040412 (Red Hat Linux 3.3.3-7)"

 :
   0:   55  push   %ebp
   1:   0f b6 05 00 00 00 80movzbl 0x8000,%eax
   8:   89 e5   mov%esp,%ebp
   a:   5d  pop%ebp
   b:   0f be c0movsbl %al,%eax
   e:   c3  ret

You do know that there is a bijection between the set of signed [32bit] 
integers and unsigned [32bit] integers, don't you?
For the CPU, it's just bits. Being signed or unsigned is not important 
when just accessing memory. It will, when a comparison is involved, but 
that was not the point here. void* comparisons are unsigned. Period.
Because a compiler doing signed comparisons will "map" the memory [from 
2 GB to 4 GB] as part of the signed comparison before the memory [from 0 
GB to 2 GB], which collides with - let's call it - "the world view".


Jan
-- 
ft: http://freshmeat.net/p/chaostables/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-09 Thread linux-os \(Dick Johnson\)

On Fri, 9 Feb 2007, Jan Engelhardt wrote:

>
> On Feb 9 2007 08:16, linux-os (Dick Johnson) wrote:
>> On Fri, 9 Feb 2007, Jan Engelhardt wrote:
>>> On Feb 8 2007 16:42, Linus Torvalds wrote:

>>> Further, giving again answer to the question whether they generate
>>> signed or unsigned comparisons: Have you ever seen a computer which
>>> addresses memory with negative numbers? Since the answer is most likely
>>> no, signed comparisons would
>>
>> Yes, most all do. Indexed memory operands are signed displacements. See
>> page 2-16, Intel486 Microprocessor Programmer's Reference Manual. This
>
> I was referring to "absolute memory", not the offset magic that assembler
> allows. After all, (reg+relativeOffset) will yield an absolute address.
> What I was out at: for machines that have more than 2 GB of memory, you
> don't call the address that is given by 0x8000U actually "byte
> -2147483648", but "byte 2147483648".

Don't make any large bets on that!

char foo()
{
volatile char *p = (char *)0x8000;
return *p;
}
Optimized
.file   "zzz.c"
.text
.p2align 2,,3
.globl foo
.type   foo, @function
foo:
pushl   %ebp
movb-2147483648, %al
movl%esp, %ebp
movsbl  %al,%eax
leave
ret
.size   foo, .-foo
.section.note.GNU-stack,"",@progbits
.ident  "GCC: (GNU) 3.3.3 20040412 (Red Hat Linux 3.3.3-7)"

Not optimized...

.file   "zzz.c"
.text
.globl foo
.type   foo, @function
foo:
pushl   %ebp
movl%esp, %ebp
subl$4, %esp
movl$-2147483648, -4(%ebp)
movl-4(%ebp), %eax
movb(%eax), %al
movsbl  %al,%eax
leave
ret
.size   foo, .-foo
.section.note.GNU-stack,"",@progbits
.ident  "GCC: (GNU) 3.3.3 20040412 (Red Hat Linux 3.3.3-7)"

... Still negative numbers!

>
>> gets hidden by higher-level languages such as 'C'. It is also non-obvious
>> when using the AT (GNU) assembly language. However, the Intel documentation
>> shows it clearly:
>>  MOV AL,  [EBX+0x]   ; -1
>
> It can be debated what exactly this is... negative offset or "using
> overflow to get where we want".
>

No debate at all, and no overflow. The construction is even used in
'C' to optimize memory access while unrolling loops:


/*-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
/*
  *   This converts an array of integers to an array of double floats.
  */
void int2float(double *fp, int *ip, size_t len)
{
 while(len >= 0x10)
 {
 fp  += 0x10;
 ip  += 0x10;
 len -= 0x10;
 fp[-0x10] = (double) ip[-0x10];
 fp[-0x0f] = (double) ip[-0x0f];
 fp[-0x0e] = (double) ip[-0x0e];
 fp[-0x0d] = (double) ip[-0x0d];
 fp[-0x0c] = (double) ip[-0x0c];
 fp[-0x0b] = (double) ip[-0x0b];
 fp[-0x0a] = (double) ip[-0x0a];
 fp[-0x09] = (double) ip[-0x09];
 fp[-0x08] = (double) ip[-0x08];
 fp[-0x07] = (double) ip[-0x07];
 fp[-0x06] = (double) ip[-0x06];
 fp[-0x05] = (double) ip[-0x05];
 fp[-0x04] = (double) ip[-0x04];
 fp[-0x03] = (double) ip[-0x03];
 fp[-0x02] = (double) ip[-0x02];
 fp[-0x01] = (double) ip[-0x01];
 }
 while(len--)
*fp++ = (double) *ip++;
}

This makes certain that each memory access uses the same type
and the address-calculation for the next memory access can be done
in parallel with the current fetch. This is one of the methods
used to save a few machine cycles. Doesn't mean much until you
end up with large arrays such as used in image processing. Then,
it may be the difference that makes or breaks the project!

>
> Jan
> -- 
> ft: http://freshmeat.net/p/chaostables/
>

Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.24 on an i686 machine (5592.61 BogoMips).
New book: http://www.AbominableFirebug.com/
_



The information transmitted in this message is confidential and may be 
privileged.  Any review, retransmission, dissemination, or other use of this 
information by persons or entities other than the intended recipient is 
prohibited.  If you are not the intended recipient, please notify Analogic 
Corporation immediately - by replying to this message or by sending an email to 
[EMAIL PROTECTED] - and destroy all copies of this information, including any 
attachments, without reading or disclosing them.

Thank you.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-09 Thread Pete Zaitcev
On Thu, 8 Feb 2007 22:27:43 -0500, "D. Hazelton" <[EMAIL PROTECTED]> wrote:

> So almost all the rules around the signs of types are because of a single, 
> historical machine. Hence the rules about "char" being unsigned by default 
> and "int" being signed by default are because of the nature of the PDP-11. 
>[...]
> Now: There is no reason for the behavior that came from the nature of the 
> PDP11 to have survived, but because it was in "The White Book" it made it 
> through the ANSI standardization process.
> 
> Now that this history lesson is over...

... we can remember that char is signed on PDP-11. Mvu hah hah hah ha,
only serious. Check how movb works.

-- Pete
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-09 Thread Jan Engelhardt

On Feb 9 2007 08:16, linux-os (Dick Johnson) wrote:
>On Fri, 9 Feb 2007, Jan Engelhardt wrote:
>> On Feb 8 2007 16:42, Linus Torvalds wrote:
>>>
>> Further, giving again answer to the question whether they generate
>> signed or unsigned comparisons: Have you ever seen a computer which
>> addresses memory with negative numbers? Since the answer is most likely
>> no, signed comparisons would
>
>Yes, most all do. Indexed memory operands are signed displacements. See
>page 2-16, Intel486 Microprocessor Programmer's Reference Manual. This

I was referring to "absolute memory", not the offset magic that assembler
allows. After all, (reg+relativeOffset) will yield an absolute address.
What I was out at: for machines that have more than 2 GB of memory, you
don't call the address that is given by 0x8000U actually "byte
-2147483648", but "byte 2147483648".

>gets hidden by higher-level languages such as 'C'. It is also non-obvious
>when using the AT (GNU) assembly language. However, the Intel documentation
>shows it clearly:
>   MOV AL,  [EBX+0x]   ; -1

It can be debated what exactly this is... negative offset or "using
overflow to get where we want".



Jan
-- 
ft: http://freshmeat.net/p/chaostables/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-09 Thread Linus Torvalds


On Fri, 9 Feb 2007, Sergei Organov wrote:
> 
> As far as I can read the C99 standard, the "char", "signed char", and
> "unsigned char", are all different types:

Indeed. Search for "pseudo-unsigned", and you'll see more. There are 
actually cases where "char" can act differently from _both_ "unsigned 
char" and "signed char".

> If so, then the code above is broken no matter what representation of
> "char" is chosen for given arch, as strcmp() takes "char*" arguments,
> that are not compatible with either "signed char*" or "unsigned char*".

..and my argument is that a warning which doesn't allow you to call 
"strlen()" on a "unsigned char" array without triggering is a bogus 
warning, and must be removed.

Which is all I've ever said from the beginning.

I've never disputed that "char" and "unsigned char" are different types. 
They *clearly* are different types. 

But that doesn't make the warning valid. There is a real need for NOT 
warning about sign differences.

But this argument has apparently again broken down and become a "language 
lawyer" argument instead of being about what a reasonable compiler 
actually can and should do.

As long as gcc warns about idiotic things, the kernel will keep shutting 
that warning off. I'd _like_ to use the warning, but if the gcc 
implementation is on crack, we clearly cannot.

There's simply an issue of "quality of implementation". In the case of 
this warning, gcc is just not very high quality.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-09 Thread Linus Torvalds


On Thu, 8 Feb 2007, David Rientjes wrote:
> 
> Yes, I read the 4.1.1 docs:
> 
>   By default, such a bit-field is signed, because this is
>   consistent: the basic integer types such as int are signed
>   types.

Ok, so the gcc people have added some language. So what? The fact is, that 
language has absolutely nothing to do with the C language, and doesn't 
change anything at all.

> That is the whole basis for my argument, when you declare something "int," 
> most programmers would consider that to be SIGNED regardless of whether it 
> is 32 bits, 13 bits, or 1 bit.

And MY argument is that YOUR argument is CRAP.

The fact is, signs of bitfields, chars and enums aren't well-defined. 
That's a FACT.

Your argument makes no sense. It's like arguing against "gravity", or like 
arguing against the fact that the earth circles around the sun. It us 
STUPID to argue against facts.

Your argument is exactly the same argument that people used to say that 
the earth is the center of the universe: "because it makes sense that 
we're the center". 

The fact that "most programmers" or "it makes sense" doesn't make anything 
at all true. Only verifiable FACTS make something true.

There's a great saying: "Reality is that which, when you stop believing in 
it, doesn't go away." - Philip K Dick.

So wake up and smell the coffee: reality is that bitfields don't have a 
specified sign. It's just a FACT. Whether you _like_ it or not is simply 
not relevant.

The same is true of "char". You can argue that it should be signed by 
default, sicne "short", "int" and "long" are signed. But THAT IS NOT TRUE.

Arguing against reality really isn't very smart.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-09 Thread Sergei Organov
Jan Engelhardt <[EMAIL PROTECTED]> writes:
> On Feb 8 2007 08:33, Linus Torvalds wrote:
>>
[...]
> What C needs is a distinction between char and int8_t, rendering "char" 
> an unsigned at all times basically and making "unsigned char" and 
> "signed char" illegal types in turn.

AFAIK, C already has 3 *distinct* types, "char", "signed char", and
"unsigned char". So your int8_t reads "signed char" in C, uint8_t reads
"unsigned char" in C, while "char" is yet another type that is not
compatible with those two. Yes, the names for these types are somewhat
confusing, but that's due to historical reasons, I think.

Overall,

signed char   == tiny signed int   == tiny int
unsigned char == tiny unsigned int
char != signed char
char != unsigned char

where

tiny == short short ;)

Rules of thumb: don't use bare char to store integers; don't use
signed/unsigned char to store characters.

[strxxx() routines take char* arguments as they work on '\0'-terminated
character arrays (strings). Using them on arrays of tiny integers is
unsafe no matter how char behaves on given platform (like signed or like
unsigned type).]

-- Sergei.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-09 Thread linux-os \(Dick Johnson\)

On Fri, 9 Feb 2007, Jan Engelhardt wrote:

>
> On Feb 8 2007 16:42, Linus Torvalds wrote:
>>
>> Most C types don't, and some you can't even tell (do pointers generate
>> "signed" or "unsigned" comparisons?
>
> I'd say "neither", because both
>
>signed void *ptr; and
>unsigned void *xyz;
>
> are impossible (and dull at that). That's why you explicitly will
> have to cast one operand in a comparison to make it evident
> what sort of comparison is intended, i.e.
>
>if((unsigned long)ptr < PAGE_OFFSET)
>
> Further, giving again answer to the question whether they generate signed or
> unsigned comparisons: Have you ever seen a computer which addresses memory 
> with
> negative numbers? Since the answer is most likely no, signed comparisons would

Yes, most all do. Indexed memory operands are signed displacements. See
page 2-16, Intel486 Microprocessor Programmer's Reference Manual. This
gets hidden by higher-level languages such as 'C'. It is also non-obvious
when using the AT (GNU) assembly language. However, the Intel documentation
shows it clearly:
MOV AL,  [EBX+0x]   ; -1

...will put the byte existing at the location just BEFORE the offset in
EBX, into register AL, while:

MOV AL,  [EBX+0x0001]  ; +1

...will put the byte existing at the location just AFTER the offset in
EBX, into register AL.

Also, the documentation specifically states that sign extension is
provided for all effective address calculations.

> not make sense for me.
>
>> I'll argue that a compiler that
>> generates signed comparisons for them is broken, but it tends to be
>> something you can only see with a standards- conforming proghram if you
>> can allocate memory across the sign boundary, which may or may not be
>> true..)
>
> Jan
> -- 
> ft: http://freshmeat.net/p/chaostables/
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.24 on an i686 machine (5592.61 BogoMips).
New book: http://www.AbominableFirebug.com/
_



The information transmitted in this message is confidential and may be 
privileged.  Any review, retransmission, dissemination, or other use of this 
information by persons or entities other than the intended recipient is 
prohibited.  If you are not the intended recipient, please notify Analogic 
Corporation immediately - by replying to this message or by sending an email to 
[EMAIL PROTECTED] - and destroy all copies of this information, including any 
attachments, without reading or disclosing them.

Thank you.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-09 Thread Sergei Organov
Linus Torvalds <[EMAIL PROTECTED]> writes:
> On Thu, 8 Feb 2007, Linus Torvalds wrote:
>> 
>> But THE CALLER CANNOT AND MUST NOT CARE! Because the sign of "char" is 
>> implementation-defined, so if you call "strcmp()", you are already 
>> basically saying: I don't care (and I _cannot_ care) what sign you are 
>> using.
>
> Let me explain it another way.
>
> Say you use
>
>   signed char *myname, *yourname;
>
>   if (strcmp(myname,yourname) < 0)
>   printf("Ha, I win!\n")
>
> and you compile this on an architecture where "char" is signed even 
> without the explicit "signed".

As far as I can read the C99 standard, the "char", "signed char", and
"unsigned char", are all different types:

 "The three types char, signed char, and unsigned char are collectively called
  the character types. The implementation shall define char to have the same 
range,
  representation, and behavior as either signed char or unsigned
  char.(35)
  
  (35) CHAR_MIN, defined in , will have one of the values 0 or
   SCHAR_MIN, and this can be used to distinguish the two
   options. Irrespective of the choice made, char is a separate type

   from the other two and is not compatible with either.
   ^
 "

If so, then the code above is broken no matter what representation of
"char" is chosen for given arch, as strcmp() takes "char*" arguments,
that are not compatible with either "signed char*" or "unsigned char*".

-- Sergei.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-09 Thread Jan Engelhardt

On Feb 8 2007 16:42, Linus Torvalds wrote:
>
>Most C types don't, and some you can't even tell (do pointers generate 
>"signed" or "unsigned" comparisons?

I'd say "neither", because both

signed void *ptr; and
unsigned void *xyz;

are impossible (and dull at that). That's why you explicitly will
have to cast one operand in a comparison to make it evident
what sort of comparison is intended, i.e.

if((unsigned long)ptr < PAGE_OFFSET)

Further, giving again answer to the question whether they generate signed or
unsigned comparisons: Have you ever seen a computer which addresses memory with
negative numbers? Since the answer is most likely no, signed comparisons would
not make sense for me.

> I'll argue that a compiler that 
>generates signed comparisons for them is broken, but it tends to be 
>something you can only see with a standards- conforming proghram if you 
>can allocate memory across the sign boundary, which may or may not be 
>true..)

Jan
-- 
ft: http://freshmeat.net/p/chaostables/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-09 Thread Jan Engelhardt

On Feb 8 2007 16:42, Linus Torvalds wrote:

Most C types don't, and some you can't even tell (do pointers generate 
signed or unsigned comparisons?

I'd say neither, because both

signed void *ptr; and
unsigned void *xyz;

are impossible (and dull at that). That's why you explicitly will
have to cast one operand in a comparison to make it evident
what sort of comparison is intended, i.e.

if((unsigned long)ptr  PAGE_OFFSET)

Further, giving again answer to the question whether they generate signed or
unsigned comparisons: Have you ever seen a computer which addresses memory with
negative numbers? Since the answer is most likely no, signed comparisons would
not make sense for me.

 I'll argue that a compiler that 
generates signed comparisons for them is broken, but it tends to be 
something you can only see with a standards- conforming proghram if you 
can allocate memory across the sign boundary, which may or may not be 
true..)

Jan
-- 
ft: http://freshmeat.net/p/chaostables/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-09 Thread Sergei Organov
Linus Torvalds [EMAIL PROTECTED] writes:
 On Thu, 8 Feb 2007, Linus Torvalds wrote:
 
 But THE CALLER CANNOT AND MUST NOT CARE! Because the sign of char is 
 implementation-defined, so if you call strcmp(), you are already 
 basically saying: I don't care (and I _cannot_ care) what sign you are 
 using.

 Let me explain it another way.

 Say you use

   signed char *myname, *yourname;

   if (strcmp(myname,yourname)  0)
   printf(Ha, I win!\n)

 and you compile this on an architecture where char is signed even 
 without the explicit signed.

As far as I can read the C99 standard, the char, signed char, and
unsigned char, are all different types:

 The three types char, signed char, and unsigned char are collectively called
  the character types. The implementation shall define char to have the same 
range,
  representation, and behavior as either signed char or unsigned
  char.(35)
  
  (35) CHAR_MIN, defined in limits.h, will have one of the values 0 or
   SCHAR_MIN, and this can be used to distinguish the two
   options. Irrespective of the choice made, char is a separate type

   from the other two and is not compatible with either.
   ^
 

If so, then the code above is broken no matter what representation of
char is chosen for given arch, as strcmp() takes char* arguments,
that are not compatible with either signed char* or unsigned char*.

-- Sergei.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-09 Thread linux-os \(Dick Johnson\)

On Fri, 9 Feb 2007, Jan Engelhardt wrote:


 On Feb 8 2007 16:42, Linus Torvalds wrote:

 Most C types don't, and some you can't even tell (do pointers generate
 signed or unsigned comparisons?

 I'd say neither, because both

signed void *ptr; and
unsigned void *xyz;

 are impossible (and dull at that). That's why you explicitly will
 have to cast one operand in a comparison to make it evident
 what sort of comparison is intended, i.e.

if((unsigned long)ptr  PAGE_OFFSET)

 Further, giving again answer to the question whether they generate signed or
 unsigned comparisons: Have you ever seen a computer which addresses memory 
 with
 negative numbers? Since the answer is most likely no, signed comparisons would

Yes, most all do. Indexed memory operands are signed displacements. See
page 2-16, Intel486 Microprocessor Programmer's Reference Manual. This
gets hidden by higher-level languages such as 'C'. It is also non-obvious
when using the ATT (GNU) assembly language. However, the Intel documentation
shows it clearly:
MOV AL,  [EBX+0x]   ; -1

...will put the byte existing at the location just BEFORE the offset in
EBX, into register AL, while:

MOV AL,  [EBX+0x0001]  ; +1

...will put the byte existing at the location just AFTER the offset in
EBX, into register AL.

Also, the documentation specifically states that sign extension is
provided for all effective address calculations.

 not make sense for me.

 I'll argue that a compiler that
 generates signed comparisons for them is broken, but it tends to be
 something you can only see with a standards- conforming proghram if you
 can allocate memory across the sign boundary, which may or may not be
 true..)

 Jan
 -- 
 ft: http://freshmeat.net/p/chaostables/
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/


Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.24 on an i686 machine (5592.61 BogoMips).
New book: http://www.AbominableFirebug.com/
_



The information transmitted in this message is confidential and may be 
privileged.  Any review, retransmission, dissemination, or other use of this 
information by persons or entities other than the intended recipient is 
prohibited.  If you are not the intended recipient, please notify Analogic 
Corporation immediately - by replying to this message or by sending an email to 
[EMAIL PROTECTED] - and destroy all copies of this information, including any 
attachments, without reading or disclosing them.

Thank you.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-09 Thread Sergei Organov
Jan Engelhardt [EMAIL PROTECTED] writes:
 On Feb 8 2007 08:33, Linus Torvalds wrote:

[...]
 What C needs is a distinction between char and int8_t, rendering char 
 an unsigned at all times basically and making unsigned char and 
 signed char illegal types in turn.

AFAIK, C already has 3 *distinct* types, char, signed char, and
unsigned char. So your int8_t reads signed char in C, uint8_t reads
unsigned char in C, while char is yet another type that is not
compatible with those two. Yes, the names for these types are somewhat
confusing, but that's due to historical reasons, I think.

Overall,

signed char   == tiny signed int   == tiny int
unsigned char == tiny unsigned int
char != signed char
char != unsigned char

where

tiny == short short ;)

Rules of thumb: don't use bare char to store integers; don't use
signed/unsigned char to store characters.

[strxxx() routines take char* arguments as they work on '\0'-terminated
character arrays (strings). Using them on arrays of tiny integers is
unsafe no matter how char behaves on given platform (like signed or like
unsigned type).]

-- Sergei.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-09 Thread Linus Torvalds


On Thu, 8 Feb 2007, David Rientjes wrote:
 
 Yes, I read the 4.1.1 docs:
 
   By default, such a bit-field is signed, because this is
   consistent: the basic integer types such as int are signed
   types.

Ok, so the gcc people have added some language. So what? The fact is, that 
language has absolutely nothing to do with the C language, and doesn't 
change anything at all.

 That is the whole basis for my argument, when you declare something int, 
 most programmers would consider that to be SIGNED regardless of whether it 
 is 32 bits, 13 bits, or 1 bit.

And MY argument is that YOUR argument is CRAP.

The fact is, signs of bitfields, chars and enums aren't well-defined. 
That's a FACT.

Your argument makes no sense. It's like arguing against gravity, or like 
arguing against the fact that the earth circles around the sun. It us 
STUPID to argue against facts.

Your argument is exactly the same argument that people used to say that 
the earth is the center of the universe: because it makes sense that 
we're the center. 

The fact that most programmers or it makes sense doesn't make anything 
at all true. Only verifiable FACTS make something true.

There's a great saying: Reality is that which, when you stop believing in 
it, doesn't go away. - Philip K Dick.

So wake up and smell the coffee: reality is that bitfields don't have a 
specified sign. It's just a FACT. Whether you _like_ it or not is simply 
not relevant.

The same is true of char. You can argue that it should be signed by 
default, sicne short, int and long are signed. But THAT IS NOT TRUE.

Arguing against reality really isn't very smart.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-09 Thread Linus Torvalds


On Fri, 9 Feb 2007, Sergei Organov wrote:
 
 As far as I can read the C99 standard, the char, signed char, and
 unsigned char, are all different types:

Indeed. Search for pseudo-unsigned, and you'll see more. There are 
actually cases where char can act differently from _both_ unsigned 
char and signed char.

 If so, then the code above is broken no matter what representation of
 char is chosen for given arch, as strcmp() takes char* arguments,
 that are not compatible with either signed char* or unsigned char*.

..and my argument is that a warning which doesn't allow you to call 
strlen() on a unsigned char array without triggering is a bogus 
warning, and must be removed.

Which is all I've ever said from the beginning.

I've never disputed that char and unsigned char are different types. 
They *clearly* are different types. 

But that doesn't make the warning valid. There is a real need for NOT 
warning about sign differences.

But this argument has apparently again broken down and become a language 
lawyer argument instead of being about what a reasonable compiler 
actually can and should do.

As long as gcc warns about idiotic things, the kernel will keep shutting 
that warning off. I'd _like_ to use the warning, but if the gcc 
implementation is on crack, we clearly cannot.

There's simply an issue of quality of implementation. In the case of 
this warning, gcc is just not very high quality.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-09 Thread Jan Engelhardt

On Feb 9 2007 08:16, linux-os (Dick Johnson) wrote:
On Fri, 9 Feb 2007, Jan Engelhardt wrote:
 On Feb 8 2007 16:42, Linus Torvalds wrote:

 Further, giving again answer to the question whether they generate
 signed or unsigned comparisons: Have you ever seen a computer which
 addresses memory with negative numbers? Since the answer is most likely
 no, signed comparisons would

Yes, most all do. Indexed memory operands are signed displacements. See
page 2-16, Intel486 Microprocessor Programmer's Reference Manual. This

I was referring to absolute memory, not the offset magic that assembler
allows. After all, (reg+relativeOffset) will yield an absolute address.
What I was out at: for machines that have more than 2 GB of memory, you
don't call the address that is given by 0x8000U actually byte
-2147483648, but byte 2147483648.

gets hidden by higher-level languages such as 'C'. It is also non-obvious
when using the ATT (GNU) assembly language. However, the Intel documentation
shows it clearly:
   MOV AL,  [EBX+0x]   ; -1

It can be debated what exactly this is... negative offset or using
overflow to get where we want.



Jan
-- 
ft: http://freshmeat.net/p/chaostables/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-09 Thread Pete Zaitcev
On Thu, 8 Feb 2007 22:27:43 -0500, D. Hazelton [EMAIL PROTECTED] wrote:

 So almost all the rules around the signs of types are because of a single, 
 historical machine. Hence the rules about char being unsigned by default 
 and int being signed by default are because of the nature of the PDP-11. 
[...]
 Now: There is no reason for the behavior that came from the nature of the 
 PDP11 to have survived, but because it was in The White Book it made it 
 through the ANSI standardization process.
 
 Now that this history lesson is over...

... we can remember that char is signed on PDP-11. Mvu hah hah hah ha,
only serious. Check how movb works.

-- Pete
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-09 Thread linux-os \(Dick Johnson\)

On Fri, 9 Feb 2007, Jan Engelhardt wrote:


 On Feb 9 2007 08:16, linux-os (Dick Johnson) wrote:
 On Fri, 9 Feb 2007, Jan Engelhardt wrote:
 On Feb 8 2007 16:42, Linus Torvalds wrote:

 Further, giving again answer to the question whether they generate
 signed or unsigned comparisons: Have you ever seen a computer which
 addresses memory with negative numbers? Since the answer is most likely
 no, signed comparisons would

 Yes, most all do. Indexed memory operands are signed displacements. See
 page 2-16, Intel486 Microprocessor Programmer's Reference Manual. This

 I was referring to absolute memory, not the offset magic that assembler
 allows. After all, (reg+relativeOffset) will yield an absolute address.
 What I was out at: for machines that have more than 2 GB of memory, you
 don't call the address that is given by 0x8000U actually byte
 -2147483648, but byte 2147483648.

Don't make any large bets on that!

char foo()
{
volatile char *p = (char *)0x8000;
return *p;
}
Optimized
.file   zzz.c
.text
.p2align 2,,3
.globl foo
.type   foo, @function
foo:
pushl   %ebp
movb-2147483648, %al
movl%esp, %ebp
movsbl  %al,%eax
leave
ret
.size   foo, .-foo
.section.note.GNU-stack,,@progbits
.ident  GCC: (GNU) 3.3.3 20040412 (Red Hat Linux 3.3.3-7)

Not optimized...

.file   zzz.c
.text
.globl foo
.type   foo, @function
foo:
pushl   %ebp
movl%esp, %ebp
subl$4, %esp
movl$-2147483648, -4(%ebp)
movl-4(%ebp), %eax
movb(%eax), %al
movsbl  %al,%eax
leave
ret
.size   foo, .-foo
.section.note.GNU-stack,,@progbits
.ident  GCC: (GNU) 3.3.3 20040412 (Red Hat Linux 3.3.3-7)

... Still negative numbers!


 gets hidden by higher-level languages such as 'C'. It is also non-obvious
 when using the ATT (GNU) assembly language. However, the Intel documentation
 shows it clearly:
  MOV AL,  [EBX+0x]   ; -1

 It can be debated what exactly this is... negative offset or using
 overflow to get where we want.


No debate at all, and no overflow. The construction is even used in
'C' to optimize memory access while unrolling loops:


/*-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
/*
  *   This converts an array of integers to an array of double floats.
  */
void int2float(double *fp, int *ip, size_t len)
{
 while(len = 0x10)
 {
 fp  += 0x10;
 ip  += 0x10;
 len -= 0x10;
 fp[-0x10] = (double) ip[-0x10];
 fp[-0x0f] = (double) ip[-0x0f];
 fp[-0x0e] = (double) ip[-0x0e];
 fp[-0x0d] = (double) ip[-0x0d];
 fp[-0x0c] = (double) ip[-0x0c];
 fp[-0x0b] = (double) ip[-0x0b];
 fp[-0x0a] = (double) ip[-0x0a];
 fp[-0x09] = (double) ip[-0x09];
 fp[-0x08] = (double) ip[-0x08];
 fp[-0x07] = (double) ip[-0x07];
 fp[-0x06] = (double) ip[-0x06];
 fp[-0x05] = (double) ip[-0x05];
 fp[-0x04] = (double) ip[-0x04];
 fp[-0x03] = (double) ip[-0x03];
 fp[-0x02] = (double) ip[-0x02];
 fp[-0x01] = (double) ip[-0x01];
 }
 while(len--)
*fp++ = (double) *ip++;
}

This makes certain that each memory access uses the same type
and the address-calculation for the next memory access can be done
in parallel with the current fetch. This is one of the methods
used to save a few machine cycles. Doesn't mean much until you
end up with large arrays such as used in image processing. Then,
it may be the difference that makes or breaks the project!


 Jan
 -- 
 ft: http://freshmeat.net/p/chaostables/


Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.24 on an i686 machine (5592.61 BogoMips).
New book: http://www.AbominableFirebug.com/
_



The information transmitted in this message is confidential and may be 
privileged.  Any review, retransmission, dissemination, or other use of this 
information by persons or entities other than the intended recipient is 
prohibited.  If you are not the intended recipient, please notify Analogic 
Corporation immediately - by replying to this message or by sending an email to 
[EMAIL PROTECTED] - and destroy all copies of this information, including any 
attachments, without reading or disclosing them.

Thank you.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-09 Thread Jan Engelhardt

On Feb 9 2007 15:29, linux-os (Dick Johnson) wrote:

 I was referring to absolute memory, not the offset magic that assembler
 allows. After all, (reg+relativeOffset) will yield an absolute address.
 What I was out at: for machines that have more than 2 GB of memory, you
 don't call the address that is given by 0x8000U actually byte
 -2147483648, but byte 2147483648.

Don't make any large bets on that!

char foo()
{
volatile char *p = (char *)0x8000;
return *p;
}
Optimized
   .file   zzz.c
   .text
   .p2align 2,,3
.globl foo
   .type   foo, @function
foo:
   pushl   %ebp
   movb-2147483648, %al
   movl%esp, %ebp
   movsbl  %al,%eax
   leave
   ret
   .size   foo, .-foo
   .section.note.GNU-stack,,@progbits
   .ident  GCC: (GNU) 3.3.3 20040412 (Red Hat Linux 3.3.3-7)

 foo:
   0:   55  push   %ebp
   1:   0f b6 05 00 00 00 80movzbl 0x8000,%eax
   8:   89 e5   mov%esp,%ebp
   a:   5d  pop%ebp
   b:   0f be c0movsbl %al,%eax
   e:   c3  ret

You do know that there is a bijection between the set of signed [32bit] 
integers and unsigned [32bit] integers, don't you?
For the CPU, it's just bits. Being signed or unsigned is not important 
when just accessing memory. It will, when a comparison is involved, but 
that was not the point here. void* comparisons are unsigned. Period.
Because a compiler doing signed comparisons will map the memory [from 
2 GB to 4 GB] as part of the signed comparison before the memory [from 0 
GB to 2 GB], which collides with - let's call it - the world view.


Jan
-- 
ft: http://freshmeat.net/p/chaostables/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-09 Thread Martin Mares
Hello!

 void* comparisons are unsigned. Period.

As far as the C standard is concerned, there is no relationship between
comparison on pointers and comparison of their values casted to uintptr_t.
The address space needn't be linear and on some machines it isn't. So
speaking about signedness of pointer comparisons doesn't make sense,
except for concrete implementations.

Have a nice fortnight
-- 
Martin `MJ' Mares  [EMAIL PROTECTED]   
http://mj.ucw.cz/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth
Top ten reasons to procrastinate: 1.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-08 Thread D. Hazelton
On Thursday 08 February 2007 19:42, Linus Torvalds wrote:

> Most C types don't, and some you can't even tell (do pointers generate
> "signed" or "unsigned" comparisons? I'll argue that a compiler that
> generates signed comparisons for them is broken, but it tends to be
> something you can only see with a standards- conforming proghram if you
> can allocate memory across the sign boundary, which may or may not be
> true..)

And this is, because as Dennis Ritchie says in "The Development of the C 
Language" (http://cm.bell-labs.com/cm/cs/who/dmr/chist.html) C evolved from a 
typeless language, B - which, in turn, had originated as a stripped down 
version of BCPL. B and BCPL  used a "cell addressing" scheme, but because of 
the move off the PDP-7, and the difficulties run into by Ken Thompson in 
rewriting the code for Unix in B (the original Unix was written in assembler) 
types were added - 'char' and 'int'. These represented the byte and word 
addressing modes of the PDP-11, and were needed because Ritchie was in the 
process of having the compiler generate assembler instead of "threaded code". 
This "NB" language, after also having data structures added and the rules for 
defining pointers finalized, was renamed "C" by Ritchie.

So almost all the rules around the signs of types are because of a single, 
historical machine. Hence the rules about "char" being unsigned by default 
and "int" being signed by default are because of the nature of the PDP-11. 
The implementation defined nature of bitfields is because Dennis Ritchie had 
freedom there - the PDP-11 didn't have anything like them in the hardware and 
they were being stuffed in to make Thompsons life easier.

Now: There is no reason for the behavior that came from the nature of the 
PDP11 to have survived, but because it was in "The White Book" it made it 
through the ANSI standardization process.

Now that this history lesson is over...

DRH
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-08 Thread David Rientjes
On Thu, 8 Feb 2007, Linus Torvalds wrote:

> Even gcc DOES DIFFERENT THINGS! Have you even read the docs?
> 
>   By default it is treated as signed int but this may be changed by 
>   the -funsigned-bitfields option.
> 

Yes, I read the 4.1.1 docs:

By default, such a bit-field is signed, because this is
consistent: the basic integer types such as int are signed
types.

That is the whole basis for my argument, when you declare something "int," 
most programmers would consider that to be SIGNED regardless of whether it 
is 32 bits, 13 bits, or 1 bit.

No doubt it is configurable because of the existance of brain-dead 
compilers that treat them as unsigned.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-08 Thread Linus Torvalds


On Thu, 8 Feb 2007, David Rientjes wrote:
>
> Maybe you should read my first post, we're talking about gcc's behavior 
> here, not the C standard.

Give it up, David.

Even gcc DOES DIFFERENT THINGS! Have you even read the docs?

By default it is treated as signed int but this may be changed by 
the -funsigned-bitfields option.

So even for gcc, it's just a default. I think you can even change the 
default in the spec file.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-08 Thread David Rientjes
On Thu, 8 Feb 2007, Linus Torvalds wrote:

> No it's not.
> 
> You just don't understand the C language. 
> 
> And if you don't understand the C language, you can't say "that's what the 
> int says". It says no such thing.
> 
> The C language clearly says that bitfields have implementation-defined 
> types. So when you see
> 
>   struct dummy {
>   int flag:1;
>   } a_variable;
> 
> if you don't read that as "oh, the sign of 'flag' is implementation- 
> defined", then you simply aren't reading it right.
> 

Maybe you should read my first post, we're talking about gcc's behavior 
here, not the C standard.  My criticism was that any compiler that makes 
a_variable.flag unsigned is brain-dead and I was arguing in favor of gcc 
treating plain int bitfields as signed ints (6.7.2, 6.7.2.1).  This has 
_nothing_ to do with the fact that the standard leaves it implementation 
defined.

Naturally you should define it's signness explicitly in your code since it 
is implementation defined.  That's not the point.

Just because a compiler CAN consider a_variable.flag as unsigned doesn't 
mean it makes sense.  It makes no sense, and thus is brain-dead.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-08 Thread Linus Torvalds


On Thu, 8 Feb 2007, Linus Torvalds wrote:
> 
> The C language clearly says that bitfields have implementation-defined 
> types.

Btw, this isn't even some "theoretical discussion". LOTS of compilers do 
unsigned bitfields. It may even be the majority. There may (or may not) be 
some correlation with what the default for "char" is, I haven't looked 
into it.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-08 Thread Linus Torvalds


On Thu, 8 Feb 2007, David Rientjes wrote:
> 
> Your struct:
> 
>   struct dummy {
>   int flag:1;
>   } a_variable;
> 
> should expect a_varible.flag to be signed, that's what the int says.  

No it's not.

You just don't understand the C language. 

And if you don't understand the C language, you can't say "that's what the 
int says". It says no such thing.

The C language clearly says that bitfields have implementation-defined 
types. So when you see

struct dummy {
int flag:1;
} a_variable;

if you don't read that as "oh, the sign of 'flag' is implementation- 
defined", then you simply aren't reading it right.

Is it "intuitive"? Nobody ever really called C an _intuitive_ language. C 
has a lot of rules that you simply have to know. The bitfield sign rule is 
just one such rule.

> There is no special case here with regard to type.

Sure there is. Read the spec.

I don't understand why you are arguing. YOU ARE WRONG.

Bitfields simply have implementation-defined signedness. As do enums. As 
does "char". It really is that simple. 

The *real* special case is actually "int" and "long". In many ways, the 
fact that those *do* have a well-specified signedness is actually the 
exception rather than the rule.

Most C types don't, and some you can't even tell (do pointers generate 
"signed" or "unsigned" comparisons? I'll argue that a compiler that 
generates signed comparisons for them is broken, but it tends to be 
something you can only see with a standards- conforming proghram if you 
can allocate memory across the sign boundary, which may or may not be 
true..)

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-08 Thread David Rientjes
On Thu, 8 Feb 2007, Linus Torvalds wrote:

> No, making bitfields unsigned is actually usually a good idea. It allows 
> you to often generate better code, and it actually tends to be what 
> programmers _expect_. A lot of people seem to be surprised to hear that a 
> one-bit bitfield actually often encodes -1/0, and not 0/1.
> 

Your struct:

struct dummy {
int flag:1;
} a_variable;

should expect a_varible.flag to be signed, that's what the int says.  
There is no special case here with regard to type.  It's traditional K 
that writing signed flag:1 here is redundant (K pg. 211).  Now whether 
that's what you wanted or not is irrelevant.  As typical with C, it gives 
you exactly what you asked for.

You're arguing for inconsistency in how bitfields are qualified based on 
their size.  If you declare a bitfield as int byte:8, then you're going 
to expect this to behave exactly like a signed char, which it does.

struct {
int byte:8;
} __attribute__((packed)) b_variable;

b_variable.byte is identical to a signed char.

Just because a_variable.flag happens to be one bit, you cannot say that 
programmers _expect_ it to be unsigned, or you would also expect 
b_variable.byte to act as an unsigned char.  signed is the default 
behavior for all types besides char, so the behavior is appropriate based 
on what most programmers would expect.

> So unsigned bitfields are not only traditional K, they are also usually 
> _faster_ (which is probably why they are traditional K - along with 
> allowing "char" to be unsigned by default). Don't knock them.  It's much 
> better to just remember that bitfields simply don't _have_ any standard 
> sign unless you specify it explicitly, than saying "it should be signed 
> because 'int' is signed".
> 

Of course they're faster, it doesn't require the sign extension.  But 
you're adding additional semantics to the language by your interpretation 
of what is expected by the programmer in the bitfield case as opposed to a 
normal variable definition.

> I will actually argue that having signed bit-fields is almost always a 
> bug, and that as a result you should _never_ use "int" at all. Especially 
> as you might as well just write it as
> 
>   signed a:1;
> 
> if you really want a signed bitfield.
> 

How can signed bitfields almost always be a bug?  I'm the programmer and I 
want to store my data in a variable that I define, so I must follow the 
twos-complement rule and allot a sign bit if I declare it as a signed 
value.  In C, you do this with "int"; otherwise, you use "unsigned".

> So I would reall yrecommend that you never use "int a:" AT ALL, 
> because there really is never any good reason to do so. Do it as
> 
>   unsigned a:3;
>   signed b:2;
> 
> but never 
> 
>   int c:4;
> 
> because the latter really isn't sensible.
> 

That _is_ sensible because anything you declare "int" is going to be 
signed as far as gcc is concerned.  "c" just happens to be 4-bits wide 
instead of 32.  But for everything else, it's the same as an int.  I know 
exactly what I'm getting by writing "int c:4", as would most programmers 
who use bitfields to begin with.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-08 Thread Linus Torvalds


On Fri, 9 Feb 2007, J.A. Magallón wrote:
> 
> Thats the point. Mmmm, I think I see it the other way around. I defined
> a variable as 'signed' or 'unsigned', because the sign info matters for me.
> And gcc warns about using a function on it that will _ignore_ or even
> misinterpret that info. Could it be a BUG ? Yes.

Sure. The other way of seeing it is that *anything*  could be a bug.

Could adding 1 to "a" be a bug? Yes. "a" might overflow. So maybe the 
compiler should warn about that too?

So do you think a compiler should warn when you do

int a = i + 1;

and say "warning: Expression on line x might overflow"?

Could it be a BUG? Hell yeah.

Is warning for things that _could_ be bugs sane? Hell NO.

> Linux/x86, gcc 4.1.2-0.20070115:
> werewolf:~> gcc -Wpointer-sign -c t.c
> t.c: In function ÿÿfÿÿ:
> t.c:10: warning: pointer targets in passing argument 1 of ÿÿstrlenÿÿ differ 
> in signedness
> t.c:11: warning: pointer targets in passing argument 1 of ÿÿstrlenÿÿ differ 
> in signedness

Yeah, and that's what I think is crazy.

Is it consistent? Yes. Does it help people? No.

A warning that is consistent is not necessarily a good warning. It needs 
to MAKE SENSE too. And this one doesn't.  I'm sorry if you can't see that.

Linus

Re: somebody dropped a (warning) bomb

2007-02-08 Thread J.A. Magallón
On Thu, 8 Feb 2007 14:03:06 -0800 (PST), Linus Torvalds <[EMAIL PROTECTED]> 
wrote:

> 
> 
> On Thu, 8 Feb 2007, Linus Torvalds wrote:
> > 
> > But THE CALLER CANNOT AND MUST NOT CARE! Because the sign of "char" is 
> > implementation-defined, so if you call "strcmp()", you are already 
> > basically saying: I don't care (and I _cannot_ care) what sign you are 
> > using.
> 
> Let me explain it another way.
> 
> Say you use
> 
>   signed char *myname, *yourname;
> 
>   if (strcmp(myname,yourname) < 0)
>   printf("Ha, I win!\n")
> 
> and you compile this on an architecture where "char" is signed even 
> without the explicit "signed".
> 
> What should happen?
> 
> Should you get a warning? The types really *are* the same, so getting a 
> warning sounds obviously insane. But on the other hand, if you really care 
> about the sign that strcmp() uses internally, the code is wrong *anyway*, 

Thats the point. Mmmm, I think I see it the other way around. I defined
a variable as 'signed' or 'unsigned', because the sign info matters for me.
And gcc warns about using a function on it that will _ignore_ or even
misinterpret that info. Could it be a BUG ? Yes.

> because with another compiler, or with the *same* compiler on another 
> architecture or some other compiler flags, the very same code is buggy.
> 
> In other words, either you should get a warning *regardless* of whether 
> the sign actually matches or not, or you shouldn't get a warning at all 
> for the above code. Either it's buggy code, or it isn't.
> 

In fact, I tried to look for an arch where this gives only _one_ error:

#include 

void f()
{
const unsigned char *a;
const signed char *b;

int la,lb;

la = strlen(a);
lb = strlen(b);
}

and guess, all give _both_ errors.

Linux/x86, gcc 4.1.2-0.20070115:
werewolf:~> gcc -Wpointer-sign -c t.c
t.c: In function ‘f’:
t.c:10: warning: pointer targets in passing argument 1 of ‘strlen’ differ in 
signedness
t.c:11: warning: pointer targets in passing argument 1 of ‘strlen’ differ in 
signedness

OSX ppc, gcc-4.0.1
belly:~> gcc -Wpointer-sign -c t.c
t.c: In function ‘f’:
t.c:10: warning: pointer targets in passing argument 1 of ‘strlen’ differ in 
signedness
t.c:11: warning: pointer targets in passing argument 1 of ‘strlen’ differ in 
signedness

Solaris5.7 Ultra, cc WorkShop Compilers 5.0
den:~> cc -Xa -c t.c   
ucbcc: Warning: "-Xa" redefines compatibility mode from "SunC transition" to 
"ANSI"
"t.c", line 10: warning: argument #1 is incompatible with prototype:
prototype: pointer to const char : "/usr/include/string.h", line 71
argument : pointer to const uchar
"t.c", line 11: warning: argument #1 is incompatible with prototype:
prototype: pointer to const char : "/usr/include/string.h", line 71
argument : pointer to const schar

This makes sense.
Someone can try on something psychedelic, like ARM ;) ?

--
J.A. Magallon  \   Software is like sex:
 \ It's better when it's free
Mandriva Linux release 2007.1 (Cooker) for i586
Linux 2.6.19-jam06 (gcc 4.1.2 20070115 (prerelease) 
(4.1.2-0.20070115.1mdv2007.1)) #1 SMP PREEMPT
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-08 Thread Linus Torvalds


On Thu, 8 Feb 2007, David Rientjes wrote:
> 
> And a compiler that makes a_variable.flag unsigned would be brain-dead 
> because "int" is always signed.

No, making bitfields unsigned is actually usually a good idea. It allows 
you to often generate better code, and it actually tends to be what 
programmers _expect_. A lot of people seem to be surprised to hear that a 
one-bit bitfield actually often encodes -1/0, and not 0/1.

So unsigned bitfields are not only traditional K, they are also usually 
_faster_ (which is probably why they are traditional K - along with 
allowing "char" to be unsigned by default). Don't knock them.  It's much 
better to just remember that bitfields simply don't _have_ any standard 
sign unless you specify it explicitly, than saying "it should be signed 
because 'int' is signed".

I will actually argue that having signed bit-fields is almost always a 
bug, and that as a result you should _never_ use "int" at all. Especially 
as you might as well just write it as

signed a:1;

if you really want a signed bitfield.

So I would reall yrecommend that you never use "int a:" AT ALL, 
because there really is never any good reason to do so. Do it as

unsigned a:3;
signed b:2;

but never 

int c:4;

because the latter really isn't sensible.

"sparse" will actually complain about single-bit signed bitfields, and it 
found a number of cases where people used that "int x:1" kind of syntax. 

Just don't do it.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >