Re: kernel + gcc 4.1 = several problems

2007-01-26 Thread Michael K. Edwards

ALSA + GCC 4.1.1 + -Os is known to be a bad combination on some
arches; see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27363 .  (I
tripped over it on an ARM target, but my limited understanding of GCC
internals does not allow me to conclude that it is ARM-specific.)  A
patch claiming to fix the bug was integrated into the 4.1 branch, but
my tests with a recent (20070115) gcc-4.1 snapshot indicate that it
has regressed again.

You might also check /proc/cpu/alignment; we have seen the alignment
fixup code trigger for alignment errors in both kernel and userspace.
The default appears to be to IGNORE alignment traps from userspace,
which results in bogus data and potentially a wacky series of system
calls, which could conceivably trigger an oops.  I am told that echo 2

/proc/cpu/alignment activates the kernel alignment fixup code, and

that 3 turns on some sort of logging in addition to the fixup (haven't
pursued this myself).  No idea whether this is relevant to your CPU.

Cheers,
- Michael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-26 Thread Michael K. Edwards

ALSA + GCC 4.1.1 + -Os is known to be a bad combination on some
arches; see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27363 .  (I
tripped over it on an ARM target, but my limited understanding of GCC
internals does not allow me to conclude that it is ARM-specific.)  A
patch claiming to fix the bug was integrated into the 4.1 branch, but
my tests with a recent (20070115) gcc-4.1 snapshot indicate that it
has regressed again.

You might also check /proc/cpu/alignment; we have seen the alignment
fixup code trigger for alignment errors in both kernel and userspace.
The default appears to be to IGNORE alignment traps from userspace,
which results in bogus data and potentially a wacky series of system
calls, which could conceivably trigger an oops.  I am told that echo 2

/proc/cpu/alignment activates the kernel alignment fixup code, and

that 3 turns on some sort of logging in addition to the fixup (haven't
pursued this myself).  No idea whether this is relevant to your CPU.

Cheers,
- Michael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-07 Thread Segher Boessenkool

I want this:

char v[4];
...
memcmp(v, "abcd", 4) == 0

compile to single cmpl on i386. This (gcc 4.1.1) is ridiculous:



callmemcmp


i686-linux-gcc (GCC) 4.2.0 20060410 (experimental)

movl$4, %ecx#, tmp65
cld
movl$v, %esi#, tmp63
movl$.LC0, %edi #, tmp64
repz
cmpsb
sete%al #, tmp68

Still not perfect, but better already.  If you have any
specific examples that you'd like to have compiled to
better code, please report them in GCC bugzilla (with a
self-contained testcase, please).


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-07 Thread Segher Boessenkool

I want this:

char v[4];
...
memcmp(v, abcd, 4) == 0

compile to single cmpl on i386. This (gcc 4.1.1) is ridiculous:



callmemcmp


i686-linux-gcc (GCC) 4.2.0 20060410 (experimental)

movl$4, %ecx#, tmp65
cld
movl$v, %esi#, tmp63
movl$.LC0, %edi #, tmp64
repz
cmpsb
sete%al #, tmp68

Still not perfect, but better already.  If you have any
specific examples that you'd like to have compiled to
better code, please report them in GCC bugzilla (with a
self-contained testcase, please).


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-06 Thread Jeff Garzik

Linus Torvalds wrote:
(That said, I think __builtin_memcpy() does a reasonable job these days 
with gcc, and we might drop the crap one day when we can trust the 
compiler to do ok. It didn't use to, and we continued using our 
ridiculous macro/__builtin_constant_p misuses just because it works with 
_all_ relevant gcc versions).



Yep, a ton of work by Roger Sayle, among others, really matured the gcc 
str*/mem* builtins in the 4.x series.  They are definitely worth another 
look.


Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-06 Thread Linus Torvalds


On Sun, 7 Jan 2007, Denis Vlasenko wrote:
> 
> I'd say "care about obvious, safe optimizations which we still not do".
> I want this:
> 
> char v[4];
> ...
>   memcmp(v, "abcd", 4) == 0
> 
> compile to single cmpl on i386.

Yeah. For a more relevant case, look at the hoops we used to jump through 
to get "memcpy()" to generate ok code for trivial fixed-sized cases.

(That said, I think __builtin_memcpy() does a reasonable job these days 
with gcc, and we might drop the crap one day when we can trust the 
compiler to do ok. It didn't use to, and we continued using our 
ridiculous macro/__builtin_constant_p misuses just because it works with 
_all_ relevant gcc versions).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-06 Thread Denis Vlasenko
On Thursday 04 January 2007 18:37, Linus Torvalds wrote:
> With 7+ million lines of C code and headers, I'm not interested in 
> compilers that read the letter of the law. We don't want some really 
> clever code generation that gets us .5% on some unrealistic load. We want 
> good _solid_ code generation that does the obvious thing.
> 
> Compiler writers seem to seldom even realize this. A lot of commercial 
> code gets shipped with basically no optimizations at all (or with specific 
> optimizations turned off), because people want to ship what they debug and 
> work with.

I'd say "care about obvious, safe optimizations which we still not do".
I want this:

char v[4];
...
memcmp(v, "abcd", 4) == 0

compile to single cmpl on i386. This (gcc 4.1.1) is ridiculous:

.LC0:
.string "abcd"
.text
...
pushl   $4
pushl   $.LC0
pushl   $v
callmemcmp
addl$12, %esp
testl   %eax, %eax

There are tons of examples where you can improve code generation.
--
vda
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-06 Thread Alistair John Strachan
On Sunday 07 January 2007 00:36, Pavel Machek wrote:
[snip]
> > However, this patch is mostly useless if you have a separate stack for
> > IRQ's (since if that happens, any interrupt will be taken on a different
> > stack which we don't see any more), so you should NOT enable the 4KSTACKS
> > config option if you try this out.
>
> stupid idea... perhaps gcc-4.1 generates bigger stackframe somewhere,
> and stack overflows?

The primary reason it's not 4KSTACKS already is that I run multiple XFS 
partitions on top of an md RAID 1. LVM isn't involved, however, and I'm not 
using any other filesystem overlays like dm.

I'm fairly sceptical that it's a stack overflow, but I'll be sure to enable 
the debugging option on the next try.

> that hw monitoring thingie... I'd turn it off. Its interactions with
> acpi are non-trivial and dangerous.

Well, GCC 3.4 kernels seem to run fine with it, but as I said to Linus I'll be 
sure to turn this and the sound drivers off in the next build.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-06 Thread Pavel Machek
Hi!

> > (I realise with problems like these it's almost always some sort of obscure 
> > hardware problem, but I find that very difficult to believe when I can 
> > toggle 
> > from 3 years of stability to 6-18 hours crashing by switching compiler. 
> > I've 
> > also ran extensive stability test programs on the hardware with absolutely 
> > no 
> > negative results.)
> 
> The thing is, I agree with you - it does seem to be compiler-related. But 
> at the same time, I'm almost positive that it's not in "pipe_poll()" 
> itself, because that function is just too simple, and looking at the 
> assembly code, I don't see how what you describe could happen in THAT 
> function.
> 
> HOWEVER.
> 
> I can easily see an NMI coming in, or another interrupt, or something, and 
> that one corrupting the stack under it because of a compiler bug (or a 
> kernel bug that just needs a specific compiler to trigger). For example, 
> we've had problems before with the compiler thinking it owns the stack 
> frame for an "asmlinkage" function, and us having no way to tell the 
> compiler to keep its hands off - so the compiler ended up touching 
> registers that were actually in the "save area" of the interrupt or system 
> call, and then returning with corrupted state.
> 
> Here's a stupid patch. It just adds more debugging to the oops message, 
> and shows all the code pointers it can find on the WHOLE stack.
> 
> It also makes the raw stack dumping print out as much of the stack 
> contents _under_ the stack pointer as it does above it too.
> 
> However, this patch is mostly useless if you have a separate stack for 
> IRQ's (since if that happens, any interrupt will be taken on a different 
> stack which we don't see any more), so you should NOT enable the 4KSTACKS 
> config option if you try this out.

stupid idea... perhaps gcc-4.1 generates bigger stackframe somewhere,
and stack overflows?

that hw monitoring thingie... I'd turn it off. Its interactions with
acpi are non-trivial and dangerous.
Pavel
-- 
Thanks for all the (sleeping) penguins.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-06 Thread Segher Boessenkool

For a different mailing list indeed; let me just point
out
that for certain important quite common cases it's an
~50%
overall speedup.


Hmm, what code was that? 'signed int does not wrap around' does not
seem to provide _that_ much info...


One of the recent huge threads on the GCC dev list has a
post that says *some other* compiler gets a result like
this from this optimisation (I don't have a link to the
exact post and I don't remember the details; perhaps it
was XLC?)

Sorry if I wasn't clear enough and you understood I meant
that GCC exploits this optimisation opportunity well
enough for such nice results already.

 - - -

So I searched for it anyway:



It looks like the result for *integer* code wasn't *all*
that dramatic a difference.  Anyway, it's obvious that
the optimisation can certainly give nice results and it
wouldn't be a good idea for the Linux kernel to dismiss
it without really evaluating the impact first; and anyway,
this is for some future date, GCC-4.2 isn't here yet.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-06 Thread Segher Boessenkool

For a different mailing list indeed; let me just point
out
that for certain important quite common cases it's an
~50%
overall speedup.


Hmm, what code was that? 'signed int does not wrap around' does not
seem to provide _that_ much info...


One of the recent huge threads on the GCC dev list has a
post that says *some other* compiler gets a result like
this from this optimisation (I don't have a link to the
exact post and I don't remember the details; perhaps it
was XLC?)

Sorry if I wasn't clear enough and you understood I meant
that GCC exploits this optimisation opportunity well
enough for such nice results already.

 - - -

So I searched for it anyway:

http://gcc.gnu.org/ml/gcc/2006-12/msg00768.html

It looks like the result for *integer* code wasn't *all*
that dramatic a difference.  Anyway, it's obvious that
the optimisation can certainly give nice results and it
wouldn't be a good idea for the Linux kernel to dismiss
it without really evaluating the impact first; and anyway,
this is for some future date, GCC-4.2 isn't here yet.


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-06 Thread Pavel Machek
Hi!

  (I realise with problems like these it's almost always some sort of obscure 
  hardware problem, but I find that very difficult to believe when I can 
  toggle 
  from 3 years of stability to 6-18 hours crashing by switching compiler. 
  I've 
  also ran extensive stability test programs on the hardware with absolutely 
  no 
  negative results.)
 
 The thing is, I agree with you - it does seem to be compiler-related. But 
 at the same time, I'm almost positive that it's not in pipe_poll() 
 itself, because that function is just too simple, and looking at the 
 assembly code, I don't see how what you describe could happen in THAT 
 function.
 
 HOWEVER.
 
 I can easily see an NMI coming in, or another interrupt, or something, and 
 that one corrupting the stack under it because of a compiler bug (or a 
 kernel bug that just needs a specific compiler to trigger). For example, 
 we've had problems before with the compiler thinking it owns the stack 
 frame for an asmlinkage function, and us having no way to tell the 
 compiler to keep its hands off - so the compiler ended up touching 
 registers that were actually in the save area of the interrupt or system 
 call, and then returning with corrupted state.
 
 Here's a stupid patch. It just adds more debugging to the oops message, 
 and shows all the code pointers it can find on the WHOLE stack.
 
 It also makes the raw stack dumping print out as much of the stack 
 contents _under_ the stack pointer as it does above it too.
 
 However, this patch is mostly useless if you have a separate stack for 
 IRQ's (since if that happens, any interrupt will be taken on a different 
 stack which we don't see any more), so you should NOT enable the 4KSTACKS 
 config option if you try this out.

stupid idea... perhaps gcc-4.1 generates bigger stackframe somewhere,
and stack overflows?

that hw monitoring thingie... I'd turn it off. Its interactions with
acpi are non-trivial and dangerous.
Pavel
-- 
Thanks for all the (sleeping) penguins.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-06 Thread Alistair John Strachan
On Sunday 07 January 2007 00:36, Pavel Machek wrote:
[snip]
  However, this patch is mostly useless if you have a separate stack for
  IRQ's (since if that happens, any interrupt will be taken on a different
  stack which we don't see any more), so you should NOT enable the 4KSTACKS
  config option if you try this out.

 stupid idea... perhaps gcc-4.1 generates bigger stackframe somewhere,
 and stack overflows?

The primary reason it's not 4KSTACKS already is that I run multiple XFS 
partitions on top of an md RAID 1. LVM isn't involved, however, and I'm not 
using any other filesystem overlays like dm.

I'm fairly sceptical that it's a stack overflow, but I'll be sure to enable 
the debugging option on the next try.

 that hw monitoring thingie... I'd turn it off. Its interactions with
 acpi are non-trivial and dangerous.

Well, GCC 3.4 kernels seem to run fine with it, but as I said to Linus I'll be 
sure to turn this and the sound drivers off in the next build.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-06 Thread Denis Vlasenko
On Thursday 04 January 2007 18:37, Linus Torvalds wrote:
 With 7+ million lines of C code and headers, I'm not interested in 
 compilers that read the letter of the law. We don't want some really 
 clever code generation that gets us .5% on some unrealistic load. We want 
 good _solid_ code generation that does the obvious thing.
 
 Compiler writers seem to seldom even realize this. A lot of commercial 
 code gets shipped with basically no optimizations at all (or with specific 
 optimizations turned off), because people want to ship what they debug and 
 work with.

I'd say care about obvious, safe optimizations which we still not do.
I want this:

char v[4];
...
memcmp(v, abcd, 4) == 0

compile to single cmpl on i386. This (gcc 4.1.1) is ridiculous:

.LC0:
.string abcd
.text
...
pushl   $4
pushl   $.LC0
pushl   $v
callmemcmp
addl$12, %esp
testl   %eax, %eax

There are tons of examples where you can improve code generation.
--
vda
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-06 Thread Linus Torvalds


On Sun, 7 Jan 2007, Denis Vlasenko wrote:
 
 I'd say care about obvious, safe optimizations which we still not do.
 I want this:
 
 char v[4];
 ...
   memcmp(v, abcd, 4) == 0
 
 compile to single cmpl on i386.

Yeah. For a more relevant case, look at the hoops we used to jump through 
to get memcpy() to generate ok code for trivial fixed-sized cases.

(That said, I think __builtin_memcpy() does a reasonable job these days 
with gcc, and we might drop the crap one day when we can trust the 
compiler to do ok. It didn't use to, and we continued using our 
ridiculous macro/__builtin_constant_p misuses just because it works with 
_all_ relevant gcc versions).

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-06 Thread Jeff Garzik

Linus Torvalds wrote:
(That said, I think __builtin_memcpy() does a reasonable job these days 
with gcc, and we might drop the crap one day when we can trust the 
compiler to do ok. It didn't use to, and we continued using our 
ridiculous macro/__builtin_constant_p misuses just because it works with 
_all_ relevant gcc versions).



Yep, a ton of work by Roger Sayle, among others, really matured the gcc 
str*/mem* builtins in the 4.x series.  They are definitely worth another 
look.


Jeff


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-05 Thread Pavel Machek
Hi!

> >IMHO you should play such games with "g++ -O9", but 
> >that's
> >a discussion for a different mailing list.
> 
> For a different mailing list indeed; let me just point 
> out
> that for certain important quite common cases it's an 
> ~50%
> overall speedup.

Hmm, what code was that? 'signed int does not wrap around' does not
seem to provide _that_ much info...
Pavel
-- 
Thanks for all the (sleeping) penguins.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-05 Thread Linus Torvalds


On Fri, 5 Jan 2007, Alistair John Strachan wrote:
> 
> (I realise with problems like these it's almost always some sort of obscure 
> hardware problem, but I find that very difficult to believe when I can toggle 
> from 3 years of stability to 6-18 hours crashing by switching compiler. I've 
> also ran extensive stability test programs on the hardware with absolutely no 
> negative results.)

The thing is, I agree with you - it does seem to be compiler-related. But 
at the same time, I'm almost positive that it's not in "pipe_poll()" 
itself, because that function is just too simple, and looking at the 
assembly code, I don't see how what you describe could happen in THAT 
function.

HOWEVER.

I can easily see an NMI coming in, or another interrupt, or something, and 
that one corrupting the stack under it because of a compiler bug (or a 
kernel bug that just needs a specific compiler to trigger). For example, 
we've had problems before with the compiler thinking it owns the stack 
frame for an "asmlinkage" function, and us having no way to tell the 
compiler to keep its hands off - so the compiler ended up touching 
registers that were actually in the "save area" of the interrupt or system 
call, and then returning with corrupted state.

Here's a stupid patch. It just adds more debugging to the oops message, 
and shows all the code pointers it can find on the WHOLE stack.

It also makes the raw stack dumping print out as much of the stack 
contents _under_ the stack pointer as it does above it too.

However, this patch is mostly useless if you have a separate stack for 
IRQ's (since if that happens, any interrupt will be taken on a different 
stack which we don't see any more), so you should NOT enable the 4KSTACKS 
config option if you try this out.

I'm not sure how enlightening any of the output might be, but it is 
probably worth trying.

Linus

---
diff --git a/arch/i386/kernel/traps.c b/arch/i386/kernel/traps.c
index 0efad8a..2359eed 100644
--- a/arch/i386/kernel/traps.c
+++ b/arch/i386/kernel/traps.c
@@ -243,6 +243,20 @@ void show_trace(struct task_struct *task, struct pt_regs 
*regs,
show_trace_log_lvl(task, regs, stack, "");
 }
 
+static void show_all_stack_addresses(unsigned long *esp)
+{
+   struct thread_info *tinfo = (void *) ((unsigned long)esp & 
(~(THREAD_SIZE - 1)));
+   unsigned long *stack = (unsigned long *)(tinfo+1);
+
+   printk("All stack code pointers:\n");
+   while (valid_stack_ptr(tinfo, stack)) {
+   unsigned long addr = *stack++;
+   if (__kernel_text_address(addr))
+   print_symbol(" %s", addr);
+   }
+   printk("\n");
+}
+
 static void show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
   unsigned long *esp, char *log_lvl)
 {
@@ -256,8 +270,10 @@ static void show_stack_log_lvl(struct task_struct *task, 
struct pt_regs *regs,
esp = (unsigned long *)
}
 
+   show_all_stack_addresses(esp);
stack = esp;
-   for(i = 0; i < kstack_depth_to_print; i++) {
+   stack -= kstack_depth_to_print;
+   for(i = 0; i < 2*kstack_depth_to_print; i++) {
if (kstack_end(stack))
break;
if (i && ((i % 8) == 0))
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-05 Thread Alistair John Strachan
On Friday 05 January 2007 16:02, Linus Torvalds wrote:
> On Fri, 5 Jan 2007, Alistair John Strachan wrote:
> > This didn't help. After about 14 hours, the machine crashed again.
> >
> > cmov is not the culprit.
>
> Ok. Have you ever tried to limit the drivers you have loaded? I notice you
> had the prism54 wireless thing in your modules list and the vt1211 hw
> monitoring thing. I'm wondering about the vt1211 thing - it probably isn't
> too common.

Sure, and it only got added to 2.6.19 anyway (however GCC 3.4.6 really does 
seem to have no problem with it).

> But if you can use that machine without the wireless too, it
> might be good to try without either.

Required, plus I've been running prism54 on three different machines with a 
huge number of compilers since the early 2.6 days with no problems.

> (The rest of your module list looked bog-standard, so if it's not
> hardware-specific, I don't think it's there)

Agreed, the config is already _very_ minimal for this machine.

> Turning of the VIA sound driver just in case would be good too.

I'm not even really sure why that's enabled. I can do that.

> The reason I mention vt1211 in particular is that it does things like
> regulate fan activity etc. Is the problem perhaps heat-related?

It definitely isn't heat related. This CPU puts out 7-10W, has a ridiculous 
5000 RPM fan on it (that works) and the temp never exceeds 40C. If anything, 
the -O2, 3.4.6 kernel with CMOV ran the chip a little hotter.

As far as I can see, all the other components are either cool to touch or have 
stupidly big heatsinks on them.

(I realise with problems like these it's almost always some sort of obscure 
hardware problem, but I find that very difficult to believe when I can toggle 
from 3 years of stability to 6-18 hours crashing by switching compiler. I've 
also ran extensive stability test programs on the hardware with absolutely no 
negative results.)

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-05 Thread Linus Torvalds


On Fri, 5 Jan 2007, Alistair John Strachan wrote:
> 
> This didn't help. After about 14 hours, the machine crashed again.
> 
> cmov is not the culprit.

Ok. Have you ever tried to limit the drivers you have loaded? I notice you 
had the prism54 wireless thing in your modules list and the vt1211 hw 
monitoring thing. I'm wondering about the vt1211 thing - it probably isn't 
too common. But if you can use that machine without the wireless too, it 
might be good to try without either.

(The rest of your module list looked bog-standard, so if it's not 
hardware-specific, I don't think it's there)

Turning of the VIA sound driver just in case would be good too.

The reason I mention vt1211 in particular is that it does things like 
regulate fan activity etc. Is the problem perhaps heat-related? 

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-05 Thread Alistair John Strachan
On Wednesday 03 January 2007 02:20, Alistair John Strachan wrote:
> On Wednesday 03 January 2007 02:12, Mikael Pettersson wrote:
> > On Tue, 2 Jan 2007 17:43:00 -0800 (PST), Linus Torvalds wrote:
> > > > The suggestions I've had so far which I have not yet tried:
> > > >
> > > > -   Select a different x86 CPU in the config.
> > > > -   Unfortunately the C3-2 flags seem to simply 
> > > > tell GCC
> > > > to schedule for ppro (like i686) and enabled 
> > > > MMX and SSE
> > > > -   Probably useless
> > >
> > > Actually, try this one. Try using something that doesn't like "cmov".
> > > Maybe the C3-2 simply has some internal cmov bugginess.
> >
> > That's a good suggestion. Earlier C3s didn't have cmov so it's
> > not entirely unlikely that cmov in C3-2 is broken in some cases.
> > Configuring for P5MMX or 486 should be good safe alternatives.
>
> Or just C3 (not C3-2), which is what I've done.
>
> I'll report back whether it crashes or not.

This didn't help. After about 14 hours, the machine crashed again.

cmov is not the culprit.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-05 Thread Alistair John Strachan
On Wednesday 03 January 2007 02:20, Alistair John Strachan wrote:
 On Wednesday 03 January 2007 02:12, Mikael Pettersson wrote:
  On Tue, 2 Jan 2007 17:43:00 -0800 (PST), Linus Torvalds wrote:
The suggestions I've had so far which I have not yet tried:
   
-   Select a different x86 CPU in the config.
-   Unfortunately the C3-2 flags seem to simply 
tell GCC
to schedule for ppro (like i686) and enabled 
MMX and SSE
-   Probably useless
  
   Actually, try this one. Try using something that doesn't like cmov.
   Maybe the C3-2 simply has some internal cmov bugginess.
 
  That's a good suggestion. Earlier C3s didn't have cmov so it's
  not entirely unlikely that cmov in C3-2 is broken in some cases.
  Configuring for P5MMX or 486 should be good safe alternatives.

 Or just C3 (not C3-2), which is what I've done.

 I'll report back whether it crashes or not.

This didn't help. After about 14 hours, the machine crashed again.

cmov is not the culprit.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-05 Thread Linus Torvalds


On Fri, 5 Jan 2007, Alistair John Strachan wrote:
 
 This didn't help. After about 14 hours, the machine crashed again.
 
 cmov is not the culprit.

Ok. Have you ever tried to limit the drivers you have loaded? I notice you 
had the prism54 wireless thing in your modules list and the vt1211 hw 
monitoring thing. I'm wondering about the vt1211 thing - it probably isn't 
too common. But if you can use that machine without the wireless too, it 
might be good to try without either.

(The rest of your module list looked bog-standard, so if it's not 
hardware-specific, I don't think it's there)

Turning of the VIA sound driver just in case would be good too.

The reason I mention vt1211 in particular is that it does things like 
regulate fan activity etc. Is the problem perhaps heat-related? 

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-05 Thread Alistair John Strachan
On Friday 05 January 2007 16:02, Linus Torvalds wrote:
 On Fri, 5 Jan 2007, Alistair John Strachan wrote:
  This didn't help. After about 14 hours, the machine crashed again.
 
  cmov is not the culprit.

 Ok. Have you ever tried to limit the drivers you have loaded? I notice you
 had the prism54 wireless thing in your modules list and the vt1211 hw
 monitoring thing. I'm wondering about the vt1211 thing - it probably isn't
 too common.

Sure, and it only got added to 2.6.19 anyway (however GCC 3.4.6 really does 
seem to have no problem with it).

 But if you can use that machine without the wireless too, it
 might be good to try without either.

Required, plus I've been running prism54 on three different machines with a 
huge number of compilers since the early 2.6 days with no problems.

 (The rest of your module list looked bog-standard, so if it's not
 hardware-specific, I don't think it's there)

Agreed, the config is already _very_ minimal for this machine.

 Turning of the VIA sound driver just in case would be good too.

I'm not even really sure why that's enabled. I can do that.

 The reason I mention vt1211 in particular is that it does things like
 regulate fan activity etc. Is the problem perhaps heat-related?

It definitely isn't heat related. This CPU puts out 7-10W, has a ridiculous 
5000 RPM fan on it (that works) and the temp never exceeds 40C. If anything, 
the -O2, 3.4.6 kernel with CMOV ran the chip a little hotter.

As far as I can see, all the other components are either cool to touch or have 
stupidly big heatsinks on them.

(I realise with problems like these it's almost always some sort of obscure 
hardware problem, but I find that very difficult to believe when I can toggle 
from 3 years of stability to 6-18 hours crashing by switching compiler. I've 
also ran extensive stability test programs on the hardware with absolutely no 
negative results.)

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-05 Thread Linus Torvalds


On Fri, 5 Jan 2007, Alistair John Strachan wrote:
 
 (I realise with problems like these it's almost always some sort of obscure 
 hardware problem, but I find that very difficult to believe when I can toggle 
 from 3 years of stability to 6-18 hours crashing by switching compiler. I've 
 also ran extensive stability test programs on the hardware with absolutely no 
 negative results.)

The thing is, I agree with you - it does seem to be compiler-related. But 
at the same time, I'm almost positive that it's not in pipe_poll() 
itself, because that function is just too simple, and looking at the 
assembly code, I don't see how what you describe could happen in THAT 
function.

HOWEVER.

I can easily see an NMI coming in, or another interrupt, or something, and 
that one corrupting the stack under it because of a compiler bug (or a 
kernel bug that just needs a specific compiler to trigger). For example, 
we've had problems before with the compiler thinking it owns the stack 
frame for an asmlinkage function, and us having no way to tell the 
compiler to keep its hands off - so the compiler ended up touching 
registers that were actually in the save area of the interrupt or system 
call, and then returning with corrupted state.

Here's a stupid patch. It just adds more debugging to the oops message, 
and shows all the code pointers it can find on the WHOLE stack.

It also makes the raw stack dumping print out as much of the stack 
contents _under_ the stack pointer as it does above it too.

However, this patch is mostly useless if you have a separate stack for 
IRQ's (since if that happens, any interrupt will be taken on a different 
stack which we don't see any more), so you should NOT enable the 4KSTACKS 
config option if you try this out.

I'm not sure how enlightening any of the output might be, but it is 
probably worth trying.

Linus

---
diff --git a/arch/i386/kernel/traps.c b/arch/i386/kernel/traps.c
index 0efad8a..2359eed 100644
--- a/arch/i386/kernel/traps.c
+++ b/arch/i386/kernel/traps.c
@@ -243,6 +243,20 @@ void show_trace(struct task_struct *task, struct pt_regs 
*regs,
show_trace_log_lvl(task, regs, stack, );
 }
 
+static void show_all_stack_addresses(unsigned long *esp)
+{
+   struct thread_info *tinfo = (void *) ((unsigned long)esp  
(~(THREAD_SIZE - 1)));
+   unsigned long *stack = (unsigned long *)(tinfo+1);
+
+   printk(All stack code pointers:\n);
+   while (valid_stack_ptr(tinfo, stack)) {
+   unsigned long addr = *stack++;
+   if (__kernel_text_address(addr))
+   print_symbol( %s, addr);
+   }
+   printk(\n);
+}
+
 static void show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
   unsigned long *esp, char *log_lvl)
 {
@@ -256,8 +270,10 @@ static void show_stack_log_lvl(struct task_struct *task, 
struct pt_regs *regs,
esp = (unsigned long *)esp;
}
 
+   show_all_stack_addresses(esp);
stack = esp;
-   for(i = 0; i  kstack_depth_to_print; i++) {
+   stack -= kstack_depth_to_print;
+   for(i = 0; i  2*kstack_depth_to_print; i++) {
if (kstack_end(stack))
break;
if (i  ((i % 8) == 0))
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-05 Thread Pavel Machek
Hi!

 IMHO you should play such games with g++ -O9, but 
 that's
 a discussion for a different mailing list.
 
 For a different mailing list indeed; let me just point 
 out
 that for certain important quite common cases it's an 
 ~50%
 overall speedup.

Hmm, what code was that? 'signed int does not wrap around' does not
seem to provide _that_ much info...
Pavel
-- 
Thanks for all the (sleeping) penguins.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-04 Thread Geert Bosch


On Jan 4, 2007, at 13:34, Segher Boessenkool wrote:


The "signed wrap is undefined" thing doesn't fit in this category
though:

-- It is an important optimisation for loops with a signed
   induction variable;


It certainly isn't that important. Even SpecINT compiled with
-O3 and top-of-tree GCC *improves* 1% by adding -fwrapv.
If the compiler itself can rely on wrap-around semantics and
doesn't have to worry about introducing overflows between
optimization passes, it can reorder simple chains of additions.
This is more important for many real-world applications than
being able to perform some complex loop-interchange.
Compiler developers always make the mistake of overrating
their optimizations.

If GCC does really poorly on a few important loops that matter,
that issue is easily addressed. If GCC generates unreliable
code for millions of boring lines of important real-world C,
the compiler is worthless.

  -Geert
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-04 Thread Al Viro
On Thu, Jan 04, 2007 at 09:47:01AM -0800, Linus Torvalds wrote:
> NOBODY will guarantee you that they follow all standards to the letter. 
> Some use compiler extensions knowingly, but pretty much _everybody_ ends 
> up depending on subtle issues without even realizing it. It's almost 
> impossible to write a real program that has no bugs, and if they don't 
> show up in testing (because the compiler didn't generate buggy assembly 
> code from source code that had the _potential_ for bugs), they often won't 
> get fixed.
> 
> The kernel does things like compare pointers across objects, and the 
> kernel EXPECTS it to work. I seriously doubt that the kernel is even 
> unusual in this. The common way to avoid AB-BA deadlocks in any threaded 
> code (whether kernel or user space) is to just take two locks in a 
> specific order, and the common way to do that for locks of the same type 
> is simply to compare the addresses).
> 
> The fact that this is "undefined" behaviour matters not a _whit_. Not for 
> the kernel, and I bet not for a lot of other applications either.

True, but we'd better understand what assumptions we are making.  I have
seen patches seriously attempting to _subtract_ unrelated pointers.  And
that simply doesn't work for obvious reasons...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-04 Thread Segher Boessenkool

(in which case, nearly all real-world code is broken)


Not "nearly all" -- but lots of code, yes.


I wouldn't say "lots of code". I would say "all real projects".


All projects that tell the compiler they're written in ISO C,
while they're not, can easily break, sure.  You can't say this
is GCC's fault; sure in some cases decisions were made that
resulted in more of those programs breaking than was really
necessary, but it's obviously *impossible* to prevent all
from breaking.

And yes it's true: most people do not program in ISO C at all,
_even if they think they do_, simply because they are not aware
of all the rules.  For some of the areas where most of the
mistakes are made, for example aliasing rules and signed overflow,
GCC provides helpful options to switch behaviour to something
that makes those people's programs work.  You can also use those
options if you have made a conscious decision that you want to
write your code in one of the resulting dialects of C.


Segher

p.s.  If it's decided to not use -fwrapv, a debug option that
sets -ftrapv can be introduced -- it will make it a BUG() if
any (accidental) signed overflow happens after all.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-04 Thread Segher Boessenkool

I'll happily turn off compiler features that are "clever optimizations
that never actually matter in practice, but are just likely to possible
cause problems".


The "signed wrap is undefined" thing doesn't fit in this category
though:

-- It is an important optimisation for loops with a signed
   induction variable;
-- "Random code" where it causes problems is typically buggy
   already (i.e., code that doesn't take overflow into account
   at all won't expect wraparound either);
-- Code that explicitly depends on signed overflow two's complement
   wraparound can be trivially converted to use unsigned arithmetic
   (and in almost all cases it really should have used that already).

If GCC can generate warnings for things in the second bullet point
(and it probably will, but nothing is finalised yet), I don't see
a reason for the kernel to turn off the optimisation.  Why not try
it out and only _if_ it causes troubles (after the compiler version
is stable) turn it off.

to take is not to add the compiler flag, but to fix the code.


Nope, unless we decide that the performance advantages of
a language change are worth the risk and pain.


But it's not a language change -- GCC has worked like this
for a _long_ time already, since May 2003 if I read the
ChangeLog correctly -- it's just that it starts to optimise
some things more aggressively now.

With integer overflow optimizations, the same situation may be true. 
The
kernel has never been "strict ANSI C". We've always used C extensions. 
The
extension of "signed integer arithmetic follows 
2's-complement-arithmetic"

is a perfectly sane extension to the language, and quite possibly worth
it.


Could be.  Who knows, without testing.  I'm just saying to
not add -fwrapv purely as a knee-jerk reaction.


And the fact that it's not "strict ANSI C" has absolutely _zero_
relevance.


I certainly never claimed so, that's all in Albert's mind it seems :-)


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-04 Thread Andreas Schwab
"Albert Cahalan" <[EMAIL PROTECTED]> writes:

> FYI, the kernel also assumes that a "char" is 8 bits.
> Maybe you should run away screaming.

You are confusing "undefined" with "implementation defined".  Those are
two quite different concepts.

Andreas.

-- 
Andreas Schwab, SuSE Labs, [EMAIL PROTECTED]
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-04 Thread Linus Torvalds


On Thu, 4 Jan 2007, Segher Boessenkool wrote:
> 
> > (in which case, nearly all real-world code is broken)
> 
> Not "nearly all" -- but lots of code, yes.

I wouldn't say "lots of code". I would say "all real projects".

NOBODY will guarantee you that they follow all standards to the letter. 
Some use compiler extensions knowingly, but pretty much _everybody_ ends 
up depending on subtle issues without even realizing it. It's almost 
impossible to write a real program that has no bugs, and if they don't 
show up in testing (because the compiler didn't generate buggy assembly 
code from source code that had the _potential_ for bugs), they often won't 
get fixed.

The kernel does things like compare pointers across objects, and the 
kernel EXPECTS it to work. I seriously doubt that the kernel is even 
unusual in this. The common way to avoid AB-BA deadlocks in any threaded 
code (whether kernel or user space) is to just take two locks in a 
specific order, and the common way to do that for locks of the same type 
is simply to compare the addresses).

The fact that this is "undefined" behaviour matters not a _whit_. Not for 
the kernel, and I bet not for a lot of other applications either.

So "nearly all" is probably _understating_ things rather than overstating 
it as you claim. Anybody who thinks that they have proven the correctness 
of their program is likely lying. It's a good thing if they have _tested_ 
all the code-paths, but they've invariably been tested with a compiler 
that doesn't go out of its way to try to generate "legal but idiotic" 
code. So the testing won't generally find cases where the compiler may 
have been _allowed_ to do something else.

The end result: any nontrivial project always has dodgy code. Because 
people simply don't write perfect code.

Compiler people who don't realize this aren't compiler people. They're 
academics involved with mental masturbation.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-04 Thread Linus Torvalds


On Thu, 4 Jan 2007, Albert Cahalan wrote:

> On 1/4/07, Segher Boessenkool <[EMAIL PROTECTED]> wrote:
> >
> > Lack of the flag does not break any valid C code, only code
> > making unwarranted assumptions (i.e., buggy code).
> 
> Right, if "C" means "strictly conforming ISO C" to you.
> (in which case, nearly all real-world code is broken)

Indeed. The gcc people seem to often think that "language lawyering" is a 
good idea, and totally overrides "real world". The whole flap about the 
completely idiotic things they do (or at least did) for alias analysis on 
the grounds that "they can" is an example of this.

> FYI, the kernel also assumes that a "char" is 8 bits.
> Maybe you should run away screaming.

Gcc people are quick to condemn others for assumptions that breaks 
standards, but it has tons of assumptions very deeply embedded itself. I 
don't think it could realistically work very well on setups where pointers 
aren't the same size as long, and it has various deep assumptions itself 
about what is "realistic".

The kernel does the same. Some of it intentional and by design, much of it 
probably totally unintentional, but the result of "it worked, and nobody 
even thought about anything else". 

With 7+ million lines of C code and headers, I'm not interested in 
compilers that read the letter of the law. We don't want some really 
clever code generation that gets us .5% on some unrealistic load. We want 
good _solid_ code generation that does the obvious thing.

Compiler writers seem to seldom even realize this. A lot of commercial 
code gets shipped with basically no optimizations at all (or with specific 
optimizations turned off), because people want to ship what they debug and 
work with.

I'll happily turn off compiler features that are "clever optimizations 
that never actually matter in practice, but are just likely to possible 
cause problems".

The sad part is that "straightforward optimizations" (as opposed to 
"really clever ones") often work better in practice too. At least with 
kernel code, which is not that high-level to begin with. 

> > to take is not to add the compiler flag, but to fix the code.
> 
> Nope, unless we decide that the performance advantages of
> a language change are worth the risk and pain.

Indeed. We'd happily fix the code if:
 (a) it's reasonably easy to find places that are buggy.
 (b) there are syntactically sane ways to fix it
 (c) the optimization actually makes sense and is worthwhile

An example of where _none_ of these things were true was the old gcc alias 
analysis. I think gcc eventually added a sane way to mark pointers as 
being possible aliases (ie case (b): give a syntactially acceptable way 
for code maintainability to actually fix things), but since neither (a) 
nor (b) are there, the _correct_ solution was just to tell the compiler to 
stop doing that.

With integer overflow optimizations, the same situation may be true. The 
kernel has never been "strict ANSI C". We've always used C extensions. The 
extension of "signed integer arithmetic follows 2's-complement-arithmetic" 
is a perfectly sane extension to the language, and quite possibly worth 
it.

And the fact that it's not "strict ANSI C" has absolutely _zero_ 
relevance.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-04 Thread Segher Boessenkool

Lack of the flag does not break any valid C code, only code
making unwarranted assumptions (i.e., buggy code).


Right, if "C" means "strictly conforming ISO C" to you.


Without any further qualification, it of course does, yes.


(in which case, nearly all real-world code is broken)


Not "nearly all" -- but lots of code, yes.


FYI, the kernel also assumes that a "char" is 8 bits.
Maybe you should run away screaming.


No, that's fine with me.  It's fine with GCC as well
of course.


Anyway, with 4.1 you shouldn't see frequent problems due to


Right, it gets much worse with the current gcc snapshots.


Yes.  And that problem will be fixed some way pretty soon --
simply because it _has_ to be fixed.


IMHO you should play such games with "g++ -O9", but that's
a discussion for a different mailing list.


For a different mailing list indeed; let me just point out
that for certain important quite common cases it's an ~50%
overall speedup.


"not using -fwrapv while my code is broken WRT signed overflow"
yet; and if/when problems start to happen, to "correct" action
to take is not to add the compiler flag, but to fix the code.


Nope, unless we decide that the performance advantages of
a language change are worth the risk and pain.


If the kernel breaks all over the place, of course you should add
the flag.  But it won't, it would break *all* programs all over
the place then, and that wouldn't be acceptable to GCC.  If instead
only a few kernel code bugs pop up, it's easy to fix.

Anyway -- my only real point was to point out that there's
no doomsday scenario here, yes current GCC TOT seems to regress
here (for some definition of that word), but GCC development
is in stage 1, that sort of thing happens.  It'll stabilise
again.

In the meantime, building git HEAD kernels with GCC 4.1 and
4.2 will probably rattle out quite a few bugs still, both
in the kernel and in GCC -- neither is used all that often
it seems?


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-04 Thread Albert Cahalan

On 1/4/07, Segher Boessenkool <[EMAIL PROTECTED]> wrote:

> Adjusting gcc flags to eliminate optimizations is another way to go.
> Adding -fwrapv would be an excellent start. Lack of this flag breaks
> most code which checks for integer wrap-around.

Lack of the flag does not break any valid C code, only code
making unwarranted assumptions (i.e., buggy code).


Right, if "C" means "strictly conforming ISO C" to you.
(in which case, nearly all real-world code is broken)

FYI, the kernel also assumes that a "char" is 8 bits.
Maybe you should run away screaming.


> The compiler "knows"
> that signed integers don't ever wrap, and thus eliminates any code
> which checks for values going negative after a wrap-around.

You cannot assume it eliminates such code; the compiler is free
to do whatever it wants in such a case.

You should typically write such a computation using unsigned
types, FWIW.

Anyway, with 4.1 you shouldn't see frequent problems due to


Right, it gets much worse with the current gcc snapshots.

IMHO you should play such games with "g++ -O9", but that's
a discussion for a different mailing list.


"not using -fwrapv while my code is broken WRT signed overflow"
yet; and if/when problems start to happen, to "correct" action
to take is not to add the compiler flag, but to fix the code.


Nope, unless we decide that the performance advantages of
a language change are worth the risk and pain.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-04 Thread Segher Boessenkool

Adjusting gcc flags to eliminate optimizations is another way to go.
Adding -fwrapv would be an excellent start. Lack of this flag breaks
most code which checks for integer wrap-around.


Lack of the flag does not break any valid C code, only code
making unwarranted assumptions (i.e., buggy code).


The compiler "knows"
that signed integers don't ever wrap, and thus eliminates any code
which checks for values going negative after a wrap-around.


You cannot assume it eliminates such code; the compiler is free
to do whatever it wants in such a case.

You should typically write such a computation using unsigned
types, FWIW.

Anyway, with 4.1 you shouldn't see frequent problems due to
"not using -fwrapv while my code is broken WRT signed overflow"
yet; and if/when problems start to happen, to "correct" action
to take is not to add the compiler flag, but to fix the code.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: kernel + gcc 4.1 = several problems

2007-01-04 Thread Linus Torvalds


On Thu, 4 Jan 2007, Zou, Nanhai wrote:
> 
> cmov will stall on eflags in your test program.

And that is EXACTLY my point.

CMOV is a piece of CRAP for most things, exactly because it serializes 
three streams of data: the two inputs, and the conditional.

My test-case was actually _good_ for cmov, because there was just the one 
conditional (which was 100% ALU) thing that was serialized. In real life, 
the two data sources also come from memory, and _any_ of them being 
delayed ends up delaying the cmov, and screwing up your out-of-order 
pipeline because you now introduced a serialization point that was very 
possibly not necessary at all.

In contrast, a conditional branch-around serializes absolutely NOTHING, 
because branches get predicted.

> I think you will see benefit of cmov if you can manage to put some 
> instructions which does NOT modify eflags between testl and cmov.

A lot of the time, the conditional _is_ the critical path.

The whole point of this discussion was that cmov isn't really all that 
great. It has fundamental problems that a conditional branch that gets 
predicted simply does not have.

That's qiute apart from the fact that cmov has rather limited semantics, 
and that in 99% of all cases you have to use a conditional branch anyway.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: kernel + gcc 4.1 = several problems

2007-01-04 Thread Linus Torvalds


On Thu, 4 Jan 2007, Zou, Nanhai wrote:
 
 cmov will stall on eflags in your test program.

And that is EXACTLY my point.

CMOV is a piece of CRAP for most things, exactly because it serializes 
three streams of data: the two inputs, and the conditional.

My test-case was actually _good_ for cmov, because there was just the one 
conditional (which was 100% ALU) thing that was serialized. In real life, 
the two data sources also come from memory, and _any_ of them being 
delayed ends up delaying the cmov, and screwing up your out-of-order 
pipeline because you now introduced a serialization point that was very 
possibly not necessary at all.

In contrast, a conditional branch-around serializes absolutely NOTHING, 
because branches get predicted.

 I think you will see benefit of cmov if you can manage to put some 
 instructions which does NOT modify eflags between testl and cmov.

A lot of the time, the conditional _is_ the critical path.

The whole point of this discussion was that cmov isn't really all that 
great. It has fundamental problems that a conditional branch that gets 
predicted simply does not have.

That's qiute apart from the fact that cmov has rather limited semantics, 
and that in 99% of all cases you have to use a conditional branch anyway.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-04 Thread Segher Boessenkool

Adjusting gcc flags to eliminate optimizations is another way to go.
Adding -fwrapv would be an excellent start. Lack of this flag breaks
most code which checks for integer wrap-around.


Lack of the flag does not break any valid C code, only code
making unwarranted assumptions (i.e., buggy code).


The compiler knows
that signed integers don't ever wrap, and thus eliminates any code
which checks for values going negative after a wrap-around.


You cannot assume it eliminates such code; the compiler is free
to do whatever it wants in such a case.

You should typically write such a computation using unsigned
types, FWIW.

Anyway, with 4.1 you shouldn't see frequent problems due to
not using -fwrapv while my code is broken WRT signed overflow
yet; and if/when problems start to happen, to correct action
to take is not to add the compiler flag, but to fix the code.


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-04 Thread Albert Cahalan

On 1/4/07, Segher Boessenkool [EMAIL PROTECTED] wrote:

 Adjusting gcc flags to eliminate optimizations is another way to go.
 Adding -fwrapv would be an excellent start. Lack of this flag breaks
 most code which checks for integer wrap-around.

Lack of the flag does not break any valid C code, only code
making unwarranted assumptions (i.e., buggy code).


Right, if C means strictly conforming ISO C to you.
(in which case, nearly all real-world code is broken)

FYI, the kernel also assumes that a char is 8 bits.
Maybe you should run away screaming.


 The compiler knows
 that signed integers don't ever wrap, and thus eliminates any code
 which checks for values going negative after a wrap-around.

You cannot assume it eliminates such code; the compiler is free
to do whatever it wants in such a case.

You should typically write such a computation using unsigned
types, FWIW.

Anyway, with 4.1 you shouldn't see frequent problems due to


Right, it gets much worse with the current gcc snapshots.

IMHO you should play such games with g++ -O9, but that's
a discussion for a different mailing list.


not using -fwrapv while my code is broken WRT signed overflow
yet; and if/when problems start to happen, to correct action
to take is not to add the compiler flag, but to fix the code.


Nope, unless we decide that the performance advantages of
a language change are worth the risk and pain.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-04 Thread Segher Boessenkool

Lack of the flag does not break any valid C code, only code
making unwarranted assumptions (i.e., buggy code).


Right, if C means strictly conforming ISO C to you.


Without any further qualification, it of course does, yes.


(in which case, nearly all real-world code is broken)


Not nearly all -- but lots of code, yes.


FYI, the kernel also assumes that a char is 8 bits.
Maybe you should run away screaming.


No, that's fine with me.  It's fine with GCC as well
of course.


Anyway, with 4.1 you shouldn't see frequent problems due to


Right, it gets much worse with the current gcc snapshots.


Yes.  And that problem will be fixed some way pretty soon --
simply because it _has_ to be fixed.


IMHO you should play such games with g++ -O9, but that's
a discussion for a different mailing list.


For a different mailing list indeed; let me just point out
that for certain important quite common cases it's an ~50%
overall speedup.


not using -fwrapv while my code is broken WRT signed overflow
yet; and if/when problems start to happen, to correct action
to take is not to add the compiler flag, but to fix the code.


Nope, unless we decide that the performance advantages of
a language change are worth the risk and pain.


If the kernel breaks all over the place, of course you should add
the flag.  But it won't, it would break *all* programs all over
the place then, and that wouldn't be acceptable to GCC.  If instead
only a few kernel code bugs pop up, it's easy to fix.

Anyway -- my only real point was to point out that there's
no doomsday scenario here, yes current GCC TOT seems to regress
here (for some definition of that word), but GCC development
is in stage 1, that sort of thing happens.  It'll stabilise
again.

In the meantime, building git HEAD kernels with GCC 4.1 and
4.2 will probably rattle out quite a few bugs still, both
in the kernel and in GCC -- neither is used all that often
it seems?


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-04 Thread Linus Torvalds


On Thu, 4 Jan 2007, Albert Cahalan wrote:

 On 1/4/07, Segher Boessenkool [EMAIL PROTECTED] wrote:
 
  Lack of the flag does not break any valid C code, only code
  making unwarranted assumptions (i.e., buggy code).
 
 Right, if C means strictly conforming ISO C to you.
 (in which case, nearly all real-world code is broken)

Indeed. The gcc people seem to often think that language lawyering is a 
good idea, and totally overrides real world. The whole flap about the 
completely idiotic things they do (or at least did) for alias analysis on 
the grounds that they can is an example of this.

 FYI, the kernel also assumes that a char is 8 bits.
 Maybe you should run away screaming.

Gcc people are quick to condemn others for assumptions that breaks 
standards, but it has tons of assumptions very deeply embedded itself. I 
don't think it could realistically work very well on setups where pointers 
aren't the same size as long, and it has various deep assumptions itself 
about what is realistic.

The kernel does the same. Some of it intentional and by design, much of it 
probably totally unintentional, but the result of it worked, and nobody 
even thought about anything else. 

With 7+ million lines of C code and headers, I'm not interested in 
compilers that read the letter of the law. We don't want some really 
clever code generation that gets us .5% on some unrealistic load. We want 
good _solid_ code generation that does the obvious thing.

Compiler writers seem to seldom even realize this. A lot of commercial 
code gets shipped with basically no optimizations at all (or with specific 
optimizations turned off), because people want to ship what they debug and 
work with.

I'll happily turn off compiler features that are clever optimizations 
that never actually matter in practice, but are just likely to possible 
cause problems.

The sad part is that straightforward optimizations (as opposed to 
really clever ones) often work better in practice too. At least with 
kernel code, which is not that high-level to begin with. 

  to take is not to add the compiler flag, but to fix the code.
 
 Nope, unless we decide that the performance advantages of
 a language change are worth the risk and pain.

Indeed. We'd happily fix the code if:
 (a) it's reasonably easy to find places that are buggy.
 (b) there are syntactically sane ways to fix it
 (c) the optimization actually makes sense and is worthwhile

An example of where _none_ of these things were true was the old gcc alias 
analysis. I think gcc eventually added a sane way to mark pointers as 
being possible aliases (ie case (b): give a syntactially acceptable way 
for code maintainability to actually fix things), but since neither (a) 
nor (b) are there, the _correct_ solution was just to tell the compiler to 
stop doing that.

With integer overflow optimizations, the same situation may be true. The 
kernel has never been strict ANSI C. We've always used C extensions. The 
extension of signed integer arithmetic follows 2's-complement-arithmetic 
is a perfectly sane extension to the language, and quite possibly worth 
it.

And the fact that it's not strict ANSI C has absolutely _zero_ 
relevance.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-04 Thread Linus Torvalds


On Thu, 4 Jan 2007, Segher Boessenkool wrote:
 
  (in which case, nearly all real-world code is broken)
 
 Not nearly all -- but lots of code, yes.

I wouldn't say lots of code. I would say all real projects.

NOBODY will guarantee you that they follow all standards to the letter. 
Some use compiler extensions knowingly, but pretty much _everybody_ ends 
up depending on subtle issues without even realizing it. It's almost 
impossible to write a real program that has no bugs, and if they don't 
show up in testing (because the compiler didn't generate buggy assembly 
code from source code that had the _potential_ for bugs), they often won't 
get fixed.

The kernel does things like compare pointers across objects, and the 
kernel EXPECTS it to work. I seriously doubt that the kernel is even 
unusual in this. The common way to avoid AB-BA deadlocks in any threaded 
code (whether kernel or user space) is to just take two locks in a 
specific order, and the common way to do that for locks of the same type 
is simply to compare the addresses).

The fact that this is undefined behaviour matters not a _whit_. Not for 
the kernel, and I bet not for a lot of other applications either.

So nearly all is probably _understating_ things rather than overstating 
it as you claim. Anybody who thinks that they have proven the correctness 
of their program is likely lying. It's a good thing if they have _tested_ 
all the code-paths, but they've invariably been tested with a compiler 
that doesn't go out of its way to try to generate legal but idiotic 
code. So the testing won't generally find cases where the compiler may 
have been _allowed_ to do something else.

The end result: any nontrivial project always has dodgy code. Because 
people simply don't write perfect code.

Compiler people who don't realize this aren't compiler people. They're 
academics involved with mental masturbation.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-04 Thread Andreas Schwab
Albert Cahalan [EMAIL PROTECTED] writes:

 FYI, the kernel also assumes that a char is 8 bits.
 Maybe you should run away screaming.

You are confusing undefined with implementation defined.  Those are
two quite different concepts.

Andreas.

-- 
Andreas Schwab, SuSE Labs, [EMAIL PROTECTED]
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-04 Thread Segher Boessenkool

I'll happily turn off compiler features that are clever optimizations
that never actually matter in practice, but are just likely to possible
cause problems.


The signed wrap is undefined thing doesn't fit in this category
though:

-- It is an important optimisation for loops with a signed
   induction variable;
-- Random code where it causes problems is typically buggy
   already (i.e., code that doesn't take overflow into account
   at all won't expect wraparound either);
-- Code that explicitly depends on signed overflow two's complement
   wraparound can be trivially converted to use unsigned arithmetic
   (and in almost all cases it really should have used that already).

If GCC can generate warnings for things in the second bullet point
(and it probably will, but nothing is finalised yet), I don't see
a reason for the kernel to turn off the optimisation.  Why not try
it out and only _if_ it causes troubles (after the compiler version
is stable) turn it off.

to take is not to add the compiler flag, but to fix the code.


Nope, unless we decide that the performance advantages of
a language change are worth the risk and pain.


But it's not a language change -- GCC has worked like this
for a _long_ time already, since May 2003 if I read the
ChangeLog correctly -- it's just that it starts to optimise
some things more aggressively now.

With integer overflow optimizations, the same situation may be true. 
The
kernel has never been strict ANSI C. We've always used C extensions. 
The
extension of signed integer arithmetic follows 
2's-complement-arithmetic

is a perfectly sane extension to the language, and quite possibly worth
it.


Could be.  Who knows, without testing.  I'm just saying to
not add -fwrapv purely as a knee-jerk reaction.


And the fact that it's not strict ANSI C has absolutely _zero_
relevance.


I certainly never claimed so, that's all in Albert's mind it seems :-)


Segher

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-04 Thread Segher Boessenkool

(in which case, nearly all real-world code is broken)


Not nearly all -- but lots of code, yes.


I wouldn't say lots of code. I would say all real projects.


All projects that tell the compiler they're written in ISO C,
while they're not, can easily break, sure.  You can't say this
is GCC's fault; sure in some cases decisions were made that
resulted in more of those programs breaking than was really
necessary, but it's obviously *impossible* to prevent all
from breaking.

And yes it's true: most people do not program in ISO C at all,
_even if they think they do_, simply because they are not aware
of all the rules.  For some of the areas where most of the
mistakes are made, for example aliasing rules and signed overflow,
GCC provides helpful options to switch behaviour to something
that makes those people's programs work.  You can also use those
options if you have made a conscious decision that you want to
write your code in one of the resulting dialects of C.


Segher

p.s.  If it's decided to not use -fwrapv, a debug option that
sets -ftrapv can be introduced -- it will make it a BUG() if
any (accidental) signed overflow happens after all.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-04 Thread Al Viro
On Thu, Jan 04, 2007 at 09:47:01AM -0800, Linus Torvalds wrote:
 NOBODY will guarantee you that they follow all standards to the letter. 
 Some use compiler extensions knowingly, but pretty much _everybody_ ends 
 up depending on subtle issues without even realizing it. It's almost 
 impossible to write a real program that has no bugs, and if they don't 
 show up in testing (because the compiler didn't generate buggy assembly 
 code from source code that had the _potential_ for bugs), they often won't 
 get fixed.
 
 The kernel does things like compare pointers across objects, and the 
 kernel EXPECTS it to work. I seriously doubt that the kernel is even 
 unusual in this. The common way to avoid AB-BA deadlocks in any threaded 
 code (whether kernel or user space) is to just take two locks in a 
 specific order, and the common way to do that for locks of the same type 
 is simply to compare the addresses).
 
 The fact that this is undefined behaviour matters not a _whit_. Not for 
 the kernel, and I bet not for a lot of other applications either.

True, but we'd better understand what assumptions we are making.  I have
seen patches seriously attempting to _subtract_ unrelated pointers.  And
that simply doesn't work for obvious reasons...
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-04 Thread Geert Bosch


On Jan 4, 2007, at 13:34, Segher Boessenkool wrote:


The signed wrap is undefined thing doesn't fit in this category
though:

-- It is an important optimisation for loops with a signed
   induction variable;


It certainly isn't that important. Even SpecINT compiled with
-O3 and top-of-tree GCC *improves* 1% by adding -fwrapv.
If the compiler itself can rely on wrap-around semantics and
doesn't have to worry about introducing overflows between
optimization passes, it can reorder simple chains of additions.
This is more important for many real-world applications than
being able to perform some complex loop-interchange.
Compiler developers always make the mistake of overrating
their optimizations.

If GCC does really poorly on a few important loops that matter,
that issue is easily addressed. If GCC generates unreliable
code for millions of boring lines of important real-world C,
the compiler is worthless.

  -Geert
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Albert Cahalan

Linus Torvalds writes:

[probably Mikael Pettersson] writes:



The suggestions I've had so far which I have not yet tried:

- Select a different x86 CPU in the config.
  - Unfortunately the C3-2 flags seem to simply tell GCC to
schedule for ppro (like i686) and enabled MMX and SSE
  - Probably useless


Actually, try this one. Try using something that doesn't like "cmov".
Maybe the C3-2 simply has some internal cmov bugginess.


Of course that changes register usage, register spilling,
and thus ultimately even the stack layout. :-(

Adjusting gcc flags to eliminate optimizations is another way to go.
Adding -fwrapv would be an excellent start. Lack of this flag breaks
most code which checks for integer wrap-around. The compiler "knows"
that signed integers don't ever wrap, and thus eliminates any code
which checks for values going negative after a wrap-around. I could
imagine this affecting a switch() or other jump table.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: kernel + gcc 4.1 = several problems

2007-01-03 Thread Zou, Nanhai
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Linus Torvalds
> Sent: 2007年1月4日 0:04
> To: Grzegorz Kulewski
> Cc: Alan; Mikael Pettersson; [EMAIL PROTECTED];
> [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED];
> linux-kernel@vger.kernel.org; [EMAIL PROTECTED]
> Subject: Re: kernel + gcc 4.1 = several problems
> 
> 
> 
> On Wed, 3 Jan 2007, Grzegorz Kulewski wrote:
> >
> > Could you explain why CMOV is pointless now? Are there any benchmarks 
> > proving
> > that?
> 
> CMOV (and, more generically, any "predicated instruction") tends to
> generally a bad idea on an aggressively out-of-order CPU. It doesn't
> always have to be horrible, but in practice it is seldom very nice, and
> (as usual) on the P4 it can be really quite bad.
> 
> On a P4, I think a cmov basically takes 10 cycles.
> 
> But even ignoring the usual P4 "I suck at things that aren't totally
> normal", cmov is actually not a great idea. You can always replace it by
> 
>   j forward
>   mov ..., %reg
>   forward:
> 
> and assuming the branch is AT ALL predictable (and 95+% of all branches
> are), the branch-over will actually be a LOT better for a CPU.
> 
> Why? Becuase branches can be predicted, and when they are predicted they
> basically go away. They go away on many levels, too. Not just the branch
> itself, but the _conditional_ for the branch goes away as far as the
> critical path of code is concerned: the CPU still has to calculate it and
> check it, but from a performance angle it "doesn't exist any more",
> because it's not holding anything else up (well, you want to do it in
> _some_ reasonable time, but the point stands..)
> 
> Similarly, whichever side of the branch wasn't taken goes away. Again, in
> an out-of-order machine with register renaming, this means that even if
> the branch isn't taken above, and you end up executing all the non-branch
> instructions, because you now UNCONDITIONALLY over-write the register, the
> old data in the register is now DEAD, so now all the OTHER writes to that
> register are off the critical path too!
> 
> So the end result is that with a conditional branch, ona good CPU, the
> _only_ part of the code that is actually performance-sensitive is the
> actual calculation of the value that gets used!
> 
> In contrast, if you use a predicated instruction, ALL of it is on the
> critical path. Calculating the conditional is on the critical path.
> Calculating the value that gets used is obviously ALSO on the critical
> path, but so is the calculation for the value that DOESN'T get used too.
> So the cmov - rather than speeding things up - actually slows things down,
> because it makes more code be dependent on each other.
> 
> So here's the basic rule:
> 
>  - cmov is sometimes nice for code density. It's not a big win, but it
>certainly can be a win.
> 
>  - if you KNOW the branch is totally unpredictable, cmov is often good for
>performance. But a compiler almost never knows that, and even if you
>train it with input data and profiling, remember that not very many
>branches _are_ totally unpredictable, so even if you were to know that
>something is unpredictable, it's going to be very rare.
> 
>  - on a P4, branch mispredictions are expensive, but so is cmov, so all
>the above is to some degree exaggerated. On nicer microarchitectures
>(the Intel Core 2 in particular is something I have to say is very nice
>indeed), the difference will be a lot less noticeable. The loss from
>cmov isn't very big (it's not as sucky as P4), but neither is the win
>(branch misprediction isn't that expensive either).
> 
> Here's an example program that you can test and time yourself.
> 
> On my Core 2, I get
> 
>   [EMAIL PROTECTED] ~]$ gcc -DCMOV -Wall -O2 t.c
>   [EMAIL PROTECTED] ~]$ time ./a.out
>   6
> 
>   real0m0.194s
>   user0m0.192s
>   sys 0m0.000s
> 
>   [EMAIL PROTECTED] ~]$ gcc -Wall -O2 t.c
>   [EMAIL PROTECTED] ~]$ time ./a.out
>   6
> 
>   real0m0.167s
>   user0m0.168s
>   sys 0m0.000s
> 
> ie the cmov is quite a bit slower. Maybe I did something wrong. But note
> how cmov not only is slower, it's fundamnetally more limited too (ie the
> branch-over can actually do a lot of things cmov simply cannot do).


Hi,
cmov will stall on eflags in your test program.
I think you will see benefit of cmov if you can manage to put some instructions 
which does NOT modify eflags between testl and cmov. 

Thanks
Zou Nan hai
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Thomas Sailer
On Wed, 2007-01-03 at 08:03 -0800, Linus Torvalds wrote:

> and assuming the branch is AT ALL predictable (and 95+% of all branches 
> are), the branch-over will actually be a LOT better for a CPU.

IF... Counterexample: Add-Compare-Select in a Viterbi Decoder. If the
compare can be predicted, you botched the compression of the data (if
you can predict the data, you could have compressed it better), or your
noise is not white, i.e. you f*** up the whitening filter. So in any
practical viterbi decoder, the compares cannot be predicted. I remember
cmov made a big difference in Viterbi Decoder performance on a Cyrix
6x86. But granted, nowadays these things are usually done with SIMD and
masks.

Tom

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Linus Torvalds


On Wed, 3 Jan 2007, Denis Vlasenko wrote:
> 
> IOW: yet another slot in instruction opcode matrix and thousands of
> transistors in instruction decoders are wasted because of this
> "clever invention", eh?

Well, in all fairness, it can probably help more on certain 
microarchitectures. Intel is fairly aggressively OoO, especially in Core 
2, and predicted branches are not only free, they allow OoO to do a great 
job around them. But an in-order architecture doesn't have that, and cmov 
might show more of an advantage there.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Linus Torvalds


On Wed, 3 Jan 2007, Thomas Sailer wrote:
> 
> IF... Counterexample: Add-Compare-Select in a Viterbi Decoder.

Yes. [De]compression stuff tends to be (a) totally unpredictable and (b) a 
situation where people care about performance. It's fairly rare in many 
other situations.

That said, any real performance these days is about avoiding cache misses. 
There cmov really can help more, if it results in denser code (fairly big 
if, though).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Denis Vlasenko
On Wednesday 03 January 2007 21:38, Linus Torvalds wrote:
> On Wed, 3 Jan 2007, Denis Vlasenko wrote:
> > 
> > Why CPU people do not internally convert cmov into jmp,mov pair?
> 
...
> It really all boils down to: there's simply no real reason to use cmov. 
> It's not horrible either, so go ahead and use it if you want to, but don't 
> expect your code to really magically run any faster.

IOW: yet another slot in instruction opcode matrix and thousands of
transistors in instruction decoders are wasted because of this
"clever invention", eh?
--
vda
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Linus Torvalds


On Wed, 3 Jan 2007, Denis Vlasenko wrote:
> 
> Why CPU people do not internally convert cmov into jmp,mov pair?

Probably because

 - it's not worth it. cmov's certainly _can_ be faster for unpredictable 
   input. So expecially if you teach your compiler (by using profiling) to 
   use cmov's mainly for unpredictable cases, turning it into a 
   conditional jump internally would likely be a bad idea.

 - the biggest reason to do it would likely be microarchitectural: if you 
   have an ALU or a bypass network that just isn't suitable for bypassing 
   the flags that way (because you designed your pipeline for a 
   conditional branch), you might decide that it just simplifies things to 
   turn the cmov internally into a branch+mov uop pair. 

 - cmov's simply aren't common enough to be worth worrying about, 
   especially as it's not likely that the difference is all that big in 
   the end. The limitations on cmov's means that the compiler can only use 
   them under certain fairly limited circumstances anyway, so it's not 
   like you'll make a huge difference by doing anything clever.  So see 
   above - it's simply a wash, and likely ends up just depending on other 
   issues.

And don't get me wrong. cmov's can make a difference. You can use them to 
avoid polluting your branch prediction tables, you can use them to make 
code smaller, and you can use them when they simply just fit the problem 
really well. It's just _not_ the case that they are "obviously better". 
They simply aren't. Conditional branches aren't "evil". There are many 
MUCH worse things you can do, and other things you should avoid.

It really all boils down to: there's simply no real reason to use cmov. 
It's not horrible either, so go ahead and use it if you want to, but don't 
expect your code to really magically run any faster.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Linus Torvalds


On Wed, 3 Jan 2007, Tim Schmielau wrote:
>
> Well, on a P4 (which is supposed to be soo bad) I get:

Interesting. My P4 gets basically exactly the same timings for the cmov 
and branch cases.  And my Core 2 is consistently faster (something like 
15%) for the branch version.

Btw, the test-case should be the best possible one for cmov, since there 
are no data-dependencies except for ALU operations, and everything is 
totally independent (the actual values have no data dependencies at all, 
since they are constants). So the critical path issue never show up.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: "kernel + gcc 4.1 = several problems" / "Oops in 2.6.19.1"

2007-01-03 Thread Adrian Bunk
On Wed, Jan 03, 2007 at 04:25:09PM +0100, Udo van den Heuvel wrote:
> Hello,
> 
> I just read about the subjects.
> I have a firewall which has some issues.
> First it was a VIA CL6000 (c3).
> Now it is a EK8000 (c3-2) with different power supply, RAM and board of
> course. Still I see strange things sometimes. Crashes, hangs, etc. Now
> and then. Not too often.
> 
> I have in .config:
> CONFIG_CC_OPTIMIZE_FOR_SIZE=y
> CONFIG_MVIAC3_2=y
> 
> Does this mean the issue applies to my own kernels?

It could be.
Or it could be something completely different.

If the same kernel compiled with gcc 3.4.6 works fine, you might run 
into one of the mysterious problems with gcc 4.1.

It could also be hardware problems (e.g. try running memtest86 for a 
longer time).

Does the machine hang completely, or is any useful information like e.g. 
an oops available?

> Udo

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Denis Vlasenko
On Wednesday 03 January 2007 17:03, Linus Torvalds wrote:
> On Wed, 3 Jan 2007, Grzegorz Kulewski wrote:
> > Could you explain why CMOV is pointless now? Are there any benchmarks 
> > proving
> > that?
> 
> CMOV (and, more generically, any "predicated instruction") tends to 
> generally a bad idea on an aggressively out-of-order CPU. It doesn't 
> always have to be horrible, but in practice it is seldom very nice, and 
> (as usual) on the P4 it can be really quite bad.
> 
> On a P4, I think a cmov basically takes 10 cycles.
> 
> But even ignoring the usual P4 "I suck at things that aren't totally 
> normal", cmov is actually not a great idea. You can always replace it by
> 
>   j forward
>   mov ..., %reg
>   forward:
...
...
> In contrast, if you use a predicated instruction, ALL of it is on the 
> critical path. Calculating the conditional is on the critical path. 
> Calculating the value that gets used is obviously ALSO on the critical 
> path, but so is the calculation for the value that DOESN'T get used too. 
> So the cmov - rather than speeding things up - actually slows things down, 
> because it makes more code be dependent on each other.

Why CPU people do not internally convert cmov into jmp,mov pair?
--
vda
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Mariusz Kozlowski
Hello, 

> Here's an example program that you can test and time yourself. 
> 
> On my Core 2, I get
> 
>   [EMAIL PROTECTED] ~]$ gcc -DCMOV -Wall -O2 t.c
>   [EMAIL PROTECTED] ~]$ time ./a.out
>   6
> 
>   real0m0.194s
>   user0m0.192s
>   sys 0m0.000s
> 
>   [EMAIL PROTECTED] ~]$ gcc -Wall -O2 t.c
>   [EMAIL PROTECTED] ~]$ time ./a.out
>   6
>   
>   real0m0.167s
>   user0m0.168s
>   sys 0m0.000s

Test was done on my laptop with gcc 4.1.1 and CPU:

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 15
model   : 2
model name  : Intel(R) Pentium(R) 4 CPU 2.40GHz
stepping: 9
cpu MHz : 2392.349
cache size  : 512 KB
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 2
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat 
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid xtpr
bogomips: 4786.36
clflush size: 64

I wrote a simple script that run each version of your code 100
times measuring the execution time. Then some simple gnuplot
magic was applied. The result is attached (png file).

- cmovne was faster with almost stable execution time (~171ms)
- je-mov was slower and execution time varies

Interpretation is up to you ;-)

-- 
Regards,

Mariusz Kozlowski


benchmark.png
Description: PNG image


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Tim Schmielau
Well, on a P4 (which is supposed to be soo bad) I get:

> gcc -O2 t.c -o t
> foreach x ( 1 2 3 4 5 )
>> time ./t > /dev/null
>> end
0.196u 0.004s 0:00.19 100.0%0+0k 0+0io 0pf+0w
0.168u 0.004s 0:00.16 100.0%0+0k 0+0io 0pf+0w
0.168u 0.000s 0:00.16 100.0%0+0k 0+0io 0pf+0w
0.160u 0.000s 0:00.15 106.6%0+0k 0+0io 0pf+0w
0.180u 0.000s 0:00.18 100.0%0+0k 0+0io 0pf+0w
> gcc -DCMOV -O2 t.c -o t
> foreach x ( 1 2 3 4 5 )
>> time ./t > /dev/null
>> end
0.168u 0.000s 0:00.17 94.1% 0+0k 0+0io 0pf+0w
0.152u 0.000s 0:00.15 100.0%0+0k 0+0io 0pf+0w
0.136u 0.004s 0:00.13 100.0%0+0k 0+0io 0pf+0w
0.168u 0.000s 0:00.16 100.0%0+0k 0+0io 0pf+0w
0.172u 0.000s 0:00.17 100.0%0+0k 0+0io 0pf+0w

see?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread l . genoni


Just curious why on Opteron dual core 2600MHZ I get:

phoenix:{root}:/tmp> gcc -DCMOV -Wall -O2 t.c
phoenix:{root}:/tmp>time ./a.out
6

real0m0.117s
user0m0.120s
sys 0m0.000s
phoenix:{root}:/tmp>gcc -Wall -O2 t.c
phoenix:{root}:/tmp> time ./a.out
6

real0m0.136s
user0m0.130s
sys 0m0.010s

Regards

(I understand it is very different from P4)

Luigi Genoni

On Wed, 3 Jan 2007, Linus Torvalds wrote:


Date: Wed, 3 Jan 2007 08:03:37 -0800 (PST)
From: Linus Torvalds <[EMAIL PROTECTED]>
To: Grzegorz Kulewski <[EMAIL PROTECTED]>
Cc: Alan <[EMAIL PROTECTED]>, Mikael Pettersson <[EMAIL PROTECTED]>,
[EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED],
[EMAIL PROTECTED], [EMAIL PROTECTED], linux-kernel@vger.kernel.org,
[EMAIL PROTECTED]
Subject: Re: kernel + gcc 4.1 = several problems
Resent-Date: Wed, 03 Jan 2007 17:16:00 +0100
Resent-From: <[EMAIL PROTECTED]>



On Wed, 3 Jan 2007, Grzegorz Kulewski wrote:


Could you explain why CMOV is pointless now? Are there any benchmarks proving
that?


CMOV (and, more generically, any "predicated instruction") tends to
generally a bad idea on an aggressively out-of-order CPU. It doesn't
always have to be horrible, but in practice it is seldom very nice, and
(as usual) on the P4 it can be really quite bad.

On a P4, I think a cmov basically takes 10 cycles.

But even ignoring the usual P4 "I suck at things that aren't totally
normal", cmov is actually not a great idea. You can always replace it by

j forward
mov ..., %reg
forward:

and assuming the branch is AT ALL predictable (and 95+% of all branches
are), the branch-over will actually be a LOT better for a CPU.

Why? Becuase branches can be predicted, and when they are predicted they
basically go away. They go away on many levels, too. Not just the branch
itself, but the _conditional_ for the branch goes away as far as the
critical path of code is concerned: the CPU still has to calculate it and
check it, but from a performance angle it "doesn't exist any more",
because it's not holding anything else up (well, you want to do it in
_some_ reasonable time, but the point stands..)

Similarly, whichever side of the branch wasn't taken goes away. Again, in
an out-of-order machine with register renaming, this means that even if
the branch isn't taken above, and you end up executing all the non-branch
instructions, because you now UNCONDITIONALLY over-write the register, the
old data in the register is now DEAD, so now all the OTHER writes to that
register are off the critical path too!

So the end result is that with a conditional branch, ona good CPU, the
_only_ part of the code that is actually performance-sensitive is the
actual calculation of the value that gets used!

In contrast, if you use a predicated instruction, ALL of it is on the
critical path. Calculating the conditional is on the critical path.
Calculating the value that gets used is obviously ALSO on the critical
path, but so is the calculation for the value that DOESN'T get used too.
So the cmov - rather than speeding things up - actually slows things down,
because it makes more code be dependent on each other.

So here's the basic rule:

- cmov is sometimes nice for code density. It's not a big win, but it
  certainly can be a win.

- if you KNOW the branch is totally unpredictable, cmov is often good for
  performance. But a compiler almost never knows that, and even if you
  train it with input data and profiling, remember that not very many
  branches _are_ totally unpredictable, so even if you were to know that
  something is unpredictable, it's going to be very rare.

- on a P4, branch mispredictions are expensive, but so is cmov, so all
  the above is to some degree exaggerated. On nicer microarchitectures
  (the Intel Core 2 in particular is something I have to say is very nice
  indeed), the difference will be a lot less noticeable. The loss from
  cmov isn't very big (it's not as sucky as P4), but neither is the win
  (branch misprediction isn't that expensive either).

Here's an example program that you can test and time yourself.

On my Core 2, I get

[EMAIL PROTECTED] ~]$ gcc -DCMOV -Wall -O2 t.c
[EMAIL PROTECTED] ~]$ time ./a.out
6

real0m0.194s
user0m0.192s
sys 0m0.000s

[EMAIL PROTECTED] ~]$ gcc -Wall -O2 t.c
[EMAIL PROTECTED] ~]$ time ./a.out
6

real0m0.167s
user0m0.168s
sys 0m0.000s

ie the cmov is quite a bit slower. Maybe I did something wrong. But note
how cmov not only is slower, it's fundamnetally more limited too (ie the
branch-over can actually do a lot of things cmov simply cannot do).

So don't use cmov. Except for non-performance-critical code, or if you
really care about code-size, and it helps (which is actually fairly rare:
quite 

Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread l . genoni


Just to make clearer why I am so curious, this from X86_64 X2 3800+:

DarkStar:{venom}:/tmp> gcc -DCMOV -Wall -O2 t.c
DarkStar:{venom}:/tmp>time ./a.out
6

real0m0.151s
user0m0.150s
sys 0m0.000s
DarkStar:{venom}:/tmp> gcc -Wall -O2 t.c
DarkStar:{venom}:/tmp> time ./a.out
6

real0m0.176s
user0m0.180s
sys 0m0.000s
DarkStar:{venom}:/tmp>gcc -m32 -DCMOV -Wall -O2 t.c
DarkStar:{venom}:/tmp>time ./a.out
6

real0m0.152s
user0m0.160s
sys 0m0.000s
DarkStar:{venom}:/tmp>gcc -m32  -Wall -O2 t.c
DarkStar:{venom}:/tmp>time ./a.out
6

real0m0.200s
user0m0.200s
sys 0m0.000s

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Linus Torvalds


On Wed, 3 Jan 2007, Alan wrote:
>
> > cmov is effectively the same cost as a compare and jump, in both cases
> > the cpu needs to do a prediction, and on a mispredict, restart.
> 
> On a P4 it appears to be slower than compare/jump in most cases

On just about EVERYTHING it's slower than compare/jump. See my other post 
on why, together with a (largely untested) test app.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Linus Torvalds


On Wed, 3 Jan 2007, Grzegorz Kulewski wrote:
> 
> Could you explain why CMOV is pointless now? Are there any benchmarks proving
> that?

CMOV (and, more generically, any "predicated instruction") tends to 
generally a bad idea on an aggressively out-of-order CPU. It doesn't 
always have to be horrible, but in practice it is seldom very nice, and 
(as usual) on the P4 it can be really quite bad.

On a P4, I think a cmov basically takes 10 cycles.

But even ignoring the usual P4 "I suck at things that aren't totally 
normal", cmov is actually not a great idea. You can always replace it by

j forward
mov ..., %reg
forward:

and assuming the branch is AT ALL predictable (and 95+% of all branches 
are), the branch-over will actually be a LOT better for a CPU.

Why? Becuase branches can be predicted, and when they are predicted they 
basically go away. They go away on many levels, too. Not just the branch 
itself, but the _conditional_ for the branch goes away as far as the 
critical path of code is concerned: the CPU still has to calculate it and 
check it, but from a performance angle it "doesn't exist any more", 
because it's not holding anything else up (well, you want to do it in 
_some_ reasonable time, but the point stands..)

Similarly, whichever side of the branch wasn't taken goes away. Again, in 
an out-of-order machine with register renaming, this means that even if 
the branch isn't taken above, and you end up executing all the non-branch 
instructions, because you now UNCONDITIONALLY over-write the register, the 
old data in the register is now DEAD, so now all the OTHER writes to that 
register are off the critical path too!

So the end result is that with a conditional branch, ona good CPU, the 
_only_ part of the code that is actually performance-sensitive is the 
actual calculation of the value that gets used!

In contrast, if you use a predicated instruction, ALL of it is on the 
critical path. Calculating the conditional is on the critical path. 
Calculating the value that gets used is obviously ALSO on the critical 
path, but so is the calculation for the value that DOESN'T get used too. 
So the cmov - rather than speeding things up - actually slows things down, 
because it makes more code be dependent on each other.

So here's the basic rule:

 - cmov is sometimes nice for code density. It's not a big win, but it 
   certainly can be a win.

 - if you KNOW the branch is totally unpredictable, cmov is often good for 
   performance. But a compiler almost never knows that, and even if you 
   train it with input data and profiling, remember that not very many 
   branches _are_ totally unpredictable, so even if you were to know that 
   something is unpredictable, it's going to be very rare.

 - on a P4, branch mispredictions are expensive, but so is cmov, so all 
   the above is to some degree exaggerated. On nicer microarchitectures 
   (the Intel Core 2 in particular is something I have to say is very nice 
   indeed), the difference will be a lot less noticeable. The loss from 
   cmov isn't very big (it's not as sucky as P4), but neither is the win 
   (branch misprediction isn't that expensive either).

Here's an example program that you can test and time yourself. 

On my Core 2, I get

[EMAIL PROTECTED] ~]$ gcc -DCMOV -Wall -O2 t.c
[EMAIL PROTECTED] ~]$ time ./a.out
6

real0m0.194s
user0m0.192s
sys 0m0.000s

[EMAIL PROTECTED] ~]$ gcc -Wall -O2 t.c
[EMAIL PROTECTED] ~]$ time ./a.out
6

real0m0.167s
user0m0.168s
sys 0m0.000s

ie the cmov is quite a bit slower. Maybe I did something wrong. But note 
how cmov not only is slower, it's fundamnetally more limited too (ie the 
branch-over can actually do a lot of things cmov simply cannot do).

So don't use cmov. Except for non-performance-critical code, or if you 
really care about code-size, and it helps (which is actually fairly rare: 
quite often cmov isn't even smaller than a conditional jump and a regular 
move, partly because a regular move can take arguments that a cmov cannot: 
move to memory, move from an immediate etc etc, so depending on what 
you're moving, cmov simply isn't good even if it's _just_ a move).

(For me, the "cmov" version of the function ends up being three bytes 
shorter. So it's actually a good example of everything above)

Linus

(*) x86 only has "move to register" as a predicated instruction, but some 
other architectures have lots of them, potentially all instructions. I 
don't count conditional branches as "predicated", although some crazy 
people do. ARM has predicated instructions (but they are gone in Thumb, I 
think), and ia64 obviously has predicated instructions (but it will be 
gone in a few years ;)#include 

/* How many iterations? */
#define ITERATIONS (1)

/* Which bit of the counter 

"kernel + gcc 4.1 = several problems" / "Oops in 2.6.19.1"

2007-01-03 Thread Udo van den Heuvel
Hello,

I just read about the subjects.
I have a firewall which has some issues.
First it was a VIA CL6000 (c3).
Now it is a EK8000 (c3-2) with different power supply, RAM and board of
course. Still I see strange things sometimes. Crashes, hangs, etc. Now
and then. Not too often.

I have in .config:
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_MVIAC3_2=y

Does this mean the issue applies to my own kernels?

Udo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Alan
> cmov is effectively the same cost as a compare and jump, in both cases
> the cpu needs to do a prediction, and on a mispredict, restart.

On a P4 it appears to be slower than compare/jump in most cases

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Jakub Jelinek
On Wed, Jan 03, 2007 at 05:32:16AM -0800, Arjan van de Ven wrote:
> On Wed, 2007-01-03 at 12:44 +, Alan wrote:
> > > > fixed. At that point an i686 kernel would contain i686 instructions and
> > > > actually run on all i686 processors ending all the i586 pain for most
> > > > users and distributions.
> > > 
> > > Could you explain why CMOV is pointless now? Are there any benchmarks 
> > > proving that?
> > 
> > Take a look at the recent ffmpeg bits on the mplayer list for one example
> > I have to hand - P4 cmov is pretty slow. The crypto folks find the same
> > things.
> 
> cmov is effectively the same cost as a compare and jump, in both cases
> the cpu needs to do a prediction, and on a mispredict, restart.
> 
> the reason cmov can make sense is because it's smaller code...

BTW, from GCC POV availability of CMOV is the only difference between
-march=i586 -mtune=something and -march=i686 -mtune=something.  So this is
just a naming thing, it could be called -march=i686cmov to make it more
obvious but it is too late (and too unimportant) to change it now.
Perhaps adding a note to info gcc/man gcc ought to be enough?
If you don't want CMOV being emitted, compile with -march=i586 -mtune=generic
(or whatever other tuning you pick up), with -march=i686 -mtune=generic
you tell GCC you have CMOV.  Whether CMOV is actually used in generated
code is another matter, which should be decided based on the selected
-mtune.  For -Os CMOV should be used whenever available, as that means
usually smaller code, otherwise if on some particular chip CMOV is actually
slower than compare, jump and assignment, then CMOV should not be selected
for that particular tuning (say if Pentium4 has slower CMOV than
compare+jump+assignment, -mtune=pentium4 should not emit CMOV, at least not
often), if you have examples of that, please file a bug to
http://gcc.gnu.org/bugzilla/.  -mtune=generic should emit resp. not emit
CMOV depending on whether it is a win on the currently common CPUs.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Arjan van de Ven
On Wed, 2007-01-03 at 12:44 +, Alan wrote:
> > > fixed. At that point an i686 kernel would contain i686 instructions and
> > > actually run on all i686 processors ending all the i586 pain for most
> > > users and distributions.
> > 
> > Could you explain why CMOV is pointless now? Are there any benchmarks 
> > proving that?
> 
> Take a look at the recent ffmpeg bits on the mplayer list for one example
> I have to hand - P4 cmov is pretty slow. The crypto folks find the same
> things.

cmov is effectively the same cost as a compare and jump, in both cases
the cpu needs to do a prediction, and on a mispredict, restart.

the reason cmov can make sense is because it's smaller code...

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Alan
> > fixed. At that point an i686 kernel would contain i686 instructions and
> > actually run on all i686 processors ending all the i586 pain for most
> > users and distributions.
> 
> Could you explain why CMOV is pointless now? Are there any benchmarks 
> proving that?

Take a look at the recent ffmpeg bits on the mplayer list for one example
I have to hand - P4 cmov is pretty slow. The crypto folks find the same
things.

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Jeff Garzik

Grzegorz Kulewski wrote:

On Wed, 3 Jan 2007, Alan wrote:

The proper fix for all of this mess is to fix the gcc compiler suite to
actually generate i686 code when told to use i686. CMOV is an optional
i686 extension which gcc uses without checking. In early PIV days it made
sense but on modern processors CMOV is so pointless the bug should be
fixed. At that point an i686 kernel would contain i686 instructions and
actually run on all i686 processors ending all the i586 pain for most
users and distributions.


Could you explain why CMOV is pointless now? Are there any benchmarks 
proving that?


In theory modern processors should have no trouble converting a 
test/move sequence into the same uops generated by a cmov instruction, 
for one.


Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Grzegorz Kulewski

On Wed, 3 Jan 2007, Alan wrote:

The proper fix for all of this mess is to fix the gcc compiler suite to
actually generate i686 code when told to use i686. CMOV is an optional
i686 extension which gcc uses without checking. In early PIV days it made
sense but on modern processors CMOV is so pointless the bug should be
fixed. At that point an i686 kernel would contain i686 instructions and
actually run on all i686 processors ending all the i586 pain for most
users and distributions.


Could you explain why CMOV is pointless now? Are there any benchmarks 
proving that?



Thanks,

Grzegorz Kulewski

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Alan
> That's a good suggestion. Earlier C3s didn't have cmov so it's 
> not entirely unlikely that cmov in C3-2 is broken in some cases.
> Configuring for P5MMX or 486 should be good safe alternatives.

The proper fix for all of this mess is to fix the gcc compiler suite to
actually generate i686 code when told to use i686. CMOV is an optional
i686 extension which gcc uses without checking. In early PIV days it made
sense but on modern processors CMOV is so pointless the bug should be
fixed. At that point an i686 kernel would contain i686 instructions and
actually run on all i686 processors ending all the i586 pain for most
users and distributions.

Unfortunately the compiler people don't appear to care about their years
old bug.

Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Alan
 That's a good suggestion. Earlier C3s didn't have cmov so it's 
 not entirely unlikely that cmov in C3-2 is broken in some cases.
 Configuring for P5MMX or 486 should be good safe alternatives.

The proper fix for all of this mess is to fix the gcc compiler suite to
actually generate i686 code when told to use i686. CMOV is an optional
i686 extension which gcc uses without checking. In early PIV days it made
sense but on modern processors CMOV is so pointless the bug should be
fixed. At that point an i686 kernel would contain i686 instructions and
actually run on all i686 processors ending all the i586 pain for most
users and distributions.

Unfortunately the compiler people don't appear to care about their years
old bug.

Alan
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Grzegorz Kulewski

On Wed, 3 Jan 2007, Alan wrote:

The proper fix for all of this mess is to fix the gcc compiler suite to
actually generate i686 code when told to use i686. CMOV is an optional
i686 extension which gcc uses without checking. In early PIV days it made
sense but on modern processors CMOV is so pointless the bug should be
fixed. At that point an i686 kernel would contain i686 instructions and
actually run on all i686 processors ending all the i586 pain for most
users and distributions.


Could you explain why CMOV is pointless now? Are there any benchmarks 
proving that?



Thanks,

Grzegorz Kulewski

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Jeff Garzik

Grzegorz Kulewski wrote:

On Wed, 3 Jan 2007, Alan wrote:

The proper fix for all of this mess is to fix the gcc compiler suite to
actually generate i686 code when told to use i686. CMOV is an optional
i686 extension which gcc uses without checking. In early PIV days it made
sense but on modern processors CMOV is so pointless the bug should be
fixed. At that point an i686 kernel would contain i686 instructions and
actually run on all i686 processors ending all the i586 pain for most
users and distributions.


Could you explain why CMOV is pointless now? Are there any benchmarks 
proving that?


In theory modern processors should have no trouble converting a 
test/move sequence into the same uops generated by a cmov instruction, 
for one.


Jeff



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Alan
  fixed. At that point an i686 kernel would contain i686 instructions and
  actually run on all i686 processors ending all the i586 pain for most
  users and distributions.
 
 Could you explain why CMOV is pointless now? Are there any benchmarks 
 proving that?

Take a look at the recent ffmpeg bits on the mplayer list for one example
I have to hand - P4 cmov is pretty slow. The crypto folks find the same
things.

Alan

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Arjan van de Ven
On Wed, 2007-01-03 at 12:44 +, Alan wrote:
   fixed. At that point an i686 kernel would contain i686 instructions and
   actually run on all i686 processors ending all the i586 pain for most
   users and distributions.
  
  Could you explain why CMOV is pointless now? Are there any benchmarks 
  proving that?
 
 Take a look at the recent ffmpeg bits on the mplayer list for one example
 I have to hand - P4 cmov is pretty slow. The crypto folks find the same
 things.

cmov is effectively the same cost as a compare and jump, in both cases
the cpu needs to do a prediction, and on a mispredict, restart.

the reason cmov can make sense is because it's smaller code...

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Jakub Jelinek
On Wed, Jan 03, 2007 at 05:32:16AM -0800, Arjan van de Ven wrote:
 On Wed, 2007-01-03 at 12:44 +, Alan wrote:
fixed. At that point an i686 kernel would contain i686 instructions and
actually run on all i686 processors ending all the i586 pain for most
users and distributions.
   
   Could you explain why CMOV is pointless now? Are there any benchmarks 
   proving that?
  
  Take a look at the recent ffmpeg bits on the mplayer list for one example
  I have to hand - P4 cmov is pretty slow. The crypto folks find the same
  things.
 
 cmov is effectively the same cost as a compare and jump, in both cases
 the cpu needs to do a prediction, and on a mispredict, restart.
 
 the reason cmov can make sense is because it's smaller code...

BTW, from GCC POV availability of CMOV is the only difference between
-march=i586 -mtune=something and -march=i686 -mtune=something.  So this is
just a naming thing, it could be called -march=i686cmov to make it more
obvious but it is too late (and too unimportant) to change it now.
Perhaps adding a note to info gcc/man gcc ought to be enough?
If you don't want CMOV being emitted, compile with -march=i586 -mtune=generic
(or whatever other tuning you pick up), with -march=i686 -mtune=generic
you tell GCC you have CMOV.  Whether CMOV is actually used in generated
code is another matter, which should be decided based on the selected
-mtune.  For -Os CMOV should be used whenever available, as that means
usually smaller code, otherwise if on some particular chip CMOV is actually
slower than compare, jump and assignment, then CMOV should not be selected
for that particular tuning (say if Pentium4 has slower CMOV than
compare+jump+assignment, -mtune=pentium4 should not emit CMOV, at least not
often), if you have examples of that, please file a bug to
http://gcc.gnu.org/bugzilla/.  -mtune=generic should emit resp. not emit
CMOV depending on whether it is a win on the currently common CPUs.

Jakub
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Alan
 cmov is effectively the same cost as a compare and jump, in both cases
 the cpu needs to do a prediction, and on a mispredict, restart.

On a P4 it appears to be slower than compare/jump in most cases

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


kernel + gcc 4.1 = several problems / Oops in 2.6.19.1

2007-01-03 Thread Udo van den Heuvel
Hello,

I just read about the subjects.
I have a firewall which has some issues.
First it was a VIA CL6000 (c3).
Now it is a EK8000 (c3-2) with different power supply, RAM and board of
course. Still I see strange things sometimes. Crashes, hangs, etc. Now
and then. Not too often.

I have in .config:
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_MVIAC3_2=y

Does this mean the issue applies to my own kernels?

Udo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Linus Torvalds


On Wed, 3 Jan 2007, Grzegorz Kulewski wrote:
 
 Could you explain why CMOV is pointless now? Are there any benchmarks proving
 that?

CMOV (and, more generically, any predicated instruction) tends to 
generally a bad idea on an aggressively out-of-order CPU. It doesn't 
always have to be horrible, but in practice it is seldom very nice, and 
(as usual) on the P4 it can be really quite bad.

On a P4, I think a cmov basically takes 10 cycles.

But even ignoring the usual P4 I suck at things that aren't totally 
normal, cmov is actually not a great idea. You can always replace it by

jnegated condition forward
mov ..., %reg
forward:

and assuming the branch is AT ALL predictable (and 95+% of all branches 
are), the branch-over will actually be a LOT better for a CPU.

Why? Becuase branches can be predicted, and when they are predicted they 
basically go away. They go away on many levels, too. Not just the branch 
itself, but the _conditional_ for the branch goes away as far as the 
critical path of code is concerned: the CPU still has to calculate it and 
check it, but from a performance angle it doesn't exist any more, 
because it's not holding anything else up (well, you want to do it in 
_some_ reasonable time, but the point stands..)

Similarly, whichever side of the branch wasn't taken goes away. Again, in 
an out-of-order machine with register renaming, this means that even if 
the branch isn't taken above, and you end up executing all the non-branch 
instructions, because you now UNCONDITIONALLY over-write the register, the 
old data in the register is now DEAD, so now all the OTHER writes to that 
register are off the critical path too!

So the end result is that with a conditional branch, ona good CPU, the 
_only_ part of the code that is actually performance-sensitive is the 
actual calculation of the value that gets used!

In contrast, if you use a predicated instruction, ALL of it is on the 
critical path. Calculating the conditional is on the critical path. 
Calculating the value that gets used is obviously ALSO on the critical 
path, but so is the calculation for the value that DOESN'T get used too. 
So the cmov - rather than speeding things up - actually slows things down, 
because it makes more code be dependent on each other.

So here's the basic rule:

 - cmov is sometimes nice for code density. It's not a big win, but it 
   certainly can be a win.

 - if you KNOW the branch is totally unpredictable, cmov is often good for 
   performance. But a compiler almost never knows that, and even if you 
   train it with input data and profiling, remember that not very many 
   branches _are_ totally unpredictable, so even if you were to know that 
   something is unpredictable, it's going to be very rare.

 - on a P4, branch mispredictions are expensive, but so is cmov, so all 
   the above is to some degree exaggerated. On nicer microarchitectures 
   (the Intel Core 2 in particular is something I have to say is very nice 
   indeed), the difference will be a lot less noticeable. The loss from 
   cmov isn't very big (it's not as sucky as P4), but neither is the win 
   (branch misprediction isn't that expensive either).

Here's an example program that you can test and time yourself. 

On my Core 2, I get

[EMAIL PROTECTED] ~]$ gcc -DCMOV -Wall -O2 t.c
[EMAIL PROTECTED] ~]$ time ./a.out
6

real0m0.194s
user0m0.192s
sys 0m0.000s

[EMAIL PROTECTED] ~]$ gcc -Wall -O2 t.c
[EMAIL PROTECTED] ~]$ time ./a.out
6

real0m0.167s
user0m0.168s
sys 0m0.000s

ie the cmov is quite a bit slower. Maybe I did something wrong. But note 
how cmov not only is slower, it's fundamnetally more limited too (ie the 
branch-over can actually do a lot of things cmov simply cannot do).

So don't use cmov. Except for non-performance-critical code, or if you 
really care about code-size, and it helps (which is actually fairly rare: 
quite often cmov isn't even smaller than a conditional jump and a regular 
move, partly because a regular move can take arguments that a cmov cannot: 
move to memory, move from an immediate etc etc, so depending on what 
you're moving, cmov simply isn't good even if it's _just_ a move).

(For me, the cmov version of the function ends up being three bytes 
shorter. So it's actually a good example of everything above)

Linus

(*) x86 only has move to register as a predicated instruction, but some 
other architectures have lots of them, potentially all instructions. I 
don't count conditional branches as predicated, although some crazy 
people do. ARM has predicated instructions (but they are gone in Thumb, I 
think), and ia64 obviously has predicated instructions (but it will be 
gone in a few years ;)#include stdio.h

/* How many iterations? */
#define ITERATIONS (1)

/* Which bit of 

Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Linus Torvalds


On Wed, 3 Jan 2007, Alan wrote:

  cmov is effectively the same cost as a compare and jump, in both cases
  the cpu needs to do a prediction, and on a mispredict, restart.
 
 On a P4 it appears to be slower than compare/jump in most cases

On just about EVERYTHING it's slower than compare/jump. See my other post 
on why, together with a (largely untested) test app.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread l . genoni


Just to make clearer why I am so curious, this from X86_64 X2 3800+:

DarkStar:{venom}:/tmp gcc -DCMOV -Wall -O2 t.c
DarkStar:{venom}:/tmptime ./a.out
6

real0m0.151s
user0m0.150s
sys 0m0.000s
DarkStar:{venom}:/tmp gcc -Wall -O2 t.c
DarkStar:{venom}:/tmp time ./a.out
6

real0m0.176s
user0m0.180s
sys 0m0.000s
DarkStar:{venom}:/tmpgcc -m32 -DCMOV -Wall -O2 t.c
DarkStar:{venom}:/tmptime ./a.out
6

real0m0.152s
user0m0.160s
sys 0m0.000s
DarkStar:{venom}:/tmpgcc -m32  -Wall -O2 t.c
DarkStar:{venom}:/tmptime ./a.out
6

real0m0.200s
user0m0.200s
sys 0m0.000s

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread l . genoni


Just curious why on Opteron dual core 2600MHZ I get:

phoenix:{root}:/tmp gcc -DCMOV -Wall -O2 t.c
phoenix:{root}:/tmptime ./a.out
6

real0m0.117s
user0m0.120s
sys 0m0.000s
phoenix:{root}:/tmpgcc -Wall -O2 t.c
phoenix:{root}:/tmp time ./a.out
6

real0m0.136s
user0m0.130s
sys 0m0.010s

Regards

(I understand it is very different from P4)

Luigi Genoni

On Wed, 3 Jan 2007, Linus Torvalds wrote:


Date: Wed, 3 Jan 2007 08:03:37 -0800 (PST)
From: Linus Torvalds [EMAIL PROTECTED]
To: Grzegorz Kulewski [EMAIL PROTECTED]
Cc: Alan [EMAIL PROTECTED], Mikael Pettersson [EMAIL PROTECTED],
[EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED],
[EMAIL PROTECTED], [EMAIL PROTECTED], linux-kernel@vger.kernel.org,
[EMAIL PROTECTED]
Subject: Re: kernel + gcc 4.1 = several problems
Resent-Date: Wed, 03 Jan 2007 17:16:00 +0100
Resent-From: [EMAIL PROTECTED]



On Wed, 3 Jan 2007, Grzegorz Kulewski wrote:


Could you explain why CMOV is pointless now? Are there any benchmarks proving
that?


CMOV (and, more generically, any predicated instruction) tends to
generally a bad idea on an aggressively out-of-order CPU. It doesn't
always have to be horrible, but in practice it is seldom very nice, and
(as usual) on the P4 it can be really quite bad.

On a P4, I think a cmov basically takes 10 cycles.

But even ignoring the usual P4 I suck at things that aren't totally
normal, cmov is actually not a great idea. You can always replace it by

jnegated condition forward
mov ..., %reg
forward:

and assuming the branch is AT ALL predictable (and 95+% of all branches
are), the branch-over will actually be a LOT better for a CPU.

Why? Becuase branches can be predicted, and when they are predicted they
basically go away. They go away on many levels, too. Not just the branch
itself, but the _conditional_ for the branch goes away as far as the
critical path of code is concerned: the CPU still has to calculate it and
check it, but from a performance angle it doesn't exist any more,
because it's not holding anything else up (well, you want to do it in
_some_ reasonable time, but the point stands..)

Similarly, whichever side of the branch wasn't taken goes away. Again, in
an out-of-order machine with register renaming, this means that even if
the branch isn't taken above, and you end up executing all the non-branch
instructions, because you now UNCONDITIONALLY over-write the register, the
old data in the register is now DEAD, so now all the OTHER writes to that
register are off the critical path too!

So the end result is that with a conditional branch, ona good CPU, the
_only_ part of the code that is actually performance-sensitive is the
actual calculation of the value that gets used!

In contrast, if you use a predicated instruction, ALL of it is on the
critical path. Calculating the conditional is on the critical path.
Calculating the value that gets used is obviously ALSO on the critical
path, but so is the calculation for the value that DOESN'T get used too.
So the cmov - rather than speeding things up - actually slows things down,
because it makes more code be dependent on each other.

So here's the basic rule:

- cmov is sometimes nice for code density. It's not a big win, but it
  certainly can be a win.

- if you KNOW the branch is totally unpredictable, cmov is often good for
  performance. But a compiler almost never knows that, and even if you
  train it with input data and profiling, remember that not very many
  branches _are_ totally unpredictable, so even if you were to know that
  something is unpredictable, it's going to be very rare.

- on a P4, branch mispredictions are expensive, but so is cmov, so all
  the above is to some degree exaggerated. On nicer microarchitectures
  (the Intel Core 2 in particular is something I have to say is very nice
  indeed), the difference will be a lot less noticeable. The loss from
  cmov isn't very big (it's not as sucky as P4), but neither is the win
  (branch misprediction isn't that expensive either).

Here's an example program that you can test and time yourself.

On my Core 2, I get

[EMAIL PROTECTED] ~]$ gcc -DCMOV -Wall -O2 t.c
[EMAIL PROTECTED] ~]$ time ./a.out
6

real0m0.194s
user0m0.192s
sys 0m0.000s

[EMAIL PROTECTED] ~]$ gcc -Wall -O2 t.c
[EMAIL PROTECTED] ~]$ time ./a.out
6

real0m0.167s
user0m0.168s
sys 0m0.000s

ie the cmov is quite a bit slower. Maybe I did something wrong. But note
how cmov not only is slower, it's fundamnetally more limited too (ie the
branch-over can actually do a lot of things cmov simply cannot do).

So don't use cmov. Except for non-performance-critical code, or if you
really care about code-size, and it helps (which is actually fairly rare:
quite often cmov isn't even smaller than a conditional jump and a regular
move

Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Tim Schmielau
Well, on a P4 (which is supposed to be soo bad) I get:

 gcc -O2 t.c -o t
 foreach x ( 1 2 3 4 5 )
 time ./t  /dev/null
 end
0.196u 0.004s 0:00.19 100.0%0+0k 0+0io 0pf+0w
0.168u 0.004s 0:00.16 100.0%0+0k 0+0io 0pf+0w
0.168u 0.000s 0:00.16 100.0%0+0k 0+0io 0pf+0w
0.160u 0.000s 0:00.15 106.6%0+0k 0+0io 0pf+0w
0.180u 0.000s 0:00.18 100.0%0+0k 0+0io 0pf+0w
 gcc -DCMOV -O2 t.c -o t
 foreach x ( 1 2 3 4 5 )
 time ./t  /dev/null
 end
0.168u 0.000s 0:00.17 94.1% 0+0k 0+0io 0pf+0w
0.152u 0.000s 0:00.15 100.0%0+0k 0+0io 0pf+0w
0.136u 0.004s 0:00.13 100.0%0+0k 0+0io 0pf+0w
0.168u 0.000s 0:00.16 100.0%0+0k 0+0io 0pf+0w
0.172u 0.000s 0:00.17 100.0%0+0k 0+0io 0pf+0w

see?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Mariusz Kozlowski
Hello, 

 Here's an example program that you can test and time yourself. 
 
 On my Core 2, I get
 
   [EMAIL PROTECTED] ~]$ gcc -DCMOV -Wall -O2 t.c
   [EMAIL PROTECTED] ~]$ time ./a.out
   6
 
   real0m0.194s
   user0m0.192s
   sys 0m0.000s
 
   [EMAIL PROTECTED] ~]$ gcc -Wall -O2 t.c
   [EMAIL PROTECTED] ~]$ time ./a.out
   6
   
   real0m0.167s
   user0m0.168s
   sys 0m0.000s

Test was done on my laptop with gcc 4.1.1 and CPU:

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 15
model   : 2
model name  : Intel(R) Pentium(R) 4 CPU 2.40GHz
stepping: 9
cpu MHz : 2392.349
cache size  : 512 KB
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 2
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat 
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid xtpr
bogomips: 4786.36
clflush size: 64

I wrote a simple script that run each version of your code 100
times measuring the execution time. Then some simple gnuplot
magic was applied. The result is attached (png file).

- cmovne was faster with almost stable execution time (~171ms)
- je-mov was slower and execution time varies

Interpretation is up to you ;-)

-- 
Regards,

Mariusz Kozlowski


benchmark.png
Description: PNG image


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Denis Vlasenko
On Wednesday 03 January 2007 17:03, Linus Torvalds wrote:
 On Wed, 3 Jan 2007, Grzegorz Kulewski wrote:
  Could you explain why CMOV is pointless now? Are there any benchmarks 
  proving
  that?
 
 CMOV (and, more generically, any predicated instruction) tends to 
 generally a bad idea on an aggressively out-of-order CPU. It doesn't 
 always have to be horrible, but in practice it is seldom very nice, and 
 (as usual) on the P4 it can be really quite bad.
 
 On a P4, I think a cmov basically takes 10 cycles.
 
 But even ignoring the usual P4 I suck at things that aren't totally 
 normal, cmov is actually not a great idea. You can always replace it by
 
   jnegated condition forward
   mov ..., %reg
   forward:
...
...
 In contrast, if you use a predicated instruction, ALL of it is on the 
 critical path. Calculating the conditional is on the critical path. 
 Calculating the value that gets used is obviously ALSO on the critical 
 path, but so is the calculation for the value that DOESN'T get used too. 
 So the cmov - rather than speeding things up - actually slows things down, 
 because it makes more code be dependent on each other.

Why CPU people do not internally convert cmov into jmp,mov pair?
--
vda
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems / Oops in 2.6.19.1

2007-01-03 Thread Adrian Bunk
On Wed, Jan 03, 2007 at 04:25:09PM +0100, Udo van den Heuvel wrote:
 Hello,
 
 I just read about the subjects.
 I have a firewall which has some issues.
 First it was a VIA CL6000 (c3).
 Now it is a EK8000 (c3-2) with different power supply, RAM and board of
 course. Still I see strange things sometimes. Crashes, hangs, etc. Now
 and then. Not too often.
 
 I have in .config:
 CONFIG_CC_OPTIMIZE_FOR_SIZE=y
 CONFIG_MVIAC3_2=y
 
 Does this mean the issue applies to my own kernels?

It could be.
Or it could be something completely different.

If the same kernel compiled with gcc 3.4.6 works fine, you might run 
into one of the mysterious problems with gcc 4.1.

It could also be hardware problems (e.g. try running memtest86 for a 
longer time).

Does the machine hang completely, or is any useful information like e.g. 
an oops available?

 Udo

cu
Adrian

-- 

   Is there not promise of rain? Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   Only a promise, Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Linus Torvalds


On Wed, 3 Jan 2007, Tim Schmielau wrote:

 Well, on a P4 (which is supposed to be soo bad) I get:

Interesting. My P4 gets basically exactly the same timings for the cmov 
and branch cases.  And my Core 2 is consistently faster (something like 
15%) for the branch version.

Btw, the test-case should be the best possible one for cmov, since there 
are no data-dependencies except for ALU operations, and everything is 
totally independent (the actual values have no data dependencies at all, 
since they are constants). So the critical path issue never show up.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Linus Torvalds


On Wed, 3 Jan 2007, Denis Vlasenko wrote:
 
 Why CPU people do not internally convert cmov into jmp,mov pair?

Probably because

 - it's not worth it. cmov's certainly _can_ be faster for unpredictable 
   input. So expecially if you teach your compiler (by using profiling) to 
   use cmov's mainly for unpredictable cases, turning it into a 
   conditional jump internally would likely be a bad idea.

 - the biggest reason to do it would likely be microarchitectural: if you 
   have an ALU or a bypass network that just isn't suitable for bypassing 
   the flags that way (because you designed your pipeline for a 
   conditional branch), you might decide that it just simplifies things to 
   turn the cmov internally into a branch+mov uop pair. 

 - cmov's simply aren't common enough to be worth worrying about, 
   especially as it's not likely that the difference is all that big in 
   the end. The limitations on cmov's means that the compiler can only use 
   them under certain fairly limited circumstances anyway, so it's not 
   like you'll make a huge difference by doing anything clever.  So see 
   above - it's simply a wash, and likely ends up just depending on other 
   issues.

And don't get me wrong. cmov's can make a difference. You can use them to 
avoid polluting your branch prediction tables, you can use them to make 
code smaller, and you can use them when they simply just fit the problem 
really well. It's just _not_ the case that they are obviously better. 
They simply aren't. Conditional branches aren't evil. There are many 
MUCH worse things you can do, and other things you should avoid.

It really all boils down to: there's simply no real reason to use cmov. 
It's not horrible either, so go ahead and use it if you want to, but don't 
expect your code to really magically run any faster.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Denis Vlasenko
On Wednesday 03 January 2007 21:38, Linus Torvalds wrote:
 On Wed, 3 Jan 2007, Denis Vlasenko wrote:
  
  Why CPU people do not internally convert cmov into jmp,mov pair?
 
...
 It really all boils down to: there's simply no real reason to use cmov. 
 It's not horrible either, so go ahead and use it if you want to, but don't 
 expect your code to really magically run any faster.

IOW: yet another slot in instruction opcode matrix and thousands of
transistors in instruction decoders are wasted because of this
clever invention, eh?
--
vda
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Linus Torvalds


On Wed, 3 Jan 2007, Thomas Sailer wrote:
 
 IF... Counterexample: Add-Compare-Select in a Viterbi Decoder.

Yes. [De]compression stuff tends to be (a) totally unpredictable and (b) a 
situation where people care about performance. It's fairly rare in many 
other situations.

That said, any real performance these days is about avoiding cache misses. 
There cmov really can help more, if it results in denser code (fairly big 
if, though).

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Linus Torvalds


On Wed, 3 Jan 2007, Denis Vlasenko wrote:
 
 IOW: yet another slot in instruction opcode matrix and thousands of
 transistors in instruction decoders are wasted because of this
 clever invention, eh?

Well, in all fairness, it can probably help more on certain 
microarchitectures. Intel is fairly aggressively OoO, especially in Core 
2, and predicted branches are not only free, they allow OoO to do a great 
job around them. But an in-order architecture doesn't have that, and cmov 
might show more of an advantage there.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Thomas Sailer
On Wed, 2007-01-03 at 08:03 -0800, Linus Torvalds wrote:

 and assuming the branch is AT ALL predictable (and 95+% of all branches 
 are), the branch-over will actually be a LOT better for a CPU.

IF... Counterexample: Add-Compare-Select in a Viterbi Decoder. If the
compare can be predicted, you botched the compression of the data (if
you can predict the data, you could have compressed it better), or your
noise is not white, i.e. you f*** up the whitening filter. So in any
practical viterbi decoder, the compares cannot be predicted. I remember
cmov made a big difference in Viterbi Decoder performance on a Cyrix
6x86. But granted, nowadays these things are usually done with SIMD and
masks.

Tom

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: kernel + gcc 4.1 = several problems

2007-01-03 Thread Zou, Nanhai
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Linus Torvalds
 Sent: 2007年1月4日 0:04
 To: Grzegorz Kulewski
 Cc: Alan; Mikael Pettersson; [EMAIL PROTECTED];
 [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED];
 linux-kernel@vger.kernel.org; [EMAIL PROTECTED]
 Subject: Re: kernel + gcc 4.1 = several problems
 
 
 
 On Wed, 3 Jan 2007, Grzegorz Kulewski wrote:
 
  Could you explain why CMOV is pointless now? Are there any benchmarks 
  proving
  that?
 
 CMOV (and, more generically, any predicated instruction) tends to
 generally a bad idea on an aggressively out-of-order CPU. It doesn't
 always have to be horrible, but in practice it is seldom very nice, and
 (as usual) on the P4 it can be really quite bad.
 
 On a P4, I think a cmov basically takes 10 cycles.
 
 But even ignoring the usual P4 I suck at things that aren't totally
 normal, cmov is actually not a great idea. You can always replace it by
 
   jnegated condition forward
   mov ..., %reg
   forward:
 
 and assuming the branch is AT ALL predictable (and 95+% of all branches
 are), the branch-over will actually be a LOT better for a CPU.
 
 Why? Becuase branches can be predicted, and when they are predicted they
 basically go away. They go away on many levels, too. Not just the branch
 itself, but the _conditional_ for the branch goes away as far as the
 critical path of code is concerned: the CPU still has to calculate it and
 check it, but from a performance angle it doesn't exist any more,
 because it's not holding anything else up (well, you want to do it in
 _some_ reasonable time, but the point stands..)
 
 Similarly, whichever side of the branch wasn't taken goes away. Again, in
 an out-of-order machine with register renaming, this means that even if
 the branch isn't taken above, and you end up executing all the non-branch
 instructions, because you now UNCONDITIONALLY over-write the register, the
 old data in the register is now DEAD, so now all the OTHER writes to that
 register are off the critical path too!
 
 So the end result is that with a conditional branch, ona good CPU, the
 _only_ part of the code that is actually performance-sensitive is the
 actual calculation of the value that gets used!
 
 In contrast, if you use a predicated instruction, ALL of it is on the
 critical path. Calculating the conditional is on the critical path.
 Calculating the value that gets used is obviously ALSO on the critical
 path, but so is the calculation for the value that DOESN'T get used too.
 So the cmov - rather than speeding things up - actually slows things down,
 because it makes more code be dependent on each other.
 
 So here's the basic rule:
 
  - cmov is sometimes nice for code density. It's not a big win, but it
certainly can be a win.
 
  - if you KNOW the branch is totally unpredictable, cmov is often good for
performance. But a compiler almost never knows that, and even if you
train it with input data and profiling, remember that not very many
branches _are_ totally unpredictable, so even if you were to know that
something is unpredictable, it's going to be very rare.
 
  - on a P4, branch mispredictions are expensive, but so is cmov, so all
the above is to some degree exaggerated. On nicer microarchitectures
(the Intel Core 2 in particular is something I have to say is very nice
indeed), the difference will be a lot less noticeable. The loss from
cmov isn't very big (it's not as sucky as P4), but neither is the win
(branch misprediction isn't that expensive either).
 
 Here's an example program that you can test and time yourself.
 
 On my Core 2, I get
 
   [EMAIL PROTECTED] ~]$ gcc -DCMOV -Wall -O2 t.c
   [EMAIL PROTECTED] ~]$ time ./a.out
   6
 
   real0m0.194s
   user0m0.192s
   sys 0m0.000s
 
   [EMAIL PROTECTED] ~]$ gcc -Wall -O2 t.c
   [EMAIL PROTECTED] ~]$ time ./a.out
   6
 
   real0m0.167s
   user0m0.168s
   sys 0m0.000s
 
 ie the cmov is quite a bit slower. Maybe I did something wrong. But note
 how cmov not only is slower, it's fundamnetally more limited too (ie the
 branch-over can actually do a lot of things cmov simply cannot do).


Hi,
cmov will stall on eflags in your test program.
I think you will see benefit of cmov if you can manage to put some instructions 
which does NOT modify eflags between testl and cmov. 

Thanks
Zou Nan hai
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-02 Thread Willy Tarreau
On Wed, Jan 03, 2007 at 03:12:13AM +0100, Mikael Pettersson wrote:
> On Tue, 2 Jan 2007 17:43:00 -0800 (PST), Linus Torvalds wrote:
> > > The suggestions I've had so far which I have not yet tried:
> > > 
> > > - Select a different x86 CPU in the config.
> > >   -   Unfortunately the C3-2 flags seem to simply tell GCC
> > >   to schedule for ppro (like i686) and enabled MMX and SSE
> > >   -   Probably useless
> > 
> > Actually, try this one. Try using something that doesn't like "cmov". 
> > Maybe the C3-2 simply has some internal cmov bugginess. 
> 
> That's a good suggestion. Earlier C3s didn't have cmov so it's 
> not entirely unlikely that cmov in C3-2 is broken in some cases.

Agreed! When I developped the cmov emulator, I used an early C3 for the
tests (well, a "Samuel2" to be precise), because it did not report "cmov"
in its flags. I first thought "wow, my emulator is amazingly fast!" because
it took something like 50 cycles to do cmovne %eax,%ebx.

Then I realized that this processor performed cmov itself between
registers, and only triggered the invalid opcode when one of the operand
was a memory reference. And this time, for a hard-coded instruction, it
was really slow...

For this reason, I would not be surprized at all that there would be some
buggy behaviour in the cmov right there. Maybe a bug in the decoder unit
making it skip a byte when the next instruction in the prefetch queue is
a cmov affecting same registers... When vendors can do dirty things such
as executing unsupported instructions, we can expect anything from them.

> Configuring for P5MMX or 486 should be good safe alternatives.

I generally use the P5MMX target for such processors.

> /Mikael

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-02 Thread Mikael Pettersson
On Tue, 2 Jan 2007 17:43:00 -0800 (PST), Linus Torvalds wrote:
> > The suggestions I've had so far which I have not yet tried:
> > 
> > -   Select a different x86 CPU in the config.
> > -   Unfortunately the C3-2 flags seem to simply tell GCC
> > to schedule for ppro (like i686) and enabled MMX and SSE
> > -   Probably useless
> 
> Actually, try this one. Try using something that doesn't like "cmov". 
> Maybe the C3-2 simply has some internal cmov bugginess. 

That's a good suggestion. Earlier C3s didn't have cmov so it's 
not entirely unlikely that cmov in C3-2 is broken in some cases.
Configuring for P5MMX or 486 should be good safe alternatives.

/Mikael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-02 Thread Alistair John Strachan
On Wednesday 03 January 2007 02:12, Mikael Pettersson wrote:
> On Tue, 2 Jan 2007 17:43:00 -0800 (PST), Linus Torvalds wrote:
> > > The suggestions I've had so far which I have not yet tried:
> > >
> > > - Select a different x86 CPU in the config.
> > >   -   Unfortunately the C3-2 flags seem to simply tell GCC
> > >   to schedule for ppro (like i686) and enabled MMX and SSE
> > >   -   Probably useless
> >
> > Actually, try this one. Try using something that doesn't like "cmov".
> > Maybe the C3-2 simply has some internal cmov bugginess.
>
> That's a good suggestion. Earlier C3s didn't have cmov so it's
> not entirely unlikely that cmov in C3-2 is broken in some cases.
> Configuring for P5MMX or 486 should be good safe alternatives.

Or just C3 (not C3-2), which is what I've done.

I'll report back whether it crashes or not.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-02 Thread Horst H. von Brand
D. Hazelton <[EMAIL PROTECTED]> wrote:

[...]

> None. I didn't file a report on this because I didn't find the big, just
> noted a problem that appears to occur. In this case the call's generated
> seem to wrap loops - something I've never heard of anyone doing.

Example code showing this weirdness?

>  These
> *might* be causing the off-by-one that is causing the function to
> re-enter in the middle of an instruction.

If something like this happened, programs would be crashing left and right.

> Seeing this I'd guess that this follows for all system-level code
> generated by 4.1.1

Define "system-level code". What makes it different from, say,
bog-of-the-mill compiler code (yes, gcc compiles itself as part of its
sanity checking)?

>and this is exactly what I was reporting. If you'd
> like I'll go dig up the dumps he posted and post the two related segments
> side-by-side to give you a better example what I'm referring to.

If the related segments show code that is somehow wrong, by all means
report it /with your detailed analysis/ to the compiler people. Just a
warning, gcc is pretty smart in what it does, its code is often surprising
to the unwashed. Also, the C standard is subtle, the error might be in a
unwarranted assumption in the source code.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-02 Thread Linus Torvalds


On Tue, 2 Jan 2007, Alistair John Strachan wrote:
>
> eax: 0008   ebx:    ecx: 0008   edx: 
> esi: f70f3e9c   edi: f7017c00   ebp: f70f3c1c   esp: f70f3c0c
>
> Code: 58 01 00 00 0f 4f c2 09 c1 89 c8 83 c8 08 85 db 0f 44 c8 8b 5d f4 89 c8 
> 8b 75 f8 8b 7d fc 89 ec 5d c3 89 ca 8b 46 6c 83 ca 10 3b <87> 68 01 00 00 0f 
> 45 ca eb b6 8d b6 00 00 00 00 55 b8 01 00 00
> EIP: [] pipe_poll+0xa0/0xb0 SS:ESP 0068:f70f3c0c
> 
> Chuck observed that the kernel tries to reenter pipe_poll half way through an 
> instruction (c0156f5f->c0156f60); it's not a single-bit error but an 
> off-by-one.

It's not an off-by-one either (eg say we're taking an exception and 
screiwing up %eip by one somehow).

The code sequence in question is

mov%ecx,%edx
mov0x6c(%esi),%eax
or $0x10,%edx
cmp0x168(%edi),%eax <--
cmovne %edx,%ecx
jmp...

and it's in the second byte of the "cmp".

And yes, it definitely entered there, because trying other random 
entry-points will have either invalid instructions or instructions that 
would fault due to NULL pointers. HOWEVER, it's also not as simple as 
"took an interrupt, and returned with %eip incremented by one", becasue 
your %edx is zero, so it won't have done that "or $10,%edx" and then some 
interrupt happened and screwed up just %eip.

So it's literally a random %eip, but since you say it's consistently in 
that function, it's not truly "random". There's something that triggers it 
just _there_.

However, that's a damn simple function. There's _nothing_ there. The 
particular code that is involved right there is literally

if (!pipe->writers && filp->f_version != pipe->w_counter)
mask |= POLLHUP;

and that's it.  There's not even anything half-way interesting around it, 
except for the "poll_wait()" call, but even that is about as common as
you can humanly get..

Looking at the register set and the stack, I see:

Stack:  
  <- saved %ebx (dunno, seems dead in caller)
f70f3e9c  <- saved %esi (== pollfd in do_pollfd)
f6e111c0  <- saved %edi (== filp)
f70f3fa4  <- outer EBP (looks reasonable) 
c015d7f3  <- return address (do_sys_poll+0x253/0x480)

and the strange thing is that when the oops happens, it really looks like 
%esi _still_ contains the value it had originally (and that is saved on 
the stack). But afaik, from your disassembly, it should have been 
overwritten by the initial %eax, which should have had the same value as 
%edi on entry...

IOW, none of it really makes any sense. The stack frames look fine, so we 
_did_ enter at the beginning of the function (and it wasn't the *poll fn 
pointer that was corrupt.

> The suggestions I've had so far which I have not yet tried:
> 
> - Select a different x86 CPU in the config.
>   -   Unfortunately the C3-2 flags seem to simply tell GCC
>   to schedule for ppro (like i686) and enabled MMX and SSE
>   -   Probably useless

Actually, try this one. Try using something that doesn't like "cmov". 
Maybe the C3-2 simply has some internal cmov bugginess. 

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >