Re: [patch 06/10] Immediate Value - i386 Optimization

2007-07-03 Thread Jeremy Fitzhardinge

Chuck Ebbert wrote:

On 07/03/2007 04:18 PM, H. Peter Anvin wrote:
  

One could, though, use an indirect jump to achieve, if not as good, at
least most of the effect:

movl$,
jmp *




Yeah, but there's this GCC bug:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22448

You can't even dereference labels in an ASM statement.


I was told in absolute terms that any use of &&label other than to pass 
it to goto was not supported, and would not be supported.


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29305

Seems that passing to an asm() falls into the same class of problem I 
had.  I think the underlying problem is that if the code containing the 
label is in an inlined function or unrolled loop, the reference can't be 
resolved properly anyway.


   J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 06/10] Immediate Value - i386 Optimization

2007-07-03 Thread H. Peter Anvin
Chuck Ebbert wrote:
> On 07/03/2007 04:18 PM, H. Peter Anvin wrote:
>> One could, though, use an indirect jump to achieve, if not as good, at
>> least most of the effect:
>>
>>  movl$,
>>  jmp *
>>
> 
> Yeah, but there's this GCC bug:
> 
>   http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22448
> 
> You can't even dereference labels in an ASM statement.

I wouldn't to that, though, for the existing compiler.   Instead, I
would do:

void (*func)(void); /* or what's appropriate */
asm( : "=rm" (func));
func();

-hpa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 06/10] Immediate Value - i386 Optimization

2007-07-03 Thread H. Peter Anvin
Mathieu Desnoyers wrote:
> 
> If we can change the compiler, here is what we could do:
> 
> Tell GCC to put NOPs that could be altered by a branch alternative to
> some specified code. We should be able to get the instruction pointers
> (think of inlines) to these nop/branch instructions so we can change
> them dynamically.
> 

Changing the compiler should be perfectly feasible, *BUT* I think we
need a transitional solution that works on existing compilers.

> I suspect this would be inherently tricky. If someone is ready to do
> this and tells me "yes, it will be there in 1 month", I am more than
> ready to switch my markers to this and help, but since the core of my
> work is kernel tracing, I don't have the time nor the ressources to
> tackle this problem.
> 
> In the event that someone answers "we'll do this in the following 3
> years", I might consider to change the if (immediate(var)) into an
> immediate_if (var) so we can later proceed to the change with simple
> ifdefs without rewriting all the kernel code that would use it.

This is much more of "we'll do that in the following 1-2 years", since
we have to deal with a full gcc development cycle.  However, I really
want to see this being implemented in a way that would let us DTRT in
the long run.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 06/10] Immediate Value - i386 Optimization

2007-07-03 Thread Mathieu Desnoyers
* H. Peter Anvin ([EMAIL PROTECTED]) wrote:
> Mathieu Desnoyers wrote:
> > 
> > Hi Peter,
> > 
> > I understand your concern. If you find a way to let the code be compiled
> > by gcc, put at the end of the functions (never being a branch target)
> > and then, dynamically, get the address of the branch instruction and
> > patch it, all that in cooperation with gcc, I would be glad to hear from
> > it. What I found is that gcc lets us do anything that touches
> > variables/registers in an inline assembly, but does not permit to place
> > branch instructions ourselves; it does not expect the execution flow to
> > be changed in inline asms.
> > 
> 
> I believe this is correct.  It probably would require requesting a gcc
> builtin, which might be worthwhile to do if we
> 
> > 
> >   77:   b8 00 00 00 00  mov$0x0,%eax
> >   7c:   85 c0   test   %eax,%eax
> >   7e:   0f 85 16 03 00 00   jne39a 
> > here, we just loaded 0 in eax (movl used to make sure we populate the
> > whole register so we do not stall the pipeline)
> > When we activate the site,
> > line 77 becomes: b8 01 00 00 00mov$0x1,%eax
> > 
> 
> One could, though, use an indirect jump to achieve, if not as good, at
> least most of the effect:
> 
>   movl$,
>   jmp *
> 

Using a jmp * will instruct gcc not to inline inline functions and
restrict loop unrolling (but the latter is not used in the linux
kernel). We would have to compute different $ for every site
generated by putting an immediate in an inline function.

> Some x86 cores will be able to detect the movl...jmp forwarding, and
> collapse it into a known branch target; however, on the ones that can't,
> it might be worse, since one would have to rely on the indirect branch
> predictor.
> 
> This would, however, provide infrastructure that could be combined with
> a future gcc builtin.
> 

If we can change the compiler, here is what we could do:

Tell GCC to put NOPs that could be altered by a branch alternative to
some specified code. We should be able to get the instruction pointers
(think of inlines) to these nop/branch instructions so we can change
them dynamically.

Something like:

immediate_t myfunc_cond;

inline myfunction(void) {
  static void *insn;   /* pointer to nops/branch instruction */
  static void *target_inactive, *target_active;

  __builtin_polymorphic_if(&insn, &myfunc_cond) {
/* Do something */
  } else {
...
  }
}

I could then save all the insns into my immediate value section and
later activate them by looking up all of those who refer to myfunc_cond.

The default behavior would be to branch to the target_inactive, and we
could change insn to jump to target_active dynamically.

Note that we should align the jump instruction so the address could be
changed atomically in the general case (on x86 and x86_64, we have to
use an int3 bypass anyway, so we don't really care).

Also, we should fine a way to let gcc tell us what type of jump it had
to use depending on how far the target of the branch is.

I suspect this would be inherently tricky. If someone is ready to do
this and tells me "yes, it will be there in 1 month", I am more than
ready to switch my markers to this and help, but since the core of my
work is kernel tracing, I don't have the time nor the ressources to
tackle this problem.

In the event that someone answers "we'll do this in the following 3
years", I might consider to change the if (immediate(var)) into an
immediate_if (var) so we can later proceed to the change with simple
ifdefs without rewriting all the kernel code that would use it.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 06/10] Immediate Value - i386 Optimization

2007-07-03 Thread Chuck Ebbert
On 07/03/2007 04:18 PM, H. Peter Anvin wrote:
> 
> One could, though, use an indirect jump to achieve, if not as good, at
> least most of the effect:
> 
>   movl$,
>   jmp *
> 

Yeah, but there's this GCC bug:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22448

You can't even dereference labels in an ASM statement.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 06/10] Immediate Value - i386 Optimization

2007-07-03 Thread H. Peter Anvin
Mathieu Desnoyers wrote:
> 
> Hi Peter,
> 
> I understand your concern. If you find a way to let the code be compiled
> by gcc, put at the end of the functions (never being a branch target)
> and then, dynamically, get the address of the branch instruction and
> patch it, all that in cooperation with gcc, I would be glad to hear from
> it. What I found is that gcc lets us do anything that touches
> variables/registers in an inline assembly, but does not permit to place
> branch instructions ourselves; it does not expect the execution flow to
> be changed in inline asms.
> 

I believe this is correct.  It probably would require requesting a gcc
builtin, which might be worthwhile to do if we

> 
>   77:   b8 00 00 00 00  mov$0x0,%eax
>   7c:   85 c0   test   %eax,%eax
>   7e:   0f 85 16 03 00 00   jne39a 
> here, we just loaded 0 in eax (movl used to make sure we populate the
> whole register so we do not stall the pipeline)a
> When we activate the site,
> line 77 becomes: b8 01 00 00 00mov$0x1,%eax
> 

One could, though, use an indirect jump to achieve, if not as good, at
least most of the effect:

movl$,
jmp *

Some x86 cores will be able to detect the movl...jmp forwarding, and
collapse it into a known branch target; however, on the ones that can't,
it might be worse, since one would have to rely on the indirect branch
predictor.

This would, however, provide infrastructure that could be combined with
a future gcc builtin.

-hpa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 06/10] Immediate Value - i386 Optimization

2007-07-03 Thread Mathieu Desnoyers
* H. Peter Anvin ([EMAIL PROTECTED]) wrote:
> What is not clear to me is the exact code that is generated by these
> macros.  Nor can I find it anywhere in the documentation.
> 
> Could you please describe this in some detail?  In particular, it seems
> that the uses of these are largely as branch targets, where the extra
> indirection over modifying the jump target directly seems wasted.
> 

Hi Peter,

I understand your concern. If you find a way to let the code be compiled
by gcc, put at the end of the functions (never being a branch target)
and then, dynamically, get the address of the branch instruction and
patch it, all that in cooperation with gcc, I would be glad to hear from
it. What I found is that gcc lets us do anything that touches
variables/registers in an inline assembly, but does not permit to place
branch instructions ourselves; it does not expect the execution flow to
be changed in inline asms.

Here is an objdump of the interesting bits on an immediate value placed
in scheddule (inline schedule_debug).


 :
   0:   55  push   %ebp
   1:   89 e5   mov%esp,%ebp
   3:   57  push   %edi
   4:   56  push   %esi
   5:   53  push   %ebx
   6:   83 ec 40sub$0x40,%esp
   9:   b8 01 00 00 00  mov$0x1,%eax
   e:   e8 fc ff ff ff  call   f 
  13:   e8 fc ff ff ff  call   14 
  18:   89 45 dcmov%eax,0xffdc(%ebp)
  1b:   b8 00 00 00 00  mov$0x0,%eax
  20:   8b 4d dcmov0xffdc(%ebp),%ecx
  23:   8b 14 8d 00 00 00 00mov0x0(,%ecx,4),%edx
  2a:   01 d0   add%edx,%eax
  2c:   89 45 d0mov%eax,0xffd0(%ebp)
  2f:   b8 00 00 00 00  mov$0x0,%eax
  34:   c7 44 02 04 01 00 00movl   $0x1,0x4(%edx,%eax,1)
  3b:   00
  3c:   8b 5d d0mov0xffd0(%ebp),%ebx
  3f:   8b 9b f0 03 00 00   mov0x3f0(%ebx),%ebx
  45:   89 5d c8mov%ebx,0xffc8(%ebp)
  48:   81 c3 94 01 00 00   add$0x194,%ebx
  4e:   89 5d ccmov%ebx,0xffcc(%ebp)
  51:   8b 45 c8mov0xffc8(%ebp),%eax
  54:   8b 40 14mov0x14(%eax),%eax
  57:   85 c0   test   %eax,%eax
  59:   0f 89 30 03 00 00   jns38f 
  5f:   89 e0   mov%esp,%eax
  61:   25 00 e0 ff ff  and$0xe000,%eax
  66:   8b 40 14mov0x14(%eax),%eax
  69:   25 ff ff ff ef  and$0xefff,%eax
  6e:   83 e8 01sub$0x1,%eax
  71:   0f 85 fb 02 00 00   jne372 

  77:   b8 00 00 00 00  mov$0x0,%eax
  7c:   85 c0   test   %eax,%eax
  7e:   0f 85 16 03 00 00   jne39a 
here, we just loaded 0 in eax (movl used to make sure we populate the
whole register so we do not stall the pipeline)a
When we activate the site,
line 77 becomes: b8 01 00 00 00mov$0x1,%eax


  84:   8b 45 d0mov0xffd0(%ebp),%eax
  87:   e8 fc ff ff ff  call   88 
  8c:   8b 4d c8mov0xffc8(%ebp),%ecx
  8f:   8b 41 04mov0x4(%ecx),%eax
  92:   f0 0f ba 70 08 02   lock btrl $0x2,0x8(%eax)
...


 39a:   8b 55 04mov0x4(%ebp),%edx
 39d:   b9 01 00 00 00  mov$0x1,%ecx
 3a2:   b8 02 00 00 00  mov$0x2,%eax
 3a7:   e8 fc ff ff ff  call   3a8 
 3ac:   e9 d3 fc ff ff  jmp84 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 06/10] Immediate Value - i386 Optimization

2007-07-03 Thread H. Peter Anvin
What is not clear to me is the exact code that is generated by these
macros.  Nor can I find it anywhere in the documentation.

Could you please describe this in some detail?  In particular, it seems
that the uses of these are largely as branch targets, where the extra
indirection over modifying the jump target directly seems wasted.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 06/10] Immediate Value - i386 Optimization

2007-07-03 Thread Mathieu Desnoyers
i386 optimization of the immediate values which uses a movl with code patching
to set/unset the value used to populate the register used for the branch test.

Signed-off-by: Mathieu Desnoyers <[EMAIL PROTECTED]>
---
 arch/i386/kernel/Makefile|1 
 arch/i386/kernel/immediate.c |  171 +++
 arch/i386/kernel/traps.c |8 +-
 include/asm-i386/immediate.h |   72 +-
 4 files changed, 247 insertions(+), 5 deletions(-)

Index: linux-2.6-lttng/include/asm-i386/immediate.h
===
--- linux-2.6-lttng.orig/include/asm-i386/immediate.h   2007-06-19 
17:02:14.0 -0400
+++ linux-2.6-lttng/include/asm-i386/immediate.h2007-06-19 
17:02:15.0 -0400
@@ -1 +1,71 @@
-#include 
+#ifndef _ASM_I386_IMMEDIATE_H
+#define _ASM_I386_IMMEDIATE_H
+
+/*
+ * Immediate values. i386 architecture optimizations.
+ *
+ * (C) Copyright 2006 Mathieu Desnoyers <[EMAIL PROTECTED]>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#define IF_DEFAULT (IF_OPTIMIZED | IF_LOCKDEP)
+
+/*
+ * Optimized version of the immediate. Passing the flags as a pointer to
+ * the inline assembly to trick it into putting the flags value as third
+ * parameter in the structure.
+ */
+#define immediate_optimized(flags, var)
\
+   ({  \
+   int condition;  \
+   asm (   ".section __immediate, \"a\", @progbits;\n\t"   \
+   ".long %1, 0f, %2;\n\t" \
+   ".previous;\n\t"\
+   "0:\n\t"\
+   "movl %3,%0;\n\t"   \
+   : "=r" (condition)  \
+   : "m" (var),\
+ "m" (*(char*)flags),  \
+ "i" (0)); \
+   condition;  \
+   })
+
+/*
+ * immediate macro selecting the generic or optimized version of immediate,
+ * depending on the flags specified. It is a macro because we need to pass the
+ * name to immediate_optimized() and immediate_generic() so they can declare a
+ * static variable with it.
+ */
+#define _immediate(flags, var) \
+({ \
+   (((flags) & IF_LOCKDEP) && ((flags) & IF_OPTIMIZED)) ?  \
+   immediate_optimized(flags, var) :   \
+   immediate_generic(flags, var);  \
+})
+
+/* immediate with default behavior */
+#define immediate(var) _immediate(IF_DEFAULT, var)
+
+/*
+ * Architecture dependant immediate information, used internally for immediate
+ * activation.
+ */
+
+/*
+ * Offset of the immediate value from the start of the movl instruction, in
+ * bytes. We point to the first lower byte of the 4 bytes immediate value. Only
+ * changing one byte makes sure we do an atomic memory write, independently of
+ * the alignment of the 4 bytes in the load immediate instruction.
+ */
+#define IMMEDIATE_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET 1
+#define IMMEDIATE_OPTIMIZED_ENABLE_TYPE unsigned char
+/* Dereference enable as lvalue from a pointer to its instruction */
+#define IMMEDIATE_OPTIMIZED_ENABLE(a)  \
+   (*(IMMEDIATE_OPTIMIZED_ENABLE_TYPE*)\
+   ((char*)(a)+IMMEDIATE_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET))
+
+extern int immediate_optimized_set_enable(void *address, char enable);
+
+#endif /* _ASM_I386_IMMEDIATE_H */
Index: linux-2.6-lttng/arch/i386/kernel/Makefile
===
--- linux-2.6-lttng.orig/arch/i386/kernel/Makefile  2007-06-19 
17:00:55.0 -0400
+++ linux-2.6-lttng/arch/i386/kernel/Makefile   2007-06-19 17:02:15.0 
-0400
@@ -35,6 +35,7 @@
 obj-y  += sysenter.o vsyscall.o
 obj-$(CONFIG_ACPI_SRAT)+= srat.o
 obj-$(CONFIG_EFI)  += efi.o efi_stub.o
+obj-$(CONFIG_IMMEDIATE)+= immediate.o
 obj-$(CONFIG_DOUBLEFAULT)  += doublefault.o
 obj-$(CONFIG_SERIAL_8250)  += legacy_serial.o
 obj-$(CONFIG_VM86) += vm86.o
Index: linux-2.6-lttng/arch/i386/kernel/immediate.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6-lttng/arch/i386/kernel/immediate.c2007-06-19 
17:02:43.0 -0400
@@ -0,0 +1,171 @@
+/*
+ * Immediate Value - i386 architecture specifi