Re: [patch 06/10] Immediate Value - i386 Optimization
Chuck Ebbert wrote: On 07/03/2007 04:18 PM, H. Peter Anvin wrote: One could, though, use an indirect jump to achieve, if not as good, at least most of the effect: movl$, jmp * Yeah, but there's this GCC bug: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22448 You can't even dereference labels in an ASM statement. I was told in absolute terms that any use of &&label other than to pass it to goto was not supported, and would not be supported. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29305 Seems that passing to an asm() falls into the same class of problem I had. I think the underlying problem is that if the code containing the label is in an inlined function or unrolled loop, the reference can't be resolved properly anyway. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 06/10] Immediate Value - i386 Optimization
Chuck Ebbert wrote: > On 07/03/2007 04:18 PM, H. Peter Anvin wrote: >> One could, though, use an indirect jump to achieve, if not as good, at >> least most of the effect: >> >> movl$, >> jmp * >> > > Yeah, but there's this GCC bug: > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22448 > > You can't even dereference labels in an ASM statement. I wouldn't to that, though, for the existing compiler. Instead, I would do: void (*func)(void); /* or what's appropriate */ asm( : "=rm" (func)); func(); -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 06/10] Immediate Value - i386 Optimization
Mathieu Desnoyers wrote: > > If we can change the compiler, here is what we could do: > > Tell GCC to put NOPs that could be altered by a branch alternative to > some specified code. We should be able to get the instruction pointers > (think of inlines) to these nop/branch instructions so we can change > them dynamically. > Changing the compiler should be perfectly feasible, *BUT* I think we need a transitional solution that works on existing compilers. > I suspect this would be inherently tricky. If someone is ready to do > this and tells me "yes, it will be there in 1 month", I am more than > ready to switch my markers to this and help, but since the core of my > work is kernel tracing, I don't have the time nor the ressources to > tackle this problem. > > In the event that someone answers "we'll do this in the following 3 > years", I might consider to change the if (immediate(var)) into an > immediate_if (var) so we can later proceed to the change with simple > ifdefs without rewriting all the kernel code that would use it. This is much more of "we'll do that in the following 1-2 years", since we have to deal with a full gcc development cycle. However, I really want to see this being implemented in a way that would let us DTRT in the long run. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 06/10] Immediate Value - i386 Optimization
* H. Peter Anvin ([EMAIL PROTECTED]) wrote: > Mathieu Desnoyers wrote: > > > > Hi Peter, > > > > I understand your concern. If you find a way to let the code be compiled > > by gcc, put at the end of the functions (never being a branch target) > > and then, dynamically, get the address of the branch instruction and > > patch it, all that in cooperation with gcc, I would be glad to hear from > > it. What I found is that gcc lets us do anything that touches > > variables/registers in an inline assembly, but does not permit to place > > branch instructions ourselves; it does not expect the execution flow to > > be changed in inline asms. > > > > I believe this is correct. It probably would require requesting a gcc > builtin, which might be worthwhile to do if we > > > > > 77: b8 00 00 00 00 mov$0x0,%eax > > 7c: 85 c0 test %eax,%eax > > 7e: 0f 85 16 03 00 00 jne39a > > here, we just loaded 0 in eax (movl used to make sure we populate the > > whole register so we do not stall the pipeline) > > When we activate the site, > > line 77 becomes: b8 01 00 00 00mov$0x1,%eax > > > > One could, though, use an indirect jump to achieve, if not as good, at > least most of the effect: > > movl$, > jmp * > Using a jmp * will instruct gcc not to inline inline functions and restrict loop unrolling (but the latter is not used in the linux kernel). We would have to compute different $ for every site generated by putting an immediate in an inline function. > Some x86 cores will be able to detect the movl...jmp forwarding, and > collapse it into a known branch target; however, on the ones that can't, > it might be worse, since one would have to rely on the indirect branch > predictor. > > This would, however, provide infrastructure that could be combined with > a future gcc builtin. > If we can change the compiler, here is what we could do: Tell GCC to put NOPs that could be altered by a branch alternative to some specified code. We should be able to get the instruction pointers (think of inlines) to these nop/branch instructions so we can change them dynamically. Something like: immediate_t myfunc_cond; inline myfunction(void) { static void *insn; /* pointer to nops/branch instruction */ static void *target_inactive, *target_active; __builtin_polymorphic_if(&insn, &myfunc_cond) { /* Do something */ } else { ... } } I could then save all the insns into my immediate value section and later activate them by looking up all of those who refer to myfunc_cond. The default behavior would be to branch to the target_inactive, and we could change insn to jump to target_active dynamically. Note that we should align the jump instruction so the address could be changed atomically in the general case (on x86 and x86_64, we have to use an int3 bypass anyway, so we don't really care). Also, we should fine a way to let gcc tell us what type of jump it had to use depending on how far the target of the branch is. I suspect this would be inherently tricky. If someone is ready to do this and tells me "yes, it will be there in 1 month", I am more than ready to switch my markers to this and help, but since the core of my work is kernel tracing, I don't have the time nor the ressources to tackle this problem. In the event that someone answers "we'll do this in the following 3 years", I might consider to change the if (immediate(var)) into an immediate_if (var) so we can later proceed to the change with simple ifdefs without rewriting all the kernel code that would use it. Mathieu -- Mathieu Desnoyers Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 06/10] Immediate Value - i386 Optimization
On 07/03/2007 04:18 PM, H. Peter Anvin wrote: > > One could, though, use an indirect jump to achieve, if not as good, at > least most of the effect: > > movl$, > jmp * > Yeah, but there's this GCC bug: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22448 You can't even dereference labels in an ASM statement. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 06/10] Immediate Value - i386 Optimization
Mathieu Desnoyers wrote: > > Hi Peter, > > I understand your concern. If you find a way to let the code be compiled > by gcc, put at the end of the functions (never being a branch target) > and then, dynamically, get the address of the branch instruction and > patch it, all that in cooperation with gcc, I would be glad to hear from > it. What I found is that gcc lets us do anything that touches > variables/registers in an inline assembly, but does not permit to place > branch instructions ourselves; it does not expect the execution flow to > be changed in inline asms. > I believe this is correct. It probably would require requesting a gcc builtin, which might be worthwhile to do if we > > 77: b8 00 00 00 00 mov$0x0,%eax > 7c: 85 c0 test %eax,%eax > 7e: 0f 85 16 03 00 00 jne39a > here, we just loaded 0 in eax (movl used to make sure we populate the > whole register so we do not stall the pipeline)a > When we activate the site, > line 77 becomes: b8 01 00 00 00mov$0x1,%eax > One could, though, use an indirect jump to achieve, if not as good, at least most of the effect: movl$, jmp * Some x86 cores will be able to detect the movl...jmp forwarding, and collapse it into a known branch target; however, on the ones that can't, it might be worse, since one would have to rely on the indirect branch predictor. This would, however, provide infrastructure that could be combined with a future gcc builtin. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 06/10] Immediate Value - i386 Optimization
* H. Peter Anvin ([EMAIL PROTECTED]) wrote: > What is not clear to me is the exact code that is generated by these > macros. Nor can I find it anywhere in the documentation. > > Could you please describe this in some detail? In particular, it seems > that the uses of these are largely as branch targets, where the extra > indirection over modifying the jump target directly seems wasted. > Hi Peter, I understand your concern. If you find a way to let the code be compiled by gcc, put at the end of the functions (never being a branch target) and then, dynamically, get the address of the branch instruction and patch it, all that in cooperation with gcc, I would be glad to hear from it. What I found is that gcc lets us do anything that touches variables/registers in an inline assembly, but does not permit to place branch instructions ourselves; it does not expect the execution flow to be changed in inline asms. Here is an objdump of the interesting bits on an immediate value placed in scheddule (inline schedule_debug). : 0: 55 push %ebp 1: 89 e5 mov%esp,%ebp 3: 57 push %edi 4: 56 push %esi 5: 53 push %ebx 6: 83 ec 40sub$0x40,%esp 9: b8 01 00 00 00 mov$0x1,%eax e: e8 fc ff ff ff call f 13: e8 fc ff ff ff call 14 18: 89 45 dcmov%eax,0xffdc(%ebp) 1b: b8 00 00 00 00 mov$0x0,%eax 20: 8b 4d dcmov0xffdc(%ebp),%ecx 23: 8b 14 8d 00 00 00 00mov0x0(,%ecx,4),%edx 2a: 01 d0 add%edx,%eax 2c: 89 45 d0mov%eax,0xffd0(%ebp) 2f: b8 00 00 00 00 mov$0x0,%eax 34: c7 44 02 04 01 00 00movl $0x1,0x4(%edx,%eax,1) 3b: 00 3c: 8b 5d d0mov0xffd0(%ebp),%ebx 3f: 8b 9b f0 03 00 00 mov0x3f0(%ebx),%ebx 45: 89 5d c8mov%ebx,0xffc8(%ebp) 48: 81 c3 94 01 00 00 add$0x194,%ebx 4e: 89 5d ccmov%ebx,0xffcc(%ebp) 51: 8b 45 c8mov0xffc8(%ebp),%eax 54: 8b 40 14mov0x14(%eax),%eax 57: 85 c0 test %eax,%eax 59: 0f 89 30 03 00 00 jns38f 5f: 89 e0 mov%esp,%eax 61: 25 00 e0 ff ff and$0xe000,%eax 66: 8b 40 14mov0x14(%eax),%eax 69: 25 ff ff ff ef and$0xefff,%eax 6e: 83 e8 01sub$0x1,%eax 71: 0f 85 fb 02 00 00 jne372 77: b8 00 00 00 00 mov$0x0,%eax 7c: 85 c0 test %eax,%eax 7e: 0f 85 16 03 00 00 jne39a here, we just loaded 0 in eax (movl used to make sure we populate the whole register so we do not stall the pipeline)a When we activate the site, line 77 becomes: b8 01 00 00 00mov$0x1,%eax 84: 8b 45 d0mov0xffd0(%ebp),%eax 87: e8 fc ff ff ff call 88 8c: 8b 4d c8mov0xffc8(%ebp),%ecx 8f: 8b 41 04mov0x4(%ecx),%eax 92: f0 0f ba 70 08 02 lock btrl $0x2,0x8(%eax) ... 39a: 8b 55 04mov0x4(%ebp),%edx 39d: b9 01 00 00 00 mov$0x1,%ecx 3a2: b8 02 00 00 00 mov$0x2,%eax 3a7: e8 fc ff ff ff call 3a8 3ac: e9 d3 fc ff ff jmp84 -- Mathieu Desnoyers Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 06/10] Immediate Value - i386 Optimization
What is not clear to me is the exact code that is generated by these macros. Nor can I find it anywhere in the documentation. Could you please describe this in some detail? In particular, it seems that the uses of these are largely as branch targets, where the extra indirection over modifying the jump target directly seems wasted. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 06/10] Immediate Value - i386 Optimization
i386 optimization of the immediate values which uses a movl with code patching to set/unset the value used to populate the register used for the branch test. Signed-off-by: Mathieu Desnoyers <[EMAIL PROTECTED]> --- arch/i386/kernel/Makefile|1 arch/i386/kernel/immediate.c | 171 +++ arch/i386/kernel/traps.c |8 +- include/asm-i386/immediate.h | 72 +- 4 files changed, 247 insertions(+), 5 deletions(-) Index: linux-2.6-lttng/include/asm-i386/immediate.h === --- linux-2.6-lttng.orig/include/asm-i386/immediate.h 2007-06-19 17:02:14.0 -0400 +++ linux-2.6-lttng/include/asm-i386/immediate.h2007-06-19 17:02:15.0 -0400 @@ -1 +1,71 @@ -#include +#ifndef _ASM_I386_IMMEDIATE_H +#define _ASM_I386_IMMEDIATE_H + +/* + * Immediate values. i386 architecture optimizations. + * + * (C) Copyright 2006 Mathieu Desnoyers <[EMAIL PROTECTED]> + * + * This file is released under the GPLv2. + * See the file COPYING for more details. + */ + +#define IF_DEFAULT (IF_OPTIMIZED | IF_LOCKDEP) + +/* + * Optimized version of the immediate. Passing the flags as a pointer to + * the inline assembly to trick it into putting the flags value as third + * parameter in the structure. + */ +#define immediate_optimized(flags, var) \ + ({ \ + int condition; \ + asm ( ".section __immediate, \"a\", @progbits;\n\t" \ + ".long %1, 0f, %2;\n\t" \ + ".previous;\n\t"\ + "0:\n\t"\ + "movl %3,%0;\n\t" \ + : "=r" (condition) \ + : "m" (var),\ + "m" (*(char*)flags), \ + "i" (0)); \ + condition; \ + }) + +/* + * immediate macro selecting the generic or optimized version of immediate, + * depending on the flags specified. It is a macro because we need to pass the + * name to immediate_optimized() and immediate_generic() so they can declare a + * static variable with it. + */ +#define _immediate(flags, var) \ +({ \ + (((flags) & IF_LOCKDEP) && ((flags) & IF_OPTIMIZED)) ? \ + immediate_optimized(flags, var) : \ + immediate_generic(flags, var); \ +}) + +/* immediate with default behavior */ +#define immediate(var) _immediate(IF_DEFAULT, var) + +/* + * Architecture dependant immediate information, used internally for immediate + * activation. + */ + +/* + * Offset of the immediate value from the start of the movl instruction, in + * bytes. We point to the first lower byte of the 4 bytes immediate value. Only + * changing one byte makes sure we do an atomic memory write, independently of + * the alignment of the 4 bytes in the load immediate instruction. + */ +#define IMMEDIATE_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET 1 +#define IMMEDIATE_OPTIMIZED_ENABLE_TYPE unsigned char +/* Dereference enable as lvalue from a pointer to its instruction */ +#define IMMEDIATE_OPTIMIZED_ENABLE(a) \ + (*(IMMEDIATE_OPTIMIZED_ENABLE_TYPE*)\ + ((char*)(a)+IMMEDIATE_OPTIMIZED_ENABLE_IMMEDIATE_OFFSET)) + +extern int immediate_optimized_set_enable(void *address, char enable); + +#endif /* _ASM_I386_IMMEDIATE_H */ Index: linux-2.6-lttng/arch/i386/kernel/Makefile === --- linux-2.6-lttng.orig/arch/i386/kernel/Makefile 2007-06-19 17:00:55.0 -0400 +++ linux-2.6-lttng/arch/i386/kernel/Makefile 2007-06-19 17:02:15.0 -0400 @@ -35,6 +35,7 @@ obj-y += sysenter.o vsyscall.o obj-$(CONFIG_ACPI_SRAT)+= srat.o obj-$(CONFIG_EFI) += efi.o efi_stub.o +obj-$(CONFIG_IMMEDIATE)+= immediate.o obj-$(CONFIG_DOUBLEFAULT) += doublefault.o obj-$(CONFIG_SERIAL_8250) += legacy_serial.o obj-$(CONFIG_VM86) += vm86.o Index: linux-2.6-lttng/arch/i386/kernel/immediate.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6-lttng/arch/i386/kernel/immediate.c2007-06-19 17:02:43.0 -0400 @@ -0,0 +1,171 @@ +/* + * Immediate Value - i386 architecture specifi