Re: [GIT PULL] x86/alternatives padding

2015-03-04 Thread Borislav Petkov
On Wed, Mar 04, 2015 at 09:22:27PM +0100, Ingo Molnar wrote:
> So you could have a look at the detailed section dump itself via:
> 
>objdump -h vmlinux
> 
> there .text will be the raw text and .alt* will be listed separately. 
> The 'size' tool will add up executable sections IIRC, mixing these 
> sections.

Right.

> .alt* is freed after init, so it's not really a kernel image size 
> increase, right?

Exactly:

void free_initmem(void)
{
free_init_pages("unused kernel",
(unsigned long)(&__init_begin),
(unsigned long)(&__init_end));
}

which are:

 69708: 81ee9000 0 NOTYPE  GLOBAL DEFAULT   16 __init_begin
 72679: 81ff9000 0 NOTYPE  GLOBAL DEFAULT   25 __init_end

and there's a bunch of stuff between 81ee9000 and 81ff9000:

  [Nr] Name  Type Address   Offset
   Size  EntSize  Flags  Link  Info  Align
  [17] .init.textPROGBITS 81efe000  012fe000
   00066c98    AX   0 0 16
  [18] .init.dataPROGBITS 81f65000  01365000
   00086d18    WA   0 0 4096
  [19] .x86_cpu_dev.init PROGBITS 81febd18  013ebd18
   0018     A   0 0 8
  [20] .altinstructions  PROGBITS 81febd30  013ebd30
   7e4b     A   0 0 1
  [21] .altinstr_replace PROGBITS 81ff3b7b  013f3b7b
   2044    AX   0 0 1
  [22] .iommu_table  PROGBITS 81ff5bc0  013f5bc0
   00c8     A   0 0 8
  [23] .apicdrivers  PROGBITS 81ff5c88  013f5c88
   0010    WA   0 0 8
  [24] .exit.textPROGBITS 81ff5c98  013f5c98
   2412    AX   0 0 1

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] x86/alternatives padding

2015-03-04 Thread Ingo Molnar

* Borislav Petkov  wrote:

> On Wed, Mar 04, 2015 at 12:22:06PM +0100, Borislav Petkov wrote:
> > Well, kernel image doesn't change while vmlinux shows only a very small
> > .text increase of about 2K. I'm not sure yet why that happens though
> > because it shouldn't be the padding. Because we will have to do it
> > anyway, this patchset makes it automatic instead of by-hand, so to
> > speak.
> > 
> > Let me bisect it and see which patch adds the increase.
> 
> Doh, of course. I've added u8 padlen to the alternative instruction
> entry struct. For 2Kish alt sites in total, this explains the almost
> exact same increase in text size:
> 
>textdata bss dec hex filename
> 122905391595264 1085440 14971243 e4716b vmlinux
> 
> 338ea55579d1... x86/lib/copy_user_64.S: Remove FIX_ALIGNMENT define
>textdata bss dec hex filename
> 122905391595264 1085440 14971243 e4716b vmlinux
> 
> db477a3386de... x86/alternatives: Cleanup DPRINTK macro
>textdata bss dec hex filename
> 122905391595264 1085440 14971243 e4716b vmlinux
> 
> 4332195c5615... x86/alternatives: Add instruction padding
>textdata bss dec hex filename
> 122930301595264 1085440 14973734 e47b26 vmlinux
> ^^^

So you could have a look at the detailed section dump itself via:

   objdump -h vmlinux

there .text will be the raw text and .alt* will be listed separately. 
The 'size' tool will add up executable sections IIRC, mixing these 
sections.

.alt* is freed after init, so it's not really a kernel image size 
increase, right?

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] x86/alternatives padding

2015-03-04 Thread Borislav Petkov
On Wed, Mar 04, 2015 at 12:22:06PM +0100, Borislav Petkov wrote:
> Well, kernel image doesn't change while vmlinux shows only a very small
> .text increase of about 2K. I'm not sure yet why that happens though
> because it shouldn't be the padding. Because we will have to do it
> anyway, this patchset makes it automatic instead of by-hand, so to
> speak.
> 
> Let me bisect it and see which patch adds the increase.

Doh, of course. I've added u8 padlen to the alternative instruction
entry struct. For 2Kish alt sites in total, this explains the almost
exact same increase in text size:

   textdata bss dec hex filename
122905391595264 1085440 14971243 e4716b vmlinux

338ea55579d1... x86/lib/copy_user_64.S: Remove FIX_ALIGNMENT define
   textdata bss dec hex filename
122905391595264 1085440 14971243 e4716b vmlinux

db477a3386de... x86/alternatives: Cleanup DPRINTK macro
   textdata bss dec hex filename
122905391595264 1085440 14971243 e4716b vmlinux

4332195c5615... x86/alternatives: Add instruction padding
   textdata bss dec hex filename
122930301595264 1085440 14973734 e47b26 vmlinux
^^^

Ok, that's sorted out now.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] x86/alternatives padding

2015-03-04 Thread Borislav Petkov
On Wed, Mar 04, 2015 at 08:32:21AM +0100, Ingo Molnar wrote:
> Just curious: did the kernel image size change before/after these 
> changes? I.e. was any of the existing alternative instructions using 
> sites coded sub-optimally, with a larger maximum instruction size 
> allocated than strictly needed?
> 
> At least some of your improvements made things more optimal - 
> wondering at the total win, beyond the significant maintainability win 
> that is.

Well, kernel image doesn't change while vmlinux shows only a very small
.text increase of about 2K. I'm not sure yet why that happens though
because it shouldn't be the padding. Because we will have to do it
anyway, this patchset makes it automatic instead of by-hand, so to
speak.

Let me bisect it and see which patch adds the increase.

4.0-rc1 with alternatives patchset:
===

Setup is 15644 bytes (padded to 15872 bytes).
System is 5855 kB
CRC f2669897
Kernel: arch/x86/boot/bzImage is ready  (#1)

   textdata bss dec hex filename
122929711595264 1085440 14973675 e47aeb vmlinux

plain 4.0-rc1:
==

Setup is 15644 bytes (padded to 15872 bytes).
System is 5855 kB
CRC 7200607a
Kernel: arch/x86/boot/bzImage is ready  (#1)

   textdata bss dec hex filename
122905391595264 1085440 14971243 e4716b vmlinux

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] x86/alternatives padding

2015-03-04 Thread Borislav Petkov
On Wed, Mar 04, 2015 at 09:22:27PM +0100, Ingo Molnar wrote:
 So you could have a look at the detailed section dump itself via:
 
objdump -h vmlinux
 
 there .text will be the raw text and .alt* will be listed separately. 
 The 'size' tool will add up executable sections IIRC, mixing these 
 sections.

Right.

 .alt* is freed after init, so it's not really a kernel image size 
 increase, right?

Exactly:

void free_initmem(void)
{
free_init_pages(unused kernel,
(unsigned long)(__init_begin),
(unsigned long)(__init_end));
}

which are:

 69708: 81ee9000 0 NOTYPE  GLOBAL DEFAULT   16 __init_begin
 72679: 81ff9000 0 NOTYPE  GLOBAL DEFAULT   25 __init_end

and there's a bunch of stuff between 81ee9000 and 81ff9000:

  [Nr] Name  Type Address   Offset
   Size  EntSize  Flags  Link  Info  Align
  [17] .init.textPROGBITS 81efe000  012fe000
   00066c98    AX   0 0 16
  [18] .init.dataPROGBITS 81f65000  01365000
   00086d18    WA   0 0 4096
  [19] .x86_cpu_dev.init PROGBITS 81febd18  013ebd18
   0018     A   0 0 8
  [20] .altinstructions  PROGBITS 81febd30  013ebd30
   7e4b     A   0 0 1
  [21] .altinstr_replace PROGBITS 81ff3b7b  013f3b7b
   2044    AX   0 0 1
  [22] .iommu_table  PROGBITS 81ff5bc0  013f5bc0
   00c8     A   0 0 8
  [23] .apicdrivers  PROGBITS 81ff5c88  013f5c88
   0010    WA   0 0 8
  [24] .exit.textPROGBITS 81ff5c98  013f5c98
   2412    AX   0 0 1

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] x86/alternatives padding

2015-03-04 Thread Ingo Molnar

* Borislav Petkov b...@alien8.de wrote:

 On Wed, Mar 04, 2015 at 12:22:06PM +0100, Borislav Petkov wrote:
  Well, kernel image doesn't change while vmlinux shows only a very small
  .text increase of about 2K. I'm not sure yet why that happens though
  because it shouldn't be the padding. Because we will have to do it
  anyway, this patchset makes it automatic instead of by-hand, so to
  speak.
  
  Let me bisect it and see which patch adds the increase.
 
 Doh, of course. I've added u8 padlen to the alternative instruction
 entry struct. For 2Kish alt sites in total, this explains the almost
 exact same increase in text size:
 
textdata bss dec hex filename
 122905391595264 1085440 14971243 e4716b vmlinux
 
 338ea55579d1... x86/lib/copy_user_64.S: Remove FIX_ALIGNMENT define
textdata bss dec hex filename
 122905391595264 1085440 14971243 e4716b vmlinux
 
 db477a3386de... x86/alternatives: Cleanup DPRINTK macro
textdata bss dec hex filename
 122905391595264 1085440 14971243 e4716b vmlinux
 
 4332195c5615... x86/alternatives: Add instruction padding
textdata bss dec hex filename
 122930301595264 1085440 14973734 e47b26 vmlinux
 ^^^

So you could have a look at the detailed section dump itself via:

   objdump -h vmlinux

there .text will be the raw text and .alt* will be listed separately. 
The 'size' tool will add up executable sections IIRC, mixing these 
sections.

.alt* is freed after init, so it's not really a kernel image size 
increase, right?

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] x86/alternatives padding

2015-03-04 Thread Borislav Petkov
On Wed, Mar 04, 2015 at 12:22:06PM +0100, Borislav Petkov wrote:
 Well, kernel image doesn't change while vmlinux shows only a very small
 .text increase of about 2K. I'm not sure yet why that happens though
 because it shouldn't be the padding. Because we will have to do it
 anyway, this patchset makes it automatic instead of by-hand, so to
 speak.
 
 Let me bisect it and see which patch adds the increase.

Doh, of course. I've added u8 padlen to the alternative instruction
entry struct. For 2Kish alt sites in total, this explains the almost
exact same increase in text size:

   textdata bss dec hex filename
122905391595264 1085440 14971243 e4716b vmlinux

338ea55579d1... x86/lib/copy_user_64.S: Remove FIX_ALIGNMENT define
   textdata bss dec hex filename
122905391595264 1085440 14971243 e4716b vmlinux

db477a3386de... x86/alternatives: Cleanup DPRINTK macro
   textdata bss dec hex filename
122905391595264 1085440 14971243 e4716b vmlinux

4332195c5615... x86/alternatives: Add instruction padding
   textdata bss dec hex filename
122930301595264 1085440 14973734 e47b26 vmlinux
^^^

Ok, that's sorted out now.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] x86/alternatives padding

2015-03-04 Thread Borislav Petkov
On Wed, Mar 04, 2015 at 08:32:21AM +0100, Ingo Molnar wrote:
 Just curious: did the kernel image size change before/after these 
 changes? I.e. was any of the existing alternative instructions using 
 sites coded sub-optimally, with a larger maximum instruction size 
 allocated than strictly needed?
 
 At least some of your improvements made things more optimal - 
 wondering at the total win, beyond the significant maintainability win 
 that is.

Well, kernel image doesn't change while vmlinux shows only a very small
.text increase of about 2K. I'm not sure yet why that happens though
because it shouldn't be the padding. Because we will have to do it
anyway, this patchset makes it automatic instead of by-hand, so to
speak.

Let me bisect it and see which patch adds the increase.

4.0-rc1 with alternatives patchset:
===

Setup is 15644 bytes (padded to 15872 bytes).
System is 5855 kB
CRC f2669897
Kernel: arch/x86/boot/bzImage is ready  (#1)

   textdata bss dec hex filename
122929711595264 1085440 14973675 e47aeb vmlinux

plain 4.0-rc1:
==

Setup is 15644 bytes (padded to 15872 bytes).
System is 5855 kB
CRC 7200607a
Kernel: arch/x86/boot/bzImage is ready  (#1)

   textdata bss dec hex filename
122905391595264 1085440 14971243 e4716b vmlinux

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] x86/alternatives padding

2015-03-03 Thread Ingo Molnar

* Borislav Petkov  wrote:

> Hi guys,
> 
> so this one has been long in the making and has been passing testing
> on a bunch of boxes and bitness here so maybe we should try to put it
> into the wider tip mix and see what happens. If all is well, great, if
> there's trouble which I haven't managed to trigger in my testing, we can
> remove it from tip/master until all issues are fixed.
> 
> Btw, the last three patches are adjusting and improving perf bench a
> little as it includes memcpy/memset_64.S directly and this patchset
> breaks it with the changes otherwise.
> 
> Please pull,
> thanks.
> 
> ---
> The following changes since commit c517d838eb7d07bbe9507871fab3931deccff539:
> 
>   Linux 4.0-rc1 (2015-02-22 18:21:14 -0800)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git 
> tags/alternatives_padding
> 
> for you to fetch changes up to dfecb95cdfeaf7872d83a96bec3a606e9cd95c8d:
> 
>   perf/bench: Add -r all so that you can run all mem* routines (2015-03-03 
> 18:01:58 +0100)
> 
> 
> A more involved rework of the alternatives framework to be able to
> pad instructions and thus make using the alternatives macros more
> straightforward and without having to figure out old and new instruction
> sizes but have the toolchain figure that out for us.
> 
> Furthermore, it optimizes JMPs used so that fetch and decode can be
> relieved with smaller versions of the JMPs, where possible.
> 
> Some stats:
> 
> x86_64 defconfig:
> 
> Alternatives sites total:   2478
> Total padding added (in Bytes): 6051

Just curious: did the kernel image size change before/after these 
changes? I.e. was any of the existing alternative instructions using 
sites coded sub-optimally, with a larger maximum instruction size 
allocated than strictly needed?

At least some of your improvements made things more optimal - 
wondering at the total win, beyond the significant maintainability win 
that is.

> The padding is currently done for:
> 
> X86_FEATURE_ALWAYS
> X86_FEATURE_ERMS
> X86_FEATURE_LFENCE_RDTSC
> X86_FEATURE_MFENCE_RDTSC
> X86_FEATURE_SMAP
> 
> This is with the latest version of the patchset. Of course, on each
> machine the alternatives sites actually being patched are a proper
> subset of the total number.
> 
> 
> Borislav Petkov (18):
>   x86/lib/copy_user_64.S: Remove FIX_ALIGNMENT define
>   x86/alternatives: Cleanup DPRINTK macro
>   x86/alternatives: Add instruction padding
>   x86/alternatives: Make JMPs more robust
>   x86/alternatives: Use optimized NOPs for padding
>   x86/lib/copy_page_64.S: Use generic ALTERNATIVE macro
>   x86/lib/copy_user_64.S: Convert to ALTERNATIVE_2
>   x86/smap: Use ALTERNATIVE macro
>   x86/entry_32: Convert X86_INVD_BUG to ALTERNATIVE macro
>   x86/lib/clear_page_64.S: Convert to ALTERNATIVE_2 macro
>   x86/asm: Use alternative_2() in rdtsc_barrier()
>   x86/asm: Cleanup prefetch primitives
>   x86/lib/memset_64.S: Convert to ALTERNATIVE_2 macro
>   x86/lib/memmove_64.S: Convert memmove() to ALTERNATIVE macro
>   x86/lib/memcpy_64.S: Convert memcpy to ALTERNATIVE_2 macro
>   perf/bench: Fix mem* routines usage after alternatives change
>   perf/bench: Carve out mem routine benchmarking
>   perf/bench: Add -r all so that you can run all mem* routines
> 
>  arch/x86/include/asm/alternative-asm.h|  43 ++-
>  arch/x86/include/asm/alternative.h|  65 +++
>  arch/x86/include/asm/apic.h   |   2 +-
>  arch/x86/include/asm/barrier.h|   6 +-
>  arch/x86/include/asm/cpufeature.h |  30 ++---
>  arch/x86/include/asm/processor.h  |  16 ++-
>  arch/x86/include/asm/smap.h   |  30 ++---
>  arch/x86/kernel/alternative.c | 158 
> ++
>  arch/x86/kernel/cpu/amd.c |   5 +
>  arch/x86/kernel/entry_32.S|  12 +-
>  arch/x86/lib/clear_page_64.S  |  66 +--
>  arch/x86/lib/copy_page_64.S   |  37 ++
>  arch/x86/lib/copy_user_64.S   |  46 ++--
>  arch/x86/lib/memcpy_64.S  |  68 ---
>  arch/x86/lib/memmove_64.S |  19 +---
>  arch/x86/lib/memset_64.S  |  61 --
>  arch/x86/um/asm/barrier.h |   4 +-
>  tools/perf/bench/mem-memcpy-x86-64-asm-def.h  |   6 +-
>  tools/perf/bench/mem-memcpy-x86-64-asm.S  |   2 -
>  tools/perf/bench/mem-memcpy.c | 128 +++--
>  tools/perf/bench/mem-memset-x86-64-asm-def.h  |   6 +-
>  tools/perf/bench/mem-memset-x86-64-asm.S  |   2 -
>  tools/perf/util/include/asm/alternative-asm.h |   1 +
>  23 files changed, 433 insertions(+), 380 deletions(-)

Re: [GIT PULL] x86/alternatives padding

2015-03-03 Thread Ingo Molnar

* Borislav Petkov b...@alien8.de wrote:

 Hi guys,
 
 so this one has been long in the making and has been passing testing
 on a bunch of boxes and bitness here so maybe we should try to put it
 into the wider tip mix and see what happens. If all is well, great, if
 there's trouble which I haven't managed to trigger in my testing, we can
 remove it from tip/master until all issues are fixed.
 
 Btw, the last three patches are adjusting and improving perf bench a
 little as it includes memcpy/memset_64.S directly and this patchset
 breaks it with the changes otherwise.
 
 Please pull,
 thanks.
 
 ---
 The following changes since commit c517d838eb7d07bbe9507871fab3931deccff539:
 
   Linux 4.0-rc1 (2015-02-22 18:21:14 -0800)
 
 are available in the git repository at:
 
   git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git 
 tags/alternatives_padding
 
 for you to fetch changes up to dfecb95cdfeaf7872d83a96bec3a606e9cd95c8d:
 
   perf/bench: Add -r all so that you can run all mem* routines (2015-03-03 
 18:01:58 +0100)
 
 
 A more involved rework of the alternatives framework to be able to
 pad instructions and thus make using the alternatives macros more
 straightforward and without having to figure out old and new instruction
 sizes but have the toolchain figure that out for us.
 
 Furthermore, it optimizes JMPs used so that fetch and decode can be
 relieved with smaller versions of the JMPs, where possible.
 
 Some stats:
 
 x86_64 defconfig:
 
 Alternatives sites total:   2478
 Total padding added (in Bytes): 6051

Just curious: did the kernel image size change before/after these 
changes? I.e. was any of the existing alternative instructions using 
sites coded sub-optimally, with a larger maximum instruction size 
allocated than strictly needed?

At least some of your improvements made things more optimal - 
wondering at the total win, beyond the significant maintainability win 
that is.

 The padding is currently done for:
 
 X86_FEATURE_ALWAYS
 X86_FEATURE_ERMS
 X86_FEATURE_LFENCE_RDTSC
 X86_FEATURE_MFENCE_RDTSC
 X86_FEATURE_SMAP
 
 This is with the latest version of the patchset. Of course, on each
 machine the alternatives sites actually being patched are a proper
 subset of the total number.
 
 
 Borislav Petkov (18):
   x86/lib/copy_user_64.S: Remove FIX_ALIGNMENT define
   x86/alternatives: Cleanup DPRINTK macro
   x86/alternatives: Add instruction padding
   x86/alternatives: Make JMPs more robust
   x86/alternatives: Use optimized NOPs for padding
   x86/lib/copy_page_64.S: Use generic ALTERNATIVE macro
   x86/lib/copy_user_64.S: Convert to ALTERNATIVE_2
   x86/smap: Use ALTERNATIVE macro
   x86/entry_32: Convert X86_INVD_BUG to ALTERNATIVE macro
   x86/lib/clear_page_64.S: Convert to ALTERNATIVE_2 macro
   x86/asm: Use alternative_2() in rdtsc_barrier()
   x86/asm: Cleanup prefetch primitives
   x86/lib/memset_64.S: Convert to ALTERNATIVE_2 macro
   x86/lib/memmove_64.S: Convert memmove() to ALTERNATIVE macro
   x86/lib/memcpy_64.S: Convert memcpy to ALTERNATIVE_2 macro
   perf/bench: Fix mem* routines usage after alternatives change
   perf/bench: Carve out mem routine benchmarking
   perf/bench: Add -r all so that you can run all mem* routines
 
  arch/x86/include/asm/alternative-asm.h|  43 ++-
  arch/x86/include/asm/alternative.h|  65 +++
  arch/x86/include/asm/apic.h   |   2 +-
  arch/x86/include/asm/barrier.h|   6 +-
  arch/x86/include/asm/cpufeature.h |  30 ++---
  arch/x86/include/asm/processor.h  |  16 ++-
  arch/x86/include/asm/smap.h   |  30 ++---
  arch/x86/kernel/alternative.c | 158 
 ++
  arch/x86/kernel/cpu/amd.c |   5 +
  arch/x86/kernel/entry_32.S|  12 +-
  arch/x86/lib/clear_page_64.S  |  66 +--
  arch/x86/lib/copy_page_64.S   |  37 ++
  arch/x86/lib/copy_user_64.S   |  46 ++--
  arch/x86/lib/memcpy_64.S  |  68 ---
  arch/x86/lib/memmove_64.S |  19 +---
  arch/x86/lib/memset_64.S  |  61 --
  arch/x86/um/asm/barrier.h |   4 +-
  tools/perf/bench/mem-memcpy-x86-64-asm-def.h  |   6 +-
  tools/perf/bench/mem-memcpy-x86-64-asm.S  |   2 -
  tools/perf/bench/mem-memcpy.c | 128 +++--
  tools/perf/bench/mem-memset-x86-64-asm-def.h  |   6 +-
  tools/perf/bench/mem-memset-x86-64-asm.S  |   2 -
  tools/perf/util/include/asm/alternative-asm.h |   1 +
  23 files changed, 433 insertions(+), 380 deletions(-)

Pulled into tip:x86/asm, thanks Boris!

(I made a few comments as replies to the