Re: Update x86-64 PLT for MPX

2014-02-19 Thread H.J. Lu
On Mon, Jan 27, 2014 at 1:50 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Jan 27, 2014 at 1:42 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Sat, Jan 18, 2014 at 8:11 AM, H.J. Lu hjl.to...@gmail.com wrote:
 Hi,

 Here is the proposal to update x86-64 PLT for MPX.  The linker change
 is implemented on hjl/mpx/pltext8 branch.  Any comments/feedbacks?

 Thanks.

 --
 H.J.
 ---
 Intel MPX:

 http://software.intel.com/en-us/file/319433-017pdf

 introduces 4 bound registers, which will be used for parameter passing
 in x86-64.  Bound registers are cleared by branch instructions.  Branch
 instructions with BND prefix will keep bound register contents. This leads
 to 2 requirements to 64-bit MPX run-time:

 1. Dynamic linker (ld.so) should save and restore bound registers during
 symbol lookup.
 2. Change the current 16-byte PLT0:

   ff 35 08 00 00 00pushq  GOT+8(%rip)
   ff 25 00 10 00jmpq  *GOT+16(%rip)
   0f 1f 40 00nopl   0x0(%rax)

 and 16-byte PLT1:

   ff 25 00 00 00 00jmpq   *name@GOTPCREL(%rip)
   68 00 00 00 00   pushq  $index
   e9 00 00 00 00   jmpq   PLT0

 which clear bound registers, to preserve bound registers.

 We use 2 new relocations:

 #define R_X86_64_PC32_BND  39 /* PC relative 32 bit signed with BND prefix 
 */
 #define R_X86_64_PLT32_BND 40 /* 32 bit PLT address with BND prefix */

 to mark branch instructions with BND prefix.

 When linker sees any R_X86_64_PC32_BND or R_X86_64_PLT32_BND relocations,
 it switches to a different PLT0:

   ff 35 08 00 00 00pushq  GOT+8(%rip)
   f2 ff 25 00 10 00bnd jmpq *GOT+16(%rip)
   0f 1f 00nopl   (%rax)

 to preserve bound registers for symbol lookup and it also creates an
 external PLT section, .pl.bnd.  Linker will create a BND PLT1 entry
 in .plt:

   68 00 00 00 00   pushq  $index
   f2 e9 00 00 00 00 bnd jmpq PLT0
   0f 1f 44 00 00nopl 0(%rax,%rax,1)

 and a 8-byte BND PLT entry in .plt.bnd:

   f2 ff 25 00 00 00 00  bnd jmpq *name@GOTPCREL(%rip)
   90nop

 Otherwise, linker will create a legacy PLT entry in .plt:

   68 00 00 00 00   pushq  $index
   e9 00 00 00 00jmpq PLT0
   66 0f 1f 44 00 00 nopw 0(%rax,%rax,1)

 and a 8-byte legacy PLT in .plt.bnd:

   ff 25 00 00 00 00 jmpq  *name@GOTPCREL(%rip)
   66 90 xchg  %ax,%ax

 The initial value of the GOT entry for name will be set to the the
 pushq instruction in the corresponding entry in .plt.  Linker will
 resolve reference of symbol name to the entry in the second PLT,
 .plt.bnd.

 Prelink stores the offset of pushq of PLT1 (plt_base + 0x10) in GOT[1]
 and GOT[1] is stored in GOT[3].  We can undo prelink in GOT by computing
 the corresponding the pushq offset with

 GOT[1] + (GOT offset - GOT[3]) * 2

 Since for each entry in .plt except for PLT0 we create a 8-byte entry in
 .plt.bnd, there is extra 8-byte per PLT symbol.

 We also investigated the 16-byte entry for .plt.bnd.  We compared the
 8-byte entry vs the the 16-byte entry for .plt.bnd on Sandy Bridge.
 There are no performance differences in SPEC CPU 2000/2006 as well as
 micro benchmarks.

 Pros:
 No change to undo prelink in dynamic linker.
 Only 8-byte memory overhead for each PLT symbol.
 Cons:
 Extra .plt.bnd section is needed.
 Extra 8 byte for legacy branches to PLT.
 GDB is unware of the new layout of .plt and .plt.bnd.

 Hi,

 I am enclosing the updated x86-64 psABI with PLT change.
 I checkeMy email is rejected due to PDF attachment.   I am resubmitting it 
 with
 out PDF file.
 d it onto hjl/mpx/master branch at

 https://github.com/hjl-tools/x86-64-psABI

 I will check in the binutils changes if there are no disagreements
 in 2 weeks.

 Thanks.


 My email is rejected due to PDF attachment.   I am resubmitting it with
 out PDF file.

I pushed the MPX binutils change into master:

https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=0ff2b86e7c14177ec7f9e1257f8e697814794017


-- 
H.J.


Re: Update x86-64 PLT for MPX

2014-01-27 Thread H.J. Lu
On Mon, Jan 27, 2014 at 1:42 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Sat, Jan 18, 2014 at 8:11 AM, H.J. Lu hjl.to...@gmail.com wrote:
 Hi,

 Here is the proposal to update x86-64 PLT for MPX.  The linker change
 is implemented on hjl/mpx/pltext8 branch.  Any comments/feedbacks?

 Thanks.

 --
 H.J.
 ---
 Intel MPX:

 http://software.intel.com/en-us/file/319433-017pdf

 introduces 4 bound registers, which will be used for parameter passing
 in x86-64.  Bound registers are cleared by branch instructions.  Branch
 instructions with BND prefix will keep bound register contents. This leads
 to 2 requirements to 64-bit MPX run-time:

 1. Dynamic linker (ld.so) should save and restore bound registers during
 symbol lookup.
 2. Change the current 16-byte PLT0:

   ff 35 08 00 00 00pushq  GOT+8(%rip)
   ff 25 00 10 00jmpq  *GOT+16(%rip)
   0f 1f 40 00nopl   0x0(%rax)

 and 16-byte PLT1:

   ff 25 00 00 00 00jmpq   *name@GOTPCREL(%rip)
   68 00 00 00 00   pushq  $index
   e9 00 00 00 00   jmpq   PLT0

 which clear bound registers, to preserve bound registers.

 We use 2 new relocations:

 #define R_X86_64_PC32_BND  39 /* PC relative 32 bit signed with BND prefix */
 #define R_X86_64_PLT32_BND 40 /* 32 bit PLT address with BND prefix */

 to mark branch instructions with BND prefix.

 When linker sees any R_X86_64_PC32_BND or R_X86_64_PLT32_BND relocations,
 it switches to a different PLT0:

   ff 35 08 00 00 00pushq  GOT+8(%rip)
   f2 ff 25 00 10 00bnd jmpq *GOT+16(%rip)
   0f 1f 00nopl   (%rax)

 to preserve bound registers for symbol lookup and it also creates an
 external PLT section, .pl.bnd.  Linker will create a BND PLT1 entry
 in .plt:

   68 00 00 00 00   pushq  $index
   f2 e9 00 00 00 00 bnd jmpq PLT0
   0f 1f 44 00 00nopl 0(%rax,%rax,1)

 and a 8-byte BND PLT entry in .plt.bnd:

   f2 ff 25 00 00 00 00  bnd jmpq *name@GOTPCREL(%rip)
   90nop

 Otherwise, linker will create a legacy PLT entry in .plt:

   68 00 00 00 00   pushq  $index
   e9 00 00 00 00jmpq PLT0
   66 0f 1f 44 00 00 nopw 0(%rax,%rax,1)

 and a 8-byte legacy PLT in .plt.bnd:

   ff 25 00 00 00 00 jmpq  *name@GOTPCREL(%rip)
   66 90 xchg  %ax,%ax

 The initial value of the GOT entry for name will be set to the the
 pushq instruction in the corresponding entry in .plt.  Linker will
 resolve reference of symbol name to the entry in the second PLT,
 .plt.bnd.

 Prelink stores the offset of pushq of PLT1 (plt_base + 0x10) in GOT[1]
 and GOT[1] is stored in GOT[3].  We can undo prelink in GOT by computing
 the corresponding the pushq offset with

 GOT[1] + (GOT offset - GOT[3]) * 2

 Since for each entry in .plt except for PLT0 we create a 8-byte entry in
 .plt.bnd, there is extra 8-byte per PLT symbol.

 We also investigated the 16-byte entry for .plt.bnd.  We compared the
 8-byte entry vs the the 16-byte entry for .plt.bnd on Sandy Bridge.
 There are no performance differences in SPEC CPU 2000/2006 as well as
 micro benchmarks.

 Pros:
 No change to undo prelink in dynamic linker.
 Only 8-byte memory overhead for each PLT symbol.
 Cons:
 Extra .plt.bnd section is needed.
 Extra 8 byte for legacy branches to PLT.
 GDB is unware of the new layout of .plt and .plt.bnd.

 Hi,

 I am enclosing the updated x86-64 psABI with PLT change.
 I checkeMy email is rejected due to PDF attachment.   I am resubmitting it 
 with
out PDF file.
d it onto hjl/mpx/master branch at

 https://github.com/hjl-tools/x86-64-psABI

 I will check in the binutils changes if there are no disagreements
 in 2 weeks.

 Thanks.


My email is rejected due to PDF attachment.   I am resubmitting it with
out PDF file.


-- 
H.J.


Update x86-64 PLT for MPX

2014-01-18 Thread H.J. Lu
Hi,

Here is the proposal to update x86-64 PLT for MPX.  The linker change
is implemented on hjl/mpx/pltext8 branch.  Any comments/feedbacks?

Thanks.

-- 
H.J.
---
Intel MPX:

http://software.intel.com/en-us/file/319433-017pdf

introduces 4 bound registers, which will be used for parameter passing
in x86-64.  Bound registers are cleared by branch instructions.  Branch
instructions with BND prefix will keep bound register contents. This leads
to 2 requirements to 64-bit MPX run-time:

1. Dynamic linker (ld.so) should save and restore bound registers during
symbol lookup.
2. Change the current 16-byte PLT0:

  ff 35 08 00 00 00pushq  GOT+8(%rip)
  ff 25 00 10 00jmpq  *GOT+16(%rip)
  0f 1f 40 00nopl   0x0(%rax)

and 16-byte PLT1:

  ff 25 00 00 00 00jmpq   *name@GOTPCREL(%rip)
  68 00 00 00 00   pushq  $index
  e9 00 00 00 00   jmpq   PLT0

which clear bound registers, to preserve bound registers.

We use 2 new relocations:

#define R_X86_64_PC32_BND  39 /* PC relative 32 bit signed with BND prefix */
#define R_X86_64_PLT32_BND 40 /* 32 bit PLT address with BND prefix */

to mark branch instructions with BND prefix.

When linker sees any R_X86_64_PC32_BND or R_X86_64_PLT32_BND relocations,
it switches to a different PLT0:

  ff 35 08 00 00 00pushq  GOT+8(%rip)
  f2 ff 25 00 10 00bnd jmpq *GOT+16(%rip)
  0f 1f 00nopl   (%rax)

to preserve bound registers for symbol lookup and it also creates an
external PLT section, .pl.bnd.  Linker will create a BND PLT1 entry
in .plt:

  68 00 00 00 00   pushq  $index
  f2 e9 00 00 00 00 bnd jmpq PLT0
  0f 1f 44 00 00nopl 0(%rax,%rax,1)

and a 8-byte BND PLT entry in .plt.bnd:

  f2 ff 25 00 00 00 00  bnd jmpq *name@GOTPCREL(%rip)
  90nop

Otherwise, linker will create a legacy PLT entry in .plt:

  68 00 00 00 00   pushq  $index
  e9 00 00 00 00jmpq PLT0
  66 0f 1f 44 00 00 nopw 0(%rax,%rax,1)

and a 8-byte legacy PLT in .plt.bnd:

  ff 25 00 00 00 00 jmpq  *name@GOTPCREL(%rip)
  66 90 xchg  %ax,%ax

The initial value of the GOT entry for name will be set to the the
pushq instruction in the corresponding entry in .plt.  Linker will
resolve reference of symbol name to the entry in the second PLT,
.plt.bnd.

Prelink stores the offset of pushq of PLT1 (plt_base + 0x10) in GOT[1]
and GOT[1] is stored in GOT[3].  We can undo prelink in GOT by computing
the corresponding the pushq offset with

GOT[1] + (GOT offset - GOT[3]) * 2

Since for each entry in .plt except for PLT0 we create a 8-byte entry in
.plt.bnd, there is extra 8-byte per PLT symbol.

We also investigated the 16-byte entry for .plt.bnd.  We compared the
8-byte entry vs the the 16-byte entry for .plt.bnd on Sandy Bridge.
There are no performance differences in SPEC CPU 2000/2006 as well as
micro benchmarks.

Pros:
No change to undo prelink in dynamic linker.
Only 8-byte memory overhead for each PLT symbol.
Cons:
Extra .plt.bnd section is needed.
Extra 8 byte for legacy branches to PLT.
GDB is unware of the new layout of .plt and .plt.bnd.


RFC: Update x86-64 PLT for MPX

2013-11-28 Thread H.J. Lu
Hi,

This is a proposal to update x86-64 PLT for MPX.We don't
need to change GCC nor glibc to support it.  The binutils change
is implemented on hjl/mpx/pltext8 branch.  GDB works except
there are no synthetic symbols for the .plt section.  Prelink change
is very small.

Any comments?

Thanks.

-- 
H.J.
--
Intel MPX:

http://software.intel.com/sites/default/files/319433-015.pdf

introduces 4 bound registers, which will be used for parameter passing
in x86-64.  Bound registers are cleared by branch instructions.  Branch
instructions with BND prefix will keep bound register contents. This leads
to 2 requirements to 64-bit MPX run-time:

1. Dynamic linker (ld.so) should save and restore bound registers during
symbol lookup.
2. Change the current 16-byte PLT0:

  ff 35 08 00 00 00pushq  GOT+8(%rip)
  ff 25 00 10 00jmpq  *GOT+16(%rip)
  0f 1f 40 00nopl   0x0(%rax)

and 16-byte PLT1:

  ff 25 00 00 00 00jmpq   *name@GOTPCREL(%rip)
  68 00 00 00 00   pushq  $index
  e9 00 00 00 00   jmpq   PLT0

which clear bound registers, to preserve bound registers.

We use 2 new relocations:

#define R_X86_64_PC32_BND  39 /* PC relative 32 bit signed with BND prefix */
#define R_X86_64_PLT32_BND 40 /* 32 bit PLT address with BND prefix */

to mark branch instructions with BND prefix.

When linker sees any R_X86_64_PC32_BND or R_X86_64_PLT32_BND relocations,
it switches to a different PLT0:

  ff 35 08 00 00 00pushq  GOT+8(%rip)
  f2 ff 25 00 10 00bnd jmpq *GOT+16(%rip)
  0f 1f 00nopl   (%rax)

to preserve bound registers for symbol lookup and it also creates an
external PLT section, .pl.bnd.  Linker will create a BND PLT entry
in .plt:

  68 00 00 00 00   pushq  $index
  f2 e9 00 00 00 00 bnd jmpq PLT0
  0f 1f 44 00 00nopl 0(%rax,%rax,1)

and a 8-byte BND PLT entry in .plt.bnd:

  f2 ff 25 00 00 00 00  bnd jmpq *name@GOTPCREL(%rip)
  90nop

Otherwise, linker will create a legacy PLT entry in .plt:

  68 00 00 00 00   pushq  $index
  e9 00 00 00 00jmpq PLT0
  66 0f 1f 44 00 00 nopw 0(%rax,%rax,1)

and a 8-byte legacy PLT in .plt.bnd:

  ff 25 00 00 00 00 jmpq  *name@GOTPCREL(%rip)
  66 90 xchg  %ax,%ax

The initial value of the GOT entry for name will be set to the the
pushq instruction in the corresponding entry in .plt.  Linker will
resolve reference of symbol name to the entry in the second PLT,
.plt.bnd.

Prelink stores the offset of pushq of PLT1 (plt_base + 0x10) in GOT[1]
and GOT[1] is stored in GOT[3].  We can undo prelink in GOT by computing
the corresponding the pushq offset with

GOT[1] + (GOT offset - GOT[3]) * 2

Since for each entry in .plt except for PLT0 we create a 8-byte entry in
.plt.bnd, there is extra 8-byte per PLT symbol.

We also investigated the 16-byte entry for .plt.bnd.  We compared the
8-byte entry vs the the 16-byte entry for .plt.bnd on Sandy Bridge.
There are no performance differences in SPEC CPU 2000/2006 as well as
micro benchmarks.

Pros:
No change to undo prelink in dynamic linker.
Only 8-byte memory overhead for each PLT symbol.
Cons:
Extra .plt.bnd section is needed.
Extra 8 byte for legacy branches to PLT.
GDB is unware of .plt and .plt.bnd.