[PATCH v4.16-rc6 (1)] x86/vdso: VDSO should handle clock_gettime(CLOCK_MONOTONIC_RAW) without syscall

2018-03-21 Thread jason . vas . dias

  Resent to address reviewer comments, and allow builds with compilers
  that support -DRETPOLINE to succeed.
  
  Currently, the VDSO does not handle
 clock_gettime( CLOCK_MONOTONIC_RAW,  )
  on Intel / AMD - it calls
 vdso_fallback_gettime()
  for this clock, which issues a syscall, having an unacceptably high
  latency (minimum measurable time or time between measurements)
  of 300-700ns on 2 2.8-3.9ghz Haswell x86_64 Family'_'Model : 06_3C
  machines under various versions of Linux.
  
  Sometimes, particularly when correlating elapsed time to performance
  counter values, user-space  code needs to know elapsed time from the
  perspective of the CPU no matter how "hot" / fast or "cold" / slow it
  might be running wrt NTP / PTP "real" time; when code needs this,
  the latencies associated with a syscall are often unacceptably high.
  
  I reported this as Bug #198161 :
'https://bugzilla.kernel.org/show_bug.cgi?id=198961'
  and in previous posts with subjects matching 'CLOCK_MONOTONIC_RAW' .
  
  This patch handles CLOCK_MONOTONIC_RAW clock_gettime() in the VDSO ,
  by exporting the raw clock calibration, last cycles, last xtime_nsec,
  and last raw_sec value in the vsyscall_gtod_data during vsyscall_update() .
  Now the new do_monotonic_raw() function in the vDSO has a latency of @ 20ns
  on average, and the test program:
   tools/testing/selftest/timers/inconsistency-check.c
  succeeds with arguments: '-c 4 -t 120' or any arbitrary -t value.
  
  The patch is against Linus' latest 4.16-rc6 tree,
  current HEAD of :
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
  .
  
  This patch affects only files:
   arch/x86/include/asm/vgtod.h
   arch/x86/entry/vdso/vclock_gettime.c
   arch/x86/entry/vsyscall/vsyscall_gtod.c
   
  Patches for kernels 3.10.0-21 and 4.9.65-rt23 (ARM) are attached to bug
  #198161,
  as is the test program, timer_latency.c, to demonstrate the problem.
  
  Before the patch a latency of 200-1000ns was measured for
clock_gettime(CLOCK_MONOTONIC_RAW,)
  calls - after the patch, the same call on the same machine
  has a latency of @ 20ns.
  
  Please consider applying something like this patch to a future Linux release.

  This patch is being resent because it has slight improvements to 
vclock_gettime
  static function attributes wrt. the previous version.

  It also supersedes all previous patches with subject matching
 '.*VDSO should handle.*clock_gettime.*MONOTONIC_RAW'
  that I have sent previously - sorry for the resends.

  Please apply this patch so we stop getting emails from
  intel build bot trying to build previous version, with
  subject :
'[PATCH v4.16-rc5 1/2] x86/vdso: VDSO should handle \
 clock_gettime(CLOCK_MONOTONIC_RAW) without syscall'
  , which only fails to build because its patch 2/2 , which
  removed -DRETPOLINE from the VDSO build, and is now the
  subject of https://bugzilla.kernel.org/show_bug.cgi?id=199129,
  raised by H.J. Liu, was not applied first - Sorry! 

Thanks & Best Regards,
Jason Vas Dias


[PATCH v4.16-rc6 (1)] x86/vdso: VDSO should handle clock_gettime(CLOCK_MONOTONIC_RAW) without syscall

2018-03-21 Thread jason . vas . dias

  Resent to address reviewer comments, and allow builds with compilers
  that support -DRETPOLINE to succeed.
  
  Currently, the VDSO does not handle
 clock_gettime( CLOCK_MONOTONIC_RAW,  )
  on Intel / AMD - it calls
 vdso_fallback_gettime()
  for this clock, which issues a syscall, having an unacceptably high
  latency (minimum measurable time or time between measurements)
  of 300-700ns on 2 2.8-3.9ghz Haswell x86_64 Family'_'Model : 06_3C
  machines under various versions of Linux.
  
  Sometimes, particularly when correlating elapsed time to performance
  counter values, user-space  code needs to know elapsed time from the
  perspective of the CPU no matter how "hot" / fast or "cold" / slow it
  might be running wrt NTP / PTP "real" time; when code needs this,
  the latencies associated with a syscall are often unacceptably high.
  
  I reported this as Bug #198161 :
'https://bugzilla.kernel.org/show_bug.cgi?id=198961'
  and in previous posts with subjects matching 'CLOCK_MONOTONIC_RAW' .
  
  This patch handles CLOCK_MONOTONIC_RAW clock_gettime() in the VDSO ,
  by exporting the raw clock calibration, last cycles, last xtime_nsec,
  and last raw_sec value in the vsyscall_gtod_data during vsyscall_update() .
  Now the new do_monotonic_raw() function in the vDSO has a latency of @ 20ns
  on average, and the test program:
   tools/testing/selftest/timers/inconsistency-check.c
  succeeds with arguments: '-c 4 -t 120' or any arbitrary -t value.
  
  The patch is against Linus' latest 4.16-rc6 tree,
  current HEAD of :
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
  .
  
  This patch affects only files:
   arch/x86/include/asm/vgtod.h
   arch/x86/entry/vdso/vclock_gettime.c
   arch/x86/entry/vsyscall/vsyscall_gtod.c
   
  Patches for kernels 3.10.0-21 and 4.9.65-rt23 (ARM) are attached to bug
  #198161,
  as is the test program, timer_latency.c, to demonstrate the problem.
  
  Before the patch a latency of 200-1000ns was measured for
clock_gettime(CLOCK_MONOTONIC_RAW,)
  calls - after the patch, the same call on the same machine
  has a latency of @ 20ns.
  
  Please consider applying something like this patch to a future Linux release.

  This patch is being resent because it has slight improvements to 
vclock_gettime
  static function attributes wrt. the previous version.

  It also supersedes all previous patches with subject matching
 '.*VDSO should handle.*clock_gettime.*MONOTONIC_RAW'
  that I have sent previously - sorry for the resends.

  Please apply this patch so we stop getting emails from
  intel build bot trying to build previous version, with
  subject :
'[PATCH v4.16-rc5 1/2] x86/vdso: VDSO should handle \
 clock_gettime(CLOCK_MONOTONIC_RAW) without syscall'
  , which only fails to build because its patch 2/2 , which
  removed -DRETPOLINE from the VDSO build, and is now the
  subject of https://bugzilla.kernel.org/show_bug.cgi?id=199129,
  raised by H.J. Liu, was not applied first - Sorry! 

Thanks & Best Regards,
Jason Vas Dias


Re: [PATCH v4.16-rc6 (1)] x86/vdso: VDSO should handle clock_gettime(CLOCK_MONOTONIC_RAW) without syscall

2018-03-19 Thread Jason Vas Dias
Note there is a bug raised by H.J. Liu :
 Bug 199129: Don't build vDSO with $(RETPOLINE_CFLAGS) -DRETPOLINE
(https://bugzilla.kernel.org/show_bug.cgi?id=199129)

If you agree it is a bug, then use both patches from post :
'[PATCH v4.16-rc5 (2)] x86/vdso: VDSO should handle \
 clock_gettime(CLOCK_MONOTONIC_RAW) without syscall
'
else, use the single patch from $subject, which makes the
calls to the statics in vclock_gettime.c' use
   indirect_branch("keep") / function_return("keep") ,
to avoid generation of thunk relocations which would not
occur unless compiled with
   -mindirect-branch=thunk-extern -mindirect-branch-register
.

Thanks & Regards,
Jason


Re: [PATCH v4.16-rc6 (1)] x86/vdso: VDSO should handle clock_gettime(CLOCK_MONOTONIC_RAW) without syscall

2018-03-19 Thread Jason Vas Dias
Note there is a bug raised by H.J. Liu :
 Bug 199129: Don't build vDSO with $(RETPOLINE_CFLAGS) -DRETPOLINE
(https://bugzilla.kernel.org/show_bug.cgi?id=199129)

If you agree it is a bug, then use both patches from post :
'[PATCH v4.16-rc5 (2)] x86/vdso: VDSO should handle \
 clock_gettime(CLOCK_MONOTONIC_RAW) without syscall
'
else, use the single patch from $subject, which makes the
calls to the statics in vclock_gettime.c' use
   indirect_branch("keep") / function_return("keep") ,
to avoid generation of thunk relocations which would not
occur unless compiled with
   -mindirect-branch=thunk-extern -mindirect-branch-register
.

Thanks & Regards,
Jason


[PATCH v4.16-rc6 (1)] x86/vdso: VDSO should handle clock_gettime(CLOCK_MONOTONIC_RAW) without syscall

2018-03-19 Thread jason . vas . dias


Resent to address reviewer comments, and allow builds with compilers
  that support -DRETPOLINE to succeed.

  Currently, the VDSO does not handle
 clock_gettime( CLOCK_MONOTONIC_RAW,  )
  on Intel / AMD - it calls
 vdso_fallback_gettime()
  for this clock, which issues a syscall, having an unacceptably high
  latency (minimum measurable time or time between measurements)
  of 300-700ns on 2 2.8-3.9ghz Haswell x86_64 Family'_'Model : 06_3C
  machines under various versions of Linux.

  Sometimes, particularly when correlating elapsed time to performance
  counter values, user-space  code needs to know elapsed time from the
  perspective of the CPU no matter how "hot" / fast or "cold" / slow it
  might be running wrt NTP / PTP "real" time; when code needs this,
  the latencies associated with a syscall are often unacceptably high.

  I reported this as Bug #198161 :
'https://bugzilla.kernel.org/show_bug.cgi?id=198961'
  and in previous posts with subjects matching 'CLOCK_MONOTONIC_RAW' .

  This patch handles CLOCK_MONOTONIC_RAW clock_gettime() in the VDSO ,
  by exporting the raw clock calibration, last cycles, last xtime_nsec,
  and last raw_sec value in the vsyscall_gtod_data during vsyscall_update() .

  Now the new do_monotonic_raw() function in the vDSO has a latency of @ 20ns
  on average, and the test program:
   tools/testing/selftest/timers/inconsistency-check.c
  succeeds with arguments: '-c 4 -t 120' or any arbitrary -t value.

  The patch is against Linus' latest 4.16-rc5 tree,
  current HEAD of :
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
  .

  This patch affects only files:

   arch/x86/include/asm/vgtod.h
   arch/x86/entry/vdso/vclock_gettime.c
   arch/x86/entry/vsyscall/vsyscall_gtod.c


  Patches for kernels 3.10.0-21 and 4.9.65-rt23 (ARM) are attached to bug 
#198161,
  as is the test program, timer_latency.c, to demonstrate the problem.

  Before the patch a latency of 200-1000ns was measured for
clock_gettime(CLOCK_MONOTONIC_RAW,)
  calls - after the patch, the same call on the same machine
  has a latency of @ 20ns.

  Please consider applying something like this patch to a future Linux release.

Thanks & Best Regards,
Jason Vas Dias


[PATCH v4.16-rc6 (1)] x86/vdso: VDSO should handle clock_gettime(CLOCK_MONOTONIC_RAW) without syscall

2018-03-19 Thread jason . vas . dias


Resent to address reviewer comments, and allow builds with compilers
  that support -DRETPOLINE to succeed.

  Currently, the VDSO does not handle
 clock_gettime( CLOCK_MONOTONIC_RAW,  )
  on Intel / AMD - it calls
 vdso_fallback_gettime()
  for this clock, which issues a syscall, having an unacceptably high
  latency (minimum measurable time or time between measurements)
  of 300-700ns on 2 2.8-3.9ghz Haswell x86_64 Family'_'Model : 06_3C
  machines under various versions of Linux.

  Sometimes, particularly when correlating elapsed time to performance
  counter values, user-space  code needs to know elapsed time from the
  perspective of the CPU no matter how "hot" / fast or "cold" / slow it
  might be running wrt NTP / PTP "real" time; when code needs this,
  the latencies associated with a syscall are often unacceptably high.

  I reported this as Bug #198161 :
'https://bugzilla.kernel.org/show_bug.cgi?id=198961'
  and in previous posts with subjects matching 'CLOCK_MONOTONIC_RAW' .

  This patch handles CLOCK_MONOTONIC_RAW clock_gettime() in the VDSO ,
  by exporting the raw clock calibration, last cycles, last xtime_nsec,
  and last raw_sec value in the vsyscall_gtod_data during vsyscall_update() .

  Now the new do_monotonic_raw() function in the vDSO has a latency of @ 20ns
  on average, and the test program:
   tools/testing/selftest/timers/inconsistency-check.c
  succeeds with arguments: '-c 4 -t 120' or any arbitrary -t value.

  The patch is against Linus' latest 4.16-rc5 tree,
  current HEAD of :
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
  .

  This patch affects only files:

   arch/x86/include/asm/vgtod.h
   arch/x86/entry/vdso/vclock_gettime.c
   arch/x86/entry/vsyscall/vsyscall_gtod.c


  Patches for kernels 3.10.0-21 and 4.9.65-rt23 (ARM) are attached to bug 
#198161,
  as is the test program, timer_latency.c, to demonstrate the problem.

  Before the patch a latency of 200-1000ns was measured for
clock_gettime(CLOCK_MONOTONIC_RAW,)
  calls - after the patch, the same call on the same machine
  has a latency of @ 20ns.

  Please consider applying something like this patch to a future Linux release.

Thanks & Best Regards,
Jason Vas Dias