On Sat, Jul 27, 2019 at 7:53 PM Andy Lutomirski <l...@kernel.org> wrote: > > On Fri, Jul 26, 2019 at 11:01 AM Sean Christopherson > <sean.j.christopher...@intel.com> wrote: > > > > On Wed, Jul 24, 2019 at 01:56:34AM +0200, Thomas Gleixner wrote: > > > On Tue, 23 Jul 2019, Kees Cook wrote: > > > > > > > On Wed, Jul 24, 2019 at 12:59:03AM +0200, Thomas Gleixner wrote: > > > > > And as we have sys_clock_gettime64() exposed for 32bit anyway you > > > > > need to > > > > > deal with that in seccomp independently of the VDSO. It does not make > > > > > sense > > > > > to treat sys_clock_gettime() differently than sys_clock_gettime64(). > > > > > They > > > > > both expose the same information, but the latter is y2038 safe. > > > > > > > > Okay, so combining Andy's ideas on aliasing and "more seccomp flags", > > > > we could declare that clock_gettime64() is not filterable on 32-bit at > > > > all without the magic SECCOMP_IGNORE_ALIASES flag or something. Then we > > > > would alias clock_gettime64 to clock_gettime _before_ the first > > > > evaluation > > > > (unless SECCOMP_IGNORE_ALIASES is set)? > > > > > > > > (When was clock_gettime64() introduced? Is it too long ago to do this > > > > "you can't filter it without a special flag" change?) > > > > > > clock_gettime64() and the other sys_*time64() syscalls which address the > > > y2038 issue were added in 5.1 > > > > Paul Bolle pointed out that this regression showed up in v5.3-rc1, not > > v5.2. In Paul's case, systemd-journal is failing. > > I think it's getting quite late to start inventing new seccomp > features to fix this. I think the right solution for 5.3 is to change > the 32-bit vdso fallback to use the old clock_gettime, i.e. > clock_gettime32. This is obviously not an acceptable long-term > solution.
I think there is something else wrong with the fallback path, it seems to pass the wrong structure in some cases: arch/x86/include/asm/vdso/gettimeofday.h vdso32: static __always_inline long clock_gettime_fallback(clockid_t _clkid, struct __kernel_timespec *_ts) { long ret; asm ( "mov %%ebx, %%edx \n" "mov %[clock], %%ebx \n" "call __kernel_vsyscall \n" "mov %%edx, %%ebx \n" : "=a" (ret), "=m" (*_ts) : "0" (__NR_clock_gettime64), [clock] "g" (_clkid), "c" (_ts) : "edx"); return ret; } arch/x86/include/asm/vdso/gettimeofday.h vdso64: static __always_inline long clock_gettime_fallback(clockid_t _clkid, struct __kernel_timespec *_ts) { long ret; asm ("syscall" : "=a" (ret), "=m" (*_ts) : "0" (__NR_clock_gettime), "D" (_clkid), "S" (_ts) : "rcx", "r11"); return ret; } lib/vdso/gettimeofday.c: static __maybe_unused int __cvdso_clock_gettime32(clockid_t clock, struct old_timespec32 *res) { struct __kernel_timespec ts; int ret; if (res == NULL) goto fallback; ret = __cvdso_clock_gettime(clock, &ts); if (ret == 0) { res->tv_sec = ts.tv_sec; res->tv_nsec = ts.tv_nsec; } return ret; fallback: return clock_gettime_fallback(clock, (struct __kernel_timespec *)res); } So we get an 'old_timespec32' pointer from user space, and cast it to __kernel_timespec in order to pass it to the low-level function that actually fills in the 64-bit structure. On a little-endian machine, the first four bytes are actually correct here, but this is followed by tv_nsec=0 and 8 more bytes that overwrite whatever comes after the user space 'timespec'. [I missed the typecast as an indication of a bug during my review, sorry about that]. I think adding a clock_gettime32_fallback() function that calls __NR_clock_gettime is both the simplest fix for this bug, and the least ugly way to handle it in the long run. We also need to decide what to do about __cvdso_clock_gettime32() once we add a compile-time option to make all time32 syscalls to return an error. Returning -ENOSYS from the clock_gettime32() fallback is probably a good idea, but for consistency the __vdso_clock_gettime() call should either always return the same in that configuration, or be left out from the vdso build endirely. Arnd