The 05/19/2020 22:31, Arnd Bergmann wrote: > On Tue, May 19, 2020 at 10:24 PM Adhemerval Zanella > <adhemerval.zane...@linaro.org> wrote: > > On 19/05/2020 16:54, Arnd Bergmann wrote: > > > Jack Schmidt reported a bug for the arm32 clock_gettimeofday64 vdso call > > > last > > > month: https://github.com/richfelker/musl-cross-make/issues/96 and > > > https://github.com/raspberrypi/linux/issues/3579 > > > > > > As Will Deacon pointed out, this was never reported on the mailing list, > > > so I'll try to summarize what we know, so this can hopefully be resolved > > > soon. > > > > > > - This happened reproducibly on Linux-5.6 on a 32-bit Raspberry Pi patched > > > kernel running on a 64-bit Raspberry Pi 4b (bcm2711) when calling > > > clock_gettime64(CLOCK_REALTIME) > > > > Does it happen with other clocks as well? > > Unclear. > > > > - The kernel tree is at https://github.com/raspberrypi/linux/, but I could > > > see no relevant changes compared to a mainline kernel. > > > > Is this bug reproducible with mainline kernel or mainline kernel can't be > > booted on bcm2711? > > Mainline linux-5.6 should boot on that machine but might not have > all the other features, so I think users tend to use the raspberry pi > kernel sources for now. > > > > - From the report, I see that the returned time value is larger than the > > > expected time, by 3.4 to 14.5 million seconds in four samples, my > > > guess is that a random number gets added in at some point. > > > > What kind code are you using to reproduce it? It is threaded or issue > > clock_gettime from signal handlers? > > The reproducer is very simple without threads or signals, > see the start of https://github.com/richfelker/musl-cross-make/issues/96 > > It does rely on calling into the musl wrapper, not the direct vdso > call. > > > > - From other sources, I found that the Raspberry Pi clocksource runs > > > at 54 MHz, with a mask value of 0xffffffffffffff. From these numbers > > > I would expect that reading a completely random hardware register > > > value would result in an offset up to 1.33 billion seconds, which is > > > around factor 100 more than the error we see, though similar. > > > > > > - The test case calls the musl clock_gettime() function, which falls back > > > to > > > the clock_gettime64() syscall on kernels prior to 5.5, or to the 32-bit > > > clock_gettime() prior to Linux-5.1. As reported in the bug, Linux-4.19 > > > does > > > not show the bug. > > > > > > - The behavior was not reproduced on the same user space in qemu, > > > though I cannot tell whether the exact same kernel binary was used. > > > > > > - glibc-2.31 calls the same clock_gettime64() vdso function on arm to > > > implement clock_gettime(), but earlier versions did not. I have not > > > seen any reports of this bug, which could be explained by users > > > generally being on older versions. > > > > > > - As far as I can tell, there are no reports of this bug from other users, > > > and so far nobody could reproduce it.
note: i could not reproduce it in qemu-system with these configs: qemu-system-aarch64 + arm64 kernel + compat vdso qemu-system-aarch64 + kvm accel (on cortex-a72) + 32bit arm kernel qemu-system-arm + cpu max + 32bit arm kernel so i think it's something specific to that user's setup (maybe rpi hw bug or gcc miscompiled the vdso or something with that particular linux, i built my own linux 5.6 because i did not know the exact kernel version where the bug was seen) i don't have access to rpi (or other cortex-a53 where i can install my own kernel) so this is as far as i got. > > > > > > - The current musl git tree has been patched to not call clock_gettime64 > > > on ARM because of this problem, so it cannot be used for reproducing > > > it. > > > > So should glibc follow musl and remove arm clock_gettime6y4 vDSO support > > or this bug is localized to an specific kernel version running on an > > specific hardware? > > I hope we can figure out what is actually going on soon, there is probably > no need to change glibc before we have. > > Arnd --