Re: [Qemu-devel] [RFC QEMU 0/2] arm/virt: Account for guest pause time

2018-11-08 Thread no-reply
Hi,

This series failed docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

Type: series
Message-id: 1541616504-68526-1-git-send-email-bijan.mottahe...@oracle.com
Subject: [Qemu-devel] [RFC QEMU 0/2] arm/virt: Account for guest pause time

=== TEST SCRIPT BEGIN ===
#!/bin/bash
time make docker-test-quick@centos7 SHOW_ENV=1 J=8
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
>From https://github.com/patchew-project/qemu
 * [new tag]   patchew/20181105014047.26447-1-sa...@linux.intel.com 
-> patchew/20181105014047.26447-1-sa...@linux.intel.com
Switched to a new branch 'test'
52275419d0 arm/virt: Account for guest pause time
10b0e76068 arm/virt: Initialize generic timer scale factor dynamically

=== OUTPUT BEGIN ===
  BUILD   centos7
make[1]: Entering directory '/var/tmp/patchew-tester-tmp-q7a5t89k/src'
  GEN 
/var/tmp/patchew-tester-tmp-q7a5t89k/src/docker-src.2018-11-08-09.32.34.7613/qemu.tar
Cloning into 
'/var/tmp/patchew-tester-tmp-q7a5t89k/src/docker-src.2018-11-08-09.32.34.7613/qemu.tar.vroot'...
done.
Your branch is up-to-date with 'origin/test'.
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into 
'/var/tmp/patchew-tester-tmp-q7a5t89k/src/docker-src.2018-11-08-09.32.34.7613/qemu.tar.vroot/dtc'...
Submodule path 'dtc': checked out '88f18909db731a627456f26d779445f84e449536'
Submodule 'ui/keycodemapdb' (git://git.qemu.org/keycodemapdb.git) registered 
for path 'ui/keycodemapdb'
Cloning into 
'/var/tmp/patchew-tester-tmp-q7a5t89k/src/docker-src.2018-11-08-09.32.34.7613/qemu.tar.vroot/ui/keycodemapdb'...
Submodule path 'ui/keycodemapdb': checked out 
'6b3d716e2b6472eb7189d3220552280ef3d832ce'
  COPYRUNNER
RUN test-quick in qemu:centos7 
Packages installed:
SDL-devel-1.2.15-14.el7.x86_64
bison-3.0.4-1.el7.x86_64
bzip2-1.0.6-13.el7.x86_64
bzip2-devel-1.0.6-13.el7.x86_64
ccache-3.3.4-1.el7.x86_64
csnappy-devel-0-6.20150729gitd7bc683.el7.x86_64
flex-2.5.37-3.el7.x86_64
gcc-4.8.5-28.el7_5.1.x86_64
gettext-0.19.8.1-2.el7.x86_64
git-1.8.3.1-14.el7_5.x86_64
glib2-devel-2.54.2-2.el7.x86_64
libaio-devel-0.3.109-13.el7.x86_64
libepoxy-devel-1.3.1-2.el7_5.x86_64
libfdt-devel-1.4.6-1.el7.x86_64
lzo-devel-2.06-8.el7.x86_64
make-3.82-23.el7.x86_64
mesa-libEGL-devel-17.2.3-8.20171019.el7.x86_64
mesa-libgbm-devel-17.2.3-8.20171019.el7.x86_64
nettle-devel-2.7.1-8.el7.x86_64
package g++ is not installed
package librdmacm-devel is not installed
pixman-devel-0.34.0-1.el7.x86_64
spice-glib-devel-0.34-3.el7_5.1.x86_64
spice-server-devel-0.14.0-2.el7_5.4.x86_64
tar-1.26-34.el7.x86_64
vte-devel-0.28.2-10.el7.x86_64
xen-devel-4.6.6-12.el7.x86_64
zlib-devel-1.2.7-17.el7.x86_64

Environment variables:
PACKAGES=bison bzip2 bzip2-devel ccache csnappy-devel flex  
   g++ gcc gettext git glib2-devel libaio-devel 
libepoxy-devel libfdt-devel librdmacm-devel lzo-devel make 
mesa-libEGL-devel mesa-libgbm-devel nettle-devel pixman-devel 
SDL-devel spice-glib-devel spice-server-devel tar vte-devel 
xen-devel zlib-devel
HOSTNAME=8bd8d5bb7b7b
MAKEFLAGS= -j8
J=8
CCACHE_DIR=/var/tmp/ccache
EXTRA_CONFIGURE_OPTS=
V=
SHOW_ENV=1
PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
TARGET_LIST=
SHLVL=1
HOME=/home/patchew
TEST_DIR=/tmp/qemu-test
FEATURES= dtc
DEBUG=
_=/usr/bin/env

Configure options:
--enable-werror --target-list=x86_64-softmmu,aarch64-softmmu 
--prefix=/tmp/qemu-test/install
No C++ compiler available; disabling C++ specific optional code
Install prefix/tmp/qemu-test/install
BIOS directory/tmp/qemu-test/install/share/qemu
firmware path /tmp/qemu-test/install/share/qemu-firmware
binary directory  /tmp/qemu-test/install/bin
library directory /tmp/qemu-test/install/lib
module directory  /tmp/qemu-test/install/lib/qemu
libexec directory /tmp/qemu-test/install/libexec
include directory /tmp/qemu-test/install/include
config directory  /tmp/qemu-test/install/etc
local state directory   /tmp/qemu-test/install/var
Manual directory  /tmp/qemu-test/install/share/man
ELF interp prefix /usr/gnemul/qemu-%M
Source path   /tmp/qemu-test/src
GIT binarygit
GIT submodules
C compilercc
Host C compiler   cc
C++ compiler  
Objective-C compiler cc
ARFLAGS   rv
CFLAGS-O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g 
QEMU_CFLAGS   -I/usr/include/pixman-1-Werror   -pthread 
-I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include   -fPIE -DPIE -m64 -mcx16 
-D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes 
-Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes 
-fno-strict-aliasing -fno-common -fwrapv  -Wendif-labels 
-Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security 
-Wform

Re: [Qemu-devel] [RFC QEMU 0/2] arm/virt: Account for guest pause time

2018-11-08 Thread no-reply
Hi,

This series failed docker-mingw@fedora build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

Type: series
Message-id: 1541616504-68526-1-git-send-email-bijan.mottahe...@oracle.com
Subject: [Qemu-devel] [RFC QEMU 0/2] arm/virt: Account for guest pause time

=== TEST SCRIPT BEGIN ===
#!/bin/bash
time make docker-test-mingw@fedora SHOW_ENV=1 J=8
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
>From https://github.com/patchew-project/qemu
 * [new tag]   patchew/20181108141944.15769-1-miny...@acm.org -> 
patchew/20181108141944.15769-1-miny...@acm.org
Switched to a new branch 'test'
52275419d0 arm/virt: Account for guest pause time
10b0e76068 arm/virt: Initialize generic timer scale factor dynamically

=== OUTPUT BEGIN ===
  BUILD   fedora
make[1]: Entering directory '/var/tmp/patchew-tester-tmp-idp_mvmw/src'
  GEN 
/var/tmp/patchew-tester-tmp-idp_mvmw/src/docker-src.2018-11-08-09.29.49.2318/qemu.tar
Cloning into 
'/var/tmp/patchew-tester-tmp-idp_mvmw/src/docker-src.2018-11-08-09.29.49.2318/qemu.tar.vroot'...
done.
Your branch is up-to-date with 'origin/test'.
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into 
'/var/tmp/patchew-tester-tmp-idp_mvmw/src/docker-src.2018-11-08-09.29.49.2318/qemu.tar.vroot/dtc'...
Submodule path 'dtc': checked out '88f18909db731a627456f26d779445f84e449536'
Submodule 'ui/keycodemapdb' (git://git.qemu.org/keycodemapdb.git) registered 
for path 'ui/keycodemapdb'
Cloning into 
'/var/tmp/patchew-tester-tmp-idp_mvmw/src/docker-src.2018-11-08-09.29.49.2318/qemu.tar.vroot/ui/keycodemapdb'...
Submodule path 'ui/keycodemapdb': checked out 
'6b3d716e2b6472eb7189d3220552280ef3d832ce'
  COPYRUNNER
RUN test-mingw in qemu:fedora 
Packages installed:
SDL2-devel-2.0.8-5.fc28.x86_64
bc-1.07.1-5.fc28.x86_64
bison-3.0.4-9.fc28.x86_64
bluez-libs-devel-5.50-1.fc28.x86_64
brlapi-devel-0.6.7-19.fc28.x86_64
bzip2-1.0.6-26.fc28.x86_64
bzip2-devel-1.0.6-26.fc28.x86_64
ccache-3.4.2-2.fc28.x86_64
clang-6.0.1-1.fc28.x86_64
device-mapper-multipath-devel-0.7.4-3.git07e7bd5.fc28.x86_64
findutils-4.6.0-19.fc28.x86_64
flex-2.6.1-7.fc28.x86_64
gcc-8.1.1-5.fc28.x86_64
gcc-c++-8.1.1-5.fc28.x86_64
gettext-0.19.8.1-14.fc28.x86_64
git-2.17.1-3.fc28.x86_64
glib2-devel-2.56.1-4.fc28.x86_64
glusterfs-api-devel-4.1.2-2.fc28.x86_64
gnutls-devel-3.6.3-3.fc28.x86_64
gtk3-devel-3.22.30-1.fc28.x86_64
hostname-3.20-3.fc28.x86_64
libaio-devel-0.3.110-11.fc28.x86_64
libasan-8.1.1-5.fc28.x86_64
libattr-devel-2.4.48-3.fc28.x86_64
libcap-devel-2.25-9.fc28.x86_64
libcap-ng-devel-0.7.9-4.fc28.x86_64
libcurl-devel-7.59.0-6.fc28.x86_64
libfdt-devel-1.4.6-5.fc28.x86_64
libpng-devel-1.6.34-6.fc28.x86_64
librbd-devel-12.2.7-1.fc28.x86_64
libssh2-devel-1.8.0-7.fc28.x86_64
libubsan-8.1.1-5.fc28.x86_64
libusbx-devel-1.0.22-1.fc28.x86_64
libxml2-devel-2.9.8-4.fc28.x86_64
llvm-6.0.1-6.fc28.x86_64
lzo-devel-2.08-12.fc28.x86_64
make-4.2.1-6.fc28.x86_64
mingw32-SDL2-2.0.5-3.fc27.noarch
mingw32-bzip2-1.0.6-9.fc27.noarch
mingw32-curl-7.57.0-1.fc28.noarch
mingw32-glib2-2.56.1-1.fc28.noarch
mingw32-gmp-6.1.2-2.fc27.noarch
mingw32-gnutls-3.6.2-1.fc28.noarch
mingw32-gtk3-3.22.30-1.fc28.noarch
mingw32-libjpeg-turbo-1.5.1-3.fc27.noarch
mingw32-libpng-1.6.29-2.fc27.noarch
mingw32-libssh2-1.8.0-3.fc27.noarch
mingw32-libtasn1-4.13-1.fc28.noarch
mingw32-nettle-3.4-1.fc28.noarch
mingw32-pixman-0.34.0-3.fc27.noarch
mingw32-pkg-config-0.28-9.fc27.x86_64
mingw64-SDL2-2.0.5-3.fc27.noarch
mingw64-bzip2-1.0.6-9.fc27.noarch
mingw64-curl-7.57.0-1.fc28.noarch
mingw64-glib2-2.56.1-1.fc28.noarch
mingw64-gmp-6.1.2-2.fc27.noarch
mingw64-gnutls-3.6.2-1.fc28.noarch
mingw64-gtk3-3.22.30-1.fc28.noarch
mingw64-libjpeg-turbo-1.5.1-3.fc27.noarch
mingw64-libpng-1.6.29-2.fc27.noarch
mingw64-libssh2-1.8.0-3.fc27.noarch
mingw64-libtasn1-4.13-1.fc28.noarch
mingw64-nettle-3.4-1.fc28.noarch
mingw64-pixman-0.34.0-3.fc27.noarch
mingw64-pkg-config-0.28-9.fc27.x86_64
ncurses-devel-6.1-5.20180224.fc28.x86_64
nettle-devel-3.4-2.fc28.x86_64
nss-devel-3.38.0-1.0.fc28.x86_64
numactl-devel-2.0.11-8.fc28.x86_64
package PyYAML is not installed
package libjpeg-devel is not installed
perl-5.26.2-413.fc28.x86_64
pixman-devel-0.34.0-8.fc28.x86_64
python3-3.6.5-1.fc28.x86_64
snappy-devel-1.1.7-5.fc28.x86_64
sparse-0.5.2-1.fc28.x86_64
spice-server-devel-0.14.0-4.fc28.x86_64
systemtap-sdt-devel-3.3-1.fc28.x86_64
tar-1.30-3.fc28.x86_64
usbredir-devel-0.8.0-1.fc28.x86_64
virglrenderer-devel-0.6.0-4.20170210git76b3da97b.fc28.x86_64
vte3-devel-0.36.5-6.fc28.x86_64
which-2.21-8.fc28.x86_64
xen-devel-4.10.1-5.fc28.x86_64
zlib-devel-1.2.11-8.fc28.x86_64

Environment variables:
TARGET_LIST=
PACKAGES=bc bison bluez-libs-devel brlapi-devel bzip2 
bzip2-devel ccache clang device-mapper-multipath-devel 
findutils flex gcc gcc-c++ gettext git glib2-devel 
glusterfs-api

[RFC QEMU 0/2] arm/virt: Account for guest pause time

2018-11-07 Thread Bijan Mottahedeh
This patch series address two Qemu issues:

  - improper system clock frequency initialization
  - lack of pause (virtsh suspend) time accounting

A simple test to reproduce the problem executes one or more instances
of the following command in the guest:

dd if=/dev/zero of=/dev/null &

and then pauses and resumes the guest after a certain delay:

virsh suspend # pauses the guest
sleep 120
virsh resume 

After the guest is resumed, there are soft lockup warning messages
displayed on the console.

A comparison with x86 shows that hwclock and date values diverge after
the above pause and resume sequence for x86 but remain the same for Arm.

Patch 1 intializes the system clock frequency in Qemu similar to the
kernel.

Patch 2 accumulates the total guest pause time in QEMU and adjusts the
virtual offset counter accordingly before the guest is resumed.

The patches have been tested on an Ampere system.  With the patches the
time behavior is the same as x86 and the soft lockup messages go away.


Clock Frequency Initialization
==

Arm v8 provides the virtual counter (cntvct), virtual counter offset
(cntvoff), and counter frequency (cntfrq) registers for guest time
management.

Linux Arm platform code initializes the system clock frequency from
cntrfq_el0 register and sets the value into a statically created device
tree (DT) node.  It is not clear why the timer device node is created
with TIMER_OF_DECLARE().  The DT passed from Qemu to the kernel does not
contain a timer node.

drivers/clocksource/arm_arch_timer.c:

static inline u32 arch_timer_get_cntfrq(void)
{
return read_sysreg(cntfrq_el0);
}

rate = arch_timer_get_cntfrq();
arch_timer_of_configure_rate(rate, np);

/*
 * For historical reasons, when probing with DT we use whichever (non-zero)
 * rate was probed first, and don't verify that others match. If the first node
 * probed has a clock-frequency property, this overrides the HW register.
 */
static void arch_timer_of_configure_rate(u32 rate, struct device_node *np)
{
...
   if (of_property_read_u32(np, "clock-frequency", _timer_rate)) {
  arch_timer_rate = rate;
...
}

TIMER_OF_DECLARE(armv7_arch_timer, "arm,armv7-timer", arch_timer_of_init);
TIMER_OF_DECLARE(armv8_arch_timer, "arm,armv8-timer", arch_timer_of_init);


Linux then initializes the clock frequency to 50MHZ.

Qemu however hard codes the clock frequency to 62.5MHZ.

target/arm/cpu.h:

/* Scale factor for generic timers, ie number of ns per tick.
 * This gives a 62.5MHz timer.
 */
#define GTIMER_SCALE 16

The suggested fix is to follow the kernel's arch_timer_get_cntfrq()
approach in order to set system_clock_scale to match the kernel's idea
of clock-frequency, rather than using a hard-coded value.

Ultimately, it seems that Qemu should construct the timer DT node and
pass the actual clock frequency value to the kernel that way but that
brings up an interface and backward compatibility considerations.
Furthermore, the implications for ACPI method of probing is not clear.


Pause Time Accounting
=

Linux registers two clock sources, a platform-independent jiffies
clocksource and a Arm-specific arch_sys_counter; the read interface
for the latter reads the virtual counter register:

static struct clocksource clocksource_jiffies = {
.name   = "jiffies",
.rating = 1, /* lowest valid rating*/
.read   = jiffies_read,
.mask   = CLOCKSOURCE_MASK(32),
.mult   = TICK_NSEC << JIFFIES_SHIFT, /* details above */
.shift  = JIFFIES_SHIFT,
.max_cycles = 10,
};

static struct clocksource clocksource_counter = {
.name   = "arch_sys_counter",
.rating = 400,
.read   = arch_counter_read,
.mask   = CLOCKSOURCE_MASK(56),
.flags  = CLOCK_SOURCE_IS_CONTINUOUS,
};

arch_counter_read()
-> arch_timer_read_counter()
   -> arch_counter_get_cntvct()
  -> arch_timer_reg_read_stable(cntvct_el0)

The virtual counter offset register is set from:

kvm_timer_vcpu_load()
-> set_cntvoff()

The counter is zeroed from:

kvm_timer_vcpu_put()
-> set_cntvoff()

/*
 * The kernel may decide to run userspace after calling vcpu_put, so
 * we reset cntvoff to 0 to ensure a consistent read between user
 * accesses to the virtual counter and kernel access to the physical
 * counter of non-VHE case. For VHE, the virtual counter uses a fixed
 * virtual offset of zero, so no need to zero CNTVOFF_EL2 register.
 */
if (!has_vhe())
set_cntvoff(0);

The virtual counter offset is not modified anywhere however to account
for pause time.  The suggested fix is to add pause time accounting to
Qemu.

One potential issue is whether modifying the virtual counter offset
breaks any assumptions, e.g., see the kvm_timer_vcpu_put() comment above.


hwclock vs. date


The hwclock on the ends up in 

Re: [RFC QEMU 0/2] arm/virt: Account for guest pause time

2018-11-07 Thread Christoffer Dall
Hi Bijan,

On Tue, Nov 06, 2018 at 04:32:27PM -0800, Bijan Mottahedeh wrote:
> This patch series address two Qemu issues:

This series should primarily go to qemu-devel (as it is a QEMU patch).

Could you please re-send the series to qemu-devel.  Keeping the kvmarm
list on cc is nice, but only a limited set of people following KVM/Arm
development is actively reviewing QEMU patches.


Thanks,

Christoffer


> 
>   - improper system clock frequency initialization
>   - lack of pause (virtsh suspend) time accounting
> 
> A simple test to reproduce the problem executes one or more instances
> of the following command in the guest:
> 
> dd if=/dev/zero of=/dev/null &
> 
> and then pauses and resumes the guest after a certain delay:
> 
> virsh suspend # pauses the guest
> sleep 120
> virsh resume 
> 
> After the guest is resumed, there are soft lockup warning messages
> displayed on the console.
> 
> A comparison with x86 shows that hwclock and date values diverge after
> the above pause and resume sequence for x86 but remain the same for Arm.
> 
> Patch 1 intializes the system clock frequency in Qemu similar to the
> kernel.
> 
> Patch 2 accumulates the total guest pause time in QEMU and adjusts the
> virtual offset counter accordingly before the guest is resumed.
> 
> The patches have been tested on an Ampere system.  With the patches the
> time behavior is the same as x86 and the soft lockup messages go away.
> 
> 
> Clock Frequency Initialization
> ==
> 
> Arm v8 provides the virtual counter (cntvct), virtual counter offset
> (cntvoff), and counter frequency (cntfrq) registers for guest time
> management.
> 
> Linux Arm platform code initializes the system clock frequency from
> cntrfq_el0 register and sets the value into a statically created device
> tree (DT) node.  It is not clear why the timer device node is created
> with TIMER_OF_DECLARE().  The DT passed from Qemu to the kernel does not
> contain a timer node.
> 
> drivers/clocksource/arm_arch_timer.c:
> 
> static inline u32 arch_timer_get_cntfrq(void)
> {
> return read_sysreg(cntfrq_el0);
> }
> 
> rate = arch_timer_get_cntfrq();
> arch_timer_of_configure_rate(rate, np);
> 
> /*
>  * For historical reasons, when probing with DT we use whichever (non-zero)
>  * rate was probed first, and don't verify that others match. If the first 
> node
>  * probed has a clock-frequency property, this overrides the HW register.
>  */
> static void arch_timer_of_configure_rate(u32 rate, struct device_node *np)
> {
> ...
>if (of_property_read_u32(np, "clock-frequency", _timer_rate)) {
>   arch_timer_rate = rate;
> ...
> }
> 
> TIMER_OF_DECLARE(armv7_arch_timer, "arm,armv7-timer", arch_timer_of_init);
> TIMER_OF_DECLARE(armv8_arch_timer, "arm,armv8-timer", arch_timer_of_init);
> 
> 
> Linux then initializes the clock frequency to 50MHZ.
> 
> Qemu however hard codes the clock frequency to 62.5MHZ.
> 
> target/arm/cpu.h:
> 
> /* Scale factor for generic timers, ie number of ns per tick.
>  * This gives a 62.5MHz timer.
>  */
> #define GTIMER_SCALE 16
> 
> The suggested fix is to follow the kernel's arch_timer_get_cntfrq()
> approach in order to set system_clock_scale to match the kernel's idea
> of clock-frequency, rather than using a hard-coded value.
> 
> Ultimately, it seems that Qemu should construct the timer DT node and
> pass the actual clock frequency value to the kernel that way but that
> brings up an interface and backward compatibility considerations.
> Furthermore, the implications for ACPI method of probing is not clear.
> 
> 
> Pause Time Accounting
> =
> 
> Linux registers two clock sources, a platform-independent jiffies
> clocksource and a Arm-specific arch_sys_counter; the read interface
> for the latter reads the virtual counter register:
> 
> static struct clocksource clocksource_jiffies = {
> .name   = "jiffies",
> .rating = 1, /* lowest valid rating*/
> .read   = jiffies_read,
> .mask   = CLOCKSOURCE_MASK(32),
> .mult   = TICK_NSEC << JIFFIES_SHIFT, /* details above */
> .shift  = JIFFIES_SHIFT,
> .max_cycles = 10,
> };
> 
> static struct clocksource clocksource_counter = {
> .name   = "arch_sys_counter",
> .rating = 400,
> .read   = arch_counter_read,
> .mask   = CLOCKSOURCE_MASK(56),
> .flags  = CLOCK_SOURCE_IS_CONTINUOUS,
> };
> 
> arch_counter_read()
> -> arch_timer_read_counter()
>-> arch_counter_get_cntvct()
>   -> arch_timer_reg_read_stable(cntvct_el0)
> 
> The virtual counter offset register is set from:
> 
> kvm_timer_vcpu_load()
> -> set_cntvoff()
> 
> The counter is zeroed from:
> 
> kvm_timer_vcpu_put()
> -> set_cntvoff()
> 
> /*
>  * The kernel may decide to run userspace after calling vcpu_put, so
>  * we reset cntvoff to 0 to ensure a consistent read between user
>  * 

[RFC QEMU 0/2] arm/virt: Account for guest pause time

2018-11-06 Thread Bijan Mottahedeh
This patch series address two Qemu issues:

  - improper system clock frequency initialization
  - lack of pause (virtsh suspend) time accounting

A simple test to reproduce the problem executes one or more instances
of the following command in the guest:

dd if=/dev/zero of=/dev/null &

and then pauses and resumes the guest after a certain delay:

virsh suspend # pauses the guest
sleep 120
virsh resume 

After the guest is resumed, there are soft lockup warning messages
displayed on the console.

A comparison with x86 shows that hwclock and date values diverge after
the above pause and resume sequence for x86 but remain the same for Arm.

Patch 1 intializes the system clock frequency in Qemu similar to the
kernel.

Patch 2 accumulates the total guest pause time in QEMU and adjusts the
virtual offset counter accordingly before the guest is resumed.

The patches have been tested on an Ampere system.  With the patches the
time behavior is the same as x86 and the soft lockup messages go away.


Clock Frequency Initialization
==

Arm v8 provides the virtual counter (cntvct), virtual counter offset
(cntvoff), and counter frequency (cntfrq) registers for guest time
management.

Linux Arm platform code initializes the system clock frequency from
cntrfq_el0 register and sets the value into a statically created device
tree (DT) node.  It is not clear why the timer device node is created
with TIMER_OF_DECLARE().  The DT passed from Qemu to the kernel does not
contain a timer node.

drivers/clocksource/arm_arch_timer.c:

static inline u32 arch_timer_get_cntfrq(void)
{
return read_sysreg(cntfrq_el0);
}

rate = arch_timer_get_cntfrq();
arch_timer_of_configure_rate(rate, np);

/*
 * For historical reasons, when probing with DT we use whichever (non-zero)
 * rate was probed first, and don't verify that others match. If the first node
 * probed has a clock-frequency property, this overrides the HW register.
 */
static void arch_timer_of_configure_rate(u32 rate, struct device_node *np)
{
...
   if (of_property_read_u32(np, "clock-frequency", _timer_rate)) {
  arch_timer_rate = rate;
...
}

TIMER_OF_DECLARE(armv7_arch_timer, "arm,armv7-timer", arch_timer_of_init);
TIMER_OF_DECLARE(armv8_arch_timer, "arm,armv8-timer", arch_timer_of_init);


Linux then initializes the clock frequency to 50MHZ.

Qemu however hard codes the clock frequency to 62.5MHZ.

target/arm/cpu.h:

/* Scale factor for generic timers, ie number of ns per tick.
 * This gives a 62.5MHz timer.
 */
#define GTIMER_SCALE 16

The suggested fix is to follow the kernel's arch_timer_get_cntfrq()
approach in order to set system_clock_scale to match the kernel's idea
of clock-frequency, rather than using a hard-coded value.

Ultimately, it seems that Qemu should construct the timer DT node and
pass the actual clock frequency value to the kernel that way but that
brings up an interface and backward compatibility considerations.
Furthermore, the implications for ACPI method of probing is not clear.


Pause Time Accounting
=

Linux registers two clock sources, a platform-independent jiffies
clocksource and a Arm-specific arch_sys_counter; the read interface
for the latter reads the virtual counter register:

static struct clocksource clocksource_jiffies = {
.name   = "jiffies",
.rating = 1, /* lowest valid rating*/
.read   = jiffies_read,
.mask   = CLOCKSOURCE_MASK(32),
.mult   = TICK_NSEC << JIFFIES_SHIFT, /* details above */
.shift  = JIFFIES_SHIFT,
.max_cycles = 10,
};

static struct clocksource clocksource_counter = {
.name   = "arch_sys_counter",
.rating = 400,
.read   = arch_counter_read,
.mask   = CLOCKSOURCE_MASK(56),
.flags  = CLOCK_SOURCE_IS_CONTINUOUS,
};

arch_counter_read()
-> arch_timer_read_counter()
   -> arch_counter_get_cntvct()
  -> arch_timer_reg_read_stable(cntvct_el0)

The virtual counter offset register is set from:

kvm_timer_vcpu_load()
-> set_cntvoff()

The counter is zeroed from:

kvm_timer_vcpu_put()
-> set_cntvoff()

/*
 * The kernel may decide to run userspace after calling vcpu_put, so
 * we reset cntvoff to 0 to ensure a consistent read between user
 * accesses to the virtual counter and kernel access to the physical
 * counter of non-VHE case. For VHE, the virtual counter uses a fixed
 * virtual offset of zero, so no need to zero CNTVOFF_EL2 register.
 */
if (!has_vhe())
set_cntvoff(0);

The virtual counter offset is not modified anywhere however to account
for pause time.  The suggested fix is to add pause time accounting to
Qemu.

One potential issue is whether modifying the virtual counter offset
breaks any assumptions, e.g., see the kvm_timer_vcpu_put() comment above.


hwclock vs. date


The hwclock on the ends up in