On Fri, Apr 12, 2024, Zide Chen wrote:
> Currently, the migration worker delays 1-10 us, assuming that one
> KVM_RUN iteration only takes a few microseconds.  But if C-state exit
> latencies are large enough, for example, hundreds or even thousands
> of microseconds on server CPUs, it may happen that it's not able to
> bring the target CPU out of C-state before the migration worker starts
> to migrate it to the next CPU.
> 
> If the system workload is light, most CPUs could be at a certain level
> of C-state, which may result in less successful migrations and fail the
> migration/KVM_RUN ratio sanity check.
> 
> This patch adds a command line option to skip the sanity check in
> this case.
> 
> Additionally, seems it's reasonable to randomize the length of usleep(),
> other than delay in a fixed pattern.

This belongs in a separate patch.  And while it's reasonable on the surface, I
doubt think it buys us anything, and it makes an already non-deterministic test
even less deterministic.  In other words, unless a random sleep time helps find
more bugs or finds the original bug faster, just drop the randomization.

> V2:
> - removed the busy loop implementation
> - add the new "-s" option

This belongs in the ignored part of the patch...
> 
> Signed-off-by: Zide Chen <zide.c...@intel.com>

...down here.

> ---
>  tools/testing/selftests/kvm/rseq_test.c | 37 +++++++++++++++++++++++--
>  1 file changed, 34 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/rseq_test.c 
> b/tools/testing/selftests/kvm/rseq_test.c
> index 28f97fb52044..515cfa32a925 100644
> --- a/tools/testing/selftests/kvm/rseq_test.c
> +++ b/tools/testing/selftests/kvm/rseq_test.c
> @@ -150,7 +150,7 @@ static void *migration_worker(void *__rseq_tid)
>                * Use usleep() for simplicity and to avoid unnecessary kernel
>                * dependencies.
>                */
> -             usleep((i % 10) + 1);
> +             usleep((rand() % 10) + 1);
>       }
>       done = true;
>       return NULL;
> @@ -186,12 +186,35 @@ static void calc_min_max_cpu(void)
>                      "Only one usable CPU, task migration not possible");
>  }
>  
> +static void usage(const char *name)

Uber nit, "help()" is more common than "usage()".

> @@ -254,9 +279,15 @@ int main(int argc, char *argv[])
>        * getcpu() to stabilize.  A 2:1 migration:KVM_RUN ratio is a fairly
>        * conservative ratio on x86-64, which can do _more_ KVM_RUNs than
>        * migrations given the 1us+ delay in the migration task.
> +      *
> +      * Another reason why it may have small migration:KVM_RUN ratio is that,
> +      * on systems with large C-state exit latency, it may happen quite often
> +      * that the scheduler is not able to wake up the target CPU before the
> +      * vCPU thread is scheduled to another CPU.
>        */
> -     TEST_ASSERT(i > (NR_TASK_MIGRATIONS / 2),
> -                 "Only performed %d KVM_RUNs, task stalled too much?", i);
> +     TEST_ASSERT(skip_sanity_check || i > (NR_TASK_MIGRATIONS / 2),
> +                 "Only performed %d KVM_RUNs, task stalled too much? "
> +                 "Try to turn off C-states or run it with the -s option", i);

I think it's worth explicitly telling the user how to reduce CPU wakeup latency.
Also, are C-states called that on other architectures?  E.g. maybe this to avoid
confusing the user?  Not a big deal, e.g. I've no objection whatsoever to the
comment, but it seems easy enough to avoid confusing the user.

                    "Try setting /dev/cpu_dma_latency to reduce CPU wakeup 
latency, "
                    "or run with -s to skip this sanity check", i);

Reply via email to