On 06/03/2015 03:56 AM, Juan Quintela wrote:
"Jason J. Herne" <jjhe...@linux.vnet.ibm.com> wrote:
Provide a method to throttle guest cpu execution. CPUState is augmented with
timeout controls and throttle start/stop functions. To throttle the guest cpu
the caller simply has to call the throttle start function and provide a ratio of
sleep time to normal execution time.

Signed-off-by: Jason J. Herne <jjhe...@linux.vnet.ibm.com>
Reviewed-by: Matthew Rosato <mjros...@linux.vnet.ibm.com>



Hi

sorry that I replied to your previous submission, I had "half done" the
mail from yesterday, I still think that all applies.


---
  cpus.c            | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
  include/qom/cpu.h | 46 +++++++++++++++++++++++++++++++++++++++++
  2 files changed, 108 insertions(+)

diff --git a/cpus.c b/cpus.c
index de6469f..7568357 100644
--- a/cpus.c
+++ b/cpus.c
@@ -64,6 +64,9 @@

  #endif /* CONFIG_LINUX */

+/* Number of ms between cpu throttle operations */
+#define CPU_THROTTLE_TIMESLICE 10


We are checking for throotling on each cpu each 10ms.
But on patch 2 we can see that we only change the throotling each
time that we call migration_bitmap_sync(), that only happens each round
through all the pages. Normally auto-converge only matters for machines
with lots of memory, so this is going to happen each more than 10ms (we
change it each 4 passes).  You changed it to each 2 passes, and you add
it a 0.2.  I think  that I would preffer to just have it each single
pass, but add a 0.1 each pass?  simpler and end result would be the same?



Well, we certainly could make it run every pass but I think it would get
a little too aggressive then. The reason is, we do not increment the throttle rate by adding 0.2 each time. We increment it by multiplying the current rate
by 2. So by doing that every pass we are doubling the exponential growth
rate. I will admit the numbers I chose are hardly scientific... I chose them
because they seemed to provide a decent balance of "throttling aggression" in
my workloads.

This is what the old code did (new code do exactly the same, just in the
previous patch)

-static void mig_sleep_cpu(void *opq)
-{
-    qemu_mutex_unlock_iothread();
-    g_usleep(30*1000);
-    qemu_mutex_lock_iothread();
-}

So, what we are doing, calling async_run_on_cpu() with this function.

To make things short, we end here:

static void flush_queued_work(CPUState *cpu)
{
     struct qemu_work_item *wi;

     if (cpu->queued_work_first == NULL) {
         return;
     }

     while ((wi = cpu->queued_work_first)) {
         cpu->queued_work_first = wi->next;
         wi->func(wi->data);
         wi->done = true;
         if (wi->free) {
             g_free(wi);
         }
     }
     cpu->queued_work_last = NULL;
     qemu_cond_broadcast(&qemu_work_cond);
}

So, we are walking a list that is protected with the iothread_lock, and
we are dropping the lock in the middle of the walk .....  Just hoping
that nothing else is calling run_async_on_cpu() from a different place
....


Could we change this to something like:

diff --git a/kvm-all.c b/kvm-all.c
index 17a3771..ae9635f 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1787,6 +1787,7 @@ int kvm_cpu_exec(CPUState *cpu)
  {
      struct kvm_run *run = cpu->kvm_run;
      int ret, run_ret;
+    long throotling_time;

      DPRINTF("kvm_cpu_exec()\n");

@@ -1813,8 +1814,15 @@ int kvm_cpu_exec(CPUState *cpu)
               */
              qemu_cpu_kick_self();
          }
+        throotling_time = cpu->throotling_time;
+        cpu->throotling_time = 0;
+        cpu->sleeping_time += throotling_time;
          qemu_mutex_unlock_iothread();

+
+        if (throotling_time) {
+            g_usleep(throttling_time);
+        }
          run_ret = kvm_vcpu_ioctl(cpu, KVM_RUN, 0);

          qemu_mutex_lock_iothread();



And we handle where we are doing now what throotling_time and
sleeping_time should be?  No need to drop throotling_time/lock/whatever.

It adds an if on the fast path, but we have a call to the hypervisor
each time, so it shouldn't be so bad, no?

For tcp/xen we should seach for a different place to put this code, but
you get the idea.  Reason for putting it here is because this is where
the iothread is dropped, so we don't lost any locking, any other place
that we put it needs review with respect to this.


Jason, I am really, really sorry to have open this big can of worms on
your patch.  Now you know why I was telling that "improving"
autoconverge was not as easy as it looked.

With respect ot the last comment, I also think that we can include the
Jason patches.  They are one imprevement over what we have now.  It is
just that it needs more improvements.


Yes, I understand what you are suggesting here. I can certainly look into it
if you would prefer that rather than accept the current design. The reason I
did things using the timer and thread was because it fit into the old
auto-converge code very nicely. Does it make sense to go forward with my
current design (modified as per your earlier suggestions), and then follow
up with your proposal at a later date?

--
-- Jason J. Herne (jjhe...@linux.vnet.ibm.com)


Reply via email to