Re: [Qemu-block] [PATCH] throttle-groups: fix hang when group member leaves

Alberto Garcia Tue, 31 Jul 2018 09:48:27 -0700

On Wed 04 Jul 2018 04:54:10 PM CEST, Stefan Hajnoczi wrote:
> Throttle groups consist of members sharing one throttling state
> (including bps/iops limits).  Round-robin scheduling is used to ensure
> fairness.  If a group member already has a timer pending then other
> groups members do not schedule their own timers.  The next group
> member will have its turn when the existing timer expires.
>
> A hang may occur when a group member leaves while it had a timer
> scheduled.


Ok, I can reproduce this if I run fio with iodepth=1.

We're draining the BDS before removing it from a throttle group, and
therefore there cannot be any pending requests.

So the problem seems to be that when throttle_co_drain_begin() runs the
pending requests from a member using throttle_group_co_restart_queue(),
it simply uses qemu_co_queue_next() and doesn't touch the timer at all.

So it can happen that there's a request in the queue waiting for a
timer, and after that call the request is gone but the timer remains.

The current patch is perhaps not worth touching at this point (we're
about to release QEMU 3.0), but I think that a better solution would be
to either

a) cancel the existing timer and reset tg->any_timer_armed on the given
   tgm after throttle_group_co_restart_queue() and before
   schedule_next_request() if the queue is empty.

b) force the existing timer to run immediately instead of calling
   throttle_group_co_restart_queue(). Seems cleaner, but I haven't tried
   this one yet.

I'll explore them a bit and send a patch.

Berto

Re: [Qemu-block] [PATCH] throttle-groups: fix hang when group member leaves

Reply via email to