On Tue, Nov 26, 2013 at 06:14:27PM +0200, Gleb Natapov wrote: > On Tue, Nov 26, 2013 at 06:05:37PM +0200, Michael S. Tsirkin wrote: > > On Tue, Nov 26, 2013 at 02:56:10PM +0200, Gleb Natapov wrote: > > > On Tue, Nov 26, 2013 at 01:47:03PM +0100, Paolo Bonzini wrote: > > > > Il 26/11/2013 13:40, Zhanghaoyu (A) ha scritto: > > > > > When guest set irq smp_affinity, VMEXIT occurs, then the vcpu thread > > > > > will IOCTL return to QEMU from hypervisor, then vcpu thread ask the > > > > > hypervisor to update the irq routing table, > > > > > in kvm_set_irq_routing, synchronize_rcu is called, current vcpu > > > > > thread is blocked for so much time to wait RCU grace period, and > > > > > during this period, this vcpu cannot provide service to VM, > > > > > so those interrupts delivered to this vcpu cannot be handled in time, > > > > > and the apps running on this vcpu cannot be serviced too. > > > > > It's unacceptable in some real-time scenario, e.g. telecom. > > > > > > > > > > So, I want to create a single workqueue for each VM, to > > > > > asynchronously performing the RCU synchronization for irq routing > > > > > table, > > > > > and let the vcpu thread return and VMENTRY to service VM immediately, > > > > > no more need to blocked to wait RCU grace period. > > > > > And, I have implemented a raw patch, took a test in our telecom > > > > > environment, above problem disappeared. > > > > > > > > I don't think a workqueue is even needed. You just need to use call_rcu > > > > to free "old" after releasing kvm->irq_lock. > > > > > > > > What do you think? > > > > > > > It should be rate limited somehow. Since it guest triggarable guest may > > > cause > > > host to allocate a lot of memory this way. > > > > The checks in __call_rcu(), should handle this I think. These keep a > > per-CPU > > counter, which can be adjusted via rcutree.blimit, which defaults > > to taking evasive action if more than 10K callbacks are waiting on a > > given CPU. > > > > > Documentation/RCU/checklist.txt has: > > An especially important property of the synchronize_rcu() > primitive is that it automatically self-limits: if grace periods > are delayed for whatever reason, then the synchronize_rcu() > primitive will correspondingly delay updates. In contrast, > code using call_rcu() should explicitly limit update rate in > cases where grace periods are delayed, as failing to do so can > result in excessive realtime latencies or even OOM conditions.
I just asked Paul what this means. > -- > Gleb.