date:20130716

Re: [PATCH RFC V10 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-07-16 Thread Gleb Natapov

On Tue, Jul 16, 2013 at 09:07:53AM +0530, Raghavendra K T wrote:
 On 07/15/2013 04:06 PM, Gleb Natapov wrote:
 On Mon, Jul 15, 2013 at 03:20:06PM +0530, Raghavendra K T wrote:
 On 07/14/2013 06:42 PM, Gleb Natapov wrote:
 On Mon, Jun 24, 2013 at 06:13:42PM +0530, Raghavendra K T wrote:
 kvm : Paravirtual ticketlocks support for linux guests running on KVM 
 hypervisor
 
 From: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
 
 trimming
 [...]
 +
 +static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t 
 want)
 +{
 + struct kvm_lock_waiting *w;
 + int cpu;
 + u64 start;
 + unsigned long flags;
 +
 + w = __get_cpu_var(lock_waiting);
 + cpu = smp_processor_id();
 + start = spin_time_start();
 +
 + /*
 +  * Make sure an interrupt handler can't upset things in a
 +  * partially setup state.
 +  */
 + local_irq_save(flags);
 +
 + /*
 +  * The ordering protocol on this is that the lock pointer
 +  * may only be set non-NULL if the want ticket is correct.
 +  * If we're updating want, we must first clear lock.
 +  */
 + w-lock = NULL;
 + smp_wmb();
 + w-want = want;
 + smp_wmb();
 + w-lock = lock;
 +
 + add_stats(TAKEN_SLOW, 1);
 +
 + /*
 +  * This uses set_bit, which is atomic but we should not rely on its
 +  * reordering gurantees. So barrier is needed after this call.
 +  */
 + cpumask_set_cpu(cpu, waiting_cpus);
 +
 + barrier();
 +
 + /*
 +  * Mark entry to slowpath before doing the pickup test to make
 +  * sure we don't deadlock with an unlocker.
 +  */
 + __ticket_enter_slowpath(lock);
 +
 + /*
 +  * check again make sure it didn't become free while
 +  * we weren't looking.
 +  */
 + if (ACCESS_ONCE(lock-tickets.head) == want) {
 + add_stats(TAKEN_SLOW_PICKUP, 1);
 + goto out;
 + }
 +
 + /* Allow interrupts while blocked */
 + local_irq_restore(flags);
 +
 So what happens if an interrupt comes here and an interrupt handler
 takes another spinlock that goes into the slow path? As far as I see
 lock_waiting will become overwritten and cpu will be cleared from
 waiting_cpus bitmap by nested kvm_lock_spinning(), so when halt is
 called here after returning from the interrupt handler nobody is going
 to wake this lock holder. Next random interrupt will fix it, but it
 may be several milliseconds away, or never. We should probably check
 if interrupt were enabled and call native_safe_halt() here.
 
 
 Okay you mean something like below should be done.
 if irq_enabled()
native_safe_halt()
 else
halt()
 
 It is been a complex stuff for analysis for me.
 
 So in our discussion stack would looking like this.
 
 spinlock()
kvm_lock_spinning()
-- interrupt here
halt()
 
 
  From the halt if we trace
 
 It is to early to trace the halt since it was not executed yet. Guest
 stack trace will look something like this:
 
 spinlock(a)
kvm_lock_spinning(a)
 lock_waiting = a
 set bit in waiting_cpus
  -- interrupt here
  spinlock(b)
kvm_lock_spinning(b)
  lock_waiting = b
  set bit in waiting_cpus
  halt()
  unset bit in waiting_cpus
  lock_waiting = NULL
   -- ret from interrupt
 halt()
 
 Now at the time of the last halt above lock_waiting == NULL and
 waiting_cpus is empty and not interrupt it pending, so who will unhalt
 the waiter?
 
 
 Yes. if an interrupt occurs between
 local_irq_restore() and halt(), this is possible. and since this is
 rarest of rare (possiility of irq entering slowpath and then no
 random irq to do spurious wakeup), we had never hit this problem in
 the past.
I do not think it is very rare to get interrupt between
local_irq_restore() and halt() under load since any interrupt that
occurs between local_irq_save() and local_irq_restore() will be delivered
immediately after local_irq_restore(). Of course the chance of no other
random interrupt waking lock waiter is very low, but waiter can sleep
for much longer then needed and this will be noticeable in performance.

BTW can NMI handler take spinlocks? If it can what happens if NMI is
delivered in a section protected by local_irq_save()/local_irq_restore()?

 
 So I am,
 1. trying to artificially reproduce this.
 
 2. I replaced the halt with below code,
if (arch_irqs_disabled())
 halt();
 
 and ran benchmarks.
 But this results in degradation because, it means we again go back
 and spin in irq enabled case.
 
Yes, this is not what I proposed.

 3. Now I am analyzing the performance overhead of safe_halt in irq
 enabled case.
   if (arch_irqs_disabled())
halt();
   else
safe_halt();
Use of arch_irqs_disabled() is incorrect here. If you are doing it before
local_irq_restore() it will always be false since you disabled interrupt
yourself, if you do it after then it is to late since interrupt can come
between local_irq_restore() and

Re: splice vs execve lockdep trace.

2013-07-16 Thread Dave Chinner

On Mon, Jul 15, 2013 at 08:25:14PM -0700, Linus Torvalds wrote:
On Mon, Jul 15, 2013 at 7:38 PM, Dave Jones da...@redhat.com wrote:

The recent trinity changes shouldn't have really made
any notable difference here.

Hmm. I'm not aware pf anything that has changed in this area since
3.10 - neither in execve, xfs or in splice. Not even since 3.9.

It's been there for years.

The pipe - cred_guard_mutex lock chain is pretty direct, and can be
clearly attributed to splicing into /proc. Now, whether that is a
*good* idea or not is clearly debatable, and I do think that maybe we
should just not splice to/from proc files, but that doesn't seem to be
new, and I don't think it's necessarily *broken* per se, it's just
that splicing into /proc seems somewhat unnecessary, and various proc
files do end up taking locks that can be interesting.

But this is a new way of triggering the inversion, however

At the other end of the spectrum, the cred_guard_mutex - FS locks
thing from execve() is also pretty clear, and probably not fixable or
necessarily something we'd even want to fix.

But the FS locks - pipe part is a bit questionable. Honestly, I'd
be much happier if XFS used generic_file_splice_read/write().

And looking more at that, I'm actually starting to think this is an
XFS locking problem. XFS really should not call back to splice while
holding the inode lock.

But that XFS code doesn't seem new either. Is XFS a new thing for you
to test with?

I posted patches to fix this i_mutex/i_iolock inversion a couple of
years ago (july 2011):

https://lkml.org/lkml/2011/7/18/4

And V2 was posted here and reviewed (aug 2011):

http://xfs.9218.n7.nabble.com/PATCH-0-2-splice-i-mutex-vs-splice-write-deadlock-V2-tt4072.html#none

It didn't get picked up by with a VFS tree, so sat moldering until
somebody else reported it (Nov 2012) and it reposted it again, only
to have it ignored again:

http://oss.sgi.com/archives/xfs/2012-11/msg00671.html

And I recently discussed it again with Al w.r.t. filesystem freeze
problems he was looking at, and I was waiting for that to settle
down before I posted the fixes again

Cheers,

Dave.
--
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1502 matches

Mail list logo