On Mon, 2010-08-23 at 15:25 +0200, Jan Kiszka wrote:
> Jan Kiszka wrote:
> > When preempting the read side of a Linux rwlock with a critical IPI
> > while the write side was already spinning and Linux is the top-most
> > domain, ipipe_critical_enter will trigger a deadlock.
> >
> > Work around this issue by detecting long stalls while waiting for all
> > CPUs, terminate the request in this case, and restart it. In that case
> > we have to make sure that a potential sync function is not executed
> > multiple time, so we have to synchronize the restart on all CPUs passing
> > through the IPI handler, though now in arbitrary order.
> >
> > Signed-off-by: Jan Kiszka <[email protected]>
> > ---
> >
> > This patch is still under test, and as it usually took at least several
> > hundred reboot cycles to trigger the issue, this test may take some more
> > days. Still I would like to collect first feedback on the proposal. I'm
> > not fully happy with having to define an arbitrary loop count as restart
> > condition. Let's hope this behaves properly when the number of CPU
> > increases.
>
> Patch ran well a few thousand reboot cycles inside kvm (until the latter
> crashed the host - should have updated it first...) while it used to
> lock up after a few hundreds before.
>
It looks sane to me as well, so I'll merge this. Thanks.
> >
> > I haven't checked all other arches, but at least powerpc requires a
> > similar fix - maybe by pushing the generic bits into generic ipipe code.
> >
> > arch/x86/kernel/ipipe.c | 38 +++++++++++++++++++++++++++++++++++++-
> > 1 files changed, 37 insertions(+), 1 deletions(-)
> >
> > diff --git a/arch/x86/kernel/ipipe.c b/arch/x86/kernel/ipipe.c
> > index 76283c3..cc2ca03 100644
> > --- a/arch/x86/kernel/ipipe.c
> > +++ b/arch/x86/kernel/ipipe.c
> > @@ -55,10 +55,14 @@ DEFINE_PER_CPU(struct pt_regs, __ipipe_tick_regs);
> >
> > #ifdef CONFIG_SMP
> >
> > +#define IPIPE_CRITICAL_TIMEOUT 1000000
> > +
> > static cpumask_t __ipipe_cpu_sync_map;
> >
> > static cpumask_t __ipipe_cpu_lock_map;
> >
> > +static cpumask_t __ipipe_cpu_pass_map;
> > +
> > static unsigned long __ipipe_critical_lock;
> >
> > static IPIPE_DEFINE_SPINLOCK(__ipipe_cpu_barrier);
> > @@ -406,6 +410,8 @@ void __ipipe_do_critical_sync(unsigned irq, void
> > *cookie)
> > /* Call the sync routine if any. */
> > __ipipe_cpu_sync();
> >
> > + cpu_set(cpu, __ipipe_cpu_pass_map);
> > +
> > spin_unlock(&__ipipe_cpu_barrier);
> >
> > cpu_clear(cpu, __ipipe_cpu_sync_map);
> > @@ -441,6 +447,7 @@ unsigned long ipipe_critical_enter(void (*syncfn)
> > (void))
> >
> > {
> > int cpu = ipipe_processor_id();
> > + unsigned long loops;
> > cpumask_t lock_map;
> >
> > if (!cpu_test_and_set(cpu, __ipipe_cpu_lock_map)) {
> > @@ -451,17 +458,46 @@ unsigned long ipipe_critical_enter(void (*syncfn)
> > (void))
> > } while (++n < cpu);
> > }
> >
> > +restart:
> > spin_lock(&__ipipe_cpu_barrier);
> >
> > __ipipe_cpu_sync = syncfn;
> >
> > + cpus_clear(__ipipe_cpu_pass_map);
> > + cpu_set(cpu, __ipipe_cpu_pass_map);
> > +
> > /* Send the sync IPI to all processors but the current
> > one. */
> > apic->send_IPI_allbutself(IPIPE_CRITICAL_VECTOR);
> >
> > cpus_andnot(lock_map, cpu_online_map,
> > __ipipe_cpu_lock_map);
> >
> > - while (!cpus_equal(__ipipe_cpu_sync_map, lock_map))
> > + loops = IPIPE_CRITICAL_TIMEOUT;
> > +
> > + while (!cpus_equal(__ipipe_cpu_sync_map, lock_map)) {
> > cpu_relax();
> > +
> > + if (--loops == 0) {
> > + /*
> > + * We ran into a deadlock due to a
> > + * contended rwlock. Cancel this round
> > + * and retry after some cycles.
> > + */
> > + __ipipe_cpu_sync = NULL;
> > +
> > + spin_unlock(&__ipipe_cpu_barrier);
> > +
> > + /*
> > + * Ensure all CPUs consumed the IPI to
> > + * avoid running __ipipe_cpu_sync
> > + * prematurely.
> > + */
> > + while (!cpus_equal(cpu_online_map,
> > +
> > __ipipe_cpu_pass_map))
> > + cpu_relax();
> > +
> > + goto restart;
> > + }
> > + }
> > }
> >
> > atomic_inc(&__ipipe_critical_count);
>
> Any further comments or change requests? Otherwise I would send out a
> git pull request for this patch and two minor ones in my 34-x86 queue.
>
> Also, I'm considering to refactor the critical sync code, moving it
> completely to kernel/ipipe/core.c. There seem to be only minor
> variations in its "arch-specific" implementations. But that would be
> 2.6.35 material.
>
> BTW, any x86 bits for 2.6.35 in some private branch of yours, Philippe?
> I have "kernel update" on my agenda, and I would like to use that
> version as base. If there are no bits, I would start generating some.
>
> Jan
>
--
Philippe.
_______________________________________________
Adeos-main mailing list
[email protected]
https://mail.gna.org/listinfo/adeos-main