[Issue]

If one cpu ,which is taking a psinfo->buf_lock, 
receive NMI from a panicked cpu via smp_send_stop(),
the panicked cpu hangs up in pstore_dump() called by kmsg_dump(KMSG_DUMP_PANIC)
because the psinfo->buf_lock is taken again in it.

To avoid the deadlock, an easy solution is moving kmsg_dump above
smp_send_stop() in panic path.

But, it is not safe to kick pstore while multiple cpus are running in panic 
case,
because they may touch corrupted data/variables and unnecessary failures may 
happen.
In that case, we can't guarantee that a panicked cpu can log messages reliably
because it may have harmful effects due to the failures.

[Solution]

This patch skips taking a psinfo->buf_lock when just one cpu is online
because stopped cpus turn to offline via smp_send_stop()
in some architectures like x86, powerpc or arm64.

It may be a hack but solves my concern deadlocking in x86 architecture.

Signed-off-by: Seiji Aguchi <seiji.agu...@hds.com>
---
 fs/pstore/platform.c |   14 +++++++++++---
 1 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
index 947fbe0..ca4d2ab 100644
--- a/fs/pstore/platform.c
+++ b/fs/pstore/platform.c
@@ -107,7 +107,7 @@ static void pstore_dump(struct kmsg_dumper *dumper,
        unsigned long   total = 0;
        const char      *why;
        u64             id;
-       unsigned int    part = 1;
+       unsigned int    part = 1, cpu_num = num_online_cpus();
        unsigned long   flags = 0;
        int             is_locked = 0;
        int             ret;
@@ -118,8 +118,14 @@ static void pstore_dump(struct kmsg_dumper *dumper,
                is_locked = spin_trylock(&psinfo->buf_lock);
                if (!is_locked)
                        pr_err("pstore dump routine blocked in NMI, may corrupt 
error record\n");
-       } else
+       } else if (cpu_num > 1) {
+               /*
+                * Take a spin lock only when multiple cpus are online.
+                */
                spin_lock_irqsave(&psinfo->buf_lock, flags);
+       } else
+               local_irq_save(flags);
+
        oopscount++;
        while (total < kmsg_bytes) {
                char *dst;
@@ -146,8 +152,10 @@ static void pstore_dump(struct kmsg_dumper *dumper,
        if (in_nmi()) {
                if (is_locked)
                        spin_unlock(&psinfo->buf_lock);
-       } else
+       } else if (cpu_num > 1) {
                spin_unlock_irqrestore(&psinfo->buf_lock, flags);
+       } else
+               local_irq_restore(flags);
 }
 
 static struct kmsg_dumper pstore_dumper = {
-- 1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to