> Can all these things really happen (did you run into this problem on a real 
> system?). Or is this just a theoretical problem.  Ugly (but
> practical) hacks might be OK to solve real problems. 

It is a theoretical problem right now.
But it is a timing issue and there is a possibility to happen actually.

> But do we really want them to fix problems that actually never happen?

If we find a problem (even if it is theoretical), we can't say "It actually 
never happen.".

I have some reasons to submit this patch before reproducing actually.

1)
It is too late if we fix a problem after it actually happened in case where we 
apply Linux, including pstore, 
to mission critical systems, because the failure of those systems has a great 
impact on a whole society.
Customers in this area ask us to fix a problem as soon as possible.
On the other hand, this kind of timing issue is hard to reproduce.
So, our support service engineers often work all night to reproduce it.
It is a nightmare for us.

If we can fix it with a small patch in adance, it is really helpful for us.

2)
In the long term, I plan to add a kmsg_dump to a kexec path because kdump may 
fail in the real world.
In that case, we need another troubleshooting material like pstore to detect a 
root cause of failure.

Actually, someone blamed for a reliability of kdump in LinuxCON Europe.
http://events.linuxfoundation.org/images/stories/pdf/lceu2012_holzheu.pdf

To convince a kexec maintainer to add a kmsg_dump, I need to prove that there 
is no problem in pstore code
causing a failure of kdump.

Seiji

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to