Re: [PATCH] printk: Don't discard earlier unprinted messages to make space

2015-10-23 Thread Petr Mladek
On Thu 2015-10-22 14:19:26, David Howells wrote:
> Petr Mladek  wrote:
> 
> > I would expect that the first few messages are printed to the console
> > before the buffer is wrapped. IMHO, in many cases, you are interested
> > into the final messages that describe why the system went down.
> 
> The last message might tell you that the machine panicked because the NMI
> handler triggered due to a spinlocked section taking too long or something.
> This doesn't help if the oops that caused the spinlock to remain held or
> whatever gets discarded from the buffer due to several intervening complaints
> that result secondarily from the initial oops.

There might be flood of messages if you enable debugging or so. In
this case, the interesting messages would be at the end of the buffer.

IMHO, you want both the beginning of the flood and the end when
machine goes down. You do not want the repeated blob in the middle.
Do you really miss the first few lines on the serial console, please?
Did you consider the panic on Oops kernel parameter?


> > If there is no time to print them, you want to have them in the crash dump
> > (ring buffer) at least.
>
> But not at the expense of discarding the first oops report.  *That* one is the
> most important.

But your patch discards the initial messages as well once they are printed.


> Perhaps things could be arranged such that messages *can* be discarded from
> the front of the buffer *provided* they are not oops messages.

Another possibility would be to decide this by the importance level
of the message. For example, we might start ignoring less important
messages when the buffer is getting full. But this should be optional.
It makes sense only when someone print/logs the messages somewhere.

Anyway, be warned that any additional complexity must have very good
reasons. The printk code already is bloated and there is a strong
resistance to make it worse.

Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] printk: Don't discard earlier unprinted messages to make space

2015-10-23 Thread Petr Mladek
On Thu 2015-10-22 14:19:26, David Howells wrote:
> Petr Mladek  wrote:
> 
> > I would expect that the first few messages are printed to the console
> > before the buffer is wrapped. IMHO, in many cases, you are interested
> > into the final messages that describe why the system went down.
> 
> The last message might tell you that the machine panicked because the NMI
> handler triggered due to a spinlocked section taking too long or something.
> This doesn't help if the oops that caused the spinlock to remain held or
> whatever gets discarded from the buffer due to several intervening complaints
> that result secondarily from the initial oops.

There might be flood of messages if you enable debugging or so. In
this case, the interesting messages would be at the end of the buffer.

IMHO, you want both the beginning of the flood and the end when
machine goes down. You do not want the repeated blob in the middle.
Do you really miss the first few lines on the serial console, please?
Did you consider the panic on Oops kernel parameter?


> > If there is no time to print them, you want to have them in the crash dump
> > (ring buffer) at least.
>
> But not at the expense of discarding the first oops report.  *That* one is the
> most important.

But your patch discards the initial messages as well once they are printed.


> Perhaps things could be arranged such that messages *can* be discarded from
> the front of the buffer *provided* they are not oops messages.

Another possibility would be to decide this by the importance level
of the message. For example, we might start ignoring less important
messages when the buffer is getting full. But this should be optional.
It makes sense only when someone print/logs the messages somewhere.

Anyway, be warned that any additional complexity must have very good
reasons. The printk code already is bloated and there is a strong
resistance to make it worse.

Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] printk: Don't discard earlier unprinted messages to make space

2015-10-22 Thread David Woodhouse
On Thu, 2015-10-22 at 14:19 +0100, David Howells wrote:
> Petr Mladek  wrote:
> 
> > I would expect that the first few messages are printed to the console
> > before the buffer is wrapped. IMHO, in many cases, you are interested
> > into the final messages that describe why the system went down.
> 
> The last message might tell you that the machine panicked because the NMI
> handler triggered due to a spinlocked section taking too long or something.
> This doesn't help if the oops that caused the spinlock to remain held or
> whatever gets discarded from the buffer due to several intervening complaints
> that result secondarily from the initial oops.
> 
> > If there is no time to print them, you want to have them in the crash dump
> > (ring buffer) at least.
> 
> But not at the expense of discarding the first oops report.  *That* one is the
> most important.
> 
> Perhaps things could be arranged such that messages *can* be discarded from
> the front of the buffer *provided* they are not oops messages.

Of course, all this is a symptom of the fact that printk is now lossy.
If you had an option to make it reliable again, you wouldn't need to
argue over *which* messages to lose.

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] printk: Don't discard earlier unprinted messages to make space

2015-10-22 Thread David Howells
Petr Mladek  wrote:

> I would expect that the first few messages are printed to the console
> before the buffer is wrapped. IMHO, in many cases, you are interested
> into the final messages that describe why the system went down.

The last message might tell you that the machine panicked because the NMI
handler triggered due to a spinlocked section taking too long or something.
This doesn't help if the oops that caused the spinlock to remain held or
whatever gets discarded from the buffer due to several intervening complaints
that result secondarily from the initial oops.

> If there is no time to print them, you want to have them in the crash dump
> (ring buffer) at least.

But not at the expense of discarding the first oops report.  *That* one is the
most important.

Perhaps things could be arranged such that messages *can* be discarded from
the front of the buffer *provided* they are not oops messages.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] printk: Don't discard earlier unprinted messages to make space

2015-10-22 Thread David Woodhouse
On Thu, 2015-10-22 at 14:18 +0200, Jan Kara wrote:
> 
> > 
> > What happened to it? And how do we fix it?
> 
> Hard to fix since you'd easily get RCU stalls and softlockup messages on
> systems with lots of CPUs and heavy printk traffic...

If we have to explicitly opt in to synchronous output, then such things
should be tolerable.

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] printk: Don't discard earlier unprinted messages to make space

2015-10-22 Thread Petr Mladek
Added Andrew into CC who maintains printk() code.

On Thu 2015-10-22 11:16:50, David Howells wrote:
> printk() currently discards earlier messages to make space for new messages
> arriving.  This has the distinct downside that if the kernel starts
> churning out messages because of some initial incident, the report of the
> initial incident is likely to be lost under a blizzard of:
> 
>   ** NNN printk messages dropped **
> 
> messages from console_unlock().
> 
> The first message generated (typically an oops) is usually the most
> important - the one you want to solve first - so we really want to see
> that.

oops=panic or panic_on_warn kernel parameters might be more useful in
this situation.

I would expect that the first few messages are printed to the console
before the buffer is wrapped. IMHO, in many cases, you are interested
into the final messages that describe why the system went down. If there
is no time to print them, you want to have them in the crash dump
(ring buffer) at least.


> To this end, change log_store() to only write a message into the buffer if
> there is sufficient space to hold that message.  The message may be
> truncated if it will then fit.
>
> This patch could be improved by noting that some messages got discarded
> when next there is space to do so.
>
> Signed-off-by: David Howells 
> ---

[...]

> + if (CIRC_SPACE_TO_END(log_next_idx, log_first_idx, log_buf_len) >= 
> wsize)
> + goto have_space_no_wrap;
> +
> + if (CIRC_SPACE(log_next_idx, log_first_idx, log_buf_len) >= size)
> + goto have_space_wrap;
> +
> + /* Try to truncate the message. */
> + size = truncate_msg(_len, _msg_len, _len, _len);
> + wsize = size + sizeof(struct printk_log);

truncate_msg() currently works only for messages that are
bigger than half of the buffer.

Also if the flood of messages is faster than printing, you might
end up with the buffer full of "" strings with a minimum
of useful information.


> + if (CIRC_SPACE_TO_END(log_next_idx, log_first_idx, log_buf_len) >= 
> wsize)
> + goto have_space_no_wrap;
> +
> + if (CIRC_SPACE(log_next_idx, log_first_idx, log_buf_len) < size)
> + return 0;

Please, try to avoid copy code.

Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] printk: Don't discard earlier unprinted messages to make space

2015-10-22 Thread Jan Kara
On Thu 22-10-15 11:28:47, David Woodhouse wrote:
> On Thu, 2015-10-22 at 11:16 +0100, David Howells wrote:
> > printk() currently discards earlier messages to make space for new messages
> > arriving.  This has the distinct downside that if the kernel starts
> > churning out messages because of some initial incident, the report of the
> > initial incident is likely to be lost under a blizzard of:
> > 
> > ** NNN printk messages dropped **
> > 
> > messages from console_unlock().
> > 
> > The first message generated (typically an oops) is usually the most
> > important - the one you want to solve first - so we really want to see
> > that.
> 
> But wait... didn't I watch you muttering on IRC about the actual bug
> you were trying to catch here... and didn't you have a *serial* console
> hooked up?
> 
> What broke such that serial console stopped giving you *every* message?
> 
> Serial console was always *synchronous*.

Not for last 10 years or even more... When there is only a single CPU
calling printk, you are right. But when two CPUs happen to enter printk
code, the first will do the printing and the second printk will return
immediately after appending message to the kernel log buffer and the
message will appear on console when the first CPU gets to it.

> We could do stuff like...
> 
>printk("Going to do foo...\n");
>outb(foo, baz);
>printk("Did foo and the machine didn't catch fire! Now bar\n");
>outb(bar, baz);
>printk("Done\n");
> 
> And with a serial console I could know *precisely* the point at which
> the machine locked up.
> 
> And I could enable the silly debugging levels on things like JFFS2 and
> be sure that with a serial console I could catch *every* printk
> reliably — which led to a number of cases where people would reproduce
> a bug with a serial console and debugging, mail me a huge log file, and
> get a patch back in reply.
> 
> We *need* to have a mode where serial console is actually *reliable*,
> and we can know that the message has been sent out the port before the
> printk() call returns.
> 
> What happened to it? And how do we fix it?

Hard to fix since you'd easily get RCU stalls and softlockup messages on
systems with lots of CPUs and heavy printk traffic...

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] printk: Don't discard earlier unprinted messages to make space

2015-10-22 Thread David Woodhouse
On Thu, 2015-10-22 at 11:16 +0100, David Howells wrote:
> printk() currently discards earlier messages to make space for new messages
> arriving.  This has the distinct downside that if the kernel starts
> churning out messages because of some initial incident, the report of the
> initial incident is likely to be lost under a blizzard of:
> 
> ** NNN printk messages dropped **
> 
> messages from console_unlock().
> 
> The first message generated (typically an oops) is usually the most
> important - the one you want to solve first - so we really want to see
> that.

But wait... didn't I watch you muttering on IRC about the actual bug
you were trying to catch here... and didn't you have a *serial* console
hooked up?

What broke such that serial console stopped giving you *every* message?

Serial console was always *synchronous*.

We could do stuff like...

   printk("Going to do foo...\n");
   outb(foo, baz);
   printk("Did foo and the machine didn't catch fire! Now bar\n");
   outb(bar, baz);
   printk("Done\n");

And with a serial console I could know *precisely* the point at which
the machine locked up.

And I could enable the silly debugging levels on things like JFFS2 and
be sure that with a serial console I could catch *every* printk
reliably — which led to a number of cases where people would reproduce
a bug with a serial console and debugging, mail me a huge log file, and
get a patch back in reply.

We *need* to have a mode where serial console is actually *reliable*,
and we can know that the message has been sent out the port before the
printk() call returns.

What happened to it? And how do we fix it?

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature


[PATCH] printk: Don't discard earlier unprinted messages to make space

2015-10-22 Thread David Howells
printk() currently discards earlier messages to make space for new messages
arriving.  This has the distinct downside that if the kernel starts
churning out messages because of some initial incident, the report of the
initial incident is likely to be lost under a blizzard of:

** NNN printk messages dropped **

messages from console_unlock().

The first message generated (typically an oops) is usually the most
important - the one you want to solve first - so we really want to see
that.

To this end, change log_store() to only write a message into the buffer if
there is sufficient space to hold that message.  The message may be
truncated if it will then fit.

This patch could be improved by noting that some messages got discarded
when next there is space to do so.

Signed-off-by: David Howells 
---

 kernel/printk/printk.c |   96 ++--
 1 file changed, 35 insertions(+), 61 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 8f0324ef72ab..c2099ede0bc8 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -46,6 +46,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -336,48 +337,6 @@ static u32 log_next(u32 idx)
return idx + msg->len;
 }
 
-/*
- * Check whether there is enough free space for the given message.
- *
- * The same values of first_idx and next_idx mean that the buffer
- * is either empty or full.
- *
- * If the buffer is empty, we must respect the position of the indexes.
- * They cannot be reset to the beginning of the buffer.
- */
-static int logbuf_has_space(u32 msg_size, bool empty)
-{
-   u32 free;
-
-   if (log_next_idx > log_first_idx || empty)
-   free = max(log_buf_len - log_next_idx, log_first_idx);
-   else
-   free = log_first_idx - log_next_idx;
-
-   /*
-* We need space also for an empty header that signalizes wrapping
-* of the buffer.
-*/
-   return free >= msg_size + sizeof(struct printk_log);
-}
-
-static int log_make_free_space(u32 msg_size)
-{
-   while (log_first_seq < log_next_seq) {
-   if (logbuf_has_space(msg_size, false))
-   return 0;
-   /* drop old messages until we have enough contiguous space */
-   log_first_idx = log_next(log_first_idx);
-   log_first_seq++;
-   }
-
-   /* sequence numbers are equal, so the log buffer is empty */
-   if (logbuf_has_space(msg_size, true))
-   return 0;
-
-   return -ENOMEM;
-}
-
 /* compute the message size including the padding bytes */
 static u32 msg_used_size(u16 text_len, u16 dict_len, u32 *pad_len)
 {
@@ -423,32 +382,47 @@ static int log_store(int facility, int level,
 const char *text, u16 text_len)
 {
struct printk_log *msg;
-   u32 size, pad_len;
+   u32 size, wsize, pad_len;
u16 trunc_msg_len = 0;
 
/* number of '\0' padding bytes to next message */
size = msg_used_size(text_len, dict_len, _len);
 
-   if (log_make_free_space(size)) {
-   /* truncate the message if it is too long for empty buffer */
-   size = truncate_msg(_len, _msg_len,
-   _len, _len);
-   /* survive when the log buffer is too small for trunc_msg */
-   if (log_make_free_space(size))
-   return 0;
-   }
+   /* See if we have sufficient space to insert a message.  Note we also
+* need to keep a gap right at the end of the buffer to insert a 'wrap
+* here' marker.
+*/
+   wsize = size + sizeof(struct printk_log);
 
-   if (log_next_idx + size + sizeof(struct printk_log) > log_buf_len) {
-   /*
-* This message + an additional empty header does not fit
-* at the end of the buffer. Add an empty header with len == 0
-* to signify a wrap around.
-*/
-   memset(log_buf + log_next_idx, 0, sizeof(struct printk_log));
-   log_next_idx = 0;
-   }
+   if (log_first_seq != log_next_seq && log_first_idx == log_next_idx)
+   return 0; /* Buffer is completely full */
+
+   if (CIRC_SPACE_TO_END(log_next_idx, log_first_idx, log_buf_len) >= 
wsize)
+   goto have_space_no_wrap;
+
+   if (CIRC_SPACE(log_next_idx, log_first_idx, log_buf_len) >= size)
+   goto have_space_wrap;
+
+   /* Try to truncate the message. */
+   size = truncate_msg(_len, _msg_len, _len, _len);
+   wsize = size + sizeof(struct printk_log);
+
+   if (CIRC_SPACE_TO_END(log_next_idx, log_first_idx, log_buf_len) >= 
wsize)
+   goto have_space_no_wrap;
+
+   if (CIRC_SPACE(log_next_idx, log_first_idx, log_buf_len) < size)
+   return 0;
+
+have_space_wrap:
+   /* This message plus an additional empty header does not fit at the end

Re: [PATCH] printk: Don't discard earlier unprinted messages to make space

2015-10-22 Thread David Woodhouse
On Thu, 2015-10-22 at 11:16 +0100, David Howells wrote:
> printk() currently discards earlier messages to make space for new messages
> arriving.  This has the distinct downside that if the kernel starts
> churning out messages because of some initial incident, the report of the
> initial incident is likely to be lost under a blizzard of:
> 
> ** NNN printk messages dropped **
> 
> messages from console_unlock().
> 
> The first message generated (typically an oops) is usually the most
> important - the one you want to solve first - so we really want to see
> that.

But wait... didn't I watch you muttering on IRC about the actual bug
you were trying to catch here... and didn't you have a *serial* console
hooked up?

What broke such that serial console stopped giving you *every* message?

Serial console was always *synchronous*.

We could do stuff like...

   printk("Going to do foo...\n");
   outb(foo, baz);
   printk("Did foo and the machine didn't catch fire! Now bar\n");
   outb(bar, baz);
   printk("Done\n");

And with a serial console I could know *precisely* the point at which
the machine locked up.

And I could enable the silly debugging levels on things like JFFS2 and
be sure that with a serial console I could catch *every* printk
reliably — which led to a number of cases where people would reproduce
a bug with a serial console and debugging, mail me a huge log file, and
get a patch back in reply.

We *need* to have a mode where serial console is actually *reliable*,
and we can know that the message has been sent out the port before the
printk() call returns.

What happened to it? And how do we fix it?

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] printk: Don't discard earlier unprinted messages to make space

2015-10-22 Thread Jan Kara
On Thu 22-10-15 11:28:47, David Woodhouse wrote:
> On Thu, 2015-10-22 at 11:16 +0100, David Howells wrote:
> > printk() currently discards earlier messages to make space for new messages
> > arriving.  This has the distinct downside that if the kernel starts
> > churning out messages because of some initial incident, the report of the
> > initial incident is likely to be lost under a blizzard of:
> > 
> > ** NNN printk messages dropped **
> > 
> > messages from console_unlock().
> > 
> > The first message generated (typically an oops) is usually the most
> > important - the one you want to solve first - so we really want to see
> > that.
> 
> But wait... didn't I watch you muttering on IRC about the actual bug
> you were trying to catch here... and didn't you have a *serial* console
> hooked up?
> 
> What broke such that serial console stopped giving you *every* message?
> 
> Serial console was always *synchronous*.

Not for last 10 years or even more... When there is only a single CPU
calling printk, you are right. But when two CPUs happen to enter printk
code, the first will do the printing and the second printk will return
immediately after appending message to the kernel log buffer and the
message will appear on console when the first CPU gets to it.

> We could do stuff like...
> 
>printk("Going to do foo...\n");
>outb(foo, baz);
>printk("Did foo and the machine didn't catch fire! Now bar\n");
>outb(bar, baz);
>printk("Done\n");
> 
> And with a serial console I could know *precisely* the point at which
> the machine locked up.
> 
> And I could enable the silly debugging levels on things like JFFS2 and
> be sure that with a serial console I could catch *every* printk
> reliably — which led to a number of cases where people would reproduce
> a bug with a serial console and debugging, mail me a huge log file, and
> get a patch back in reply.
> 
> We *need* to have a mode where serial console is actually *reliable*,
> and we can know that the message has been sent out the port before the
> printk() call returns.
> 
> What happened to it? And how do we fix it?

Hard to fix since you'd easily get RCU stalls and softlockup messages on
systems with lots of CPUs and heavy printk traffic...

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] printk: Don't discard earlier unprinted messages to make space

2015-10-22 Thread David Howells
printk() currently discards earlier messages to make space for new messages
arriving.  This has the distinct downside that if the kernel starts
churning out messages because of some initial incident, the report of the
initial incident is likely to be lost under a blizzard of:

** NNN printk messages dropped **

messages from console_unlock().

The first message generated (typically an oops) is usually the most
important - the one you want to solve first - so we really want to see
that.

To this end, change log_store() to only write a message into the buffer if
there is sufficient space to hold that message.  The message may be
truncated if it will then fit.

This patch could be improved by noting that some messages got discarded
when next there is space to do so.

Signed-off-by: David Howells 
---

 kernel/printk/printk.c |   96 ++--
 1 file changed, 35 insertions(+), 61 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 8f0324ef72ab..c2099ede0bc8 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -46,6 +46,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -336,48 +337,6 @@ static u32 log_next(u32 idx)
return idx + msg->len;
 }
 
-/*
- * Check whether there is enough free space for the given message.
- *
- * The same values of first_idx and next_idx mean that the buffer
- * is either empty or full.
- *
- * If the buffer is empty, we must respect the position of the indexes.
- * They cannot be reset to the beginning of the buffer.
- */
-static int logbuf_has_space(u32 msg_size, bool empty)
-{
-   u32 free;
-
-   if (log_next_idx > log_first_idx || empty)
-   free = max(log_buf_len - log_next_idx, log_first_idx);
-   else
-   free = log_first_idx - log_next_idx;
-
-   /*
-* We need space also for an empty header that signalizes wrapping
-* of the buffer.
-*/
-   return free >= msg_size + sizeof(struct printk_log);
-}
-
-static int log_make_free_space(u32 msg_size)
-{
-   while (log_first_seq < log_next_seq) {
-   if (logbuf_has_space(msg_size, false))
-   return 0;
-   /* drop old messages until we have enough contiguous space */
-   log_first_idx = log_next(log_first_idx);
-   log_first_seq++;
-   }
-
-   /* sequence numbers are equal, so the log buffer is empty */
-   if (logbuf_has_space(msg_size, true))
-   return 0;
-
-   return -ENOMEM;
-}
-
 /* compute the message size including the padding bytes */
 static u32 msg_used_size(u16 text_len, u16 dict_len, u32 *pad_len)
 {
@@ -423,32 +382,47 @@ static int log_store(int facility, int level,
 const char *text, u16 text_len)
 {
struct printk_log *msg;
-   u32 size, pad_len;
+   u32 size, wsize, pad_len;
u16 trunc_msg_len = 0;
 
/* number of '\0' padding bytes to next message */
size = msg_used_size(text_len, dict_len, _len);
 
-   if (log_make_free_space(size)) {
-   /* truncate the message if it is too long for empty buffer */
-   size = truncate_msg(_len, _msg_len,
-   _len, _len);
-   /* survive when the log buffer is too small for trunc_msg */
-   if (log_make_free_space(size))
-   return 0;
-   }
+   /* See if we have sufficient space to insert a message.  Note we also
+* need to keep a gap right at the end of the buffer to insert a 'wrap
+* here' marker.
+*/
+   wsize = size + sizeof(struct printk_log);
 
-   if (log_next_idx + size + sizeof(struct printk_log) > log_buf_len) {
-   /*
-* This message + an additional empty header does not fit
-* at the end of the buffer. Add an empty header with len == 0
-* to signify a wrap around.
-*/
-   memset(log_buf + log_next_idx, 0, sizeof(struct printk_log));
-   log_next_idx = 0;
-   }
+   if (log_first_seq != log_next_seq && log_first_idx == log_next_idx)
+   return 0; /* Buffer is completely full */
+
+   if (CIRC_SPACE_TO_END(log_next_idx, log_first_idx, log_buf_len) >= 
wsize)
+   goto have_space_no_wrap;
+
+   if (CIRC_SPACE(log_next_idx, log_first_idx, log_buf_len) >= size)
+   goto have_space_wrap;
+
+   /* Try to truncate the message. */
+   size = truncate_msg(_len, _msg_len, _len, _len);
+   wsize = size + sizeof(struct printk_log);
+
+   if (CIRC_SPACE_TO_END(log_next_idx, log_first_idx, log_buf_len) >= 
wsize)
+   goto have_space_no_wrap;
+
+   if (CIRC_SPACE(log_next_idx, log_first_idx, log_buf_len) < size)
+   return 0;
+
+have_space_wrap:
+   /* This message plus an additional empty header 

Re: [PATCH] printk: Don't discard earlier unprinted messages to make space

2015-10-22 Thread David Woodhouse
On Thu, 2015-10-22 at 14:18 +0200, Jan Kara wrote:
> 
> > 
> > What happened to it? And how do we fix it?
> 
> Hard to fix since you'd easily get RCU stalls and softlockup messages on
> systems with lots of CPUs and heavy printk traffic...

If we have to explicitly opt in to synchronous output, then such things
should be tolerable.

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] printk: Don't discard earlier unprinted messages to make space

2015-10-22 Thread David Woodhouse
On Thu, 2015-10-22 at 14:19 +0100, David Howells wrote:
> Petr Mladek  wrote:
> 
> > I would expect that the first few messages are printed to the console
> > before the buffer is wrapped. IMHO, in many cases, you are interested
> > into the final messages that describe why the system went down.
> 
> The last message might tell you that the machine panicked because the NMI
> handler triggered due to a spinlocked section taking too long or something.
> This doesn't help if the oops that caused the spinlock to remain held or
> whatever gets discarded from the buffer due to several intervening complaints
> that result secondarily from the initial oops.
> 
> > If there is no time to print them, you want to have them in the crash dump
> > (ring buffer) at least.
> 
> But not at the expense of discarding the first oops report.  *That* one is the
> most important.
> 
> Perhaps things could be arranged such that messages *can* be discarded from
> the front of the buffer *provided* they are not oops messages.

Of course, all this is a symptom of the fact that printk is now lossy.
If you had an option to make it reliable again, you wouldn't need to
argue over *which* messages to lose.

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] printk: Don't discard earlier unprinted messages to make space

2015-10-22 Thread Petr Mladek
Added Andrew into CC who maintains printk() code.

On Thu 2015-10-22 11:16:50, David Howells wrote:
> printk() currently discards earlier messages to make space for new messages
> arriving.  This has the distinct downside that if the kernel starts
> churning out messages because of some initial incident, the report of the
> initial incident is likely to be lost under a blizzard of:
> 
>   ** NNN printk messages dropped **
> 
> messages from console_unlock().
> 
> The first message generated (typically an oops) is usually the most
> important - the one you want to solve first - so we really want to see
> that.

oops=panic or panic_on_warn kernel parameters might be more useful in
this situation.

I would expect that the first few messages are printed to the console
before the buffer is wrapped. IMHO, in many cases, you are interested
into the final messages that describe why the system went down. If there
is no time to print them, you want to have them in the crash dump
(ring buffer) at least.


> To this end, change log_store() to only write a message into the buffer if
> there is sufficient space to hold that message.  The message may be
> truncated if it will then fit.
>
> This patch could be improved by noting that some messages got discarded
> when next there is space to do so.
>
> Signed-off-by: David Howells 
> ---

[...]

> + if (CIRC_SPACE_TO_END(log_next_idx, log_first_idx, log_buf_len) >= 
> wsize)
> + goto have_space_no_wrap;
> +
> + if (CIRC_SPACE(log_next_idx, log_first_idx, log_buf_len) >= size)
> + goto have_space_wrap;
> +
> + /* Try to truncate the message. */
> + size = truncate_msg(_len, _msg_len, _len, _len);
> + wsize = size + sizeof(struct printk_log);

truncate_msg() currently works only for messages that are
bigger than half of the buffer.

Also if the flood of messages is faster than printing, you might
end up with the buffer full of "" strings with a minimum
of useful information.


> + if (CIRC_SPACE_TO_END(log_next_idx, log_first_idx, log_buf_len) >= 
> wsize)
> + goto have_space_no_wrap;
> +
> + if (CIRC_SPACE(log_next_idx, log_first_idx, log_buf_len) < size)
> + return 0;

Please, try to avoid copy code.

Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] printk: Don't discard earlier unprinted messages to make space

2015-10-22 Thread David Howells
Petr Mladek  wrote:

> I would expect that the first few messages are printed to the console
> before the buffer is wrapped. IMHO, in many cases, you are interested
> into the final messages that describe why the system went down.

The last message might tell you that the machine panicked because the NMI
handler triggered due to a spinlocked section taking too long or something.
This doesn't help if the oops that caused the spinlock to remain held or
whatever gets discarded from the buffer due to several intervening complaints
that result secondarily from the initial oops.

> If there is no time to print them, you want to have them in the crash dump
> (ring buffer) at least.

But not at the expense of discarding the first oops report.  *That* one is the
most important.

Perhaps things could be arranged such that messages *can* be discarded from
the front of the buffer *provided* they are not oops messages.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/