Re: What is wrong?
On Tue, Mar 04, 2014 at 10:54:04AM +0200, Leon Pollak wrote: > I will recheck everything and try. Meanwhile, the news are not good: our > guys say that it appears that the additional sync DOES NOT SOLVE the > issue. Gonna be honest, I have a tough time explaining this. :( Unfortunately I don't have a board here with a hardware write protect which would make things easier to verify. - Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: What is wrong?
Hello, all. I am really sorry for the silence - I was on the business trip and returned today. I will recheck everything and try. Meanwhile, the news are not good: our guys say that it appears that the additional sync DOES NOT SOLVE the issue. I ask for excuse, but as I did not know the exact processing, I was mistaken and, probably, used already gc-ted unit for tests. Sorry, again. BR On Tuesday 04 March 2014 00:33:25 Brian Norris wrote: > On Mon, Mar 03, 2014 at 03:13:36PM -0600, Andrew Ruder wrote: > > On Thu, Feb 27, 2014 at 01:22:08PM -0800, Brian Norris wrote: > > > Perhaps Richard or Andrew can comment on whether this patch should > > > help you. But I think JFFS2 on NAND uses write-buffered support > > > which can be affected by this bug. > > > > Definitely sounds like the same issue and I'm kind of glad to see it > > crop up in another filesystem. > > We haven't confirmed that the *patch* actually affects Leon's problem; > just that if he runs an additional 'sync' it solves his problem. > Leon, did you get to try the patch? > > Anyway, should commit 807612db2f9940b9fa6deaef054eb16d51bd3e00 be > marked for -stable? > > Brian -- Leon -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: What is wrong?
On Mon, Mar 03, 2014 at 03:13:36PM -0600, Andrew Ruder wrote: > On Thu, Feb 27, 2014 at 01:22:08PM -0800, Brian Norris wrote: > > Perhaps Richard or Andrew can comment on whether this patch should help > > you. But I think JFFS2 on NAND uses write-buffered support which can be > > affected by this bug. > > Definitely sounds like the same issue and I'm kind of glad to see it > crop up in another filesystem. We haven't confirmed that the *patch* actually affects Leon's problem; just that if he runs an additional 'sync' it solves his problem. Leon, did you get to try the patch? Anyway, should commit 807612db2f9940b9fa6deaef054eb16d51bd3e00 be marked for -stable? Brian -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: What is wrong?
On Mon, Mar 03, 2014 at 03:13:36PM -0600, Andrew Ruder wrote: On Thu, Feb 27, 2014 at 01:22:08PM -0800, Brian Norris wrote: Perhaps Richard or Andrew can comment on whether this patch should help you. But I think JFFS2 on NAND uses write-buffered support which can be affected by this bug. Definitely sounds like the same issue and I'm kind of glad to see it crop up in another filesystem. We haven't confirmed that the *patch* actually affects Leon's problem; just that if he runs an additional 'sync' it solves his problem. Leon, did you get to try the patch? Anyway, should commit 807612db2f9940b9fa6deaef054eb16d51bd3e00 be marked for -stable? Brian -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: What is wrong?
Hello, all. I am really sorry for the silence - I was on the business trip and returned today. I will recheck everything and try. Meanwhile, the news are not good: our guys say that it appears that the additional sync DOES NOT SOLVE the issue. I ask for excuse, but as I did not know the exact processing, I was mistaken and, probably, used already gc-ted unit for tests. Sorry, again. BR On Tuesday 04 March 2014 00:33:25 Brian Norris wrote: On Mon, Mar 03, 2014 at 03:13:36PM -0600, Andrew Ruder wrote: On Thu, Feb 27, 2014 at 01:22:08PM -0800, Brian Norris wrote: Perhaps Richard or Andrew can comment on whether this patch should help you. But I think JFFS2 on NAND uses write-buffered support which can be affected by this bug. Definitely sounds like the same issue and I'm kind of glad to see it crop up in another filesystem. We haven't confirmed that the *patch* actually affects Leon's problem; just that if he runs an additional 'sync' it solves his problem. Leon, did you get to try the patch? Anyway, should commit 807612db2f9940b9fa6deaef054eb16d51bd3e00 be marked for -stable? Brian -- Leon -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: What is wrong?
On Tue, Mar 04, 2014 at 10:54:04AM +0200, Leon Pollak wrote: I will recheck everything and try. Meanwhile, the news are not good: our guys say that it appears that the additional sync DOES NOT SOLVE the issue. Gonna be honest, I have a tough time explaining this. :( Unfortunately I don't have a board here with a hardware write protect which would make things easier to verify. - Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: What is wrong?
On Thu, Feb 27, 2014 at 01:22:08PM -0800, Brian Norris wrote: > Perhaps Richard or Andrew can comment on whether this patch should help > you. But I think JFFS2 on NAND uses write-buffered support which can be > affected by this bug. Definitely sounds like the same issue and I'm kind of glad to see it crop up in another filesystem. Also glad you Cc'd me with the URL because I had the painful task of recreating this issue on another filesystem on my TODO list as I didn't think it had ever been committed. Cheers, Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: What is wrong?
On Thu, Feb 27, 2014 at 01:22:08PM -0800, Brian Norris wrote: Perhaps Richard or Andrew can comment on whether this patch should help you. But I think JFFS2 on NAND uses write-buffered support which can be affected by this bug. Definitely sounds like the same issue and I'm kind of glad to see it crop up in another filesystem. Also glad you Cc'd me with the URL because I had the painful task of recreating this issue on another filesystem on my TODO list as I didn't think it had ever been committed. Cheers, Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: What is wrong?
+ others Hi Leon, Can you please keep the CC list intact? And please try to reply below the quotes and trim context, rather than top-posting. Thanks! On Thu, Feb 27, 2014 at 02:00:25PM +0200, Leon Pollak wrote: > I am VERY(!) thankful to you for the answer. > First, I am calm now that there is no any error on my side and the > system remains clean despite these messages. > Second, yes, the workaround worked. That's nice to hear, but that is (as you note) a workaround. You should not need an extra sync after remounting read-only. Do you think you can try the linked patch? commit 807612db2f9940b9fa6deaef054eb16d51bd3e00 Author: Andrew Ruder Date: Thu Jan 30 09:26:54 2014 -0600 fs/super.c: sync ro remount after blocking writers Perhaps Richard or Andrew can comment on whether this patch should help you. But I think JFFS2 on NAND uses write-buffered support which can be affected by this bug. > May thanks to you for your help!!! You're welcome. I have a few other questions: are you using NOR or NAND (it looks like maybe NAND)? Leaving most context intact for others, below. > On Wednesday 26 February 2014 17:11:48 you wrote: > > On Wed, Feb 26, 2014 at 04:07:21PM +0200, Leon Pollak wrote: > > > The NAND is write protected by HW and the partition is mounted as > > > RO. > > > At some moment I need to update a small file. > > > So I do: > > > - HW write protect off, > > > - remount RW, > > > - update file, > > > - sync, > > > - remount RO, > > > - write protect on. > > > > > > Looking at linux console I see a lot of messages like: > > > Erase at 0x0040 failed immediately: errno -5 > > > Erase at 0x003e failed immediately: errno -5 > > > .. > > > Erase at 0x0034 failed immediately: errno -5 > > > jffs2_flush_wbuf(): Write failed with -5 > > > Write of 2016 bytes at 0x002578a0 failed. returned -5, retlen 0 > > > Not marking the space at 0x002578a0 as dirty because the flash > > > driver > > > returned retlen zero. > > > > > > > > > This is repeated for a long time, but everything seems work OK. > > > The sequential starts and even file updates are also OK, without > > > error messages. > > > > > > What do I do wrong? Thanks a lot. > > > > It's possible you're seeing symptoms of this: > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit > > /?id=807612db2f9940b9fa6deaef054eb16d51bd3e00 > > > > It seems like maybe JFFS2 is still doing some GC and/or write flushing > > after the remount. > > > > Could try this? > > > > - HW write protect off, > > - remount RW, > > - update file, > > - sync, > > - remount RO, > > - sync, <-- add this, to see if you're experiencing any > > writeback after remount > > - write protect on. Regards, Brian -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: What is wrong?
+ others Hi Leon, Can you please keep the CC list intact? And please try to reply below the quotes and trim context, rather than top-posting. Thanks! On Thu, Feb 27, 2014 at 02:00:25PM +0200, Leon Pollak wrote: I am VERY(!) thankful to you for the answer. First, I am calm now that there is no any error on my side and the system remains clean despite these messages. Second, yes, the workaround worked. That's nice to hear, but that is (as you note) a workaround. You should not need an extra sync after remounting read-only. Do you think you can try the linked patch? commit 807612db2f9940b9fa6deaef054eb16d51bd3e00 Author: Andrew Ruder andrew.ru...@elecsyscorp.com Date: Thu Jan 30 09:26:54 2014 -0600 fs/super.c: sync ro remount after blocking writers Perhaps Richard or Andrew can comment on whether this patch should help you. But I think JFFS2 on NAND uses write-buffered support which can be affected by this bug. May thanks to you for your help!!! You're welcome. I have a few other questions: are you using NOR or NAND (it looks like maybe NAND)? Leaving most context intact for others, below. On Wednesday 26 February 2014 17:11:48 you wrote: On Wed, Feb 26, 2014 at 04:07:21PM +0200, Leon Pollak wrote: The NAND is write protected by HW and the partition is mounted as RO. At some moment I need to update a small file. So I do: - HW write protect off, - remount RW, - update file, - sync, - remount RO, - write protect on. Looking at linux console I see a lot of messages like: Erase at 0x0040 failed immediately: errno -5 Erase at 0x003e failed immediately: errno -5 .. Erase at 0x0034 failed immediately: errno -5 jffs2_flush_wbuf(): Write failed with -5 Write of 2016 bytes at 0x002578a0 failed. returned -5, retlen 0 Not marking the space at 0x002578a0 as dirty because the flash driver returned retlen zero. This is repeated for a long time, but everything seems work OK. The sequential starts and even file updates are also OK, without error messages. What do I do wrong? Thanks a lot. It's possible you're seeing symptoms of this: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit /?id=807612db2f9940b9fa6deaef054eb16d51bd3e00 It seems like maybe JFFS2 is still doing some GC and/or write flushing after the remount. Could try this? - HW write protect off, - remount RW, - update file, - sync, - remount RO, - sync, -- add this, to see if you're experiencing any writeback after remount - write protect on. Regards, Brian -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors
* H. Peter Anvin wrote: > On 08/12/2013 07:31 AM, Peter Zijlstra wrote: > > On Wed, Aug 07, 2013 at 07:20:44PM +0200, Borislav Petkov wrote: > >> Besides, BUG can be disabled in CONFIG_EXPERT. > > > > There was some email on this subject a while ago; disabling BUG() is > > very risky and can cause all kinds of horrid. IIRC the consensus back > > then was to remove this 'feature' and have BUG always at least lock up > > the machine hard. > > Yes, always trap but leave the metadata as optional. I think those patches are going forward, so we can just regard BUG() as always existing from a program flow POV. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors
On 08/12/2013 07:31 AM, Peter Zijlstra wrote: > On Wed, Aug 07, 2013 at 07:20:44PM +0200, Borislav Petkov wrote: >> Besides, BUG can be disabled in CONFIG_EXPERT. > > There was some email on this subject a while ago; disabling BUG() is > very risky and can cause all kinds of horrid. IIRC the consensus back > then was to remove this 'feature' and have BUG always at least lock up > the machine hard. > Yes, always trap but leave the metadata as optional. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors
On Wed, Aug 07, 2013 at 07:20:44PM +0200, Borislav Petkov wrote: > Besides, BUG can be disabled in CONFIG_EXPERT. There was some email on this subject a while ago; disabling BUG() is very risky and can cause all kinds of horrid. IIRC the consensus back then was to remove this 'feature' and have BUG always at least lock up the machine hard. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors
On Wed, Aug 07, 2013 at 07:20:44PM +0200, Borislav Petkov wrote: Besides, BUG can be disabled in CONFIG_EXPERT. There was some email on this subject a while ago; disabling BUG() is very risky and can cause all kinds of horrid. IIRC the consensus back then was to remove this 'feature' and have BUG always at least lock up the machine hard. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors
On 08/12/2013 07:31 AM, Peter Zijlstra wrote: On Wed, Aug 07, 2013 at 07:20:44PM +0200, Borislav Petkov wrote: Besides, BUG can be disabled in CONFIG_EXPERT. There was some email on this subject a while ago; disabling BUG() is very risky and can cause all kinds of horrid. IIRC the consensus back then was to remove this 'feature' and have BUG always at least lock up the machine hard. Yes, always trap but leave the metadata as optional. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors
* H. Peter Anvin h...@zytor.com wrote: On 08/12/2013 07:31 AM, Peter Zijlstra wrote: On Wed, Aug 07, 2013 at 07:20:44PM +0200, Borislav Petkov wrote: Besides, BUG can be disabled in CONFIG_EXPERT. There was some email on this subject a while ago; disabling BUG() is very risky and can cause all kinds of horrid. IIRC the consensus back then was to remove this 'feature' and have BUG always at least lock up the machine hard. Yes, always trap but leave the metadata as optional. I think those patches are going forward, so we can just regard BUG() as always existing from a program flow POV. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors
On 08/07/2013 11:41 AM, Borislav Petkov wrote: > On Wed, Aug 07, 2013 at 10:51:43AM -0700, H. Peter Anvin wrote: >> A bigger issue is probably if panic-on-bug should be the default, with >> !panic being an opt-in debugging option. > > Yes, it might make sense although embedded wants to disable CONFIG_BUG > on systems which cannot report errors: > There is another thread going on to convert BUG to trap even on machines where CONFIG_BUG is disabled, i.e. CONFIG_BUG would control the metadata only. With the configuration option off, you would get the address of the failure only. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors
On Wed, Aug 07, 2013 at 10:51:43AM -0700, H. Peter Anvin wrote: > A bigger issue is probably if panic-on-bug should be the default, with > !panic being an opt-in debugging option. Yes, it might make sense although embedded wants to disable CONFIG_BUG on systems which cannot report errors: │ CONFIG_BUG: │ │ Disabling this option eliminates support for BUG and WARN, reducing │ the size of your kernel image and potentially quietly ignoring │ numerous fatal conditions. You should only consider disabling this │ option for embedded systems with no facilities for reporting errors. │ Just say Y. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors
On 08/07/2013 10:46 AM, Steven Rostedt wrote: > On Wed, 2013-08-07 at 19:37 +0200, Borislav Petkov wrote: >> On Wed, Aug 07, 2013 at 01:33:06PM -0400, Steven Rostedt wrote: >>> Right, and this code keeps the same logic as it was before. If it was >>> disabled by CONFIG_EXPERT, it stays disabled, but at least you get to >>> see a warning that your kernel may be corrupt now :-) >> >> Don't we really want to panic instead of running a corrupt kernel? IOW, >> to change the logic to panic unconditionally because the image in memory >> has been violated and not in a good way, at that :-) > > Well, there's lots of places that use BUG() for a corrupt kernel. If you > are stupid enough to disable it, you get what you asked for. > A bigger issue is probably if panic-on-bug should be the default, with !panic being an opt-in debugging option. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors
On Wed, 2013-08-07 at 19:37 +0200, Borislav Petkov wrote: > On Wed, Aug 07, 2013 at 01:33:06PM -0400, Steven Rostedt wrote: > > Right, and this code keeps the same logic as it was before. If it was > > disabled by CONFIG_EXPERT, it stays disabled, but at least you get to > > see a warning that your kernel may be corrupt now :-) > > Don't we really want to panic instead of running a corrupt kernel? IOW, > to change the logic to panic unconditionally because the image in memory > has been violated and not in a good way, at that :-) Well, there's lots of places that use BUG() for a corrupt kernel. If you are stupid enough to disable it, you get what you asked for. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors
On Wed, Aug 07, 2013 at 01:33:06PM -0400, Steven Rostedt wrote: > Right, and this code keeps the same logic as it was before. If it was > disabled by CONFIG_EXPERT, it stays disabled, but at least you get to > see a warning that your kernel may be corrupt now :-) Don't we really want to panic instead of running a corrupt kernel? IOW, to change the logic to panic unconditionally because the image in memory has been violated and not in a good way, at that :-) -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors
On Wed, 2013-08-07 at 19:20 +0200, Borislav Petkov wrote: > > > +static void bug_at(unsigned char *ip, int line) > > +{ > > + /* > > +* The location is not an op that we were expecting. > > +* Something went wrong. Crash the box, as something could be > > +* corrupting the kernel. > > +*/ > > + pr_warning("Unexpected op at %pS [%p] (%02x %02x %02x %02x %02x) > > %s:%d\n", > > + ip, ip, ip[0], ip[1], ip[2], ip[3], ip[4], __FILE__, line); > > + BUG(); > > Why not simply > > panic("Unexpected...") > > ? Because the patch replaced BUG(). > > Besides, BUG can be disabled in CONFIG_EXPERT. > Right, and this code keeps the same logic as it was before. If it was disabled by CONFIG_EXPERT, it stays disabled, but at least you get to see a warning that your kernel may be corrupt now :-) -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors
On Wed, Aug 07, 2013 at 12:49:38PM -0400, Steven Rostedt wrote: > From: Steven Rostedt > > When modifying text sections for jump labels, a paranoid check is > performed. If the check fails, the system "bugs". But why it failed > is not shown. > > The BUG_ON()s in the jump label update code is replaced with bug_at(ip). > This is a function that will show what pointer failed, and what was > at the location of the failure that made jump label panic. > > Signed-off-by: Steven Rostedt > --- > arch/x86/kernel/jump_label.c | 21 ++--- > 1 file changed, 18 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c > index 24cf2b2..912a528 100644 > --- a/arch/x86/kernel/jump_label.c > +++ b/arch/x86/kernel/jump_label.c > @@ -24,6 +24,18 @@ union jump_code_union { > } __attribute__((packed)); > }; > > +static void bug_at(unsigned char *ip, int line) > +{ > + /* > + * The location is not an op that we were expecting. > + * Something went wrong. Crash the box, as something could be > + * corrupting the kernel. > + */ > + pr_warning("Unexpected op at %pS [%p] (%02x %02x %02x %02x %02x) > %s:%d\n", > +ip, ip, ip[0], ip[1], ip[2], ip[3], ip[4], __FILE__, line); > + BUG(); Why not simply panic("Unexpected...") ? Besides, BUG can be disabled in CONFIG_EXPERT. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/4] x86/jump-label: Show where and what was wrong on errors
From: Steven Rostedt When modifying text sections for jump labels, a paranoid check is performed. If the check fails, the system "bugs". But why it failed is not shown. The BUG_ON()s in the jump label update code is replaced with bug_at(ip). This is a function that will show what pointer failed, and what was at the location of the failure that made jump label panic. Signed-off-by: Steven Rostedt --- arch/x86/kernel/jump_label.c | 21 ++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c index 24cf2b2..912a528 100644 --- a/arch/x86/kernel/jump_label.c +++ b/arch/x86/kernel/jump_label.c @@ -24,6 +24,18 @@ union jump_code_union { } __attribute__((packed)); }; +static void bug_at(unsigned char *ip, int line) +{ + /* +* The location is not an op that we were expecting. +* Something went wrong. Crash the box, as something could be +* corrupting the kernel. +*/ + pr_warning("Unexpected op at %pS [%p] (%02x %02x %02x %02x %02x) %s:%d\n", + ip, ip, ip[0], ip[1], ip[2], ip[3], ip[4], __FILE__, line); + BUG(); +} + static void __jump_label_transform(struct jump_entry *entry, enum jump_label_type type, void *(*poker)(void *, const void *, size_t), @@ -37,7 +49,8 @@ static void __jump_label_transform(struct jump_entry *entry, * We are enabling this jump label. If it is not a nop * then something must have gone wrong. */ - BUG_ON(memcmp((void *)entry->code, ideal_nop, 5) != 0); + if (unlikely(memcmp((void *)entry->code, ideal_nop, 5) != 0)) + bug_at((void *)entry->code, __LINE__); code.jump = 0xe9; code.offset = entry->target - @@ -51,12 +64,14 @@ static void __jump_label_transform(struct jump_entry *entry, */ if (init) { const unsigned char default_nop[] = { STATIC_KEY_INIT_NOP }; - BUG_ON(memcmp((void *)entry->code, default_nop, 5) != 0); + if (unlikely(memcmp((void *)entry->code, default_nop, 5) != 0)) + bug_at((void *)entry->code, __LINE__); } else { code.jump = 0xe9; code.offset = entry->target - (entry->code + JUMP_LABEL_NOP_SIZE); - BUG_ON(memcmp((void *)entry->code, , 5) != 0); + if (unlikely(memcmp((void *)entry->code, , 5) != 0)) + bug_at((void *)entry->code, __LINE__); } memcpy(, ideal_nops[NOP_ATOMIC5], JUMP_LABEL_NOP_SIZE); } -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/4] x86/jump-label: Show where and what was wrong on errors
From: Steven Rostedt srost...@redhat.com When modifying text sections for jump labels, a paranoid check is performed. If the check fails, the system bugs. But why it failed is not shown. The BUG_ON()s in the jump label update code is replaced with bug_at(ip). This is a function that will show what pointer failed, and what was at the location of the failure that made jump label panic. Signed-off-by: Steven Rostedt rost...@goodmis.org --- arch/x86/kernel/jump_label.c | 21 ++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c index 24cf2b2..912a528 100644 --- a/arch/x86/kernel/jump_label.c +++ b/arch/x86/kernel/jump_label.c @@ -24,6 +24,18 @@ union jump_code_union { } __attribute__((packed)); }; +static void bug_at(unsigned char *ip, int line) +{ + /* +* The location is not an op that we were expecting. +* Something went wrong. Crash the box, as something could be +* corrupting the kernel. +*/ + pr_warning(Unexpected op at %pS [%p] (%02x %02x %02x %02x %02x) %s:%d\n, + ip, ip, ip[0], ip[1], ip[2], ip[3], ip[4], __FILE__, line); + BUG(); +} + static void __jump_label_transform(struct jump_entry *entry, enum jump_label_type type, void *(*poker)(void *, const void *, size_t), @@ -37,7 +49,8 @@ static void __jump_label_transform(struct jump_entry *entry, * We are enabling this jump label. If it is not a nop * then something must have gone wrong. */ - BUG_ON(memcmp((void *)entry-code, ideal_nop, 5) != 0); + if (unlikely(memcmp((void *)entry-code, ideal_nop, 5) != 0)) + bug_at((void *)entry-code, __LINE__); code.jump = 0xe9; code.offset = entry-target - @@ -51,12 +64,14 @@ static void __jump_label_transform(struct jump_entry *entry, */ if (init) { const unsigned char default_nop[] = { STATIC_KEY_INIT_NOP }; - BUG_ON(memcmp((void *)entry-code, default_nop, 5) != 0); + if (unlikely(memcmp((void *)entry-code, default_nop, 5) != 0)) + bug_at((void *)entry-code, __LINE__); } else { code.jump = 0xe9; code.offset = entry-target - (entry-code + JUMP_LABEL_NOP_SIZE); - BUG_ON(memcmp((void *)entry-code, code, 5) != 0); + if (unlikely(memcmp((void *)entry-code, code, 5) != 0)) + bug_at((void *)entry-code, __LINE__); } memcpy(code, ideal_nops[NOP_ATOMIC5], JUMP_LABEL_NOP_SIZE); } -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors
On Wed, Aug 07, 2013 at 12:49:38PM -0400, Steven Rostedt wrote: From: Steven Rostedt srost...@redhat.com When modifying text sections for jump labels, a paranoid check is performed. If the check fails, the system bugs. But why it failed is not shown. The BUG_ON()s in the jump label update code is replaced with bug_at(ip). This is a function that will show what pointer failed, and what was at the location of the failure that made jump label panic. Signed-off-by: Steven Rostedt rost...@goodmis.org --- arch/x86/kernel/jump_label.c | 21 ++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c index 24cf2b2..912a528 100644 --- a/arch/x86/kernel/jump_label.c +++ b/arch/x86/kernel/jump_label.c @@ -24,6 +24,18 @@ union jump_code_union { } __attribute__((packed)); }; +static void bug_at(unsigned char *ip, int line) +{ + /* + * The location is not an op that we were expecting. + * Something went wrong. Crash the box, as something could be + * corrupting the kernel. + */ + pr_warning(Unexpected op at %pS [%p] (%02x %02x %02x %02x %02x) %s:%d\n, +ip, ip, ip[0], ip[1], ip[2], ip[3], ip[4], __FILE__, line); + BUG(); Why not simply panic(Unexpected...) ? Besides, BUG can be disabled in CONFIG_EXPERT. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors
On Wed, 2013-08-07 at 19:20 +0200, Borislav Petkov wrote: +static void bug_at(unsigned char *ip, int line) +{ + /* +* The location is not an op that we were expecting. +* Something went wrong. Crash the box, as something could be +* corrupting the kernel. +*/ + pr_warning(Unexpected op at %pS [%p] (%02x %02x %02x %02x %02x) %s:%d\n, + ip, ip, ip[0], ip[1], ip[2], ip[3], ip[4], __FILE__, line); + BUG(); Why not simply panic(Unexpected...) ? Because the patch replaced BUG(). Besides, BUG can be disabled in CONFIG_EXPERT. Right, and this code keeps the same logic as it was before. If it was disabled by CONFIG_EXPERT, it stays disabled, but at least you get to see a warning that your kernel may be corrupt now :-) -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors
On Wed, Aug 07, 2013 at 01:33:06PM -0400, Steven Rostedt wrote: Right, and this code keeps the same logic as it was before. If it was disabled by CONFIG_EXPERT, it stays disabled, but at least you get to see a warning that your kernel may be corrupt now :-) Don't we really want to panic instead of running a corrupt kernel? IOW, to change the logic to panic unconditionally because the image in memory has been violated and not in a good way, at that :-) -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors
On Wed, 2013-08-07 at 19:37 +0200, Borislav Petkov wrote: On Wed, Aug 07, 2013 at 01:33:06PM -0400, Steven Rostedt wrote: Right, and this code keeps the same logic as it was before. If it was disabled by CONFIG_EXPERT, it stays disabled, but at least you get to see a warning that your kernel may be corrupt now :-) Don't we really want to panic instead of running a corrupt kernel? IOW, to change the logic to panic unconditionally because the image in memory has been violated and not in a good way, at that :-) Well, there's lots of places that use BUG() for a corrupt kernel. If you are stupid enough to disable it, you get what you asked for. -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors
On 08/07/2013 10:46 AM, Steven Rostedt wrote: On Wed, 2013-08-07 at 19:37 +0200, Borislav Petkov wrote: On Wed, Aug 07, 2013 at 01:33:06PM -0400, Steven Rostedt wrote: Right, and this code keeps the same logic as it was before. If it was disabled by CONFIG_EXPERT, it stays disabled, but at least you get to see a warning that your kernel may be corrupt now :-) Don't we really want to panic instead of running a corrupt kernel? IOW, to change the logic to panic unconditionally because the image in memory has been violated and not in a good way, at that :-) Well, there's lots of places that use BUG() for a corrupt kernel. If you are stupid enough to disable it, you get what you asked for. A bigger issue is probably if panic-on-bug should be the default, with !panic being an opt-in debugging option. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors
On Wed, Aug 07, 2013 at 10:51:43AM -0700, H. Peter Anvin wrote: A bigger issue is probably if panic-on-bug should be the default, with !panic being an opt-in debugging option. Yes, it might make sense although embedded wants to disable CONFIG_BUG on systems which cannot report errors: │ CONFIG_BUG: │ │ Disabling this option eliminates support for BUG and WARN, reducing │ the size of your kernel image and potentially quietly ignoring │ numerous fatal conditions. You should only consider disabling this │ option for embedded systems with no facilities for reporting errors. │ Just say Y. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors
On 08/07/2013 11:41 AM, Borislav Petkov wrote: On Wed, Aug 07, 2013 at 10:51:43AM -0700, H. Peter Anvin wrote: A bigger issue is probably if panic-on-bug should be the default, with !panic being an opt-in debugging option. Yes, it might make sense although embedded wants to disable CONFIG_BUG on systems which cannot report errors: There is another thread going on to convert BUG to trap even on machines where CONFIG_BUG is disabled, i.e. CONFIG_BUG would control the metadata only. With the configuration option off, you would get the address of the failure only. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/