Re: What is wrong?

2014-03-04 Thread Andrew Ruder
On Tue, Mar 04, 2014 at 10:54:04AM +0200, Leon Pollak wrote:
> I will recheck everything and try. Meanwhile, the news are not good: our 
> guys say that it appears that the additional sync DOES NOT SOLVE the 
> issue.

Gonna be honest, I have a tough time explaining this. :(  Unfortunately
I don't have a board here with a hardware write protect which would make
things easier to verify.

- Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: What is wrong?

2014-03-04 Thread Leon Pollak
Hello, all.

I am really sorry for the silence - I was on the business trip and 
returned today.

I will recheck everything and try. Meanwhile, the news are not good: our 
guys say that it appears that the additional sync DOES NOT SOLVE the 
issue.
I ask for excuse, but as I did not know the exact processing, I was 
mistaken and, probably, used already gc-ted unit for tests.

Sorry, again.

BR

On Tuesday 04 March 2014 00:33:25 Brian Norris wrote:
> On Mon, Mar 03, 2014 at 03:13:36PM -0600, Andrew Ruder wrote:
> > On Thu, Feb 27, 2014 at 01:22:08PM -0800, Brian Norris wrote:
> > > Perhaps Richard or Andrew can comment on whether this patch should
> > > help you. But I think JFFS2 on NAND uses write-buffered support
> > > which can be affected by this bug.
> > 
> > Definitely sounds like the same issue and I'm kind of glad to see it
> > crop up in another filesystem.
> 
> We haven't confirmed that the *patch* actually affects Leon's problem;
> just that if he runs an additional 'sync' it solves his problem.
> Leon, did you get to try the patch?
> 
> Anyway, should commit 807612db2f9940b9fa6deaef054eb16d51bd3e00 be
> marked for -stable?
> 
> Brian

-- 
Leon
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: What is wrong?

2014-03-04 Thread Brian Norris
On Mon, Mar 03, 2014 at 03:13:36PM -0600, Andrew Ruder wrote:
> On Thu, Feb 27, 2014 at 01:22:08PM -0800, Brian Norris wrote:
> > Perhaps Richard or Andrew can comment on whether this patch should help
> > you. But I think JFFS2 on NAND uses write-buffered support which can be
> > affected by this bug.
> 
> Definitely sounds like the same issue and I'm kind of glad to see it
> crop up in another filesystem.

We haven't confirmed that the *patch* actually affects Leon's problem;
just that if he runs an additional 'sync' it solves his problem. Leon,
did you get to try the patch?

Anyway, should commit 807612db2f9940b9fa6deaef054eb16d51bd3e00 be marked
for -stable? 

Brian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: What is wrong?

2014-03-04 Thread Brian Norris
On Mon, Mar 03, 2014 at 03:13:36PM -0600, Andrew Ruder wrote:
 On Thu, Feb 27, 2014 at 01:22:08PM -0800, Brian Norris wrote:
  Perhaps Richard or Andrew can comment on whether this patch should help
  you. But I think JFFS2 on NAND uses write-buffered support which can be
  affected by this bug.
 
 Definitely sounds like the same issue and I'm kind of glad to see it
 crop up in another filesystem.

We haven't confirmed that the *patch* actually affects Leon's problem;
just that if he runs an additional 'sync' it solves his problem. Leon,
did you get to try the patch?

Anyway, should commit 807612db2f9940b9fa6deaef054eb16d51bd3e00 be marked
for -stable? 

Brian
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: What is wrong?

2014-03-04 Thread Leon Pollak
Hello, all.

I am really sorry for the silence - I was on the business trip and 
returned today.

I will recheck everything and try. Meanwhile, the news are not good: our 
guys say that it appears that the additional sync DOES NOT SOLVE the 
issue.
I ask for excuse, but as I did not know the exact processing, I was 
mistaken and, probably, used already gc-ted unit for tests.

Sorry, again.

BR

On Tuesday 04 March 2014 00:33:25 Brian Norris wrote:
 On Mon, Mar 03, 2014 at 03:13:36PM -0600, Andrew Ruder wrote:
  On Thu, Feb 27, 2014 at 01:22:08PM -0800, Brian Norris wrote:
   Perhaps Richard or Andrew can comment on whether this patch should
   help you. But I think JFFS2 on NAND uses write-buffered support
   which can be affected by this bug.
  
  Definitely sounds like the same issue and I'm kind of glad to see it
  crop up in another filesystem.
 
 We haven't confirmed that the *patch* actually affects Leon's problem;
 just that if he runs an additional 'sync' it solves his problem.
 Leon, did you get to try the patch?
 
 Anyway, should commit 807612db2f9940b9fa6deaef054eb16d51bd3e00 be
 marked for -stable?
 
 Brian

-- 
Leon
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: What is wrong?

2014-03-04 Thread Andrew Ruder
On Tue, Mar 04, 2014 at 10:54:04AM +0200, Leon Pollak wrote:
 I will recheck everything and try. Meanwhile, the news are not good: our 
 guys say that it appears that the additional sync DOES NOT SOLVE the 
 issue.

Gonna be honest, I have a tough time explaining this. :(  Unfortunately
I don't have a board here with a hardware write protect which would make
things easier to verify.

- Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: What is wrong?

2014-03-03 Thread Andrew Ruder
On Thu, Feb 27, 2014 at 01:22:08PM -0800, Brian Norris wrote:
> Perhaps Richard or Andrew can comment on whether this patch should help
> you. But I think JFFS2 on NAND uses write-buffered support which can be
> affected by this bug.

Definitely sounds like the same issue and I'm kind of glad to see it
crop up in another filesystem.  Also glad you Cc'd me with the URL
because I had the painful task of recreating this issue on another
filesystem on my TODO list as I didn't think it had ever been committed.

Cheers,
Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: What is wrong?

2014-03-03 Thread Andrew Ruder
On Thu, Feb 27, 2014 at 01:22:08PM -0800, Brian Norris wrote:
 Perhaps Richard or Andrew can comment on whether this patch should help
 you. But I think JFFS2 on NAND uses write-buffered support which can be
 affected by this bug.

Definitely sounds like the same issue and I'm kind of glad to see it
crop up in another filesystem.  Also glad you Cc'd me with the URL
because I had the painful task of recreating this issue on another
filesystem on my TODO list as I didn't think it had ever been committed.

Cheers,
Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: What is wrong?

2014-02-27 Thread Brian Norris
+ others

Hi Leon,

Can you please keep the CC list intact? And please try to reply below
the quotes and trim context, rather than top-posting. Thanks!

On Thu, Feb 27, 2014 at 02:00:25PM +0200, Leon Pollak wrote:
> I am VERY(!) thankful to you for the answer.
> First, I am calm now that there is no any error on my side and the 
> system remains clean despite these messages.
> Second, yes, the workaround worked.

That's nice to hear, but that is (as you note) a workaround. You should
not need an extra sync after remounting read-only. Do you think you can
try the linked patch?

commit 807612db2f9940b9fa6deaef054eb16d51bd3e00
Author: Andrew Ruder 
Date:   Thu Jan 30 09:26:54 2014 -0600

fs/super.c: sync ro remount after blocking writers

Perhaps Richard or Andrew can comment on whether this patch should help
you. But I think JFFS2 on NAND uses write-buffered support which can be
affected by this bug.

> May thanks to you for your help!!!

You're welcome.

I have a few other questions: are you using NOR or NAND (it looks like
maybe NAND)?

Leaving most context intact for others, below.

> On Wednesday 26 February 2014 17:11:48 you wrote:
> > On Wed, Feb 26, 2014 at 04:07:21PM +0200, Leon Pollak wrote:
> > > The NAND is write protected by HW and the partition is mounted as
> > > RO.
> > > At some moment I need to update a small file.
> > > So I do:
> > > - HW write protect off,
> > > - remount RW,
> > > - update file,
> > > - sync,
> > > - remount RO,
> > > - write protect on.
> > > 
> > > Looking at linux console I see a lot of messages like:
> > > Erase at 0x0040 failed immediately: errno -5
> > > Erase at 0x003e failed immediately: errno -5
> > > ..
> > > Erase at 0x0034 failed immediately: errno -5
> > > jffs2_flush_wbuf(): Write failed with -5
> > > Write of 2016 bytes at 0x002578a0 failed. returned -5, retlen 0
> > > Not marking the space at 0x002578a0 as dirty because the flash
> > > driver
> > > returned retlen zero.
> > > 
> > > 
> > > This is repeated for a long time, but everything seems work OK.
> > > The sequential starts and even file updates are also OK, without
> > > error messages.
> > > 
> > > What do I do wrong? Thanks a lot.
> > 
> > It's possible you're seeing symptoms of this:
> > 
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit
> > /?id=807612db2f9940b9fa6deaef054eb16d51bd3e00
> > 
> > It seems like maybe JFFS2 is still doing some GC and/or write flushing
> > after the remount.
> > 
> > Could try this?
> > 
> >  - HW write protect off,
> >  - remount RW,
> >  - update file,
> >  - sync,
> >  - remount RO,
> >  - sync, <-- add this, to see if you're experiencing any
> >  writeback after remount
> >  - write protect on.

Regards,
Brian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: What is wrong?

2014-02-27 Thread Brian Norris
+ others

Hi Leon,

Can you please keep the CC list intact? And please try to reply below
the quotes and trim context, rather than top-posting. Thanks!

On Thu, Feb 27, 2014 at 02:00:25PM +0200, Leon Pollak wrote:
 I am VERY(!) thankful to you for the answer.
 First, I am calm now that there is no any error on my side and the 
 system remains clean despite these messages.
 Second, yes, the workaround worked.

That's nice to hear, but that is (as you note) a workaround. You should
not need an extra sync after remounting read-only. Do you think you can
try the linked patch?

commit 807612db2f9940b9fa6deaef054eb16d51bd3e00
Author: Andrew Ruder andrew.ru...@elecsyscorp.com
Date:   Thu Jan 30 09:26:54 2014 -0600

fs/super.c: sync ro remount after blocking writers

Perhaps Richard or Andrew can comment on whether this patch should help
you. But I think JFFS2 on NAND uses write-buffered support which can be
affected by this bug.

 May thanks to you for your help!!!

You're welcome.

I have a few other questions: are you using NOR or NAND (it looks like
maybe NAND)?

Leaving most context intact for others, below.

 On Wednesday 26 February 2014 17:11:48 you wrote:
  On Wed, Feb 26, 2014 at 04:07:21PM +0200, Leon Pollak wrote:
   The NAND is write protected by HW and the partition is mounted as
   RO.
   At some moment I need to update a small file.
   So I do:
   - HW write protect off,
   - remount RW,
   - update file,
   - sync,
   - remount RO,
   - write protect on.
   
   Looking at linux console I see a lot of messages like:
   Erase at 0x0040 failed immediately: errno -5
   Erase at 0x003e failed immediately: errno -5
   ..
   Erase at 0x0034 failed immediately: errno -5
   jffs2_flush_wbuf(): Write failed with -5
   Write of 2016 bytes at 0x002578a0 failed. returned -5, retlen 0
   Not marking the space at 0x002578a0 as dirty because the flash
   driver
   returned retlen zero.
   
   
   This is repeated for a long time, but everything seems work OK.
   The sequential starts and even file updates are also OK, without
   error messages.
   
   What do I do wrong? Thanks a lot.
  
  It's possible you're seeing symptoms of this:
  
  https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit
  /?id=807612db2f9940b9fa6deaef054eb16d51bd3e00
  
  It seems like maybe JFFS2 is still doing some GC and/or write flushing
  after the remount.
  
  Could try this?
  
   - HW write protect off,
   - remount RW,
   - update file,
   - sync,
   - remount RO,
   - sync, -- add this, to see if you're experiencing any
   writeback after remount
   - write protect on.

Regards,
Brian
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors

2013-08-12 Thread Ingo Molnar

* H. Peter Anvin  wrote:

> On 08/12/2013 07:31 AM, Peter Zijlstra wrote:
> > On Wed, Aug 07, 2013 at 07:20:44PM +0200, Borislav Petkov wrote:
> >> Besides, BUG can be disabled in CONFIG_EXPERT.
> > 
> > There was some email on this subject a while ago; disabling BUG() is 
> > very risky and can cause all kinds of horrid. IIRC the consensus back 
> > then was to remove this 'feature' and have BUG always at least lock up 
> > the machine hard.
> 
> Yes, always trap but leave the metadata as optional.

I think those patches are going forward, so we can just regard BUG() as 
always existing from a program flow POV.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors

2013-08-12 Thread H. Peter Anvin
On 08/12/2013 07:31 AM, Peter Zijlstra wrote:
> On Wed, Aug 07, 2013 at 07:20:44PM +0200, Borislav Petkov wrote:
>> Besides, BUG can be disabled in CONFIG_EXPERT.
> 
> There was some email on this subject a while ago; disabling BUG() is
> very risky and can cause all kinds of horrid. IIRC the consensus back
> then was to remove this 'feature' and have BUG always at least lock up
> the machine hard.
> 

Yes, always trap but leave the metadata as optional.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors

2013-08-12 Thread Peter Zijlstra
On Wed, Aug 07, 2013 at 07:20:44PM +0200, Borislav Petkov wrote:
> Besides, BUG can be disabled in CONFIG_EXPERT.

There was some email on this subject a while ago; disabling BUG() is
very risky and can cause all kinds of horrid. IIRC the consensus back
then was to remove this 'feature' and have BUG always at least lock up
the machine hard.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors

2013-08-12 Thread Peter Zijlstra
On Wed, Aug 07, 2013 at 07:20:44PM +0200, Borislav Petkov wrote:
 Besides, BUG can be disabled in CONFIG_EXPERT.

There was some email on this subject a while ago; disabling BUG() is
very risky and can cause all kinds of horrid. IIRC the consensus back
then was to remove this 'feature' and have BUG always at least lock up
the machine hard.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors

2013-08-12 Thread H. Peter Anvin
On 08/12/2013 07:31 AM, Peter Zijlstra wrote:
 On Wed, Aug 07, 2013 at 07:20:44PM +0200, Borislav Petkov wrote:
 Besides, BUG can be disabled in CONFIG_EXPERT.
 
 There was some email on this subject a while ago; disabling BUG() is
 very risky and can cause all kinds of horrid. IIRC the consensus back
 then was to remove this 'feature' and have BUG always at least lock up
 the machine hard.
 

Yes, always trap but leave the metadata as optional.

-hpa

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors

2013-08-12 Thread Ingo Molnar

* H. Peter Anvin h...@zytor.com wrote:

 On 08/12/2013 07:31 AM, Peter Zijlstra wrote:
  On Wed, Aug 07, 2013 at 07:20:44PM +0200, Borislav Petkov wrote:
  Besides, BUG can be disabled in CONFIG_EXPERT.
  
  There was some email on this subject a while ago; disabling BUG() is 
  very risky and can cause all kinds of horrid. IIRC the consensus back 
  then was to remove this 'feature' and have BUG always at least lock up 
  the machine hard.
 
 Yes, always trap but leave the metadata as optional.

I think those patches are going forward, so we can just regard BUG() as 
always existing from a program flow POV.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors

2013-08-07 Thread H. Peter Anvin
On 08/07/2013 11:41 AM, Borislav Petkov wrote:
> On Wed, Aug 07, 2013 at 10:51:43AM -0700, H. Peter Anvin wrote:
>> A bigger issue is probably if panic-on-bug should be the default, with
>> !panic being an opt-in debugging option.
> 
> Yes, it might make sense although embedded wants to disable CONFIG_BUG
> on systems which cannot report errors:
> 

There is another thread going on to convert BUG to trap even on machines
where CONFIG_BUG is disabled, i.e. CONFIG_BUG would control the metadata
only.  With the configuration option off, you would get the address of
the failure only.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors

2013-08-07 Thread Borislav Petkov
On Wed, Aug 07, 2013 at 10:51:43AM -0700, H. Peter Anvin wrote:
> A bigger issue is probably if panic-on-bug should be the default, with
> !panic being an opt-in debugging option.

Yes, it might make sense although embedded wants to disable CONFIG_BUG
on systems which cannot report errors:

  │ CONFIG_BUG:
  │
  │ Disabling this option eliminates support for BUG and WARN, reducing
  │ the size of your kernel image and potentially quietly ignoring
  │ numerous fatal conditions. You should only consider disabling this
  │ option for embedded systems with no facilities for reporting errors.
  │ Just say Y.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors

2013-08-07 Thread H. Peter Anvin
On 08/07/2013 10:46 AM, Steven Rostedt wrote:
> On Wed, 2013-08-07 at 19:37 +0200, Borislav Petkov wrote:
>> On Wed, Aug 07, 2013 at 01:33:06PM -0400, Steven Rostedt wrote:
>>> Right, and this code keeps the same logic as it was before. If it was
>>> disabled by CONFIG_EXPERT, it stays disabled, but at least you get to
>>> see a warning that your kernel may be corrupt now :-)
>>
>> Don't we really want to panic instead of running a corrupt kernel? IOW,
>> to change the logic to panic unconditionally because the image in memory
>> has been violated and not in a good way, at that :-)
> 
> Well, there's lots of places that use BUG() for a corrupt kernel. If you
> are stupid enough to disable it, you get what you asked for.
> 

A bigger issue is probably if panic-on-bug should be the default, with
!panic being an opt-in debugging option.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors

2013-08-07 Thread Steven Rostedt
On Wed, 2013-08-07 at 19:37 +0200, Borislav Petkov wrote:
> On Wed, Aug 07, 2013 at 01:33:06PM -0400, Steven Rostedt wrote:
> > Right, and this code keeps the same logic as it was before. If it was
> > disabled by CONFIG_EXPERT, it stays disabled, but at least you get to
> > see a warning that your kernel may be corrupt now :-)
> 
> Don't we really want to panic instead of running a corrupt kernel? IOW,
> to change the logic to panic unconditionally because the image in memory
> has been violated and not in a good way, at that :-)

Well, there's lots of places that use BUG() for a corrupt kernel. If you
are stupid enough to disable it, you get what you asked for.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors

2013-08-07 Thread Borislav Petkov
On Wed, Aug 07, 2013 at 01:33:06PM -0400, Steven Rostedt wrote:
> Right, and this code keeps the same logic as it was before. If it was
> disabled by CONFIG_EXPERT, it stays disabled, but at least you get to
> see a warning that your kernel may be corrupt now :-)

Don't we really want to panic instead of running a corrupt kernel? IOW,
to change the logic to panic unconditionally because the image in memory
has been violated and not in a good way, at that :-)

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors

2013-08-07 Thread Steven Rostedt
On Wed, 2013-08-07 at 19:20 +0200, Borislav Petkov wrote:
>   
> > +static void bug_at(unsigned char *ip, int line)
> > +{
> > +   /*
> > +* The location is not an op that we were expecting.
> > +* Something went wrong. Crash the box, as something could be
> > +* corrupting the kernel.
> > +*/
> > +   pr_warning("Unexpected op at %pS [%p] (%02x %02x %02x %02x %02x) 
> > %s:%d\n",
> > +  ip, ip, ip[0], ip[1], ip[2], ip[3], ip[4], __FILE__, line);
> > +   BUG();
> 
> Why not simply
> 
>   panic("Unexpected...")
> 
> ?

Because the patch replaced BUG().

> 
> Besides, BUG can be disabled in CONFIG_EXPERT.
> 

Right, and this code keeps the same logic as it was before. If it was
disabled by CONFIG_EXPERT, it stays disabled, but at least you get to
see a warning that your kernel may be corrupt now :-)

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors

2013-08-07 Thread Borislav Petkov
On Wed, Aug 07, 2013 at 12:49:38PM -0400, Steven Rostedt wrote:
> From: Steven Rostedt 
> 
> When modifying text sections for jump labels, a paranoid check is
> performed. If the check fails, the system "bugs". But why it failed
> is not shown.
> 
> The BUG_ON()s in the jump label update code is replaced with bug_at(ip).
> This is a function that will show what pointer failed, and what was
> at the location of the failure that made jump label panic.
> 
> Signed-off-by: Steven Rostedt 
> ---
>  arch/x86/kernel/jump_label.c |   21 ++---
>  1 file changed, 18 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
> index 24cf2b2..912a528 100644
> --- a/arch/x86/kernel/jump_label.c
> +++ b/arch/x86/kernel/jump_label.c
> @@ -24,6 +24,18 @@ union jump_code_union {
>   } __attribute__((packed));
>  };
>  
> +static void bug_at(unsigned char *ip, int line)
> +{
> + /*
> +  * The location is not an op that we were expecting.
> +  * Something went wrong. Crash the box, as something could be
> +  * corrupting the kernel.
> +  */
> + pr_warning("Unexpected op at %pS [%p] (%02x %02x %02x %02x %02x) 
> %s:%d\n",
> +ip, ip, ip[0], ip[1], ip[2], ip[3], ip[4], __FILE__, line);
> + BUG();

Why not simply

panic("Unexpected...")

?

Besides, BUG can be disabled in CONFIG_EXPERT.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/4] x86/jump-label: Show where and what was wrong on errors

2013-08-07 Thread Steven Rostedt
From: Steven Rostedt 

When modifying text sections for jump labels, a paranoid check is
performed. If the check fails, the system "bugs". But why it failed
is not shown.

The BUG_ON()s in the jump label update code is replaced with bug_at(ip).
This is a function that will show what pointer failed, and what was
at the location of the failure that made jump label panic.

Signed-off-by: Steven Rostedt 
---
 arch/x86/kernel/jump_label.c |   21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
index 24cf2b2..912a528 100644
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -24,6 +24,18 @@ union jump_code_union {
} __attribute__((packed));
 };
 
+static void bug_at(unsigned char *ip, int line)
+{
+   /*
+* The location is not an op that we were expecting.
+* Something went wrong. Crash the box, as something could be
+* corrupting the kernel.
+*/
+   pr_warning("Unexpected op at %pS [%p] (%02x %02x %02x %02x %02x) 
%s:%d\n",
+  ip, ip, ip[0], ip[1], ip[2], ip[3], ip[4], __FILE__, line);
+   BUG();
+}
+
 static void __jump_label_transform(struct jump_entry *entry,
   enum jump_label_type type,
   void *(*poker)(void *, const void *, size_t),
@@ -37,7 +49,8 @@ static void __jump_label_transform(struct jump_entry *entry,
 * We are enabling this jump label. If it is not a nop
 * then something must have gone wrong.
 */
-   BUG_ON(memcmp((void *)entry->code, ideal_nop, 5) != 0);
+   if (unlikely(memcmp((void *)entry->code, ideal_nop, 5) != 0))
+   bug_at((void *)entry->code, __LINE__);
 
code.jump = 0xe9;
code.offset = entry->target -
@@ -51,12 +64,14 @@ static void __jump_label_transform(struct jump_entry *entry,
 */
if (init) {
const unsigned char default_nop[] = { 
STATIC_KEY_INIT_NOP };
-   BUG_ON(memcmp((void *)entry->code, default_nop, 5) != 
0);
+   if (unlikely(memcmp((void *)entry->code, default_nop, 
5) != 0))
+   bug_at((void *)entry->code, __LINE__);
} else {
code.jump = 0xe9;
code.offset = entry->target -
(entry->code + JUMP_LABEL_NOP_SIZE);
-   BUG_ON(memcmp((void *)entry->code, , 5) != 0);
+   if (unlikely(memcmp((void *)entry->code, , 5) != 
0))
+   bug_at((void *)entry->code, __LINE__);
}
memcpy(, ideal_nops[NOP_ATOMIC5], JUMP_LABEL_NOP_SIZE);
}
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/4] x86/jump-label: Show where and what was wrong on errors

2013-08-07 Thread Steven Rostedt
From: Steven Rostedt srost...@redhat.com

When modifying text sections for jump labels, a paranoid check is
performed. If the check fails, the system bugs. But why it failed
is not shown.

The BUG_ON()s in the jump label update code is replaced with bug_at(ip).
This is a function that will show what pointer failed, and what was
at the location of the failure that made jump label panic.

Signed-off-by: Steven Rostedt rost...@goodmis.org
---
 arch/x86/kernel/jump_label.c |   21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
index 24cf2b2..912a528 100644
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -24,6 +24,18 @@ union jump_code_union {
} __attribute__((packed));
 };
 
+static void bug_at(unsigned char *ip, int line)
+{
+   /*
+* The location is not an op that we were expecting.
+* Something went wrong. Crash the box, as something could be
+* corrupting the kernel.
+*/
+   pr_warning(Unexpected op at %pS [%p] (%02x %02x %02x %02x %02x) 
%s:%d\n,
+  ip, ip, ip[0], ip[1], ip[2], ip[3], ip[4], __FILE__, line);
+   BUG();
+}
+
 static void __jump_label_transform(struct jump_entry *entry,
   enum jump_label_type type,
   void *(*poker)(void *, const void *, size_t),
@@ -37,7 +49,8 @@ static void __jump_label_transform(struct jump_entry *entry,
 * We are enabling this jump label. If it is not a nop
 * then something must have gone wrong.
 */
-   BUG_ON(memcmp((void *)entry-code, ideal_nop, 5) != 0);
+   if (unlikely(memcmp((void *)entry-code, ideal_nop, 5) != 0))
+   bug_at((void *)entry-code, __LINE__);
 
code.jump = 0xe9;
code.offset = entry-target -
@@ -51,12 +64,14 @@ static void __jump_label_transform(struct jump_entry *entry,
 */
if (init) {
const unsigned char default_nop[] = { 
STATIC_KEY_INIT_NOP };
-   BUG_ON(memcmp((void *)entry-code, default_nop, 5) != 
0);
+   if (unlikely(memcmp((void *)entry-code, default_nop, 
5) != 0))
+   bug_at((void *)entry-code, __LINE__);
} else {
code.jump = 0xe9;
code.offset = entry-target -
(entry-code + JUMP_LABEL_NOP_SIZE);
-   BUG_ON(memcmp((void *)entry-code, code, 5) != 0);
+   if (unlikely(memcmp((void *)entry-code, code, 5) != 
0))
+   bug_at((void *)entry-code, __LINE__);
}
memcpy(code, ideal_nops[NOP_ATOMIC5], JUMP_LABEL_NOP_SIZE);
}
-- 
1.7.10.4


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors

2013-08-07 Thread Borislav Petkov
On Wed, Aug 07, 2013 at 12:49:38PM -0400, Steven Rostedt wrote:
 From: Steven Rostedt srost...@redhat.com
 
 When modifying text sections for jump labels, a paranoid check is
 performed. If the check fails, the system bugs. But why it failed
 is not shown.
 
 The BUG_ON()s in the jump label update code is replaced with bug_at(ip).
 This is a function that will show what pointer failed, and what was
 at the location of the failure that made jump label panic.
 
 Signed-off-by: Steven Rostedt rost...@goodmis.org
 ---
  arch/x86/kernel/jump_label.c |   21 ++---
  1 file changed, 18 insertions(+), 3 deletions(-)
 
 diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
 index 24cf2b2..912a528 100644
 --- a/arch/x86/kernel/jump_label.c
 +++ b/arch/x86/kernel/jump_label.c
 @@ -24,6 +24,18 @@ union jump_code_union {
   } __attribute__((packed));
  };
  
 +static void bug_at(unsigned char *ip, int line)
 +{
 + /*
 +  * The location is not an op that we were expecting.
 +  * Something went wrong. Crash the box, as something could be
 +  * corrupting the kernel.
 +  */
 + pr_warning(Unexpected op at %pS [%p] (%02x %02x %02x %02x %02x) 
 %s:%d\n,
 +ip, ip, ip[0], ip[1], ip[2], ip[3], ip[4], __FILE__, line);
 + BUG();

Why not simply

panic(Unexpected...)

?

Besides, BUG can be disabled in CONFIG_EXPERT.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors

2013-08-07 Thread Steven Rostedt
On Wed, 2013-08-07 at 19:20 +0200, Borislav Petkov wrote:
   
  +static void bug_at(unsigned char *ip, int line)
  +{
  +   /*
  +* The location is not an op that we were expecting.
  +* Something went wrong. Crash the box, as something could be
  +* corrupting the kernel.
  +*/
  +   pr_warning(Unexpected op at %pS [%p] (%02x %02x %02x %02x %02x) 
  %s:%d\n,
  +  ip, ip, ip[0], ip[1], ip[2], ip[3], ip[4], __FILE__, line);
  +   BUG();
 
 Why not simply
 
   panic(Unexpected...)
 
 ?

Because the patch replaced BUG().

 
 Besides, BUG can be disabled in CONFIG_EXPERT.
 

Right, and this code keeps the same logic as it was before. If it was
disabled by CONFIG_EXPERT, it stays disabled, but at least you get to
see a warning that your kernel may be corrupt now :-)

-- Steve


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors

2013-08-07 Thread Borislav Petkov
On Wed, Aug 07, 2013 at 01:33:06PM -0400, Steven Rostedt wrote:
 Right, and this code keeps the same logic as it was before. If it was
 disabled by CONFIG_EXPERT, it stays disabled, but at least you get to
 see a warning that your kernel may be corrupt now :-)

Don't we really want to panic instead of running a corrupt kernel? IOW,
to change the logic to panic unconditionally because the image in memory
has been violated and not in a good way, at that :-)

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors

2013-08-07 Thread Steven Rostedt
On Wed, 2013-08-07 at 19:37 +0200, Borislav Petkov wrote:
 On Wed, Aug 07, 2013 at 01:33:06PM -0400, Steven Rostedt wrote:
  Right, and this code keeps the same logic as it was before. If it was
  disabled by CONFIG_EXPERT, it stays disabled, but at least you get to
  see a warning that your kernel may be corrupt now :-)
 
 Don't we really want to panic instead of running a corrupt kernel? IOW,
 to change the logic to panic unconditionally because the image in memory
 has been violated and not in a good way, at that :-)

Well, there's lots of places that use BUG() for a corrupt kernel. If you
are stupid enough to disable it, you get what you asked for.

-- Steve


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors

2013-08-07 Thread H. Peter Anvin
On 08/07/2013 10:46 AM, Steven Rostedt wrote:
 On Wed, 2013-08-07 at 19:37 +0200, Borislav Petkov wrote:
 On Wed, Aug 07, 2013 at 01:33:06PM -0400, Steven Rostedt wrote:
 Right, and this code keeps the same logic as it was before. If it was
 disabled by CONFIG_EXPERT, it stays disabled, but at least you get to
 see a warning that your kernel may be corrupt now :-)

 Don't we really want to panic instead of running a corrupt kernel? IOW,
 to change the logic to panic unconditionally because the image in memory
 has been violated and not in a good way, at that :-)
 
 Well, there's lots of places that use BUG() for a corrupt kernel. If you
 are stupid enough to disable it, you get what you asked for.
 

A bigger issue is probably if panic-on-bug should be the default, with
!panic being an opt-in debugging option.

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors

2013-08-07 Thread Borislav Petkov
On Wed, Aug 07, 2013 at 10:51:43AM -0700, H. Peter Anvin wrote:
 A bigger issue is probably if panic-on-bug should be the default, with
 !panic being an opt-in debugging option.

Yes, it might make sense although embedded wants to disable CONFIG_BUG
on systems which cannot report errors:

  │ CONFIG_BUG:
  │
  │ Disabling this option eliminates support for BUG and WARN, reducing
  │ the size of your kernel image and potentially quietly ignoring
  │ numerous fatal conditions. You should only consider disabling this
  │ option for embedded systems with no facilities for reporting errors.
  │ Just say Y.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] x86/jump-label: Show where and what was wrong on errors

2013-08-07 Thread H. Peter Anvin
On 08/07/2013 11:41 AM, Borislav Petkov wrote:
 On Wed, Aug 07, 2013 at 10:51:43AM -0700, H. Peter Anvin wrote:
 A bigger issue is probably if panic-on-bug should be the default, with
 !panic being an opt-in debugging option.
 
 Yes, it might make sense although embedded wants to disable CONFIG_BUG
 on systems which cannot report errors:
 

There is another thread going on to convert BUG to trap even on machines
where CONFIG_BUG is disabled, i.e. CONFIG_BUG would control the metadata
only.  With the configuration option off, you would get the address of
the failure only.

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/