Re: mm: BUG: Bad page state in process ksmd

2014-03-27 Thread Hugh Dickins
On Wed, 26 Mar 2014, Sasha Levin wrote:
> On 03/26/2014 03:55 PM, Andrew Morton wrote:
> > On Wed, 26 Mar 2014 11:13:27 -0400 Sasha Levin 
> > wrote:
> > > Out of curiosity, is there a reason not to do bad flag checks when
> > > actually
> > > setting flag? Obviously it'll be slower but it'll be easier catching these
> > > issues.
> > 
> > Tricky.  Each code site must determine what are and are not valid page
> > states depending upon the current context.  The one place where we've
> > made that effort is at the point where a page is returned to the free
> > page pool.  Any other sites would require similar amounts of effort and
> > each one would be different from all the others.
> > 
> > We do this in a small way all over the place, against individual page
> > flags.  grep PageLocked */*.c.
> 
> What if we define generic page types and group page flags under them?
> It would be easier to put these checks in key sites around the code
> and no need to fully customize them to each site.
> 
> For exmaple, swap_readpage() is doing this:
> 
> VM_BUG_ON_PAGE(!PageLocked(page), page);
> VM_BUG_ON_PAGE(PageUptodate(page), page);
> 
> But what if instead of that we'd do:
> 
>   VM_BUG_ON_PAGE(!PageSwap(page), page);
> 
> Where PageSwap would test "not locked", "uptodate", and in addition
> a set of "sanity" flags which it didn't make sense to test individually
> everywhere (PageError()? PageReclaim()?).
> 
> I can add the infrastructure if that sounds good (and people promise to
> work with me on defining page types). I'd be happy to do all the testing
> involved in getting this to work right.

Sorry, I don't understand how you see that as a good idea.  I wonder
if you have cleverly put that suggestion into the thread, to push me
into a more timely response to the BUG than you usually get ?-)

It seems a bad idea to me in at least three ways: expending more
developer time on establishing what set of page flags to test at
each site; expending more developer time on fixing all the false
positives that would result; and spoiling the greppability of the
source tree by hiding flag checks in obscure combinations.

Page flags are separate flags because they are largely
independent.

Developers have inserted the VM_BUG_ONs they think are needed,
please leave them at that.  There may be a good case for removing
some of the older ones that have served their purpose (we rather
overused PageLocked checks in 2.4 for example), but not for
putting effort into adding more to what's there.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm: BUG: Bad page state in process ksmd

2014-03-27 Thread Sasha Levin

On 03/27/2014 11:21 AM, Hugh Dickins wrote:

I've thought about this some, and slept on it, but don't yet see
how it comes about.  I'll have to come back to it later.

Was it a one-off, or do you find it fairly easy to reproduce?

If the latter, it would be interesting to know if it comes from
recent changes or not.  mm/mlock.c does appear to have been under
continuous revision for several releases (but barely changed in next).


I can't say it's easy to reproduce but it did happen 5-6 times at this point.

As far as I can tell there were no big changes in trinity for the last week
or so while we were in lsf/mm, and this issue being reproducible makes me
believe it has something to do with recent changes to mm code.


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm: BUG: Bad page state in process ksmd

2014-03-27 Thread Hugh Dickins
On Wed, 26 Mar 2014, Sasha Levin wrote:
> Hi all,
> 
> While fuzzing with trinity inside a KVM tools guest running the latest -next
> kernel I've stumbled on the following.
> 
> Out of curiosity, is there a reason not to do bad flag checks when actually
> setting flag? Obviously it'll be slower but it'll be easier catching these
> issues.o

I don't see how it would help here.

> 
> [ 3926.683948] BUG: Bad page state in process ksmd  pfn:5a6246
> [ 3926.689336] page:ea0016989180 count:0 mapcount:0 mapping:
> (null) index:
> [ 3926.696507] page flags:
> 0x56f8028001c(referenced|uptodate|dirty|swapbacked|mlock
> [ 3926.709201] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> [ 3926.711216] bad because of flags:
> [ 3926.712136] page flags: 0x20(mlocked)
> [ 3926.713574] Modules linked in:
> [ 3926.714466] CPU: 26 PID: 3864 Comm: ksmd Tainted: GW
> 3.14.0-rc7-next-201
> [ 3926.720942]  85688060 8806ec7abc38 844bd702
> 2fa0
> [ 3926.728107]  ea0016989180 8806ec7abc68 844b158f
> 000f8000
> [ 3926.730563]   000f8000 85688060
> 8806ec7abcb8
> [ 3926.737653] Call Trace:
> [ 3926.738347]  dump_stack (lib/dump_stack.c:52)
> [ 3926.739841]  bad_page (arch/x86/include/asm/atomic.h:38
> include/linux/mm.h:432 mm/page_alloc.c:339)
> [ 3926.741296]  free_pages_prepare (mm/page_alloc.c:644 mm/page_alloc.c:738)
> [ 3926.742818]  free_hot_cold_page (mm/page_alloc.c:1371)
> [ 3926.749425]  __put_single_page (mm/swap.c:71)
> [ 3926.751074]  put_page (mm/swap.c:237)
> [ 3926.752398]  ksm_do_scan (mm/ksm.c:1480 mm/ksm.c:1704)
> [ 3926.753957]  ksm_scan_thread (mm/ksm.c:1723)
> [ 3926.755940]  ? bit_waitqueue (kernel/sched/wait.c:291)
> [ 3926.758644]  ? ksm_do_scan (mm/ksm.c:1715)
> [ 3926.760420]  kthread (kernel/kthread.c:219)
> [ 3926.761605]  ? kthread_create_on_node (kernel/kthread.c:185)
> [ 3926.763149]  ret_from_fork (arch/x86/kernel/entry_64.S:555)
> [ 3926.764323]  ? kthread_create_on_node (kernel/kthread.c:185)

I've thought about this some, and slept on it, but don't yet see
how it comes about.  I'll have to come back to it later.

Was it a one-off, or do you find it fairly easy to reproduce?

If the latter, it would be interesting to know if it comes from
recent changes or not.  mm/mlock.c does appear to have been under
continuous revision for several releases (but barely changed in next).

Thanks,
Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm: BUG: Bad page state in process ksmd

2014-03-27 Thread Hugh Dickins
On Wed, 26 Mar 2014, Sasha Levin wrote:
 Hi all,
 
 While fuzzing with trinity inside a KVM tools guest running the latest -next
 kernel I've stumbled on the following.
 
 Out of curiosity, is there a reason not to do bad flag checks when actually
 setting flag? Obviously it'll be slower but it'll be easier catching these
 issues.o

I don't see how it would help here.

 
 [ 3926.683948] BUG: Bad page state in process ksmd  pfn:5a6246
 [ 3926.689336] page:ea0016989180 count:0 mapcount:0 mapping:
 (null) index:
 [ 3926.696507] page flags:
 0x56f8028001c(referenced|uptodate|dirty|swapbacked|mlock
 [ 3926.709201] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
 [ 3926.711216] bad because of flags:
 [ 3926.712136] page flags: 0x20(mlocked)
 [ 3926.713574] Modules linked in:
 [ 3926.714466] CPU: 26 PID: 3864 Comm: ksmd Tainted: GW
 3.14.0-rc7-next-201
 [ 3926.720942]  85688060 8806ec7abc38 844bd702
 2fa0
 [ 3926.728107]  ea0016989180 8806ec7abc68 844b158f
 000f8000
 [ 3926.730563]   000f8000 85688060
 8806ec7abcb8
 [ 3926.737653] Call Trace:
 [ 3926.738347]  dump_stack (lib/dump_stack.c:52)
 [ 3926.739841]  bad_page (arch/x86/include/asm/atomic.h:38
 include/linux/mm.h:432 mm/page_alloc.c:339)
 [ 3926.741296]  free_pages_prepare (mm/page_alloc.c:644 mm/page_alloc.c:738)
 [ 3926.742818]  free_hot_cold_page (mm/page_alloc.c:1371)
 [ 3926.749425]  __put_single_page (mm/swap.c:71)
 [ 3926.751074]  put_page (mm/swap.c:237)
 [ 3926.752398]  ksm_do_scan (mm/ksm.c:1480 mm/ksm.c:1704)
 [ 3926.753957]  ksm_scan_thread (mm/ksm.c:1723)
 [ 3926.755940]  ? bit_waitqueue (kernel/sched/wait.c:291)
 [ 3926.758644]  ? ksm_do_scan (mm/ksm.c:1715)
 [ 3926.760420]  kthread (kernel/kthread.c:219)
 [ 3926.761605]  ? kthread_create_on_node (kernel/kthread.c:185)
 [ 3926.763149]  ret_from_fork (arch/x86/kernel/entry_64.S:555)
 [ 3926.764323]  ? kthread_create_on_node (kernel/kthread.c:185)

I've thought about this some, and slept on it, but don't yet see
how it comes about.  I'll have to come back to it later.

Was it a one-off, or do you find it fairly easy to reproduce?

If the latter, it would be interesting to know if it comes from
recent changes or not.  mm/mlock.c does appear to have been under
continuous revision for several releases (but barely changed in next).

Thanks,
Hugh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm: BUG: Bad page state in process ksmd

2014-03-27 Thread Sasha Levin

On 03/27/2014 11:21 AM, Hugh Dickins wrote:

I've thought about this some, and slept on it, but don't yet see
how it comes about.  I'll have to come back to it later.

Was it a one-off, or do you find it fairly easy to reproduce?

If the latter, it would be interesting to know if it comes from
recent changes or not.  mm/mlock.c does appear to have been under
continuous revision for several releases (but barely changed in next).


I can't say it's easy to reproduce but it did happen 5-6 times at this point.

As far as I can tell there were no big changes in trinity for the last week
or so while we were in lsf/mm, and this issue being reproducible makes me
believe it has something to do with recent changes to mm code.


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm: BUG: Bad page state in process ksmd

2014-03-27 Thread Hugh Dickins
On Wed, 26 Mar 2014, Sasha Levin wrote:
 On 03/26/2014 03:55 PM, Andrew Morton wrote:
  On Wed, 26 Mar 2014 11:13:27 -0400 Sasha Levin sasha.le...@oracle.com
  wrote:
   Out of curiosity, is there a reason not to do bad flag checks when
   actually
   setting flag? Obviously it'll be slower but it'll be easier catching these
   issues.
  
  Tricky.  Each code site must determine what are and are not valid page
  states depending upon the current context.  The one place where we've
  made that effort is at the point where a page is returned to the free
  page pool.  Any other sites would require similar amounts of effort and
  each one would be different from all the others.
  
  We do this in a small way all over the place, against individual page
  flags.  grep PageLocked */*.c.
 
 What if we define generic page types and group page flags under them?
 It would be easier to put these checks in key sites around the code
 and no need to fully customize them to each site.
 
 For exmaple, swap_readpage() is doing this:
 
 VM_BUG_ON_PAGE(!PageLocked(page), page);
 VM_BUG_ON_PAGE(PageUptodate(page), page);
 
 But what if instead of that we'd do:
 
   VM_BUG_ON_PAGE(!PageSwap(page), page);
 
 Where PageSwap would test not locked, uptodate, and in addition
 a set of sanity flags which it didn't make sense to test individually
 everywhere (PageError()? PageReclaim()?).
 
 I can add the infrastructure if that sounds good (and people promise to
 work with me on defining page types). I'd be happy to do all the testing
 involved in getting this to work right.

Sorry, I don't understand how you see that as a good idea.  I wonder
if you have cleverly put that suggestion into the thread, to push me
into a more timely response to the BUG than you usually get ?-)

It seems a bad idea to me in at least three ways: expending more
developer time on establishing what set of page flags to test at
each site; expending more developer time on fixing all the false
positives that would result; and spoiling the greppability of the
source tree by hiding flag checks in obscure combinations.

Page flags are separate flags because they are largely
independent.

Developers have inserted the VM_BUG_ONs they think are needed,
please leave them at that.  There may be a good case for removing
some of the older ones that have served their purpose (we rather
overused PageLocked checks in 2.4 for example), but not for
putting effort into adding more to what's there.

Hugh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm: BUG: Bad page state in process ksmd

2014-03-26 Thread Sasha Levin

On 03/26/2014 03:55 PM, Andrew Morton wrote:

On Wed, 26 Mar 2014 11:13:27 -0400 Sasha Levin  wrote:

Out of curiosity, is there a reason not to do bad flag checks when actually
setting flag? Obviously it'll be slower but it'll be easier catching these
issues.


Tricky.  Each code site must determine what are and are not valid page
states depending upon the current context.  The one place where we've
made that effort is at the point where a page is returned to the free
page pool.  Any other sites would require similar amounts of effort and
each one would be different from all the others.

We do this in a small way all over the place, against individual page
flags.  grep PageLocked */*.c.


What if we define generic page types and group page flags under them?
It would be easier to put these checks in key sites around the code
and no need to fully customize them to each site.

For exmaple, swap_readpage() is doing this:

VM_BUG_ON_PAGE(!PageLocked(page), page);
VM_BUG_ON_PAGE(PageUptodate(page), page);

But what if instead of that we'd do:

VM_BUG_ON_PAGE(!PageSwap(page), page);

Where PageSwap would test "not locked", "uptodate", and in addition
a set of "sanity" flags which it didn't make sense to test individually
everywhere (PageError()? PageReclaim()?).

I can add the infrastructure if that sounds good (and people promise to
work with me on defining page types). I'd be happy to do all the testing
involved in getting this to work right.


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm: BUG: Bad page state in process ksmd

2014-03-26 Thread Andrew Morton
On Wed, 26 Mar 2014 11:13:27 -0400 Sasha Levin  wrote:

> Hi all,
> 
> While fuzzing with trinity inside a KVM tools guest running the latest -next
> kernel I've stumbled on the following.

(cc Hugh)

> Out of curiosity, is there a reason not to do bad flag checks when actually
> setting flag? Obviously it'll be slower but it'll be easier catching these
> issues.

Tricky.  Each code site must determine what are and are not valid page
states depending upon the current context.  The one place where we've
made that effort is at the point where a page is returned to the free
page pool.  Any other sites would require similar amounts of effort and
each one would be different from all the others.

We do this in a small way all over the place, against individual page
flags.  grep PageLocked */*.c.

> [ 3926.683948] BUG: Bad page state in process ksmd  pfn:5a6246
> [ 3926.689336] page:ea0016989180 count:0 mapcount:0 mapping:  
> (null) index:
> [ 3926.696507] page flags: 
> 0x56f8028001c(referenced|uptodate|dirty|swapbacked|mlock
> [ 3926.709201] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> [ 3926.711216] bad because of flags:
> [ 3926.712136] page flags: 0x20(mlocked)
> [ 3926.713574] Modules linked in:
> [ 3926.714466] CPU: 26 PID: 3864 Comm: ksmd Tainted: GW 
> 3.14.0-rc7-next-201
> [ 3926.720942]  85688060 8806ec7abc38 844bd702 
> 2fa0
> [ 3926.728107]  ea0016989180 8806ec7abc68 844b158f 
> 000f8000
> [ 3926.730563]   000f8000 85688060 
> 8806ec7abcb8
> [ 3926.737653] Call Trace:
> [ 3926.738347]  dump_stack (lib/dump_stack.c:52)
> [ 3926.739841]  bad_page (arch/x86/include/asm/atomic.h:38 
> include/linux/mm.h:432 mm/page_alloc.c:339)
> [ 3926.741296]  free_pages_prepare (mm/page_alloc.c:644 mm/page_alloc.c:738)
> [ 3926.742818]  free_hot_cold_page (mm/page_alloc.c:1371)
> [ 3926.749425]  __put_single_page (mm/swap.c:71)
> [ 3926.751074]  put_page (mm/swap.c:237)
> [ 3926.752398]  ksm_do_scan (mm/ksm.c:1480 mm/ksm.c:1704)
> [ 3926.753957]  ksm_scan_thread (mm/ksm.c:1723)
> [ 3926.755940]  ? bit_waitqueue (kernel/sched/wait.c:291)
> [ 3926.758644]  ? ksm_do_scan (mm/ksm.c:1715)
> [ 3926.760420]  kthread (kernel/kthread.c:219)
> [ 3926.761605]  ? kthread_create_on_node (kernel/kthread.c:185)
> [ 3926.763149]  ret_from_fork (arch/x86/kernel/entry_64.S:555)
> [ 3926.764323]  ? kthread_create_on_node (kernel/kthread.c:185)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


mm: BUG: Bad page state in process ksmd

2014-03-26 Thread Sasha Levin

Hi all,

While fuzzing with trinity inside a KVM tools guest running the latest -next
kernel I've stumbled on the following.

Out of curiosity, is there a reason not to do bad flag checks when actually
setting flag? Obviously it'll be slower but it'll be easier catching these
issues.

[ 3926.683948] BUG: Bad page state in process ksmd  pfn:5a6246
[ 3926.689336] page:ea0016989180 count:0 mapcount:0 mapping:  
(null) index:
[ 3926.696507] page flags: 
0x56f8028001c(referenced|uptodate|dirty|swapbacked|mlock
[ 3926.709201] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
[ 3926.711216] bad because of flags:
[ 3926.712136] page flags: 0x20(mlocked)
[ 3926.713574] Modules linked in:
[ 3926.714466] CPU: 26 PID: 3864 Comm: ksmd Tainted: GW 
3.14.0-rc7-next-201
[ 3926.720942]  85688060 8806ec7abc38 844bd702 
2fa0
[ 3926.728107]  ea0016989180 8806ec7abc68 844b158f 
000f8000
[ 3926.730563]   000f8000 85688060 
8806ec7abcb8
[ 3926.737653] Call Trace:
[ 3926.738347]  dump_stack (lib/dump_stack.c:52)
[ 3926.739841]  bad_page (arch/x86/include/asm/atomic.h:38 
include/linux/mm.h:432 mm/page_alloc.c:339)
[ 3926.741296]  free_pages_prepare (mm/page_alloc.c:644 mm/page_alloc.c:738)
[ 3926.742818]  free_hot_cold_page (mm/page_alloc.c:1371)
[ 3926.749425]  __put_single_page (mm/swap.c:71)
[ 3926.751074]  put_page (mm/swap.c:237)
[ 3926.752398]  ksm_do_scan (mm/ksm.c:1480 mm/ksm.c:1704)
[ 3926.753957]  ksm_scan_thread (mm/ksm.c:1723)
[ 3926.755940]  ? bit_waitqueue (kernel/sched/wait.c:291)
[ 3926.758644]  ? ksm_do_scan (mm/ksm.c:1715)
[ 3926.760420]  kthread (kernel/kthread.c:219)
[ 3926.761605]  ? kthread_create_on_node (kernel/kthread.c:185)
[ 3926.763149]  ret_from_fork (arch/x86/kernel/entry_64.S:555)
[ 3926.764323]  ? kthread_create_on_node (kernel/kthread.c:185)


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


mm: BUG: Bad page state in process ksmd

2014-03-26 Thread Sasha Levin

Hi all,

While fuzzing with trinity inside a KVM tools guest running the latest -next
kernel I've stumbled on the following.

Out of curiosity, is there a reason not to do bad flag checks when actually
setting flag? Obviously it'll be slower but it'll be easier catching these
issues.

[ 3926.683948] BUG: Bad page state in process ksmd  pfn:5a6246
[ 3926.689336] page:ea0016989180 count:0 mapcount:0 mapping:  
(null) index:
[ 3926.696507] page flags: 
0x56f8028001c(referenced|uptodate|dirty|swapbacked|mlock
[ 3926.709201] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
[ 3926.711216] bad because of flags:
[ 3926.712136] page flags: 0x20(mlocked)
[ 3926.713574] Modules linked in:
[ 3926.714466] CPU: 26 PID: 3864 Comm: ksmd Tainted: GW 
3.14.0-rc7-next-201
[ 3926.720942]  85688060 8806ec7abc38 844bd702 
2fa0
[ 3926.728107]  ea0016989180 8806ec7abc68 844b158f 
000f8000
[ 3926.730563]   000f8000 85688060 
8806ec7abcb8
[ 3926.737653] Call Trace:
[ 3926.738347]  dump_stack (lib/dump_stack.c:52)
[ 3926.739841]  bad_page (arch/x86/include/asm/atomic.h:38 
include/linux/mm.h:432 mm/page_alloc.c:339)
[ 3926.741296]  free_pages_prepare (mm/page_alloc.c:644 mm/page_alloc.c:738)
[ 3926.742818]  free_hot_cold_page (mm/page_alloc.c:1371)
[ 3926.749425]  __put_single_page (mm/swap.c:71)
[ 3926.751074]  put_page (mm/swap.c:237)
[ 3926.752398]  ksm_do_scan (mm/ksm.c:1480 mm/ksm.c:1704)
[ 3926.753957]  ksm_scan_thread (mm/ksm.c:1723)
[ 3926.755940]  ? bit_waitqueue (kernel/sched/wait.c:291)
[ 3926.758644]  ? ksm_do_scan (mm/ksm.c:1715)
[ 3926.760420]  kthread (kernel/kthread.c:219)
[ 3926.761605]  ? kthread_create_on_node (kernel/kthread.c:185)
[ 3926.763149]  ret_from_fork (arch/x86/kernel/entry_64.S:555)
[ 3926.764323]  ? kthread_create_on_node (kernel/kthread.c:185)


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm: BUG: Bad page state in process ksmd

2014-03-26 Thread Andrew Morton
On Wed, 26 Mar 2014 11:13:27 -0400 Sasha Levin sasha.le...@oracle.com wrote:

 Hi all,
 
 While fuzzing with trinity inside a KVM tools guest running the latest -next
 kernel I've stumbled on the following.

(cc Hugh)

 Out of curiosity, is there a reason not to do bad flag checks when actually
 setting flag? Obviously it'll be slower but it'll be easier catching these
 issues.

Tricky.  Each code site must determine what are and are not valid page
states depending upon the current context.  The one place where we've
made that effort is at the point where a page is returned to the free
page pool.  Any other sites would require similar amounts of effort and
each one would be different from all the others.

We do this in a small way all over the place, against individual page
flags.  grep PageLocked */*.c.

 [ 3926.683948] BUG: Bad page state in process ksmd  pfn:5a6246
 [ 3926.689336] page:ea0016989180 count:0 mapcount:0 mapping:  
 (null) index:
 [ 3926.696507] page flags: 
 0x56f8028001c(referenced|uptodate|dirty|swapbacked|mlock
 [ 3926.709201] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
 [ 3926.711216] bad because of flags:
 [ 3926.712136] page flags: 0x20(mlocked)
 [ 3926.713574] Modules linked in:
 [ 3926.714466] CPU: 26 PID: 3864 Comm: ksmd Tainted: GW 
 3.14.0-rc7-next-201
 [ 3926.720942]  85688060 8806ec7abc38 844bd702 
 2fa0
 [ 3926.728107]  ea0016989180 8806ec7abc68 844b158f 
 000f8000
 [ 3926.730563]   000f8000 85688060 
 8806ec7abcb8
 [ 3926.737653] Call Trace:
 [ 3926.738347]  dump_stack (lib/dump_stack.c:52)
 [ 3926.739841]  bad_page (arch/x86/include/asm/atomic.h:38 
 include/linux/mm.h:432 mm/page_alloc.c:339)
 [ 3926.741296]  free_pages_prepare (mm/page_alloc.c:644 mm/page_alloc.c:738)
 [ 3926.742818]  free_hot_cold_page (mm/page_alloc.c:1371)
 [ 3926.749425]  __put_single_page (mm/swap.c:71)
 [ 3926.751074]  put_page (mm/swap.c:237)
 [ 3926.752398]  ksm_do_scan (mm/ksm.c:1480 mm/ksm.c:1704)
 [ 3926.753957]  ksm_scan_thread (mm/ksm.c:1723)
 [ 3926.755940]  ? bit_waitqueue (kernel/sched/wait.c:291)
 [ 3926.758644]  ? ksm_do_scan (mm/ksm.c:1715)
 [ 3926.760420]  kthread (kernel/kthread.c:219)
 [ 3926.761605]  ? kthread_create_on_node (kernel/kthread.c:185)
 [ 3926.763149]  ret_from_fork (arch/x86/kernel/entry_64.S:555)
 [ 3926.764323]  ? kthread_create_on_node (kernel/kthread.c:185)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm: BUG: Bad page state in process ksmd

2014-03-26 Thread Sasha Levin

On 03/26/2014 03:55 PM, Andrew Morton wrote:

On Wed, 26 Mar 2014 11:13:27 -0400 Sasha Levin sasha.le...@oracle.com wrote:

Out of curiosity, is there a reason not to do bad flag checks when actually
setting flag? Obviously it'll be slower but it'll be easier catching these
issues.


Tricky.  Each code site must determine what are and are not valid page
states depending upon the current context.  The one place where we've
made that effort is at the point where a page is returned to the free
page pool.  Any other sites would require similar amounts of effort and
each one would be different from all the others.

We do this in a small way all over the place, against individual page
flags.  grep PageLocked */*.c.


What if we define generic page types and group page flags under them?
It would be easier to put these checks in key sites around the code
and no need to fully customize them to each site.

For exmaple, swap_readpage() is doing this:

VM_BUG_ON_PAGE(!PageLocked(page), page);
VM_BUG_ON_PAGE(PageUptodate(page), page);

But what if instead of that we'd do:

VM_BUG_ON_PAGE(!PageSwap(page), page);

Where PageSwap would test not locked, uptodate, and in addition
a set of sanity flags which it didn't make sense to test individually
everywhere (PageError()? PageReclaim()?).

I can add the infrastructure if that sounds good (and people promise to
work with me on defining page types). I'd be happy to do all the testing
involved in getting this to work right.


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/