Re: [PATCH V5] Allow compaction of unevictable pages
On Mon, Mar 16, 2015 at 09:49:56AM -0400, Eric B Munson wrote: > On Mon, 16 Mar 2015, Vlastimil Babka wrote: > > > [CC += linux-api@] > > > > Since this is a kernel-user-space API change, please CC linux-api@. > > The kernel source file Documentation/SubmitChecklist notes that all > > Linux kernel patches that change userspace interfaces should be CCed > > to linux-...@vger.kernel.org, so that the various parties who are > > interested in API changes are informed. For further information, see > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.kernel.org_doc_man-2Dpages_linux-2Dapi-2Dml.html&d=AwIC-g&c=96ZbZZcaMF4w0F4jpN6LZg&r=aUmMDRRT0nx4IfILbQLv8xzE0wB9sQxTHI3QrQ2lkBU&m=GUotTNnv26L0HxtXrBgiHqu6kwW3ufx2_TQpXIA216c&s=IFFYQ7Zr-4SIaF3slOZqiSP_noyva42kCwVRxxDm5wo&e= > > Added to the Cc list, thanks. > > > > > > > On 03/13/2015 09:19 PM, Michal Hocko wrote: > > >On Fri 13-03-15 15:09:15, Eric B Munson wrote: > > >>On Fri, 13 Mar 2015, Rik van Riel wrote: > > >> > > >>>On 03/13/2015 01:26 PM, Eric B Munson wrote: > > >>> > > --- a/mm/compaction.c > > +++ b/mm/compaction.c > > @@ -1046,6 +1046,8 @@ typedef enum { > > ISOLATE_SUCCESS,/* Pages isolated, migrate */ > > } isolate_migrate_t; > > > > +int sysctl_compact_unevictable; > > > > A comment here would be useful I think, as well as explicit default > > value. Maybe also __read_mostly although I don't know how much that > > matters. > > I am going to sit on V6 for a couple of days incase anyone from rt wants > to chime in. But these will be in V6. > > > > > I also wonder if it might be confusing that "compact_memory" is a > > write-only trigger that doesn't even show under "sysctl -a", while > > "compact_unevictable" is a read/write setting. But I don't have a > > better suggestion right now. > > Does allow_unevictable_compaction sound better? It feels too much like > variable naming conventions from other languages which seems to > encourage verbosity to me, but does indicate a difference from > compact_memory. > > > > > + > > /* > > * Isolate all pages that can be migrated from the first suitable > > block, > > * starting at the block pointed to by the migrate scanner pfn within > > >>> > > >>>I suspect that the use cases where users absolutely do not want > > >>>unevictable pages migrated are special cases, and it may make > > >>>sense to enable sysctl_compact_unevictable by default. > > >> > > >>Given that sysctl_compact_unevictable=0 is the way the kernel behaves > > >>now and the push back against always enabling compaction on unevictable > > >>pages, I left the default to be the behavior as it is today. > > > > > >The question is _why_ we have this behavior now. Is it intentional? > > > > It's there since 748446bb6 ("mm: compaction: memory compaction > > core"). Commit c53919adc0 ("mm: vmscan: remove lumpy reclaim") > > changes the comment in __isolate_lru_page() handling of unevictable > > pages to mention compaction explicitly. It could have been > > accidental in 748446bb6 though, maybe it just reused > > __isolate_lru_page() for compaction - it seems that the skipping of > > unevictable was initially meant to optimize lumpy reclaim. > > > > >e46a28790e59 (CMA: migrate mlocked pages) is a precedence in that > > > > Well, CMA and realtime kernels are probably mutually exclusive enough. > > > > >direction. Vlastimil has then changed that by edc2ca612496 (mm, > > >compaction: move pageblock checks up from isolate_migratepages_range()). > > >There is no mention about mlock pages so I guess it was more an > > >unintentional side effect of the patch. At least that is my current > > >understanding. I might be wrong here. > > > > Although that commit did change unintentionally more details that I > > would have liked (unfortunately), I think you are wrong on this one. > > ISOLATE_UNEVICTABLE is still passed from > > isolate_migratepages_range() which is used by CMA, while the > > compaction variant isolate_migratepages() does not pass it. So it's > > kept CMA-specific as before. > > > > >The thing about RT is that it is not usable with the upstream kernel > > >without the RT patchset AFAIU. So the default should be reflect what is > > >better for the standard kernel. RT loads have to tune the system anyway > > >so it is not so surprising they would disable this option as well. We > > >should help those guys and do not require them to touch the code but the > > >knob is reasonable IMHO. > > > > > >Especially when your changelog suggests that having this enabled by > > >default is beneficial for the standard kernel. > > > > I agree, but if there's a danger of becoming too of a bikeshed > > topic, I'm fine with keeping the default same as current behavior > > and changing it later. Or maybe we should ask some -rt mailing list > > instead of just Peter and Thomas? > > According to the rt wiki, there is no -rt development list so lkml is > it. I will chan
Re: [PATCH V5] Allow compaction of unevictable pages
On 03/16/2015 02:49 PM, Eric B Munson wrote: On Mon, 16 Mar 2015, Vlastimil Babka wrote: [CC += linux-api@] Since this is a kernel-user-space API change, please CC linux-api@. The kernel source file Documentation/SubmitChecklist notes that all Linux kernel patches that change userspace interfaces should be CCed to linux-...@vger.kernel.org, so that the various parties who are interested in API changes are informed. For further information, see https://urldefense.proofpoint.com/v2/url?u=https-3A__www.kernel.org_doc_man-2Dpages_linux-2Dapi-2Dml.html&d=AwIC-g&c=96ZbZZcaMF4w0F4jpN6LZg&r=aUmMDRRT0nx4IfILbQLv8xzE0wB9sQxTHI3QrQ2lkBU&m=GUotTNnv26L0HxtXrBgiHqu6kwW3ufx2_TQpXIA216c&s=IFFYQ7Zr-4SIaF3slOZqiSP_noyva42kCwVRxxDm5wo&e= Added to the Cc list, thanks. On 03/13/2015 09:19 PM, Michal Hocko wrote: On Fri 13-03-15 15:09:15, Eric B Munson wrote: On Fri, 13 Mar 2015, Rik van Riel wrote: On 03/13/2015 01:26 PM, Eric B Munson wrote: --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1046,6 +1046,8 @@ typedef enum { ISOLATE_SUCCESS,/* Pages isolated, migrate */ } isolate_migrate_t; +int sysctl_compact_unevictable; A comment here would be useful I think, as well as explicit default value. Maybe also __read_mostly although I don't know how much that matters. I am going to sit on V6 for a couple of days incase anyone from rt wants to chime in. But these will be in V6. I also wonder if it might be confusing that "compact_memory" is a write-only trigger that doesn't even show under "sysctl -a", while "compact_unevictable" is a read/write setting. But I don't have a better suggestion right now. Does allow_unevictable_compaction sound better? It feels too much like For sorting purposes, maybe compact_unevictable_allowed? variable naming conventions from other languages which seems to encourage verbosity to me, but does indicate a difference from compact_memory. If it sounds too awkward/long and nobody else has better suggestion, then just keep it as "compact_unevictable". -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V5] Allow compaction of unevictable pages
On Fri, 13 Mar 2015, David Rientjes wrote: > It would be really disappointing to not enable this by default for !rt > kernels. We haven't migrated mlocked pages in the past by way of memory > compaction because it can theoretically result in consistent minor page > faults, but I haven't yet heard a !rt objection to enabling this. > > If the rt patchset is going to carry a patch to disable this, then the > question arises: why not just carry an ISOLATE_UNEVICTABLE patch instead? > I think you've done the due diligence required to allow this to be > disabled at any time in a very easy way from userspace by the new tunable. > I think it should be enabled and I'd be very surprised to hear any other > objection about it other than it's different from the status quo. Compaction can alrady be disabled and thus you can also disable migration of mlocked pages. In general low latency requires that no expensive kernel processing is being done. Thus the rest of compaction processing also needs to be disabled. That means that allowing compaction handling mlocked pages would be ok. RT loads and low latency configurations (like my environment) will selective disable compaction to avoid creating additional latencies. This could be done only for specific nodes and processors if necessary. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V5] Allow compaction of unevictable pages
On Mon, 16 Mar 2015, Vlastimil Babka wrote: > [CC += linux-api@] > > Since this is a kernel-user-space API change, please CC linux-api@. > The kernel source file Documentation/SubmitChecklist notes that all > Linux kernel patches that change userspace interfaces should be CCed > to linux-...@vger.kernel.org, so that the various parties who are > interested in API changes are informed. For further information, see > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.kernel.org_doc_man-2Dpages_linux-2Dapi-2Dml.html&d=AwIC-g&c=96ZbZZcaMF4w0F4jpN6LZg&r=aUmMDRRT0nx4IfILbQLv8xzE0wB9sQxTHI3QrQ2lkBU&m=GUotTNnv26L0HxtXrBgiHqu6kwW3ufx2_TQpXIA216c&s=IFFYQ7Zr-4SIaF3slOZqiSP_noyva42kCwVRxxDm5wo&e= Added to the Cc list, thanks. > > > On 03/13/2015 09:19 PM, Michal Hocko wrote: > >On Fri 13-03-15 15:09:15, Eric B Munson wrote: > >>On Fri, 13 Mar 2015, Rik van Riel wrote: > >> > >>>On 03/13/2015 01:26 PM, Eric B Munson wrote: > >>> > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -1046,6 +1046,8 @@ typedef enum { > ISOLATE_SUCCESS,/* Pages isolated, migrate */ > } isolate_migrate_t; > > +int sysctl_compact_unevictable; > > A comment here would be useful I think, as well as explicit default > value. Maybe also __read_mostly although I don't know how much that > matters. I am going to sit on V6 for a couple of days incase anyone from rt wants to chime in. But these will be in V6. > > I also wonder if it might be confusing that "compact_memory" is a > write-only trigger that doesn't even show under "sysctl -a", while > "compact_unevictable" is a read/write setting. But I don't have a > better suggestion right now. Does allow_unevictable_compaction sound better? It feels too much like variable naming conventions from other languages which seems to encourage verbosity to me, but does indicate a difference from compact_memory. > > + > /* > * Isolate all pages that can be migrated from the first suitable block, > * starting at the block pointed to by the migrate scanner pfn within > >>> > >>>I suspect that the use cases where users absolutely do not want > >>>unevictable pages migrated are special cases, and it may make > >>>sense to enable sysctl_compact_unevictable by default. > >> > >>Given that sysctl_compact_unevictable=0 is the way the kernel behaves > >>now and the push back against always enabling compaction on unevictable > >>pages, I left the default to be the behavior as it is today. > > > >The question is _why_ we have this behavior now. Is it intentional? > > It's there since 748446bb6 ("mm: compaction: memory compaction > core"). Commit c53919adc0 ("mm: vmscan: remove lumpy reclaim") > changes the comment in __isolate_lru_page() handling of unevictable > pages to mention compaction explicitly. It could have been > accidental in 748446bb6 though, maybe it just reused > __isolate_lru_page() for compaction - it seems that the skipping of > unevictable was initially meant to optimize lumpy reclaim. > > >e46a28790e59 (CMA: migrate mlocked pages) is a precedence in that > > Well, CMA and realtime kernels are probably mutually exclusive enough. > > >direction. Vlastimil has then changed that by edc2ca612496 (mm, > >compaction: move pageblock checks up from isolate_migratepages_range()). > >There is no mention about mlock pages so I guess it was more an > >unintentional side effect of the patch. At least that is my current > >understanding. I might be wrong here. > > Although that commit did change unintentionally more details that I > would have liked (unfortunately), I think you are wrong on this one. > ISOLATE_UNEVICTABLE is still passed from > isolate_migratepages_range() which is used by CMA, while the > compaction variant isolate_migratepages() does not pass it. So it's > kept CMA-specific as before. > > >The thing about RT is that it is not usable with the upstream kernel > >without the RT patchset AFAIU. So the default should be reflect what is > >better for the standard kernel. RT loads have to tune the system anyway > >so it is not so surprising they would disable this option as well. We > >should help those guys and do not require them to touch the code but the > >knob is reasonable IMHO. > > > >Especially when your changelog suggests that having this enabled by > >default is beneficial for the standard kernel. > > I agree, but if there's a danger of becoming too of a bikeshed > topic, I'm fine with keeping the default same as current behavior > and changing it later. Or maybe we should ask some -rt mailing list > instead of just Peter and Thomas? According to the rt wiki, there is no -rt development list so lkml is it. I will change the default to 1 for V6 if I don't hear otherwise by the time I get back around to spinning V6. signature.asc Description: Digital signature
Re: [PATCH V5] Allow compaction of unevictable pages
[CC += linux-api@] Since this is a kernel-user-space API change, please CC linux-api@. The kernel source file Documentation/SubmitChecklist notes that all Linux kernel patches that change userspace interfaces should be CCed to linux-...@vger.kernel.org, so that the various parties who are interested in API changes are informed. For further information, see https://www.kernel.org/doc/man-pages/linux-api-ml.html On 03/13/2015 09:19 PM, Michal Hocko wrote: On Fri 13-03-15 15:09:15, Eric B Munson wrote: On Fri, 13 Mar 2015, Rik van Riel wrote: On 03/13/2015 01:26 PM, Eric B Munson wrote: --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1046,6 +1046,8 @@ typedef enum { ISOLATE_SUCCESS,/* Pages isolated, migrate */ } isolate_migrate_t; +int sysctl_compact_unevictable; A comment here would be useful I think, as well as explicit default value. Maybe also __read_mostly although I don't know how much that matters. I also wonder if it might be confusing that "compact_memory" is a write-only trigger that doesn't even show under "sysctl -a", while "compact_unevictable" is a read/write setting. But I don't have a better suggestion right now. + /* * Isolate all pages that can be migrated from the first suitable block, * starting at the block pointed to by the migrate scanner pfn within I suspect that the use cases where users absolutely do not want unevictable pages migrated are special cases, and it may make sense to enable sysctl_compact_unevictable by default. Given that sysctl_compact_unevictable=0 is the way the kernel behaves now and the push back against always enabling compaction on unevictable pages, I left the default to be the behavior as it is today. The question is _why_ we have this behavior now. Is it intentional? It's there since 748446bb6 ("mm: compaction: memory compaction core"). Commit c53919adc0 ("mm: vmscan: remove lumpy reclaim") changes the comment in __isolate_lru_page() handling of unevictable pages to mention compaction explicitly. It could have been accidental in 748446bb6 though, maybe it just reused __isolate_lru_page() for compaction - it seems that the skipping of unevictable was initially meant to optimize lumpy reclaim. e46a28790e59 (CMA: migrate mlocked pages) is a precedence in that Well, CMA and realtime kernels are probably mutually exclusive enough. direction. Vlastimil has then changed that by edc2ca612496 (mm, compaction: move pageblock checks up from isolate_migratepages_range()). There is no mention about mlock pages so I guess it was more an unintentional side effect of the patch. At least that is my current understanding. I might be wrong here. Although that commit did change unintentionally more details that I would have liked (unfortunately), I think you are wrong on this one. ISOLATE_UNEVICTABLE is still passed from isolate_migratepages_range() which is used by CMA, while the compaction variant isolate_migratepages() does not pass it. So it's kept CMA-specific as before. The thing about RT is that it is not usable with the upstream kernel without the RT patchset AFAIU. So the default should be reflect what is better for the standard kernel. RT loads have to tune the system anyway so it is not so surprising they would disable this option as well. We should help those guys and do not require them to touch the code but the knob is reasonable IMHO. Especially when your changelog suggests that having this enabled by default is beneficial for the standard kernel. I agree, but if there's a danger of becoming too of a bikeshed topic, I'm fine with keeping the default same as current behavior and changing it later. Or maybe we should ask some -rt mailing list instead of just Peter and Thomas? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V5] Allow compaction of unevictable pages
On 03/13/2015 07:18 PM, David Rientjes wrote: > On Fri, 13 Mar 2015, Eric B Munson wrote: > --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1046,6 +1046,8 @@ typedef enum { ISOLATE_SUCCESS,/* Pages isolated, migrate */ } isolate_migrate_t; +int sysctl_compact_unevictable; + /* * Isolate all pages that can be migrated from the first suitable block, * starting at the block pointed to by the migrate scanner pfn within >>> >>> I suspect that the use cases where users absolutely do not want >>> unevictable pages migrated are special cases, and it may make >>> sense to enable sysctl_compact_unevictable by default. >> >> Given that sysctl_compact_unevictable=0 is the way the kernel behaves >> now and the push back against always enabling compaction on unevictable >> pages, I left the default to be the behavior as it is today. I agree >> that this is likely the minority case, but I'd really like Peter Z or >> someone else from real time to say that they are okay with the default >> changing. >> > > It would be really disappointing to not enable this by default for !rt > kernels. We haven't migrated mlocked pages in the past by way of memory > compaction because it can theoretically result in consistent minor page > faults, but I haven't yet heard a !rt objection to enabling this. > > If the rt patchset is going to carry a patch to disable this It does not have to carry a patch to disable something that can be disabled at run time. The smaller the realtime patchset has to be, the better. -- All rights reversed -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V5] Allow compaction of unevictable pages
On Fri, 13 Mar 2015, Eric B Munson wrote: > > > --- a/mm/compaction.c > > > +++ b/mm/compaction.c > > > @@ -1046,6 +1046,8 @@ typedef enum { > > > ISOLATE_SUCCESS,/* Pages isolated, migrate */ > > > } isolate_migrate_t; > > > > > > +int sysctl_compact_unevictable; > > > + > > > /* > > > * Isolate all pages that can be migrated from the first suitable block, > > > * starting at the block pointed to by the migrate scanner pfn within > > > > I suspect that the use cases where users absolutely do not want > > unevictable pages migrated are special cases, and it may make > > sense to enable sysctl_compact_unevictable by default. > > Given that sysctl_compact_unevictable=0 is the way the kernel behaves > now and the push back against always enabling compaction on unevictable > pages, I left the default to be the behavior as it is today. I agree > that this is likely the minority case, but I'd really like Peter Z or > someone else from real time to say that they are okay with the default > changing. > It would be really disappointing to not enable this by default for !rt kernels. We haven't migrated mlocked pages in the past by way of memory compaction because it can theoretically result in consistent minor page faults, but I haven't yet heard a !rt objection to enabling this. If the rt patchset is going to carry a patch to disable this, then the question arises: why not just carry an ISOLATE_UNEVICTABLE patch instead? I think you've done the due diligence required to allow this to be disabled at any time in a very easy way from userspace by the new tunable. I think it should be enabled and I'd be very surprised to hear any other objection about it other than it's different from the status quo. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V5] Allow compaction of unevictable pages
On Fri 13-03-15 15:09:15, Eric B Munson wrote: > On Fri, 13 Mar 2015, Rik van Riel wrote: > > > On 03/13/2015 01:26 PM, Eric B Munson wrote: > > > > > --- a/mm/compaction.c > > > +++ b/mm/compaction.c > > > @@ -1046,6 +1046,8 @@ typedef enum { > > > ISOLATE_SUCCESS,/* Pages isolated, migrate */ > > > } isolate_migrate_t; > > > > > > +int sysctl_compact_unevictable; > > > + > > > /* > > > * Isolate all pages that can be migrated from the first suitable block, > > > * starting at the block pointed to by the migrate scanner pfn within > > > > I suspect that the use cases where users absolutely do not want > > unevictable pages migrated are special cases, and it may make > > sense to enable sysctl_compact_unevictable by default. > > Given that sysctl_compact_unevictable=0 is the way the kernel behaves > now and the push back against always enabling compaction on unevictable > pages, I left the default to be the behavior as it is today. The question is _why_ we have this behavior now. Is it intentional? e46a28790e59 (CMA: migrate mlocked pages) is a precedence in that direction. Vlastimil has then changed that by edc2ca612496 (mm, compaction: move pageblock checks up from isolate_migratepages_range()). There is no mention about mlock pages so I guess it was more an unintentional side effect of the patch. At least that is my current understanding. I might be wrong here. The thing about RT is that it is not usable with the upstream kernel without the RT patchset AFAIU. So the default should be reflect what is better for the standard kernel. RT loads have to tune the system anyway so it is not so surprising they would disable this option as well. We should help those guys and do not require them to touch the code but the knob is reasonable IMHO. Especially when your changelog suggests that having this enabled by default is beneficial for the standard kernel. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V5] Allow compaction of unevictable pages
On Fri 13-03-15 13:26:37, Eric B Munson wrote: [...] > +compact_unevictable > + > +Available only when CONFIG_COMPACTION is set. When set to 1, compaction is > +allowed to examine the unevictable lru (mlocked pages) for pages to compact. > +This should be used on systems where stalls for minor page faults are an > +acceptable trade for large contiguous free memory. Set to 0 to prevent > +compaction from moving pages that are unevictable. It is not clear which behavior is default from this text. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V5] Allow compaction of unevictable pages
On Fri, 13 Mar 2015, Rik van Riel wrote: > On 03/13/2015 01:26 PM, Eric B Munson wrote: > > > --- a/mm/compaction.c > > +++ b/mm/compaction.c > > @@ -1046,6 +1046,8 @@ typedef enum { > > ISOLATE_SUCCESS,/* Pages isolated, migrate */ > > } isolate_migrate_t; > > > > +int sysctl_compact_unevictable; > > + > > /* > > * Isolate all pages that can be migrated from the first suitable block, > > * starting at the block pointed to by the migrate scanner pfn within > > I suspect that the use cases where users absolutely do not want > unevictable pages migrated are special cases, and it may make > sense to enable sysctl_compact_unevictable by default. Given that sysctl_compact_unevictable=0 is the way the kernel behaves now and the push back against always enabling compaction on unevictable pages, I left the default to be the behavior as it is today. I agree that this is likely the minority case, but I'd really like Peter Z or someone else from real time to say that they are okay with the default changing. signature.asc Description: Digital signature
Re: [PATCH V5] Allow compaction of unevictable pages
On 03/13/2015 01:26 PM, Eric B Munson wrote: > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -1046,6 +1046,8 @@ typedef enum { > ISOLATE_SUCCESS,/* Pages isolated, migrate */ > } isolate_migrate_t; > > +int sysctl_compact_unevictable; > + > /* > * Isolate all pages that can be migrated from the first suitable block, > * starting at the block pointed to by the migrate scanner pfn within I suspect that the use cases where users absolutely do not want unevictable pages migrated are special cases, and it may make sense to enable sysctl_compact_unevictable by default. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V5] Allow compaction of unevictable pages
Currently, pages which are marked as unevictable are protected from compaction, but not from other types of migration. The POSIX real time extension explicitly states that mlock() will prevent a major page fault, but the spirit of is is that mlock() should give a process the ability to control sources of latency, including minor page faults. However, the mlock manpage only explicitly says that a locked page will not be written to swap and this can cause some confusion. The compaction code today, does not give a developer who wants to avoid swap but wants to have large contiguous areas available any method to achieve this state. This patch introduces a sysctl for controlling compaction behavoir with respect to the unevictable lru. Users that demand no page faults after a page is present can set compact_unevictable to 0 and users who need the large contiguous areas can enable compaction on locked memory by setting it to 1. To illustrate this problem I wrote a quick test program that mmaps a large number of 1MB files filled with random data. These maps are created locked and read only. Then every other mmap is unmapped and I attempt to allocate huge pages to the static huge page pool. When the compact_unevictable sysctl is 0, I cannot allocate hugepages after fragmenting memory. When the value is set to 1, allocations succeed. Signed-off-by: Eric B Munson Cc: Vlastimil Babka Cc: Thomas Gleixner Cc: Christoph Lameter Cc: Peter Zijlstra Cc: Mel Gorman Cc: David Rientjes Cc: Rik van Riel Cc: Michal Hocko Cc: linux...@kvack.org Cc: linux-kernel@vger.kernel.org --- Changes from V3: * Updated changelog * Restrict valid input to 0 or 1 in sysctl * Add documentation to sysctl/vm.txt Documentation/sysctl/vm.txt | 11 +++ include/linux/compaction.h |1 + kernel/sysctl.c |9 + mm/compaction.c |3 +++ 4 files changed, 24 insertions(+) diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index 902b457..812f0d4 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt @@ -21,6 +21,7 @@ Currently, these files are in /proc/sys/vm: - admin_reserve_kbytes - block_dump - compact_memory +- compact_unevictable - dirty_background_bytes - dirty_background_ratio - dirty_bytes @@ -106,6 +107,16 @@ huge pages although processes will also directly compact memory as required. == +compact_unevictable + +Available only when CONFIG_COMPACTION is set. When set to 1, compaction is +allowed to examine the unevictable lru (mlocked pages) for pages to compact. +This should be used on systems where stalls for minor page faults are an +acceptable trade for large contiguous free memory. Set to 0 to prevent +compaction from moving pages that are unevictable. + +== + dirty_background_bytes Contains the amount of dirty memory at which the background kernel diff --git a/include/linux/compaction.h b/include/linux/compaction.h index a014559..9dd7e7c 100644 --- a/include/linux/compaction.h +++ b/include/linux/compaction.h @@ -34,6 +34,7 @@ extern int sysctl_compaction_handler(struct ctl_table *table, int write, extern int sysctl_extfrag_threshold; extern int sysctl_extfrag_handler(struct ctl_table *table, int write, void __user *buffer, size_t *length, loff_t *ppos); +extern int sysctl_compact_unevictable; extern int fragmentation_index(struct zone *zone, unsigned int order); extern unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order, diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 88ea2d6..9272568 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1313,6 +1313,15 @@ static struct ctl_table vm_table[] = { .extra1 = &min_extfrag_threshold, .extra2 = &max_extfrag_threshold, }, + { + .procname = "compact_unevictable", + .data = &sysctl_compact_unevictable, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec, + .extra1 = &zero, + .extra2 = &one, + }, #endif /* CONFIG_COMPACTION */ { diff --git a/mm/compaction.c b/mm/compaction.c index 8c0d945..342b221 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1046,6 +1046,8 @@ typedef enum { ISOLATE_SUCCESS,/* Pages isolated, migrate */ } isolate_migrate_t; +int sysctl_compact_unevictable; + /* * Isolate all pages that can be migrated from the first suitable block, * starting at the block pointed to by the migrate scanner pfn within @@ -1057,6 +1059,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, unsigned long low_pfn, end_pfn; struct page *page; const isolate_mode_t isolate_mode = +