Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat
On Tue, 13 May 2014, Jianyu Zhan wrote: > >> This means they guarantee that even they are preemted the vm > >> counter won't be modified incorrectly. Because the counter is page-related > >> (e.g., a new anon page added), and they are exclusively hold the pte lock. > > > >But there are multiple pte locks for numerous page. Another process could > >modify the counter because the pte lock for a different page was > >available which would cause counter corruption. > > > > > >> So, as you concludes in the other mail that __modd_zone_page_stat > >> couldn't be used. > >> in mlocked_vma_newpage, then what qualifies other call sites for using > >> it, in the same situation? > > Thanks, now everything is clear. > > I've renewed the patch, would you please review it? Thanks! > > ---<8--- > mm: use the light version __mod_zone_page_state in mlocked_vma_newpage() > > mlocked_vma_newpage() is called with pte lock held(a spinlock), which > implies preemtion disabled, and the vm stat counter is not modified from > interrupt context, so we need not use an irq-safe mod_zone_page_state() here, > using a light-weight version __mod_zone_page_state() would be OK. > > This patch also documents __mod_zone_page_state() and some of its > callsites. The comment above __mod_zone_page_state() is from Hugh > Dickins, and acked by Christoph. > > Most credits to Hugh and Christoph for the clarification on the usage of > the __mod_zone_page_state(). > > Suggested-by: Andrew Morton > Signed-off-by: Hugh Dickins No, I didn't sign it off, just offered some text (which you already acknowledged graciously). Andrew, please delete that line, but... > Signed-off-by: Jianyu Zhan Acked-by: Hugh Dickins Yes, this is good - and even better is the version which Andrew has slipped into mmotm, removing the duplicated comment line from page_remove_rmap(), and reformatting the comments to fit better. It is a little random, in that we can easily see other __mod_zone_page_states to which the same comment could be applied; but that's okay, please leave it as is, you've made the point enough to help other people get it, and it would get boring to repeat it more. Hugh > --- > mm/internal.h | 7 ++- > mm/rmap.c | 10 ++ > mm/vmstat.c | 4 +++- > 3 files changed, 19 insertions(+), 2 deletions(-) > > diff --git a/mm/internal.h b/mm/internal.h > index 07b6736..53d439e 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -196,7 +196,12 @@ static inline int mlocked_vma_newpage(struct > vm_area_struct *vma, > return 0; > > if (!TestSetPageMlocked(page)) { > - mod_zone_page_state(page_zone(page), NR_MLOCK, > + /* > + * We use the irq-unsafe __mod_zone_page_stat because > + * this counter is not modified from interrupt context, and the > + * pte lock is held(spinlock), which implies preemtion disabled. > + */ > + __mod_zone_page_state(page_zone(page), NR_MLOCK, > hpage_nr_pages(page)); > count_vm_event(UNEVICTABLE_PGMLOCKED); > } > diff --git a/mm/rmap.c b/mm/rmap.c > index 9c3e773..2fa4375 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -986,6 +986,11 @@ void do_page_add_anon_rmap(struct page *page, > { > int first = atomic_inc_and_test(&page->_mapcount); > if (first) { > + /* > + * We use the irq-unsafe __{inc|mod}_zone_page_stat because > + * these counters are not modified in interrupt context, and > + * pte lock(a spinlock) is held, which implies preemtion > disabled. > + */ > if (PageTransHuge(page)) > __inc_zone_page_state(page, > NR_ANON_TRANSPARENT_HUGEPAGES); > @@ -1077,6 +1082,11 @@ void page_remove_rmap(struct page *page) > /* >* Hugepages are not counted in NR_ANON_PAGES nor NR_FILE_MAPPED >* and not charged by memcg for now. > + * > + * We use the irq-unsafe __{inc|mod}_zone_page_stat because > + * these counters are not modified in interrupt context, and > + * these counters are not modified in interrupt context, and > + * pte lock(a spinlock) is held, which implies preemtion disabled. >*/ > if (unlikely(PageHuge(page))) > goto out; > diff --git a/mm/vmstat.c b/mm/vmstat.c > index 302dd07..704928e 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -207,7 +207,9 @@ void set_pgdat_percpu_threshold(pg_data_t *pgdat, > } > > /* > - * For use when we know that interrupts are disabled. > + * For use when we know that interrupts are disabled, > + * or when we know that preemption is disabled and that > + * particular counter cannot be updated from interrupt context. > */ > void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item, > int delta) > -- > 2.0.0-rc1 -- To unsubs
Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat
>Preemption is mis-spelled throughout. > >Otherwise > >Reviewed-by: Christoph Lameter Oops, corrected. Thanks. -<8- >From 1d0a080429542accfeac7de03e159a0bea12abba Mon Sep 17 00:00:00 2001 From: Jianyu Zhan Date: Tue, 13 May 2014 00:19:08 +0800 Subject: [PATCH] mm: use the light version __mod_zone_page_state in mlocked_vma_newpage() mlocked_vma_newpage() is called with pte lock held(a spinlock), which implies preemtion disabled, and the vm stat counter is not modified from interrupt context, so we need not use an irq-safe mod_zone_page_state() here, using a light-weight version __mod_zone_page_state() would be OK. This patch also documents __mod_zone_page_state() and some of its callsites. The comment above __mod_zone_page_state() is from Hugh Dickins, and acked by Christoph. Most credits to Hugh and Christoph for the clarification on the usage of the __mod_zone_page_state(). Suggested-by: Andrew Morton Signed-off-by: Hugh Dickins Reviewed-by: Christoph Lameter Signed-off-by: Jianyu Zhan --- mm/internal.h | 7 ++- mm/rmap.c | 9 + mm/vmstat.c | 4 +++- 3 files changed, 18 insertions(+), 2 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 07b6736..d6a4868 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -196,7 +196,12 @@ static inline int mlocked_vma_newpage(struct vm_area_struct *vma, return 0; if (!TestSetPageMlocked(page)) { - mod_zone_page_state(page_zone(page), NR_MLOCK, + /* +* We use the irq-unsafe __mod_zone_page_stat because +* this counter is not modified from interrupt context, and the +* pte lock is held(spinlock), which implies preemption disabled. +*/ + __mod_zone_page_state(page_zone(page), NR_MLOCK, hpage_nr_pages(page)); count_vm_event(UNEVICTABLE_PGMLOCKED); } diff --git a/mm/rmap.c b/mm/rmap.c index 9c3e773..fa73194 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -986,6 +986,11 @@ void do_page_add_anon_rmap(struct page *page, { int first = atomic_inc_and_test(&page->_mapcount); if (first) { + /* +* We use the irq-unsafe __{inc|mod}_zone_page_stat because +* these counters are not modified from interrupt context, and +* pte lock(a spinlock) is held, which implies preemption disabled. +*/ if (PageTransHuge(page)) __inc_zone_page_state(page, NR_ANON_TRANSPARENT_HUGEPAGES); @@ -1077,6 +1082,10 @@ void page_remove_rmap(struct page *page) /* * Hugepages are not counted in NR_ANON_PAGES nor NR_FILE_MAPPED * and not charged by memcg for now. +* +* We use the irq-unsafe __{inc|mod}_zone_page_stat because +* these counters are not modified from interrupt context, and +* pte lock(a spinlock) is held, which implies preemption disabled. */ if (unlikely(PageHuge(page))) goto out; diff --git a/mm/vmstat.c b/mm/vmstat.c index 302dd07..704928e 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -207,7 +207,9 @@ void set_pgdat_percpu_threshold(pg_data_t *pgdat, } /* - * For use when we know that interrupts are disabled. + * For use when we know that interrupts are disabled, + * or when we know that preemption is disabled and that + * particular counter cannot be updated from interrupt context. */ void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item, int delta) -- 2.0.0-rc1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat
On Tue, 13 May 2014, Jianyu Zhan wrote: > diff --git a/mm/internal.h b/mm/internal.h > index 07b6736..53d439e 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -196,7 +196,12 @@ static inline int mlocked_vma_newpage(struct > vm_area_struct *vma, > return 0; > > if (!TestSetPageMlocked(page)) { > - mod_zone_page_state(page_zone(page), NR_MLOCK, > + /* > + * We use the irq-unsafe __mod_zone_page_stat because > + * this counter is not modified from interrupt context, and the > + * pte lock is held(spinlock), which implies preemtion disabled. Preemption is mis-spelled throughout. Otherwise Reviewed-by: Christoph Lameter -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat
>> This means they guarantee that even they are preemted the vm >> counter won't be modified incorrectly. Because the counter is page-related >> (e.g., a new anon page added), and they are exclusively hold the pte lock. > >But there are multiple pte locks for numerous page. Another process could >modify the counter because the pte lock for a different page was >available which would cause counter corruption. > > >> So, as you concludes in the other mail that __modd_zone_page_stat >> couldn't be used. >> in mlocked_vma_newpage, then what qualifies other call sites for using >> it, in the same situation? Thanks, now everything is clear. I've renewed the patch, would you please review it? Thanks! ---<8--- mm: use the light version __mod_zone_page_state in mlocked_vma_newpage() mlocked_vma_newpage() is called with pte lock held(a spinlock), which implies preemtion disabled, and the vm stat counter is not modified from interrupt context, so we need not use an irq-safe mod_zone_page_state() here, using a light-weight version __mod_zone_page_state() would be OK. This patch also documents __mod_zone_page_state() and some of its callsites. The comment above __mod_zone_page_state() is from Hugh Dickins, and acked by Christoph. Most credits to Hugh and Christoph for the clarification on the usage of the __mod_zone_page_state(). Suggested-by: Andrew Morton Signed-off-by: Hugh Dickins Signed-off-by: Jianyu Zhan --- mm/internal.h | 7 ++- mm/rmap.c | 10 ++ mm/vmstat.c | 4 +++- 3 files changed, 19 insertions(+), 2 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 07b6736..53d439e 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -196,7 +196,12 @@ static inline int mlocked_vma_newpage(struct vm_area_struct *vma, return 0; if (!TestSetPageMlocked(page)) { - mod_zone_page_state(page_zone(page), NR_MLOCK, + /* +* We use the irq-unsafe __mod_zone_page_stat because +* this counter is not modified from interrupt context, and the +* pte lock is held(spinlock), which implies preemtion disabled. +*/ + __mod_zone_page_state(page_zone(page), NR_MLOCK, hpage_nr_pages(page)); count_vm_event(UNEVICTABLE_PGMLOCKED); } diff --git a/mm/rmap.c b/mm/rmap.c index 9c3e773..2fa4375 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -986,6 +986,11 @@ void do_page_add_anon_rmap(struct page *page, { int first = atomic_inc_and_test(&page->_mapcount); if (first) { + /* +* We use the irq-unsafe __{inc|mod}_zone_page_stat because +* these counters are not modified in interrupt context, and +* pte lock(a spinlock) is held, which implies preemtion disabled. +*/ if (PageTransHuge(page)) __inc_zone_page_state(page, NR_ANON_TRANSPARENT_HUGEPAGES); @@ -1077,6 +1082,11 @@ void page_remove_rmap(struct page *page) /* * Hugepages are not counted in NR_ANON_PAGES nor NR_FILE_MAPPED * and not charged by memcg for now. +* +* We use the irq-unsafe __{inc|mod}_zone_page_stat because +* these counters are not modified in interrupt context, and +* these counters are not modified in interrupt context, and +* pte lock(a spinlock) is held, which implies preemtion disabled. */ if (unlikely(PageHuge(page))) goto out; diff --git a/mm/vmstat.c b/mm/vmstat.c index 302dd07..704928e 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -207,7 +207,9 @@ void set_pgdat_percpu_threshold(pg_data_t *pgdat, } /* - * For use when we know that interrupts are disabled. + * For use when we know that interrupts are disabled, + * or when we know that preemption is disabled and that + * particular counter cannot be updated from interrupt context. */ void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item, int delta) -- 2.0.0-rc1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat
On Mon, 12 May 2014, Jianyu Zhan wrote: > This means they guarantee that even they are preemted the vm > counter won't be modified incorrectly. Because the counter is page-related > (e.g., a new anon page added), and they are exclusively hold the pte lock. Ok. if these locations hold the pte lock then preemption is disabled and you are ok to use __mod_zone_page_state. Has nothing to do with the page. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat
On Mon, 12 May 2014, Jianyu Zhan wrote: > This means they guarantee that even they are preemted the vm > counter won't be modified incorrectly. Because the counter is page-related > (e.g., a new anon page added), and they are exclusively hold the pte lock. But there are multiple pte locks for numerous page. Another process could modify the counter because the pte lock for a different page was available which would cause counter corruption. > So, as you concludes in the other mail that __modd_zone_page_stat > couldn't be used. > in mlocked_vma_newpage, then what qualifies other call sites for using > it, in the same situation? Preemption should be off in those functions because a spinlock is being held. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat
On Mon, May 12, 2014 at 10:01 PM, Christoph Lameter wrote: > > > >/* > > * For use when we know that interrupts are disabled, > > * or when we know that preemption is disabled and that > > * particular counter cannot be updated from interrupt context. > > */ > > The description above looks ok to me. The problem is that you are > considering the page related data structures as an issue for races and not > the data structures relevant for vm statistics. Hi, Christoph, Yep. I did. Let me restate the point here. To use __mod_zone_page_stat when we know that either 1. interrupts are disabled, or 2. preemption is disabled and that particular counter cannot be updated from interrupt context. For those call sites currently using __mod_zone_page_stat, they just guarantees the counter is never modified from an interrupt context, but doesn't disable preemption. This means they guarantee that even they are preemted the vm counter won't be modified incorrectly. Because the counter is page-related (e.g., a new anon page added), and they are exclusively hold the pte lock. This is why I emphasized on 'the page related data structures as an issue for races'. So, as you concludes in the other mail that __modd_zone_page_stat couldn't be used. in mlocked_vma_newpage, then what qualifies other call sites for using it, in the same situation? See: void page_add_new_anon_rmap(struct page *page, struct vm_area_struct *vma, unsigned long address) { VM_BUG_ON(address < vma->vm_start || address >= vma->vm_end); SetPageSwapBacked(page); atomic_set(&page->_mapcount, 0); /* increment count (starts at -1) */ if (PageTransHuge(page)) __inc_zone_page_state(page, NR_ANON_TRANSPARENT_HUGEPAGES); __mod_zone_page_state(page_zone(page), NR_ANON_PAGES, hpage_nr_pages(page)); <--- using it. __page_set_anon_rmap(page, vma, address, 1); if (!mlocked_vma_newpage(vma, page)) { <--- couldn't use it ? SetPageActive(page); lru_cache_add(page); } else add_page_to_unevictable_list(page); } Hope I express it clearly enough. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat
On Sun, 11 May 2014, Jianyu Zhan wrote: > > > >/* > > * For use when we know that interrupts are disabled, > > * or when we know that preemption is disabled and that > > * particular counter cannot be updated from interrupt context. > > */ > > Seconded. Christoph, would you please write a comment? I've written > a new one based on Hugh's, would you please also take a look? Thanks. The description above looks ok to me. The problem is that you are considering the page related data structures as an issue for races and not the data structures relevant for vm statistics. > It is essential to have such gurantees, because __mod_zone_page_stat() > is a two-step operation : read-percpu-couter-then-modify-it. > (Need comments. Christoph, do I misunderstand it?) Yup. > mlocked_vma_newpage() is only called in fault path by > page_add_new_anon_rmap(), which is called on a *new* page. > And such page is initially only visible via the pagetables, and the > pte is locked while calling page_add_new_anon_rmap(), so we need not > use an irq-safe mod_zone_page_state() here, using a light-weight version > __mod_zone_page_state() would be OK. This is wrong.. What you could say is that preemption is off and that the counter is never incremented from an interrupt context that could race with it. If this is the case then it would be safe. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat
>Your original __mod_zone_page_state happened to be correct; >but you have no understanding of why it was correct, so its >comment was very wrong, even after you changed the irq wording. > >This series just propagates your misunderstanding further, >while providing an object lesson in how not to present a series. > >Sorry, you have quickly developed an unenviable reputation for >patches which waste developers' time: please consider your >patches much more carefully before posting them. Hi, Hugh. I'm sorry for my misunderstanding in the previous patches. I should have given them more thought. I'm really sorry. {palmface} X 3 And I am really appreciated for your detailed comments and your patience with me such a noob;-) Today I've spent some time on digging into the history to figure out what is under the hood. >I'd prefer to let Christoph write the definitive version, >but my first stab at it would be: > >/* > * For use when we know that interrupts are disabled, > * or when we know that preemption is disabled and that > * particular counter cannot be updated from interrupt context. > */ Seconded. Christoph, would you please write a comment? I've written a new one based on Hugh's, would you please also take a look? Thanks. >Once upon a time, from 2.6.16 to 2.6.32, there was indeed a relevant >and helpful comment in __page_set_anon_rmap(): >/* > * nr_mapped state can be updated without turning off > * interrupts because it is not modified via interrupt. > */ >__inc_page_state(nr_mapped); > >The comment survived the replacement of nr_mapped, but eventually >it got cleaned away completely. > >It is safe to use the irq-unsafe __mod_zone_page_stat on counters >which are never modified via interrupt. >You are right that the comment is not good enough, but I don't trust >your version either. Since percpu variables are involved, it's important >that preemption be disabled too (see comment above __inc_zone_state). Thanks for clarifying this. I checked the history, this is my understanding , correct me if I am wrong, thanks! __mod_zone_page_stat() should be called with irq-off, this is a strongest gurantee for safety. For a weaker gurantee, preemte-disable is needed and in this situation, the item counter in question should not be modified in interrupt context. It is essential to have such gurantees, because __mod_zone_page_stat() is a two-step operation : read-percpu-couter-then-modify-it. (Need comments. Christoph, do I misunderstand it?) For for all other call sites of __mod_zone_page_stat() in the patch, I think your comment is right, it is because they are not modified in interrupt context, so we could safely use __mod_zone_page_stat(). But for the call site in page_add_new_anon_rmap(), I think my previous wording is appropriate: mlocked_vma_newpage() is only called in fault path by page_add_new_anon_rmap() on a *new* page, which is initially only visible via the pagetables, and the pte is locked while calling page_add_new_anon_rmap(), so we are not afraid of others seeing it a this point, not even modifying it. so whether is could be modified in interrupt context or not does matter, but it is not the key point here. I've renewed the patch. And in this patch, I also change page_add_new_anon_rmap to use __mod_zone_page_stat(), since they are quite relative to be in one patch. Thanks for your review. I am appreciated for your comments. Andrew, Christoph, would you please review it? Thansk. -<8- mm: use the light version __mod_zone_page_state in mlocked_vma_newpage() mlocked_vma_newpage() is only called in fault path by page_add_new_anon_rmap(), which is called on a *new* page. And such page is initially only visible via the pagetables, and the pte is locked while calling page_add_new_anon_rmap(), so we need not use an irq-safe mod_zone_page_state() here, using a light-weight version __mod_zone_page_state() would be OK. This patch also documents __mod_zone_page_state() and some of its callsites. Suggested-by: Andrew Morton Suggested-by: Hugh Dickins Signed-off-by: Jianyu Zhan --- mm/internal.h | 9 - mm/rmap.c | 11 +++ mm/vmstat.c | 5 - 3 files changed, 23 insertions(+), 2 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 07b6736..7140c9b 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -196,7 +196,14 @@ static inline int mlocked_vma_newpage(struct vm_area_struct *vma, return 0; if (!TestSetPageMlocked(page)) { - mod_zone_page_state(page_zone(page), NR_MLOCK, + /* +* We use the irq-unsafe __mod_zone_page_stat because +* 1. this counter is not modified in interrupt context, and +* 2. pte lock is held, and this a newpage, which is initially +*only visible via the pagetables. So this would exclude +*racy processes from preemting us and to modify it. +
Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat
On Sat, 10 May 2014, Jianyu Zhan wrote: > __mod_zone_page_stat() is not irq-safe, so it should be used carefully. > And it is not appropirately documented now. This patch adds comment for > it, and also documents for some of its call sites. > > Suggested-by: Andrew Morton > Signed-off-by: Jianyu Zhan Your original __mod_zone_page_state happened to be correct; but you have no understanding of why it was correct, so its comment was very wrong, even after you changed the irq wording. This series just propagates your misunderstanding further, while providing an object lesson in how not to present a series. Sorry, you have quickly developed an unenviable reputation for patches which waste developers' time: please consider your patches much more carefully before posting them. > --- > mm/page_alloc.c | 2 ++ > mm/rmap.c | 6 ++ > mm/vmstat.c | 16 +++- > 3 files changed, 23 insertions(+), 1 deletion(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 5dba293..9d6f474 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -659,6 +659,8 @@ static inline int free_pages_check(struct page *page) > * > * And clear the zone's pages_scanned counter, to hold off the "all pages are > * pinned" detection logic. > + * > + * Note: this function should be used with irq disabled. Correct, but I don't see that that needed saying. This is a static function which is being used as intended: just because it matched your search for "__mod_zone_page_state" is not a reason for you to add that comment; irq disabled is only one of its prerequisites. > */ > static void free_pcppages_bulk(struct zone *zone, int count, > struct per_cpu_pages *pcp) > diff --git a/mm/rmap.c b/mm/rmap.c > index 9c3e773..6078a30 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -979,6 +979,8 @@ void page_add_anon_rmap(struct page *page, > /* > * Special version of the above for do_swap_page, which often runs > * into pages that are exclusively owned by the current process. > + * So we could use the irq-unsafe version __{inc|mod}_zone_page_stat > + * here without others racing change it in between. And yet you can immediately see them being used without any test for "exclusive" below: why is that? Think about it. > * Everybody else should continue to use page_add_anon_rmap above. > */ > void do_page_add_anon_rmap(struct page *page, > @@ -1077,6 +1079,10 @@ void page_remove_rmap(struct page *page) > /* >* Hugepages are not counted in NR_ANON_PAGES nor NR_FILE_MAPPED >* and not charged by memcg for now. > + * > + * And we are the last user of this page, so it is safe to use > + * the irq-unsafe version __{mod|dec}_zone_page here, since we > + * have no racer. Again, the code is correct to be using the irq-unsafe version, but your comment is doubly wrong. We are not necessarily the last user of this page, merely the one that just now brought the mapcount down to 0. But think: what bearing would being the last user of this page have on the safety of using __mod_zone_page_state to adjust per-zone counters? None at all. A page does not move from one zone to another (though its contents might be migrated from one page to another when safe to do so). Once upon a time, from 2.6.16 to 2.6.32, there was indeed a relevant and helpful comment in __page_set_anon_rmap(): /* * nr_mapped state can be updated without turning off * interrupts because it is not modified via interrupt. */ __inc_page_state(nr_mapped); The comment survived the replacement of nr_mapped, but eventually it got cleaned away completely. It is safe to use the irq-unsafe __mod_zone_page_stat on counters which are never modified via interrupt. >*/ > if (unlikely(PageHuge(page))) > goto out; > diff --git a/mm/vmstat.c b/mm/vmstat.c > index 302dd07..778f154 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -207,7 +207,21 @@ void set_pgdat_percpu_threshold(pg_data_t *pgdat, > } > > /* > - * For use when we know that interrupts are disabled. > + * Optimized modificatoin function. > + * > + * The code basically does the modification in two steps: > + * > + * 1. read the current counter based on the processor number > + * 2. modificate the counter write it back. > + * > + * So this function should be used with the guarantee that > + * > + * 1. interrupts are disabled, or > + * 2. interrupts are enabled, but no other sites would race to > + * modify this counter in between. > + * > + * Otherwise, an irq-safe version mod_zone_page_state() should > + * be used instead. You are right that the comment is not good enough, but I don't trust your version either. Since percpu variables are involved, it's important that preemption be disabled too (see comment above __inc_zone_state). I'd prefer to let Christoph write the definitive version, but my first stab at it would be: /
[PATCH 1/3] mm: add comment for __mod_zone_page_stat
__mod_zone_page_stat() is not irq-safe, so it should be used carefully. And it is not appropirately documented now. This patch adds comment for it, and also documents for some of its call sites. Suggested-by: Andrew Morton Signed-off-by: Jianyu Zhan --- mm/page_alloc.c | 2 ++ mm/rmap.c | 6 ++ mm/vmstat.c | 16 +++- 3 files changed, 23 insertions(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5dba293..9d6f474 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -659,6 +659,8 @@ static inline int free_pages_check(struct page *page) * * And clear the zone's pages_scanned counter, to hold off the "all pages are * pinned" detection logic. + * + * Note: this function should be used with irq disabled. */ static void free_pcppages_bulk(struct zone *zone, int count, struct per_cpu_pages *pcp) diff --git a/mm/rmap.c b/mm/rmap.c index 9c3e773..6078a30 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -979,6 +979,8 @@ void page_add_anon_rmap(struct page *page, /* * Special version of the above for do_swap_page, which often runs * into pages that are exclusively owned by the current process. + * So we could use the irq-unsafe version __{inc|mod}_zone_page_stat + * here without others racing change it in between. * Everybody else should continue to use page_add_anon_rmap above. */ void do_page_add_anon_rmap(struct page *page, @@ -1077,6 +1079,10 @@ void page_remove_rmap(struct page *page) /* * Hugepages are not counted in NR_ANON_PAGES nor NR_FILE_MAPPED * and not charged by memcg for now. +* +* And we are the last user of this page, so it is safe to use +* the irq-unsafe version __{mod|dec}_zone_page here, since we +* have no racer. */ if (unlikely(PageHuge(page))) goto out; diff --git a/mm/vmstat.c b/mm/vmstat.c index 302dd07..778f154 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -207,7 +207,21 @@ void set_pgdat_percpu_threshold(pg_data_t *pgdat, } /* - * For use when we know that interrupts are disabled. + * Optimized modificatoin function. + * + * The code basically does the modification in two steps: + * + * 1. read the current counter based on the processor number + * 2. modificate the counter write it back. + * + * So this function should be used with the guarantee that + * + * 1. interrupts are disabled, or + * 2. interrupts are enabled, but no other sites would race to + * modify this counter in between. + * + * Otherwise, an irq-safe version mod_zone_page_state() should + * be used instead. */ void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item, int delta) -- 2.0.0-rc1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/